Most Cited 2025 "grounded pre-training" Papers

22,274 papers found • Page 17 of 112

#3201

Can Classic GNNs Be Strong Baselines for Graph-level Tasks? Simple Architectures Meet Excellence

Yuankai Luo, Lei Shi, Xiao-Ming Wu

ICML 2025arXiv:2502.09263
9
citations
#3202

Correcting Deviations from Normality: A Reformulated Diffusion Model for Multi-Class Unsupervised Anomaly Detection

Farzad Beizaee, Gregory A. Lodygensky, Christian Desrosiers et al.

CVPR 2025arXiv:2503.19357
9
citations
#3203

MultiADS: Defect-aware Supervision for Multi-type Anomaly Detection and Segmentation in Zero-Shot Learning

Ylli Sadikaj, Hongkuan Zhou, Lavdim Halilaj et al.

ICCV 2025arXiv:2504.06740
9
citations
#3204

BadToken: Token-level Backdoor Attacks to Multi-modal Large Language Models

Zenghui Yuan, Jiawen Shi, Pan Zhou et al.

CVPR 2025arXiv:2503.16023
9
citations
#3205

Vanish into Thin Air: Cross-prompt Universal Adversarial Attacks for SAM2

Ziqi Zhou, Yifan Hu, Yufei Song et al.

NEURIPS 2025spotlightarXiv:2510.24195
9
citations
#3206

Accurate and Regret-Aware Numerical Problem Solver for Tabular Question Answering

Yuxiang Wang, Jianzhong Qi, Junhao Gan

AAAI 2025paperarXiv:2410.12846
9
citations
#3207

STAA-SNN: Spatial-Temporal Attention Aggregator for Spiking Neural Networks

Tianqing Zhang, Kairong Yu, Xian Zhong et al.

CVPR 2025arXiv:2503.02689
9
citations
#3208

ChartMuseum: Testing Visual Reasoning Capabilities of Large Vision-Language Models

Liyan Tang, Grace Kim, Xinyu Zhao et al.

NEURIPS 2025arXiv:2505.13444
9
citations
#3209

Distilling Spectral Graph for Object-Context Aware Open-Vocabulary Semantic Segmentation

Chanyoung Kim, Dayun Ju, Woojung Han et al.

CVPR 2025arXiv:2411.17150
9
citations
#3210

PassionSR: Post-Training Quantization with Adaptive Scale in One-Step Diffusion based Image Super-Resolution

Zhu Li Bo, Jianze Li, Haotong Qin et al.

CVPR 2025arXiv:2411.17106
9
citations
#3211

Deformable Radial Kernel Splatting

Yihua Huang, Mingxian Lin, Yangtian Sun et al.

CVPR 2025arXiv:2412.11752
9
citations
#3212

When Do LLMs Help With Node Classification? A Comprehensive Analysis

Xixi Wu, Yifei Shen, Fangzhou Ge et al.

ICML 2025arXiv:2502.00829
9
citations
#3213

Generalizing from SIMPLE to HARD Visual Reasoning: Can We Mitigate Modality Imbalance in VLMs?

Simon Park, Abhishek Panigrahi, Yun Cheng et al.

ICML 2025arXiv:2501.02669
9
citations
#3214

MLLM-For3D: Adapting Multimodal Large Language Model for 3D Reasoning Segmentation

Jiaxin Huang, Runnan Chen, Ziwen Li et al.

NEURIPS 2025arXiv:2503.18135
9
citations
#3215

Constrained Fair and Efficient Allocations

Benjamin Cookson, Soroush Ebadian, Nisarg Shah

AAAI 2025paperarXiv:2411.00133
9
citations
#3216

Markov Persuasion Processes: Learning to Persuade From Scratch

Francesco Bacchiocchi, Francesco Emanuele Stradi, Matteo Castiglioni et al.

NEURIPS 2025arXiv:2402.03077
9
citations
#3217

InPO: Inversion Preference Optimization with Reparametrized DDIM for Efficient Diffusion Model Alignment

Yunhong Lu, Qichao Wang, Hengyuan Cao et al.

CVPR 2025highlightarXiv:2503.18454
9
citations
#3218

Beyond the convexity assumption: Realistic tabular data generation under quantifier-free real linear constraints

Mihaela Stoian, Eleonora Giunchiglia

ICLR 2025arXiv:2502.18237
9
citations
#3219

Distilling Monocular Foundation Model for Fine-grained Depth Completion

Yingping Liang, Yutao Hu, Wenqi Shao et al.

CVPR 2025arXiv:2503.16970
9
citations
#3220

Probability Density Geodesics in Image Diffusion Latent Space

Qingtao Yu, Jaskirat Singh, Zhaoyuan Yang et al.

CVPR 2025arXiv:2504.06675
9
citations
#3221

LATINO-PRO: LAtent consisTency INverse sOlver with PRompt Optimization

Alessio Spagnoletti, Jean Prost, Andres Almansa et al.

ICCV 2025arXiv:2503.12615
9
citations
#3222

GLASS: Guided Latent Slot Diffusion for Object-Centric Learning

Krishnakant Singh, Simone Schaub-Meyer, Stefan Roth

CVPR 2025arXiv:2407.17929
9
citations
#3223

Language Guided Concept Bottleneck Models for Interpretable Continual Learning

Lu Yu, HaoYu Han, Zhe Tao et al.

CVPR 2025arXiv:2503.23283
9
citations
#3224

Enhancing Foundation Models for Time Series Forecasting via Wavelet-based Tokenization

Luca Masserano, Abdul Fatir Ansari, Boran Han et al.

ICML 2025oralarXiv:2412.05244
9
citations
#3225

O-TPT: Orthogonality Constraints for Calibrating Test-time Prompt Tuning in Vision-Language Models

Ashshak Sharifdeen, Muhammad Akhtar Munir, Sanoojan Baliah et al.

CVPR 2025highlightarXiv:2503.12096
9
citations
#3226

Fine-Tuning Visual Autogressive Models for Subject-Driven Generation

Jiwoo Chung, Sangeek Hyun, Hyunjun Kim et al.

ICCV 2025
9
citations
#3227

REWIND: Real-Time Egocentric Whole-Body Motion Diffusion with Exemplar-Based Identity Conditioning

Jihyun Lee, Weipeng Xu, Alexander Richard et al.

CVPR 2025arXiv:2504.04956
9
citations
#3228

Objective drives the consistency of representational similarity across datasets

Laure Ciernik, Lorenz Linhardt, Marco Morik et al.

ICML 2025arXiv:2411.05561
9
citations
#3229

MagCache: Fast Video Generation with Magnitude-Aware Cache

Zehong Ma, Longhui Wei, Feng Wang et al.

NEURIPS 2025arXiv:2506.09045
9
citations
#3230

Neural Video Compression with Context Modulation

Chuanbo Tang, Zhuoyuan Li, Yifan Bian et al.

CVPR 2025arXiv:2505.14541
9
citations
#3231

Genesis: Multimodal Driving Scene Generation with Spatio-Temporal and Cross-Modal Consistency

Xiangyu Guo, Zhanqian Wu, Kaixin Xiong et al.

NEURIPS 2025oralarXiv:2506.07497
9
citations
#3232

Quality-Driven Curation of Remote Sensing Vision-Language Data via Learned Scoring Models

Dilxat Muhtar, Enzhuo Zhang, Zhenshi Li et al.

NEURIPS 2025arXiv:2503.00743
9
citations
#3233

MultiGO: Towards Multi-level Geometry Learning for Monocular 3D Textured Human Reconstruction

Gangjian Zhang, Nanjie Yao, Shunsi Zhang et al.

CVPR 2025arXiv:2412.03103
9
citations
#3234

BountyBench: Dollar Impact of AI Agent Attackers and Defenders on Real-World Cybersecurity Systems

Andy Zhang, Joey Ji, Celeste Menders et al.

NEURIPS 2025arXiv:2505.15216
9
citations
#3235

PMQ-VE: Progressive Multi-Frame Quantization for Video Enhancement

ZhanFeng Feng, Long Peng, Xin Di et al.

NEURIPS 2025oralarXiv:2505.12266
9
citations
#3236

All-in-One: Transferring Vision Foundation Models into Stereo Matching

Jingyi Zhou, Haoyu Zhang, Jiakang Yuan et al.

AAAI 2025paperarXiv:2412.09912
9
citations
#3237

Revealing and Reducing Gender Biases in Vision and Language Assistants (VLAs)

Leander Girrbach, Stephan Alaniz, Yiran Huang et al.

ICLR 2025arXiv:2410.19314
9
citations
#3238

An End-to-End Robust Point Cloud Semantic Segmentation Network with Single-Step Conditional Diffusion Models

Wentao Qu, Jing Wang, Yongshun Gong et al.

CVPR 2025arXiv:2411.16308
9
citations
#3239

SAMRefiner: Taming Segment Anything Model for Universal Mask Refinement

Yuqi Lin, Hengjia Li, Wenqi Shao et al.

ICLR 2025arXiv:2502.06756
9
citations
#3240

RNG: Relightable Neural Gaussians

Jiahui Fan, Fujun Luan, Jian Yang et al.

CVPR 2025arXiv:2409.19702
9
citations
#3241

HybridGS: High-Efficiency Gaussian Splatting Data Compression using Dual-Channel Sparse Representation and Point Cloud Encoder

Qi Yang, Le Yang, Geert Van der Auwera et al.

ICML 2025arXiv:2505.01938
9
citations
#3242

SILMM: Self-Improving Large Multimodal Models for Compositional Text-to-Image Generation

Leigang Qu, Haochuan Li, Wenjie Wang et al.

CVPR 2025arXiv:2412.05818
9
citations
#3243

Model Provenance Testing for Large Language Models

Ivica Nikolic, Teodora Baluta, Prateek Saxena

NEURIPS 2025arXiv:2502.00706
9
citations
#3244

Counterfactual Generative Modeling with Variational Causal Inference

Yulun Wu, Louis McConnell, Claudia Iriondo

ICLR 2025arXiv:2410.12730
9
citations
#3245

PDE-Transformer: Efficient and Versatile Transformers for Physics Simulations

Benjamin Holzschuh, Qiang Liu, Georg Kohl et al.

ICML 2025oralarXiv:2505.24717
9
citations
#3246

SurFhead: Affine Rig Blending for Geometrically Accurate 2D Gaussian Surfel Head Avatars

Jaeseong Lee, Taewoong Kang, Marcel Buehler et al.

ICLR 2025arXiv:2410.11682
9
citations
#3247

QuaDiM: A Conditional Diffusion Model For Quantum State Property Estimation

Yehui Tang, Mabiao Long, Junchi Yan

ICLR 2025
9
citations
#3248

ADIFF: Explaining audio difference using natural language

Soham Deshmukh, Shuo Han, Rita Singh et al.

ICLR 2025arXiv:2502.04476
9
citations
#3249

Controllable Generation via Locally Constrained Resampling

Kareem Ahmed, Kai-Wei Chang, Guy Van den Broeck

ICLR 2025arXiv:2410.13111
9
citations
#3250

Attention-Driven GUI Grounding: Leveraging Pretrained Multimodal Large Language Models Without Fine-Tuning

Hai-Ming Xu, Qi Chen, Lei Wang et al.

AAAI 2025paperarXiv:2412.10840
9
citations
#3251

MAP: Unleashing Hybrid Mamba-Transformer Vision Backbone's Potential with Masked Autoregressive Pretraining

Yunze Liu, Li Yi

CVPR 2025arXiv:2410.00871
9
citations
#3252

Learning World Models for Interactive Video Generation

Taiye Chen, Xun Hu, Zihan Ding et al.

NEURIPS 2025oralarXiv:2505.21996
9
citations
#3253

Scaling Collapse Reveals Universal Dynamics in Compute-Optimally Trained Neural Networks

Shikai Qiu, Lechao Xiao, Andrew Wilson et al.

ICML 2025oralarXiv:2507.02119
9
citations
#3254

DIFFER: Disentangling Identity Features via Semantic Cues for Clothes-Changing Person Re-ID

Xin Liang, Yogesh S. Rawat

CVPR 2025arXiv:2503.22912
9
citations
#3255

Fast Summation of Radial Kernels via QMC Slicing

Johannes Hertrich, Tim Jahn, Michael Quellmalz

ICLR 2025arXiv:2410.01316
9
citations
#3256

MegActor-Sigma: Unlocking Flexible Mixed-Modal Control in Portrait Animation with Diffusion Transformer

Shurong Yang, Huadong Li, Juhao Wu et al.

AAAI 2025paper
9
citations
#3257

CoA: Towards Real Image Dehazing via Compression-and-Adaptation

Long Ma, Yuxin Feng, Yan Zhang et al.

CVPR 2025arXiv:2504.05590
9
citations
#3258

Test-time Adaptation for Cross-modal Retrieval with Query Shift

Haobin Li, Peng Hu, Qianjun Zhang et al.

ICLR 2025arXiv:2410.15624
9
citations
#3259

HumanRig: Learning Automatic Rigging for Humanoid Character in a Large Scale Dataset

Zedong Chu, Feng Xiong, Meiduo Liu et al.

CVPR 2025highlightarXiv:2412.02317
9
citations
#3260

ICM-Assistant: Instruction-tuning Multimodal Large Language Models for Rule-based Explainable Image Content Moderation

Mengyang Wu, Yuzhi Zhao, Jialun Cao et al.

AAAI 2025paperarXiv:2412.18216
9
citations
#3261

GAS: Generative Avatar Synthesis from a Single Image

Yixing Lu, Junting Dong, YoungJoong Kwon et al.

ICCV 2025arXiv:2502.06957
9
citations
#3262

Universal Video Temporal Grounding with Generative Multi-modal Large Language Models

Zeqian Li, Shangzhe Di, Zhonghua Zhai et al.

NEURIPS 2025oralarXiv:2506.18883
9
citations
#3263

DreamText: High Fidelity Scene Text Synthesis

Yibin Wang, Weizhong Zhang, honghui xu et al.

CVPR 2025arXiv:2405.14701
9
citations
#3264

LIBA: Language Instructed Multi-granularity Bridge Assistant for 3D Visual Grounding

Yuan Wang, Ya-Li Li, W U Eastman Z Y et al.

AAAI 2025paper
9
citations
#3265

Adversarially Robust Out-of-Distribution Detection Using Lyapunov-Stabilized Embeddings

Hossein Mirzaei Sadeghlou, Mackenzie Mathis

ICLR 2025
9
citations
#3266

Aligned Better, Listen Better for Audio-Visual Large Language Models

Yuxin Guo, Shuailei Ma, Shijie Ma et al.

ICLR 2025oralarXiv:2504.02061
9
citations
#3267

Towards Transformer-Based Aligned Generation with Self-Coherence Guidance

Shulei Wang, Wang Lin, Hai Huang et al.

CVPR 2025arXiv:2503.17675
9
citations
#3268

Discrete Codebook World Models for Continuous Control

Aidan Scannell, Mohammadreza Nakhaeinezhadfard, Kalle Kujanpää et al.

ICLR 2025arXiv:2503.00653
9
citations
#3269

Rethinking Query-based Transformer for Continual Image Segmentation

Yuchen Zhu, Cheng Shi, Dingyou Wang et al.

CVPR 2025arXiv:2507.07831
9
citations
#3270

Solving Inequality Proofs with Large Language Models

Jiayi Sheng, Luna Lyu, Jikai Jin et al.

NEURIPS 2025spotlightarXiv:2506.07927
9
citations
#3271

Circumventing Shortcuts in Audio-visual Deepfake Detection Datasets with Unsupervised Learning

Stefan Smeu, Dragos-Alexandru Boldisor, Dan Oneata et al.

CVPR 2025highlightarXiv:2412.00175
9
citations
#3272

EventGPT: Event Stream Understanding with Multimodal Large Language Models

shaoyu liu, Jianing Li, guanghui zhao et al.

CVPR 2025arXiv:2412.00832
9
citations
#3273

RHanDS: Refining Malformed Hands for Generated Images with Decoupled Structure and Style Guidance

Chengrui Wang, Pengfei Liu, Min Zhou et al.

AAAI 2025paperarXiv:2404.13984
9
citations
#3274

GEAL: Generalizable 3D Affordance Learning with Cross-Modal Consistency

Dongyue Lu, Lingdong Kong, Tianxin Huang et al.

CVPR 2025arXiv:2412.09511
9
citations
#3275

Diff3DS: Generating View-Consistent 3D Sketch via Differentiable Curve Rendering

Yibo Zhang, Lihong Wang, Changqing Zou et al.

ICLR 2025arXiv:2405.15305
9
citations
#3276

Learning Pattern-Specific Experts for Time Series Forecasting Under Patch-level Distribution Shift

Yanru Sun, Zongxia Xie, Emadeldeen Eldele et al.

NEURIPS 2025oralarXiv:2410.09836
9
citations
#3277

MM-Agent: LLM as Agents for Real-world Mathematical Modeling Problem

Fan LIU, Zherui Yang, Cancheng Liu et al.

NEURIPS 2025arXiv:2505.14148
9
citations
#3278

Theoretically Grounded Framework for LLM Watermarking: A Distribution-Adaptive Approach

Haiyun He, Yepeng Liu, Ziqiao Wang et al.

NEURIPS 2025arXiv:2410.02890
9
citations
#3279

BodyGen: Advancing Towards Efficient Embodiment Co-Design

Haofei Lu, Zhe Wu, Junliang Xing et al.

ICLR 2025oralarXiv:2503.00533
9
citations
#3280

VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by Video Spatiotemporal Augmentation

Weiming Ren, Huan Yang, Jie Min et al.

CVPR 2025arXiv:2412.00927
9
citations
#3281

Diffusion Tree Sampling: Scalable inference‑time alignment of diffusion models

Vineet Jain, Kusha Sareen, Mohammad Pedramfar et al.

NEURIPS 2025arXiv:2506.20701
9
citations
#3282

Relieving Universal Label Noise for Unsupervised Visible-Infrared Person Re-Identification by Inferring from Neighbors

Xiao Teng, Long Lan, Dingyao Chen et al.

AAAI 2025paperarXiv:2412.12220
9
citations
#3283

SoundCTM: Unifying Score-based and Consistency Models for Full-band Text-to-Sound Generation

Koichi Saito, Dongjun Kim, Takashi Shibuya et al.

ICLR 2025arXiv:2405.18503
9
citations
#3284

How Transformers Learn Structured Data: Insights From Hierarchical Filtering

Jerome Garnier-Brun, Marc Mezard, Emanuele Moscato et al.

ICML 2025arXiv:2408.15138
9
citations
#3285

Beyond Bradley-Terry Models: A General Preference Model for Language Model Alignment

Yifan Zhang, Ge Zhang, Yue Wu et al.

ICML 2025arXiv:2410.02197
9
citations
#3286

AgentTTS: Large Language Model Agent for Test-time Compute-optimal Scaling Strategy in Complex Tasks

Fali Wang, Hui Liu, Zhenwei Dai et al.

NEURIPS 2025arXiv:2508.00890
9
citations
#3287

LesionLocator: Zero-Shot Universal Tumor Segmentation and Tracking in 3D Whole-Body Imaging

Maximilian Rokuss, Yannick Kirchhoff, Seval Akbal et al.

CVPR 2025arXiv:2502.20985
9
citations
#3288

SketchVideo: Sketch-based Video Generation and Editing

Feng-Lin Liu, Hongbo Fu, Xintao Wang et al.

CVPR 2025arXiv:2503.23284
9
citations
#3289

Multi-Focus Image Fusion via Explicit Defocus Blur Modelling

Yuhui Quan, Xi Wan, Zitao Tang et al.

AAAI 2025paper
9
citations
#3290

CriSPO: Multi-Aspect Critique-Suggestion-guided Automatic Prompt Optimization for Text Generation

Han He, Qianchu Liu, Lei Xu et al.

AAAI 2025paperarXiv:2410.02748
9
citations
#3291

Diffusion models for Gaussian distributions: Exact solutions and Wasserstein errors

Emile Pierret, Bruno Galerne

ICML 2025arXiv:2405.14250
9
citations
#3292

Mamba4D: Efficient 4D Point Cloud Video Understanding with Disentangled Spatial-Temporal State Space Models

Jiuming Liu, Jinru Han, Lihao Liu et al.

CVPR 2025
9
citations
#3293

Edit Away and My Face Will not Stay: Personal Biometric Defense against Malicious Generative Editing

Hanhui Wang, Yihua Zhang, Ruizheng Bai et al.

CVPR 2025arXiv:2411.16832
8
citations
#3294

FaceBench: A Multi-View Multi-Level Facial Attribute VQA Dataset for Benchmarking Face Perception MLLMs

Xiaoqin Wang, Xusen Ma, Xianxu Hou et al.

CVPR 2025arXiv:2503.21457
8
citations
#3295

Autonomous LLM-Enhanced Adversarial Attack for Text-to-Motion

Honglei Miao, Fan Ma, Ruijie Quan et al.

AAAI 2025paperarXiv:2408.00352
8
citations
#3296

MVREC: A General Few-shot Defect Classification Model Using Multi-View Region-Context

Shuai Lyu, Rongchen Zhang, Zeqi Ma et al.

AAAI 2025paperarXiv:2412.16897
8
citations
#3297

Image Quality Assessment: Investigating Causal Perceptual Effects with Abductive Counterfactual Inference

Wenhao Shen, Mingliang Zhou, Yu Chen et al.

CVPR 2025arXiv:2412.16939
8
citations
#3298

Modality-Specialized Synergizers for Interleaved Vision-Language Generalists

Zhiyang Xu, Minqian Liu, Ying Shen et al.

ICLR 2025arXiv:2407.03604
8
citations
#3299

Secant Line Search for Frank-Wolfe Algorithms

Deborah Hendrych, Sebastian Pokutta, Mathieu Besançon et al.

ICML 2025arXiv:2501.18775
8
citations
#3300

Fast and Slow Streams for Online Time Series Forecasting Without Information Leakage

Ying-yee Ava Lau, Zhiwen Shao, Dit-Yan Yeung

ICLR 2025oral
8
citations
#3301

Effective and Efficient Masked Image Generation Models

Zebin You, Jingyang Ou, Xiaolu Zhang et al.

ICML 2025arXiv:2503.07197
8
citations
#3302

Physics of Language Models: Part 4.1, Architecture Design and the Magic of Canon Layers

Zeyuan Allen-Zhu

NEURIPS 2025arXiv:2512.17351
8
citations
#3303

Data Taggants: Dataset Ownership Verification Via Harmless Targeted Data Poisoning

Wassim Bouaziz, Nicolas Usunier, El-Mahdi El-Mhamdi

ICLR 2025arXiv:2410.09101
8
citations
#3304

HC-LLM: Historical-Constrained Large Language Models for Radiology Report Generation

Tengfei Liu, Jiapu Wang, Yongli Hu et al.

AAAI 2025paperarXiv:2412.11070
8
citations
#3305

RAP: Retrieval-Augmented Personalization for Multimodal Large Language Models

Haoran Hao, Jiaming Han, Changsheng Li et al.

CVPR 2025arXiv:2410.13360
8
citations
#3306

Why Masking Diffusion Works: Condition on the Jump Schedule for Improved Discrete Diffusion

Alan Amin, Nate Gruver, Andrew Wilson

NEURIPS 2025arXiv:2506.08316
8
citations
#3307

ACE: Anti-Editing Concept Erasure in Text-to-Image Models

Zihao Wang, Yuxiang Wei, Fan Li et al.

CVPR 2025arXiv:2501.01633
8
citations
#3308

AMO Sampler: Enhancing Text Rendering with Overshooting

Xixi Hu, Keyang Xu, Bo Liu et al.

CVPR 2025arXiv:2411.19415
8
citations
#3309

DELIFT: Data Efficient Language model Instruction Fine-Tuning

Ishika Agarwal, Krishnateja Killamsetty, Lucian Popa et al.

ICLR 2025arXiv:2411.04425
8
citations
#3310

ViG: Linear-complexity Visual Sequence Learning with Gated Linear Attention

Bencheng Liao, Xinggang Wang, Lianghui Zhu et al.

AAAI 2025paperarXiv:2405.18425
8
citations
#3311

INST-IT: Boosting Instance Understanding via Explicit Visual Prompt Instruction Tuning

Wujian Peng, Lingchen Meng, Yitong Chen et al.

NEURIPS 2025oralarXiv:2412.03565
8
citations
#3312

FALCON: Fairness Learning via Contrastive Attention Approach to Continual Semantic Scene Understanding

Thanh-Dat Truong, Utsav Prabhu, Bhiksha Raj et al.

CVPR 2025arXiv:2311.15965
8
citations
#3313

Micro-macro Wavelet-based Gaussian Splatting for 3D Reconstruction from Unconstrained Images

Yihui Li, Chengxin Lv, Hongyu Yang et al.

AAAI 2025paperarXiv:2501.14231
8
citations
#3314

Geometry-aware RL for Manipulation of Varying Shapes and Deformable Objects

Tai Hoang, Huy Le, Philipp Becker et al.

ICLR 2025arXiv:2502.07005
8
citations
#3315

Edge-SD-SR: Low Latency and Parameter Efficient On-device Super-Resolution with Stable Diffusion via Bidirectional Conditioning

Isma Hadji, Mehdi Noroozi, Victor Escorcia et al.

CVPR 2025arXiv:2412.06978
8
citations
#3316

BARD-GS: Blur-Aware Reconstruction of Dynamic Scenes via Gaussian Splatting

Yiren Lu, Yunlai Zhou, Disheng Liu et al.

CVPR 2025arXiv:2503.15835
8
citations
#3317

3D-GSW: 3D Gaussian Splatting for Robust Watermarking

Youngdong Jang, Hyunje Park, Feng Yang et al.

CVPR 2025arXiv:2409.13222
8
citations
#3318

Feature Denoising Diffusion Model for Blind Image Quality Assessment

Xudong Li, Yan Zhang, Yunhang Shen et al.

AAAI 2025paperarXiv:2401.11949
8
citations
#3319

Incomplete Modality Disentangled Representation for Ophthalmic Disease Grading and Diagnosis

Chengzhi Liu, Zile Huang, Zhe Chen et al.

AAAI 2025paperarXiv:2502.11724
8
citations
#3320

Can Textual Gradient Work in Federated Learning?

Minghui Chen, Ruinan Jin, Wenlong Deng et al.

ICLR 2025arXiv:2502.19980
8
citations
#3321

SapiensID: Foundation for Human Recognition

Minchul Kim, Dingqiang Ye, Yiyang Su et al.

CVPR 2025arXiv:2504.04708
8
citations
#3322

Learning Chaos In A Linear Way

Xiaoyuan Cheng, Yi He, Yiming Yang et al.

ICLR 2025arXiv:2503.14702
8
citations
#3323

Scene4U: Hierarchical Layered 3D Scene Reconstruction from Single Panoramic Image for Your Immerse Exploration

Zilong Huang, Jun He, Junyan Ye et al.

CVPR 2025arXiv:2504.00387
8
citations
#3324

DataMan: Data Manager for Pre-training Large Language Models

Ru Peng, Kexin Yang, Yawen Zeng et al.

ICLR 2025arXiv:2502.19363
8
citations
#3325

Differentially Private Steering for Large Language Model Alignment

Anmol Goel, Yaxi Hu, Iryna Gurevych et al.

ICLR 2025arXiv:2501.18532
8
citations
#3326

VidChain: Chain-of-Tasks with Metric-based Direct Preference Optimization for Dense Video Captioning

Ji Soo Lee, Jongha Kim, Jeehye Na et al.

AAAI 2025paperarXiv:2501.06761
8
citations
#3327

Sail into the Headwind: Alignment via Robust Rewards and Dynamic Labels against Reward Hacking

Paria Rashidinejad, Yuandong Tian

ICLR 2025arXiv:2412.09544
8
citations
#3328

Soft Reasoning: Navigating Solution Spaces in Large Language Models through Controlled Embedding Exploration

Qinglin Zhu, Runcong Zhao, Hanqi Yan et al.

ICML 2025spotlightarXiv:2505.24688
8
citations
#3329

From Experts to a Generalist: Toward General Whole-Body Control for Humanoid Robots

Yuxuan Wang, Ming Yang, Gang Ding et al.

NEURIPS 2025oralarXiv:2506.12779
8
citations
#3330

HELM: Hyperbolic Large Language Models via Mixture-of-Curvature Experts

Neil He, Rishabh Anand, Hiren Madhu et al.

NEURIPS 2025arXiv:2505.24722
8
citations
#3331

Adversarial Generative Flow Network for Solving Vehicle Routing Problems

Ni Zhang, Jingfeng Yang, Zhiguang Cao et al.

ICLR 2025arXiv:2503.01931
8
citations
#3332

VIKI‑R: Coordinating Embodied Multi-Agent Cooperation via Reinforcement Learning

Li Kang, Xiufeng Song, Heng Zhou et al.

NEURIPS 2025arXiv:2506.09049
8
citations
#3333

DenseGrounding: Improving Dense Language-Vision Semantics for Ego-centric 3D Visual Grounding

Henry Zheng, Hao Shi, Qihang Peng et al.

ICLR 2025arXiv:2505.04965
8
citations
#3334

T2ICount: Enhancing Cross-modal Understanding for Zero-Shot Counting

Yifei Qian, Zhongliang Guo, Bowen Deng et al.

CVPR 2025highlightarXiv:2502.20625
8
citations
#3335

Local-Prompt: Extensible Local Prompts for Few-Shot Out-of-Distribution Detection

Fanhu Zeng, Zhen Cheng, Fei Zhu et al.

ICLR 2025arXiv:2409.04796
8
citations
#3336

How do Transformers Learn Implicit Reasoning?

Jiaran Ye, Zijun Yao, Zhidian Huang et al.

NEURIPS 2025oralarXiv:2505.23653
8
citations
#3337

Focusing on Tracks for Online Multi-Object Tracking

Kyujin Shim, Kangwook Ko, YuJin Yang et al.

CVPR 2025
8
citations
#3338

Synthetic Prior for Few-Shot Drivable Head Avatar Inversion

Wojciech Zielonka, Stephan J. Garbin, Alexandros Lattas et al.

CVPR 2025arXiv:2501.06903
8
citations
#3339

DiffFNO: Diffusion Fourier Neural Operator

Xiaoyi Liu, Hao Tang

CVPR 2025arXiv:2411.09911
8
citations
#3340

Hypergraph Vision Transformers: Images are More than Nodes, More than Edges

Joshua Fixelle

CVPR 2025arXiv:2504.08710
8
citations
#3341

From Specificity to Generality: Revisiting Generalizable Artifacts in Detecting Face Deepfakes

Long Ma, Zhiyuan Yan, Jin Xu et al.

NEURIPS 2025arXiv:2504.04827
8
citations
#3342

MergeNet: Knowledge Migration Across Heterogeneous Models, Tasks, and Modalities

Kunxi Li, Tianyu Zhan, Kairui Fu et al.

AAAI 2025paperarXiv:2404.13322
8
citations
#3343

DIPO: Dual-State Images Controlled Articulated Object Generation Powered by Diverse Data

Ruiqi Wu, Xinjie wang, Liu.Liu et al.

NEURIPS 2025arXiv:2505.20460
8
citations
#3344

Noise Calibration and Spatial-Frequency Interactive Network for STEM Image Enhancement

Hesong Li, Ziqi Wu, Ruiwen Shao et al.

CVPR 2025arXiv:2504.02555
8
citations
#3345

COCONut-PanCap: Joint Panoptic Segmentation and Grounded Captions for Fine-Grained Understanding and Generation

Xueqing Deng, Linjie Yang, Qihang Yu et al.

NEURIPS 2025arXiv:2502.02589
8
citations
#3346

Not all solutions are created equal: An analytical dissociation of functional and representational similarity in deep linear neural networks

Lukas Braun, Erin Grant, Andrew Saxe

ICML 2025spotlight
8
citations
#3347

DataEnvGym: Data Generation Agents in Teacher Environments with Student Feedback

Zaid Khan, Elias Stengel-Eskin, Jaemin Cho et al.

ICLR 2025arXiv:2410.06215
8
citations
#3348

Unveiling Differences in Generative Models: A Scalable Differential Clustering Approach

Jingwei Zhang, Mohammad Jalali, Cheuk Ting Li et al.

CVPR 2025highlightarXiv:2405.02700
8
citations
#3349

FAVOR-Bench: A Comprehensive Benchmark for Fine-Grained Video Motion Understanding

Chongjun Tu, Lin Zhang, pengtao chen et al.

NEURIPS 2025oralarXiv:2503.14935
8
citations
#3350

Right Now, Wrong Then: Non-Stationary Direct Preference Optimization under Preference Drift

Seongho Son, William Bankes, Sayak Ray Chowdhury et al.

ICML 2025oralarXiv:2407.18676
8
citations
#3351

ROUTE: Robust Multitask Tuning and Collaboration for Text-to-SQL

Yang Qin, Chao Chen, Zhihang Fu et al.

ICLR 2025arXiv:2412.10138
8
citations
#3352

V2C-CBM: Building Concept Bottlenecks with Vision-to-Concept Tokenizer

Hangzhou He, Lei Zhu, Xinliang Zhang et al.

AAAI 2025paperarXiv:2501.04975
8
citations
#3353

Temporal Heterogeneous Graph Generation with Privacy, Utility, and Efficiency

Xinyu He, Dongqi Fu, Hanghang Tong et al.

ICLR 2025oral
8
citations
#3354

Combining Cost Constrained Runtime Monitors for AI Safety

Tim Hua, James Baskerville, Henri Lemoine et al.

NEURIPS 2025arXiv:2507.15886
8
citations
#3355

Sample complexity of data-driven tuning of model hyperparameters in neural networks with structured parameter-dependent dual function

Maria-Florina Balcan, Anh Nguyen, Dravyansh Sharma

NEURIPS 2025arXiv:2501.13734
8
citations
#3356

Depth-Centric Dehazing and Depth-Estimation from Real-World Hazy Driving Video

Junkai Fan, Kun Wang, Zhiqiang Yan et al.

AAAI 2025paperarXiv:2412.11395
8
citations
#3357

Decoding Game: On Minimax Optimality of Heuristic Text Generation Strategies

Sijin Chen, Omar Hagrass, Jason Klusowski

ICLR 2025arXiv:2410.03968
8
citations
#3358

LOMA: Language-assisted Semantic Occupancy Network via Triplane Mamba

Yubo Cui, Zhiheng Li, Jiaqiang Wang et al.

AAAI 2025paperarXiv:2412.08388
8
citations
#3359

Efficient stagewise pretraining via progressive subnetworks

Abhishek Panigrahi, Nikunj Saunshi, Kaifeng Lyu et al.

ICLR 2025arXiv:2402.05913
8
citations
#3360

Bundle Neural Network for message diffusion on graphs

Jacob Bamberger, Federico Barbero, Xiaowen Dong et al.

ICLR 2025arXiv:2405.15540
8
citations
#3361

Alleviate and Mining: Rethinking Unsupervised Domain Adaptation for Mitochondria Segmentation from Pseudo-Label Perspective

Yujia Chen, Rui Sun, Wangkai Li et al.

AAAI 2025paper
8
citations
#3362

Filter or Compensate: Towards Invariant Representation from Distribution Shift for Anomaly Detection

Zining Chen, Xingshuang Luo, Weiqiu Wang et al.

AAAI 2025paperarXiv:2412.10115
8
citations
#3363

CONTRA: Conformal Prediction Region via Normalizing Flow Transformation

Zhenhan FANG, Aixin Tan, Jian Huang

ICLR 2025
8
citations
#3364

PokerBench: Training Large Language Models to Become Professional Poker Players

Richard Zhuang, Akshat Gupta, Richard Yang et al.

AAAI 2025paperarXiv:2501.08328
8
citations
#3365

MANTRA: The Manifold Triangulations Assemblage

Rubén Ballester, Ernst Roell, Daniel Bin Schmid et al.

ICLR 2025arXiv:2410.02392
8
citations
#3366

Seeing Eye to AI: Human Alignment via Gaze-Based Response Rewards for Large Language Models

Ángela López-Cardona, Carlos Segura, Alexandros Karatzoglou et al.

ICLR 2025arXiv:2410.01532
8
citations
#3367

LoR-VP: Low-Rank Visual Prompting for Efficient Vision Model Adaptation

Can Jin, Ying Li, Mingyu Zhao et al.

ICLR 2025arXiv:2502.00896
8
citations
#3368

CapeX: Category-Agnostic Pose Estimation from Textual Point Explanation

Matan Rusanovsky, Or Hirschorn, Shai Avidan

ICLR 2025arXiv:2406.00384
8
citations
#3369

Multi-Granular Multimodal Clue Fusion for Meme Understanding

Li Zheng, Hao Fei, Ting Dai et al.

AAAI 2025paperarXiv:2503.12560
8
citations
#3370

Assessing the Creativity of LLMs in Proposing Novel Solutions to Mathematical Problems

Junyi Ye, Jingyi Gu, Xinyun Zhao et al.

AAAI 2025paperarXiv:2410.18336
8
citations
#3371

RoboPEPP: Vision-Based Robot Pose and Joint Angle Estimation through Embedding Predictive Pre-Training

Raktim Gautam Goswami, Prashanth Krishnamurthy, Yann LeCun et al.

CVPR 2025highlightarXiv:2411.17662
8
citations
#3372

Q-SFT: Q-Learning for Language Models via Supervised Fine-Tuning

Joey Hong, Anca Dragan, Sergey Levine

ICLR 2025arXiv:2411.05193
8
citations
#3373

Quantum-PEFT: Ultra parameter-efficient fine-tuning

Toshiaki Koike-Akino, Francesco Tonin, Yongtao Wu et al.

ICLR 2025arXiv:2503.05431
8
citations
#3374

LangSplatV2: High-dimensional 3D Language Gaussian Splatting with 450+ FPS

Wanhua Li, Yujie Zhao, Minghan Qin et al.

NEURIPS 2025arXiv:2507.07136
8
citations
#3375

TopoDiffusionNet: A Topology-aware Diffusion Model

Saumya Gupta, Dimitris Samaras, Chao Chen

ICLR 2025arXiv:2410.16646
8
citations
#3376

Re-HOLD: Video Hand Object Interaction Reenactment via adaptive Layout-instructed Diffusion Model

Yingying Fan, Quanwei Yang, Kaisiyuan Wang et al.

CVPR 2025arXiv:2503.16942
8
citations
#3377

Mitigating Social Bias in Large Language Models: A Multi-Objective Approach Within a Multi-Agent Framework

Zhenjie Xu, Wenqing Chen, Yi Tang et al.

AAAI 2025paperarXiv:2412.15504
8
citations
#3378

Generating Physically Stable and Buildable Brick Structures from Text

Ava Pun, Kangle Deng, Ruixuan Liu et al.

ICCV 2025arXiv:2505.05469
8
citations
#3379

DTGBrepGen: A Novel B-rep Generative Model through Decoupling Topology and Geometry

Jing Li, Yihang Fu, Falai Chen

CVPR 2025arXiv:2503.13110
8
citations
#3380

Adaptive Few-shot Prompting for Machine Translation with Pre-trained Language Models

Lei Tang, Jinghui Qin, Wenxuan Ye et al.

AAAI 2025paperarXiv:2501.01679
8
citations
#3381

MaFeRw: Query Rewriting with Multi-Aspect Feedbacks for Retrieval-Augmented Large Language Models

Yujing Wang, Hainan Zhang, Liang Pang et al.

AAAI 2025paperarXiv:2408.17072
8
citations
#3382

SceneWeaver: All-in-One 3D Scene Synthesis with an Extensible and Self-Reflective Agent

Yandan Yang, Baoxiong Jia, Shujie Zhang et al.

NEURIPS 2025arXiv:2509.20414
8
citations
#3383

MalCL: Leveraging GAN-Based Generative Replay to Combat Catastrophic Forgetting in Malware Classification

Jimin Park, AHyun Ji, Minji Park et al.

AAAI 2025paperarXiv:2501.01110
8
citations
#3384

SIGMAN: Scaling 3D Human Gaussian Generation with Millions of Assets

Yuhang Yang, Fengqi Liu, Yixing Lu et al.

ICCV 2025
8
citations
#3385

EBBS: An Ensemble with Bi-Level Beam Search for Zero-Shot Machine Translation

Yuqiao Wen, Behzad Shayegh, Chenyang Huang et al.

AAAI 2025paperarXiv:2403.00144
8
citations
#3386

Mask^2DiT: Dual Mask-based Diffusion Transformer for Multi-Scene Long Video Generation

Tianhao Qi, Jianlong Yuan, Wanquan Feng et al.

CVPR 2025
8
citations
#3387

SmartRAG: Jointly Learn RAG-Related Tasks From the Environment Feedback

Jingsheng Gao, Linxu Li, Ke Ji et al.

ICLR 2025arXiv:2410.18141
8
citations
#3388

HMoRA: Making LLMs More Effective with Hierarchical Mixture of LoRA Experts

Mengqi Liao, Wei Chen, Junfeng Shen et al.

ICLR 2025
8
citations
#3389

DroneSplat: 3D Gaussian Splatting for Robust 3D Reconstruction from In-the-Wild Drone Imagery

Jiadong Tang, Yu Gao, Dianyi Yang et al.

CVPR 2025highlightarXiv:2503.16964
8
citations
#3390

Revisiting Random Walks for Learning on Graphs

Jinwoo Kim, Olga Zaghen, Ayhan Suleymanzade et al.

ICLR 2025arXiv:2407.01214
8
citations
#3391

ROD-MLLM: Towards More Reliable Object Detection in Multimodal Large Language Models

Heng Yin, Yuqiang Ren, Ke Yan et al.

CVPR 2025
8
citations
#3392

Beyond Sequence: Impact of Geometric Context for RNA Property Prediction

Junjie Xu, Artem Moskalev, Tommaso Mansi et al.

ICLR 2025arXiv:2410.11933
8
citations
#3393

VideoOrion: Tokenizing Object Dynamics in Videos

Yicheng Feng, Yijiang Li, Wanpeng Zhang et al.

ICCV 2025arXiv:2411.16156
8
citations
#3394

EWMoE: An Effective Model for Global Weather Forecasting with Mixture-of-Experts

Lihao Gan, Xin Man, Chenghong Zhang et al.

AAAI 2025paperarXiv:2405.06004
8
citations
#3395

Near, far: Patch-ordering enhances vision foundation models' scene understanding

Valentinos Pariza, Mohammadreza Salehi, Gertjan J Burghouts et al.

ICLR 2025arXiv:2408.11054
8
citations
#3396

GNS: Solving Plane Geometry Problems by Neural-Symbolic Reasoning with Multi-Modal LLMs

Maizhen Ning, Zihao Zhou, Qiufeng Wang et al.

AAAI 2025paper
8
citations
#3397

Scaling Laws for Gradient Descent and Sign Descent for Linear Bigram Models under Zipf’s Law

Frederik Kunstner, Francis Bach

NEURIPS 2025arXiv:2505.19227
8
citations
#3398

ProtComposer: Compositional Protein Structure Generation with 3D Ellipsoids

Hannes Stärk, Bowen Jing, Tomas Geffner et al.

ICLR 2025arXiv:2503.05025
8
citations
#3399

As large as it gets – Studying Infinitely Large Convolutions via Neural Implicit Frequency Filters

Margret Keuper, Julia Grabinski, Janis Keuper

ICLR 2025
8
citations
#3400

On the Expressiveness of Rational ReLU Neural Networks With Bounded Depth

Gennadiy Averkov, Christopher Hojny, Maximilian Merkert

ICLR 2025arXiv:2502.06283
8
citations