Most Cited 2025 "llm capabilities transfer" Papers

22,274 papers found • Page 17 of 112

#3201

DrivingRecon: Large 4D Gaussian Reconstruction Model For Autonomous Driving

Hao LU, Tianshuo Xu, Wenzhao Zheng et al.

NEURIPS 2025arXiv:2412.09043
15
citations
#3202

STARFlow: Scaling Latent Normalizing Flows for High-resolution Image Synthesis

Jiatao Gu, Tianrong Chen, David Berthelot et al.

NEURIPS 2025spotlightarXiv:2506.06276
15
citations
#3203

Transformers Can Learn Temporal Difference Methods for In-Context Reinforcement Learning

Jiuqi Wang, Ethan Blaser, Hadi Daneshmand et al.

ICLR 2025oralarXiv:2405.13861
15
citations
#3204

Active Data Curation Effectively Distills Large-Scale Multimodal Models

Vishaal Udandarao, Nikhil Parthasarathy, Muhammad Ferjad Naeem et al.

CVPR 2025arXiv:2411.18674
15
citations
#3205

MBQ: Modality-Balanced Quantization for Large Vision-Language Models

Shiyao Li, Yingchun Hu, Xuefei Ning et al.

CVPR 2025arXiv:2412.19509
15
citations
#3206

Multi-Teacher Knowledge Distillation with Reinforcement Learning for Visual Recognition

Chuanguang Yang, XinQiang Yu, Han Yang et al.

AAAI 2025paperarXiv:2502.18510
15
citations
#3207

Optimal transport-based conformal prediction

Gauthier Thurin, Kimia Nadjahi, Claire Boyer

ICML 2025arXiv:2501.18991
15
citations
#3208

Toward Understanding In-context vs. In-weight Learning

Bryan Chan, Xinyi Chen, Andras Gyorgy et al.

ICLR 2025arXiv:2410.23042
15
citations
#3209

VideoAutoArena: An Automated Arena for Evaluating Large Multimodal Models in Video Analysis through User Simulation

Ziyang Luo, Haoning Wu, Dongxu Li et al.

CVPR 2025arXiv:2411.13281
15
citations
#3210

Backdooring Vision-Language Models with Out-Of-Distribution Data

Weimin Lyu, Michael Yao, Saumya Gupta et al.

ICLR 2025arXiv:2410.01264
15
citations
#3211

TextRefiner: Internal Visual Feature as Efficient Refiner for Vision-Language Models Prompt Tuning

Jingjing Xie, Yuxin Zhang, Jun Peng et al.

AAAI 2025paperarXiv:2412.08176
15
citations
#3212

Galileo: Learning Global & Local Features of Many Remote Sensing Modalities

Gabriel Tseng, Anthony Fuller, Marlena Reil et al.

ICML 2025arXiv:2502.09356
15
citations
#3213

Tuning Timestep-Distilled Diffusion Model Using Pairwise Sample Optimization

Zichen Miao, Zhengyuan Yang, Kevin Lin et al.

ICLR 2025arXiv:2410.03190
15
citations
#3214

Perceive, Understand and Restore: Real-World Image Super-Resolution with Autoregressive Multimodal Generative Models

Hongyang Wei, Shuaizheng Liu, Chun Yuan et al.

ICCV 2025arXiv:2503.11073
15
citations
#3215

Learned Image Compression with Dictionary-based Entropy Model

Jingbo Lu, Leheng Zhang, Xingyu Zhou et al.

CVPR 2025arXiv:2504.00496
15
citations
#3216

SleeperMark: Towards Robust Watermark against Fine-Tuning Text-to-image Diffusion Models

Zilan Wang, Junfeng Guo, Jiacheng Zhu et al.

CVPR 2025arXiv:2412.04852
15
citations
#3217

Towards Universal Soccer Video Understanding

Jiayuan Rao, Haoning Wu, Hao Jiang et al.

CVPR 2025arXiv:2412.01820
15
citations
#3218

ResQ: Mixed-Precision Quantization of Large Language Models with Low-Rank Residuals

Utkarsh Saxena, Sayeh Sharify, Kaushik Roy et al.

ICML 2025spotlightarXiv:2412.14363
15
citations
#3219

Guided Diffusion Sampling on Function Spaces with Applications to PDEs

Jiachen Yao, Abbas Mammadov, Julius Berner et al.

NEURIPS 2025arXiv:2505.17004
15
citations
#3220

MMRole: A Comprehensive Framework for Developing and Evaluating Multimodal Role-Playing Agents

Yanqi Dai, Huanran Hu, Lei Wang et al.

ICLR 2025arXiv:2408.04203
15
citations
#3221

SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction

Ling Yang, Zhaochen Yu, Tianjun Zhang et al.

ICLR 2025arXiv:2410.09008
15
citations
#3222

VLRMBench: A Comprehensive and Challenging Benchmark for Vision-Language Reward Models

JIACHENG RUAN, Wenzhen Yuan, Xian Gao et al.

ICCV 2025arXiv:2503.07478
15
citations
#3223

1000 Layer Networks for Self-Supervised RL: Scaling Depth Can Enable New Goal-Reaching Capabilities

Kevin Wang, Ishaan Javali, Michał Bortkiewicz et al.

NEURIPS 2025oralarXiv:2503.14858
15
citations
#3224

Dynamic Camera Poses and Where to Find Them

Chris Rockwell, Joseph Tung, Tsung-Yi Lin et al.

CVPR 2025arXiv:2504.17788
15
citations
#3225

FedBiP: Heterogeneous One-Shot Federated Learning with Personalized Latent Diffusion Models

Haokun Chen, Hang Li, Yao Zhang et al.

CVPR 2025arXiv:2410.04810
15
citations
#3226

The Foundations of Tokenization: Statistical and Computational Concerns

Juan Luis Gastaldi, John Terilla, Luca Malagutti et al.

ICLR 2025arXiv:2407.11606
15
citations
#3227

Position: Editing Large Language Models Poses Serious Safety Risks

Paul Youssef, Zhixue Zhao, Daniel Braun et al.

ICML 2025arXiv:2502.02958
15
citations
#3228

V2M4: 4D Mesh Animation Reconstruction from a Single Monocular Video

Jianqi Chen, Biao Zhang, Xiangjun Tang et al.

ICCV 2025arXiv:2503.09631
15
citations
#3229

AI-Slop to AI-Polish? Aligning Language Models through Edit-Based Writing Rewards and Test-time computation

Tuhin Chakrabarty, Philippe Laban, Chien-Sheng Wu

COLM 2025paperarXiv:2504.07532
15
citations
#3230

DON’T STOP ME NOW: EMBEDDING BASED SCHEDULING FOR LLMS

Rana Shahout, Eran Malach, Chunwei Liu et al.

ICLR 2025
15
citations
#3231

Mitigate the Gap: Improving Cross-Modal Alignment in CLIP

Sedigheh Eslami, Gerard de Melo

ICLR 2025
15
citations
#3232

Multi-Reward as Condition for Instruction-based Image Editing

Xin Gu, Ming Li, Libo Zhang et al.

ICLR 2025arXiv:2411.04713
15
citations
#3233

Efficient Track Anything

Yunyang Xiong, Chong Zhou, Xiaoyu Xiang et al.

ICCV 2025arXiv:2411.18933
15
citations
#3234

Think or Not? Exploring Thinking Efficiency in Large Reasoning Models via an Information-Theoretic Lens

Xixian Yong, Xiao Zhou, Yingying Zhang et al.

NEURIPS 2025spotlightarXiv:2505.18237
15
citations
#3235

Boundless Byte Pair Encoding: Breaking the Pre-tokenization Barrier

Craig W Schmidt, Varshini Reddy, Chris Tanner et al.

COLM 2025paperarXiv:2504.00178
15
citations
#3236

HarmAug: Effective Data Augmentation for Knowledge Distillation of Safety Guard Models

Seanie Lee, Haebin Seong, Dong Bok Lee et al.

ICLR 2025arXiv:2410.01524
15
citations
#3237

Thermalizer: Stable autoregressive neural emulation of spatiotemporal chaos

Chris Pedersen, Laure Zanna, Joan Bruna

ICML 2025oralarXiv:2503.18731
15
citations
#3238

Hybrid-Level Instruction Injection for Video Token Compression in Multi-modal Large Language Models

Zhihang Liu, Chen-Wei Xie, Pandeng Li et al.

CVPR 2025arXiv:2503.16036
15
citations
#3239

ILIAS: Instance-Level Image retrieval At Scale

Giorgos Kordopatis-Zilos, Vladan Stojnić, Anna Manko et al.

CVPR 2025arXiv:2502.11748
15
citations
#3240

Harnessing Vision Foundation Models for High-Performance, Training-Free Open Vocabulary Segmentation

Yuheng Shi, Minjing Dong, Chang Xu

ICCV 2025arXiv:2411.09219
15
citations
#3241

SOLVE: Synergy of Language-Vision and End-to-End Networks for Autonomous Driving

Xuesong Chen, Linjiang Huang, Tao Ma et al.

CVPR 2025arXiv:2505.16805
15
citations
#3242

4DGC: Rate-Aware 4D Gaussian Compression for Efficient Streamable Free-Viewpoint Video

Qiang Hu, Zihan Zheng, Houqiang Zhong et al.

CVPR 2025arXiv:2503.18421
15
citations
#3243

TS-RAG: Retrieval-Augmented Generation based Time Series Foundation Models are Stronger Zero-Shot Forecaster

Kanghui Ning, Zijie Pan, Yu Liu et al.

NEURIPS 2025arXiv:2503.07649
15
citations
#3244

Ultra-High Resolution Segmentation via Boundary-Enhanced Patch-Merging Transformer

Haopeng Sun, Yingwei Zhang, Lumin Xu et al.

AAAI 2025paperarXiv:2412.10181
15
citations
#3245

Adversarial Locomotion and Motion Imitation for Humanoid Policy Learning

Jiyuan Shi, Xinzhe Liu, Dewei Wang et al.

NEURIPS 2025arXiv:2504.14305
15
citations
#3246

Unlearning through Knowledge Overwriting: Reversible Federated Unlearning via Selective Sparse Adapter

Zhengyi Zhong, Weidong Bao, Ji Wang et al.

CVPR 2025arXiv:2502.20709
15
citations
#3247

RAGGED: Towards Informed Design of Scalable and Stable RAG Systems

Jennifer Hsia, Afreen Shaikh, Zhiruo Wang et al.

ICML 2025arXiv:2403.09040
15
citations
#3248

ECVC: Exploiting Non-Local Correlations in Multiple Frames for Contextual Video Compression

Wei Jiang, Junru Li, Kai Zhang et al.

CVPR 2025arXiv:2410.09706
15
citations
#3249

Omnia de EgoTempo: Benchmarking Temporal Understanding of Multi-Modal LLMs in Egocentric Videos

Chiara Plizzari, Alessio Tonioni, Yongqin Xian et al.

CVPR 2025arXiv:2503.13646
15
citations
#3250

Needle In A Video Haystack: A Scalable Synthetic Evaluator for Video MLLMs

Zijia Zhao, Haoyu Lu, Yuqi Huo et al.

ICLR 2025oralarXiv:2406.09367
15
citations
#3251

Apollo-MILP: An Alternating Prediction-Correction Neural Solving Framework for Mixed-Integer Linear Programming

Haoyang Liu, Jie Wang, Zijie Geng et al.

ICLR 2025arXiv:2503.01129
15
citations
#3252

Sylber: Syllabic Embedding Representation of Speech from Raw Audio

Cheol Jun Cho, Nicholas Lee, Akshat Gupta et al.

ICLR 2025arXiv:2410.07168
15
citations
#3253

Lost in Translation, Found in Context: Sign Language Translation with Contextual Cues

Youngjoon Jang, Haran Raajesh, Liliane Momeni et al.

CVPR 2025arXiv:2501.09754
15
citations
#3254

Implicit Reward as the Bridge: A Unified View of SFT and DPO Connections

Bo Wang, Qinyuan Cheng, Runyu Peng et al.

NEURIPS 2025arXiv:2507.00018
15
citations
#3255

AGENTIF: Benchmarking Large Language Models Instruction Following Ability in Agentic Scenarios

Yunjia Qi, Hao Peng, Xiaozhi Wang et al.

NEURIPS 2025spotlight
15
citations
#3256

Cats Confuse Reasoning LLM: Query Agnostic Adversarial Triggers for Reasoning Models

Meghana Arakkal Rajeev, Rajkumar Ramamurthy, Prapti Trivedi et al.

COLM 2025paperarXiv:2503.01781
15
citations
#3257

Detail Matters: Mamba-Inspired Joint Unfolding Network for Snapshot Spectral Compressive Imaging

Mengjie Qin, Yuchao Feng, Zongliang Wu et al.

AAAI 2025paperarXiv:2501.01262
15
citations
#3258

VideoHallu: Evaluating and Mitigating Multi-modal Hallucinations on Synthetic Video Understanding

Zongxia Li, Xiyang Wu, Guangyao Shi et al.

NEURIPS 2025arXiv:2505.01481
15
citations
#3259

A Unified Theory of Quantum Neural Network Loss Landscapes

Eric Anschuetz

ICLR 2025arXiv:2408.11901
15
citations
#3260

xFinder: Large Language Models as Automated Evaluators for Reliable Evaluation

Qingchen Yu, Zifan Zheng, Shichao Song et al.

ICLR 2025arXiv:2405.11874
15
citations
#3261

OWLS: Scaling Laws for Multilingual Speech Recognition and Translation Models

William Chen, Jinchuan Tian, Yifan Peng et al.

ICML 2025arXiv:2502.10373
15
citations
#3262

MangaNinja: Line Art Colorization with Precise Reference Following

Zhiheng Liu, Ka Leong Cheng, Xi Chen et al.

CVPR 2025highlightarXiv:2501.08332
15
citations
#3263

Dynamic-VLM: Simple Dynamic Visual Token Compression for VideoLLM

Han Wang, Yuxiang Nie, Yongjie Ye et al.

ICCV 2025arXiv:2412.09530
15
citations
#3264

SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Training

Tianjin Huang, Ziquan Zhu, Gaojie Jin et al.

ICLR 2025arXiv:2501.06842
15
citations
#3265

ALFA: Aligning LLMs to Ask Good Questions A Case Study in Clinical Reasoning

Shuyue Stella Li, Jimin Mun, Faeze Brahman et al.

COLM 2025paperarXiv:2502.14860
15
citations
#3266

Beyond the Destination: A Novel Benchmark for Exploration-Aware Embodied Question Answering

Kaixuan Jiang, Yang Liu, Weixing Chen et al.

ICCV 2025arXiv:2503.11117
15
citations
#3267

Towards a Theoretical Understanding of Synthetic Data in LLM Post-Training: A Reverse-Bottleneck Perspective

Zeyu Gan, Yong Liu

ICLR 2025arXiv:2410.01720
15
citations
#3268

ManiSkill-HAB: A Benchmark for Low-Level Manipulation in Home Rearrangement Tasks

Arth Shukla, Stone Tao, Hao Su

ICLR 2025arXiv:2412.13211
15
citations
#3269

From Flat to Hierarchical: Extracting Sparse Representations with Matching Pursuit

Valérie Costa, Thomas Fel, Ekdeep S Lubana et al.

NEURIPS 2025arXiv:2506.03093
15
citations
#3270

Finding Flawed Fictions: Evaluating Complex Reasoning in Language Models via Plot Hole Detection

Kabir Ahuja, Melanie Sclar, Yulia Tsvetkov

COLM 2025paperarXiv:2504.11900
15
citations
#3271

IncEventGS: Pose-Free Gaussian Splatting from a Single Event Camera

Jian Huang, Chengrui Dong, Xuanhua Chen et al.

CVPR 2025highlightarXiv:2410.08107
15
citations
#3272

Position: Don't Use the CLT in LLM Evals With Fewer Than a Few Hundred Datapoints

Sam Bowyer, Laurence Aitchison, Desi Ivanova

ICML 2025spotlightarXiv:2503.01747
15
citations
#3273

GOLD: Graph Out-of-Distribution Detection via Implicit Adversarial Latent Generation

Danny Wang, Ruihong Qiu, Guangdong Bai et al.

ICLR 2025arXiv:2502.05780
15
citations
#3274

BindGPT: A Scalable Framework for 3D Molecular Design via Language Modeling and Reinforcement Learning

Artem Zholus, Maksim Kuznetsov, Roman Schutski et al.

AAAI 2025paperarXiv:2406.03686
15
citations
#3275

MoE-LPR: Multilingual Extension of Large Language Models Through Mixture-of-Experts with Language Priors Routing

Hao Zhou, Zhijun Wang, Shujian Huang et al.

AAAI 2025paperarXiv:2408.11396
15
citations
#3276

MobileUse: A Hierarchical Reflection-Driven GUI Agent for Autonomous Mobile Operation

Ning Li, Xiangmou Qu, Jiamu Zhou et al.

NEURIPS 2025oral
15
citations
#3277

ActSafe: Active Exploration with Safety Constraints for Reinforcement Learning

Yarden As, Bhavya, Lenart Treven et al.

ICLR 2025arXiv:2410.09486
15
citations
#3278

TPO: Aligning Large Language Models with Multi-branch & Multi-step Preference Trees

Weibin Liao, Xu Chu, Yasha Wang

ICLR 2025arXiv:2410.12854
15
citations
#3279

Ensembling Diffusion Models via Adaptive Feature Aggregation

Cong Wang, kuan tian, Yonghang Guan et al.

ICLR 2025arXiv:2405.17082
15
citations
#3280

SpectroMotion: Dynamic 3D Reconstruction of Specular Scenes

Cheng-De Fan, Chen-Wei Chang, Yi-Ruei Liu et al.

CVPR 2025arXiv:2410.17249
15
citations
#3281

F-Fidelity: A Robust Framework for Faithfulness Evaluation of Explainable AI

Xu Zheng, Farhad Shirani, Zhuomin Chen et al.

ICLR 2025arXiv:2410.02970
15
citations
#3282

SqueezeAttention: 2D Management of KV-Cache in LLM Inference via Layer-wise Optimal Budget

Zihao Wang, Bin CUI, Shaoduo Gan

ICLR 2025arXiv:2404.04793
15
citations
#3283

RelationAdapter: Learning and Transferring Visual Relation with Diffusion Transformers

Yan Gong, Yiren Song, Yicheng Li et al.

NEURIPS 2025arXiv:2506.02528
15
citations
#3284

Improving Factuality in Large Language Models via Decoding-Time Hallucinatory and Truthful Comparators

Dingkang Yang, Dongling Xiao, Jinjie Wei et al.

AAAI 2025paperarXiv:2408.12325
15
citations
#3285

Synthetic Data is an Elegant GIFT for Continual Vision-Language Models

Bin Wu, Wuxuan Shi, Jinqiao Wang et al.

CVPR 2025arXiv:2503.04229
15
citations
#3286

STEP: Enhancing Video-LLMs’ Compositional Reasoning by Spatio-Temporal Graph-guided Self-Training

Haiyi Qiu, Minghe Gao, Long Qian et al.

CVPR 2025arXiv:2412.00161
15
citations
#3287

Diffusion-NPO: Negative Preference Optimization for Better Preference Aligned Generation of Diffusion Models

Fu-Yun Wang, Yunhao Shui, Jingtan Piao et al.

ICLR 2025arXiv:2505.11245
15
citations
#3288

GenEx: Generating an Explorable World

TaiMing Lu, Tianmin Shu, Alan Yuille et al.

ICLR 2025arXiv:2412.09624
15
citations
#3289

Neural-Symbolic Collaborative Distillation: Advancing Small Language Models for Complex Reasoning Tasks

Huanxuan Liao, Shizhu He, Yao Xu et al.

AAAI 2025paperarXiv:2409.13203
15
citations
#3290

MDNS: Masked Diffusion Neural Sampler via Stochastic Optimal Control

Yuchen Zhu, Wei Guo, Jaemoo Choi et al.

NEURIPS 2025arXiv:2508.10684
15
citations
#3291

Probing Visual Language Priors in VLMs

Tiange Luo, Ang Cao, Gunhee Lee et al.

ICML 2025arXiv:2501.00569
15
citations
#3292

JamMa: Ultra-lightweight Local Feature Matching with Joint Mamba

Xiaoyong Lu, Songlin Du

CVPR 2025arXiv:2503.03437
15
citations
#3293

PILAF: Optimal Human Preference Sampling for Reward Modeling

Yunzhen Feng, Ariel Kwiatkowski, Kunhao Zheng et al.

ICML 2025arXiv:2502.04270
15
citations
#3294

CADCrafter: Generating Computer-Aided Design Models from Unconstrained Images

Chen Cheng, Jiacheng Wei, Tianrun Chen et al.

CVPR 2025arXiv:2504.04753
15
citations
#3295

Pseudo-Labeling for Kernel Ridge Regression under Covariate Shift

Kaizheng Wang

NEURIPS 2025arXiv:2302.10160
15
citations
#3296

GotenNet: Rethinking Efficient 3D Equivariant Graph Neural Networks

Sarp Aykent, Tian Xia

ICLR 2025
15
citations
#3297

Truncated Consistency Models

Sangyun Lee, Yilun Xu, Tomas Geffner et al.

ICLR 2025arXiv:2410.14895
15
citations
#3298

QuEST: Stable Training of LLMs with 1-Bit Weights and Activations

Andrei Panferov, Jiale Chen, Rush Tabesh et al.

ICML 2025arXiv:2502.05003
14
citations
#3299

Weak-to-Strong Generalization Through the Data-Centric Lens

Changho Shin, John Cooper, Frederic Sala

ICLR 2025arXiv:2412.03881
14
citations
#3300

Layout-your-3D: Controllable and Precise 3D Generation with 2D Blueprint

Junwei Zhou, Xueting Li, Lu Qi et al.

ICLR 2025arXiv:2410.15391
14
citations
#3301

Editable Concept Bottleneck Models

Lijie Hu, Chenyang Ren, Zhengyu Hu et al.

ICML 2025arXiv:2405.15476
14
citations
#3302

Multi-Granular Spatio-Temporal Token Merging for Training-Free Acceleration of Video LLMs

Jeongseok Hyun, Sukjun Hwang, Su Ho Han et al.

ICCV 2025arXiv:2507.07990
14
citations
#3303

Through-The-Mask: Mask-based Motion Trajectories for Image-to-Video Generation

Guy Yariv, Yuval Kirstain, Amit Zohar et al.

CVPR 2025arXiv:2501.03059
14
citations
#3304

BingoGuard: LLM Content Moderation Tools with Risk Levels

Fan Yin, Philippe Laban, XIANGYU PENG et al.

ICLR 2025arXiv:2503.06550
14
citations
#3305

Benchmarking Vision Language Model Unlearning via Fictitious Facial Identity Dataset

Yingzi Ma, Jiongxiao Wang, Fei Wang et al.

ICLR 2025arXiv:2411.03554
14
citations
#3306

DiffusionGuard: A Robust Defense Against Malicious Diffusion-based Image Editing

William June Suk Choi, Kyungmin Lee, Jongheon Jeong et al.

ICLR 2025arXiv:2410.05694
14
citations
#3307

Scenario Dreamer: Vectorized Latent Diffusion for Generating Driving Simulation Environments

Luke Rowe, Roger Girgis, Anthony Gosselin et al.

CVPR 2025arXiv:2503.22496
14
citations
#3308

Beyond Canonicalization: How Tensorial Messages Improve Equivariant Message Passing

Peter Lippmann, Gerrit Gerhartz, Roman Remme et al.

ICLR 2025arXiv:2405.15389
14
citations
#3309

Multi-Turn Jailbreaking Large Language Models via Attention Shifting

Xiaohu Du, Fan Mo, Ming Wen et al.

AAAI 2025paper
14
citations
#3310

Sample then Identify: A General Framework for Risk Control and Assessment in Multimodal Large Language Models

Qingni Wang, Tiantian Geng, Zhiyuan Wang et al.

ICLR 2025arXiv:2410.08174
14
citations
#3311

Hyperspherical Normalization for Scalable Deep Reinforcement Learning

Hojoon Lee, Youngdo Lee, Takuma Seno et al.

ICML 2025spotlightarXiv:2502.15280
14
citations
#3312

Medical MLLM Is Vulnerable: Cross-Modality Jailbreak and Mismatched Attacks on Medical Multimodal Large Language Models

Xijie Huang, Xinyuan Wang, Hantao Zhang et al.

AAAI 2025paperarXiv:2405.20775
14
citations
#3313

MemSim: A Bayesian Simulator for Evaluating Memory of LLM-based Personal Assistants

Zeyu Zhang, Quanyu Dai, Luyu Chen et al.

NEURIPS 2025arXiv:2409.20163
14
citations
#3314

MDP3: A Training-free Approach for List-wise Frame Selection in Video-LLMs

Hui Sun, Shiyin Lu, Huanyu Wang et al.

ICCV 2025arXiv:2501.02885
14
citations
#3315

CyberPal.AI: Empowering LLMs with Expert-Driven Cybersecurity Instructions

Matan Levi, Yair Allouche, Daniel Ohayon et al.

AAAI 2025paperarXiv:2408.09304
14
citations
#3316

SeePhys: Does Seeing Help Thinking? – Benchmarking Vision-Based Physics Reasoning

Kun Xiang, Heng Li, Terry Jingchen Zhang et al.

NEURIPS 2025arXiv:2505.19099
14
citations
#3317

Backdoor Cleaning without External Guidance in MLLM Fine-tuning

Xuankun Rong, Wenke Huang, Jian Liang et al.

NEURIPS 2025arXiv:2505.16916
14
citations
#3318

LlavaGuard: An Open VLM-based Framework for Safeguarding Vision Datasets and Models

Lukas Helff, Felix Friedrich, Manuel Brack et al.

ICML 2025arXiv:2406.05113
14
citations
#3319

Optimization with Access to Auxiliary Information

EL MAHDI CHAYTI, Sai Karimireddy

ICLR 2025arXiv:2206.00395
14
citations
#3320

Pitfalls of Evidence-Based AI Policy

Stephen Casper, David Krueger, Dylan Hadfield-Menell

ICLR 2025arXiv:2502.09618
14
citations
#3321

ICT: Image-Object Cross-Level Trusted Intervention for Mitigating Object Hallucination in Large Vision-Language Models

Junzhe Chen, Tianshu Zhang, Shiyu Huang et al.

CVPR 2025arXiv:2411.15268
14
citations
#3322

AdaptiveStep: Automatically Dividing Reasoning Step through Model Confidence

Yuliang Liu, Junjie Lu, Chaofeng Qu et al.

ICML 2025arXiv:2502.13943
14
citations
#3323

CEB: Compositional Evaluation Benchmark for Fairness in Large Language Models

Song Wang, Peng Wang, Tong Zhou et al.

ICLR 2025arXiv:2407.02408
14
citations
#3324

STEREO: A Two-Stage Framework for Adversarially Robust Concept Erasing from Text-to-Image Diffusion Models

Koushik Srivatsan, Fahad Shamshad, Muzammal Naseer et al.

CVPR 2025highlightarXiv:2408.16807
14
citations
#3325

Dyn-HaMR: Recovering 4D Interacting Hand Motion from a Dynamic Camera

Zhengdi Yu, Stefanos Zafeiriou, Tolga Birdal

CVPR 2025highlightarXiv:2412.12861
14
citations
#3326

Scaling Laws for Optimal Data Mixtures

Mustafa Shukor, Louis Bethune, Dan Busbridge et al.

NEURIPS 2025arXiv:2507.09404
14
citations
#3327

Variational Rectified Flow Matching

Pengsheng Guo, Alex Schwing

ICML 2025arXiv:2502.09616
14
citations
#3328

Communication-Efficient Language Model Training Scales Reliably and Robustly: Scaling Laws for DiLoCo

Zachary Charles, Gabriel Teston, Lucio Dery et al.

NEURIPS 2025spotlightarXiv:2503.09799
14
citations
#3329

Robust Self-Paced Hashing for Cross-Modal Retrieval with Noisy Labels

Ruitao Pu, Yuan Sun, Yang Qin et al.

AAAI 2025paperarXiv:2501.01699
14
citations
#3330

BiomedCoOp: Learning to Prompt for Biomedical Vision-Language Models

Taha Koleilat, Hojat Asgariandehkordi, Hassan Rivaz et al.

CVPR 2025arXiv:2411.15232
14
citations
#3331

Contextual Bandits for Unbounded Context Distributions

Puning Zhao, Rongfei Fan, Shaowei Wang et al.

ICML 2025arXiv:2408.09655
14
citations
#3332

SITCOM: Step-wise Triple-Consistent Diffusion Sampling For Inverse Problems

Ismail Alkhouri, Shijun Liang, Cheng-Han Huang et al.

ICML 2025arXiv:2410.04479
14
citations
#3333

Mixture of Parrots: Experts improve memorization more than reasoning

Samy Jelassi, Clara Mohri, David Brandfonbrener et al.

ICLR 2025arXiv:2410.19034
14
citations
#3334

Weighted-Reward Preference Optimization for Implicit Model Fusion

Ziyi Yang, Fanqi Wan, Longguang Zhong et al.

ICLR 2025arXiv:2412.03187
14
citations
#3335

UnrealZoo: Enriching Photo-realistic Virtual Worlds for Embodied AI

Fangwei Zhong, Kui Wu, Churan Wang et al.

ICCV 2025highlightarXiv:2412.20977
14
citations
#3336

Sum of Squares Circuits

Lorenzo Loconte, Stefan Mengel, Antonio Vergari

AAAI 2025paperarXiv:2408.11778
14
citations
#3337

PNVC: Towards Practical INR-based Video Compression

Ge Gao, Ho Man Kwan, Fan Zhang et al.

AAAI 2025paperarXiv:2409.00953
14
citations
#3338

Mixture of Attentions For Speculative Decoding

Matthieu Zimmer, Milan Gritta, Gerasimos Lampouras et al.

ICLR 2025arXiv:2410.03804
14
citations
#3339

Multi-task Visual Grounding with Coarse-to-Fine Consistency Constraints

Ming Dai, Jian Li, Jiedong Zhuang et al.

AAAI 2025paperarXiv:2501.06710
14
citations
#3340

Fine-grained Abnormality Prompt Learning for Zero-shot Anomaly Detection

Jiawen Zhu, YEW-SOON ONG, Chunhua Shen et al.

ICCV 2025arXiv:2410.10289
14
citations
#3341

REArtGS: Reconstructing and Generating Articulated Objects via 3D Gaussian Splatting with Geometric and Motion Constraints

Di Wu, Liu Liu, Zhou Linli et al.

NEURIPS 2025arXiv:2503.06677
14
citations
#3342

Toward Adaptive Large Language Models Structured Pruning via Hybrid-grained Weight Importance Assessment

Jun Liu, Zhenglun Kong, Pu Zhao et al.

AAAI 2025paperarXiv:2403.10799
14
citations
#3343

Omni-R1: Reinforcement Learning for Omnimodal Reasoning via Two-System Collaboration

Hao Zhong, Muzhi Zhu, Zongze Du et al.

NEURIPS 2025oralarXiv:2505.20256
14
citations
#3344

AerialMegaDepth: Learning Aerial-Ground Reconstruction and View Synthesis

Khiem Vuong, Anurag Ghosh, Deva Ramanan et al.

CVPR 2025arXiv:2504.13157
14
citations
#3345

EasyHOI: Unleashing the Power of Large Models for Reconstructing Hand-Object Interactions in the Wild

Yumeng Liu, Xiaoxiao Long, Zemin Yang et al.

CVPR 2025arXiv:2411.14280
14
citations
#3346

DPFlow: Adaptive Optical Flow Estimation with a Dual-Pyramid Framework

Henrique Morimitsu, Xiaobin Zhu, Roberto M. Cesar Jr et al.

CVPR 2025arXiv:2503.14880
14
citations
#3347

Pippo: High-Resolution Multi-View Humans from a Single Image

Yash Kant, Ethan Weber, Jin Kyu Kim et al.

CVPR 2025highlightarXiv:2502.07785
14
citations
#3348

A Hitchhiker's Guide to Scaling Law Estimation

Leshem Choshen, Yang Zhang, Jacob Andreas

ICML 2025arXiv:2410.11840
14
citations
#3349

Battling the Non-stationarity in Time Series Forecasting via Test-time Adaptation

HyunGi Kim, Siwon Kim, Jisoo Mok et al.

AAAI 2025paperarXiv:2501.04970
14
citations
#3350

Discrete Diffusion Schrödinger Bridge Matching for Graph Transformation

Jun Hyeong Kim, Seonghwan Kim, Seokhyun Moon et al.

ICLR 2025arXiv:2410.01500
14
citations
#3351

MovieBench: A Hierarchical Movie Level Dataset for Long Video Generation

Weijia Wu, Mingyu Liu, Zeyu Zhu et al.

CVPR 2025arXiv:2411.15262
14
citations
#3352

Improving Equivariant Networks with Probabilistic Symmetry Breaking

Hannah Lawrence, Vasco Portilheiro, Yan Zhang et al.

ICLR 2025arXiv:2503.21985
14
citations
#3353

CR-CTC: Consistency regularization on CTC for improved speech recognition

Zengwei Yao, Wei Kang, Xiaoyu Yang et al.

ICLR 2025oralarXiv:2410.05101
14
citations
#3354

Context Steering: Controllable Personalization at Inference Time

Zhiyang He, Sashrika Pandey, Mariah Schrum et al.

ICLR 2025arXiv:2405.01768
14
citations
#3355

SimpleTM: A Simple Baseline for Multivariate Time Series Forecasting

Hui Chen, Viet Luong, Lopamudra Mukherjee et al.

ICLR 2025oral
14
citations
#3356

MASH-VLM: Mitigating Action-Scene Hallucination in Video-LLMs through Disentangled Spatial-Temporal Representations

Kyungho Bae, Jinhyung Kim, Sihaeng Lee et al.

CVPR 2025highlightarXiv:2503.15871
14
citations
#3357

Trivialized Momentum Facilitates Diffusion Generative Modeling on Lie Groups

Yuchen Zhu, Tianrong Chen, Lingkai Kong et al.

ICLR 2025arXiv:2405.16381
14
citations
#3358

Large Language Model Meets Graph Neural Network in Knowledge Distillation

Shengxiang Hu, Guobing Zou, Song Yang et al.

AAAI 2025paperarXiv:2402.05894
14
citations
#3359

Pareto Set Learning for Multi-Objective Reinforcement Learning

Erlong Liu, Yu-Chang Wu, Xiaobin Huang et al.

AAAI 2025paperarXiv:2501.06773
14
citations
#3360

The Power of LLM-Generated Synthetic Data for Stance Detection in Online Political Discussions

Stefan Sylvius Wagner, Maike Behrendt, Marc Ziegele et al.

ICLR 2025arXiv:2406.12480
14
citations
#3361

How to Synthesize Text Data without Model Collapse?

Xuekai Zhu, Daixuan Cheng, Hengli Li et al.

ICML 2025arXiv:2412.14689
14
citations
#3362

ExPLoRA: Parameter-Efficient Extended Pre-Training to Adapt Vision Transformers under Domain Shifts

Samar Khanna, Medhanie Irgau, David Lobell et al.

ICML 2025arXiv:2406.10973
14
citations
#3363

Deep Distributed Optimization for Large-Scale Quadratic Programming

Augustinos Saravanos, Hunter Kuperman, Alex Oshin et al.

ICLR 2025arXiv:2412.12156
14
citations
#3364

Spike2Former: Efficient Spiking Transformer for High-performance Image Segmentation

Zhenxin Lei, Man Yao, Jiakui Hu et al.

AAAI 2025paperarXiv:2412.14587
14
citations
#3365

Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning

Yana Wei, Liang Zhao, Jianjian Sun et al.

NEURIPS 2025arXiv:2507.05255
14
citations
#3366

Unsupervised Visual Chain-of-Thought Reasoning via Preference Optimization

Kesen Zhao, Beier Zhu, Qianru Sun et al.

ICCV 2025arXiv:2504.18397
14
citations
#3367

MapExpert: Online HD Map Construction with Simple and Efficient Sparse Map Element Expert

Dapeng Zhang, Dayu Chen, Peng Zhi et al.

AAAI 2025paperarXiv:2412.12704
14
citations
#3368

Benchmarking LLMs' Judgments with No Gold Standard

Shengwei Xu, Yuxuan Lu, Grant Schoenebeck et al.

ICLR 2025arXiv:2411.07127
14
citations
#3369

NEST: A Neuromodulated Small-world Hypergraph Trajectory Prediction Model for Autonomous Driving

Chengyue Wang, Haicheng Liao, Bonan Wang et al.

AAAI 2025paperarXiv:2412.11682
14
citations
#3370

DPCore: Dynamic Prompt Coreset for Continual Test-Time Adaptation

Yunbei Zhang, Akshay Mehra, Shuaicheng Niu et al.

ICML 2025arXiv:2406.10737
14
citations
#3371

When does compositional structure yield compositional generalization? A kernel theory.

Samuel Lippl, Kimberly Stachenfeld

ICLR 2025arXiv:2405.16391
14
citations
#3372

Reward-Guided Iterative Refinement in Diffusion Models at Test-Time with Applications to Protein and DNA Design

Masatoshi Uehara, su, Yulai Zhao et al.

ICML 2025arXiv:2502.14944
14
citations
#3373

Bridging Modalities: Improving Universal Multimodal Retrieval by Multimodal Large Language Models

Xin Zhang, Yanzhao Zhang, Wen Xie et al.

CVPR 2025
14
citations
#3374

Efficient Traffic Prediction Through Spatio-Temporal Distillation

Qianru Zhang, Xinyi Gao, Haixin Wang et al.

AAAI 2025paperarXiv:2501.10459
14
citations
#3375

Adaptive Classifier-Free Guidance via Dynamic Low-Confidence Masking

Pengxiang Li, Shilin Yan, Jiayin Cai et al.

NEURIPS 2025arXiv:2505.20199
14
citations
#3376

NitroFusion: High-Fidelity Single-Step Diffusion through Dynamic Adversarial Training

Dar-Yen Chen, Hmrishav Bandyopadhyay, Kai Zou et al.

CVPR 2025arXiv:2412.02030
14
citations
#3377

Overcoming Lower-Level Constraints in Bilevel Optimization: A Novel Approach with Regularized Gap Functions

Wei Yao, Haian Yin, Shangzhi Zeng et al.

ICLR 2025arXiv:2406.01992
14
citations
#3378

MiniMax-Remover: Taming Bad Noise Helps Video Object Removal

Bojia Zi, Weixuan Peng, Xianbiao Qi et al.

NEURIPS 2025arXiv:2505.24873
14
citations
#3379

SALAD: Skeleton-aware Latent Diffusion for Text-driven Motion Generation and Editing

Seokhyeon Hong, Chaelin Kim, Serin Yoon et al.

CVPR 2025arXiv:2503.13836
14
citations
#3380

Revisiting Prefix-tuning: Statistical Benefits of Reparameterization among Prompts

Minh Le, Chau Nguyen, Huy Nguyen et al.

ICLR 2025arXiv:2410.02200
14
citations
#3381

LoLCATs: On Low-Rank Linearizing of Large Language Models

Michael Zhang, Simran Arora, Rahul Chalamala et al.

ICLR 2025arXiv:2410.10254
14
citations
#3382

Epistemic EFX Allocations Exist for Monotone Valuations

Hannaneh Akrami, Nidhi Rathi

AAAI 2025paperarXiv:2405.14463
14
citations
#3383

One Diffusion Step to Real-World Super-Resolution via Flow Trajectory Distillation

Jianze Li, Jiezhang Cao, Yong Guo et al.

ICML 2025arXiv:2502.01993
14
citations
#3384

MAGiC-SLAM: Multi-Agent Gaussian Globally Consistent SLAM

Vladimir Yugay, Theo Gevers, Martin R. Oswald

CVPR 2025arXiv:2411.16785
14
citations
#3385

KITS: Inductive Spatio-Temporal Kriging with Increment Training Strategy

Qianxiong Xu, Cheng Long, Ziyue Li et al.

AAAI 2025paperarXiv:2311.02565
14
citations
#3386

Enhancing Time Series Forecasting through Selective Representation Spaces: A Patch Perspective

Xingjian Wu, Xiangfei Qiu, Hanyin Cheng et al.

NEURIPS 2025arXiv:2510.14510
14
citations
#3387

Rethinking End-to-End 2D to 3D Scene Segmentation in Gaussian Splatting

Runsong Zhu, Shi Qiu, ZHENGZHE LIU et al.

CVPR 2025arXiv:2503.14029
14
citations
#3388

OLinear: A Linear Model for Time Series Forecasting in Orthogonally Transformed Domain

Wenzhen Yue, Yong Liu, Hao Wang et al.

NEURIPS 2025oralarXiv:2505.08550
14
citations
#3389

SECOND: Mitigating Perceptual Hallucination in Vision-Language Models via Selective and Contrastive Decoding

Woohyeon Park, Woojin Kim, Jaeik Kim et al.

ICML 2025arXiv:2506.08391
14
citations
#3390

MVTokenFlow: High-quality 4D Content Generation using Multiview Token Flow

Hanzhuo Huang, Yuan Liu, Ge Zheng et al.

ICLR 2025oralarXiv:2502.11697
14
citations
#3391

Adversarial Distribution Matching for Diffusion Distillation Towards Efficient Image and Video Synthesis

Yanzuo Lu, Yuxi Ren, Xin Xia et al.

ICCV 2025highlightarXiv:2507.18569
14
citations
#3392

SAIST: Segment Any Infrared Small Target Model Guided by Contrastive Language-Image Pretraining

Mingjin Zhang, Xiaolong Li, Fei Gao et al.

CVPR 2025
14
citations
#3393

ColorBench: Can VLMs See and Understand the Colorful World? A Comprehensive Benchmark for Color Perception, Reasoning, and Robustness

Yijun Liang, Ming Li, Chenrui Fan et al.

NEURIPS 2025arXiv:2504.10514
14
citations
#3394

Referring to Any Person

Qing Jiang, Lin Wu, Zhaoyang Zeng et al.

ICCV 2025arXiv:2503.08507
14
citations
#3395

Mix Data or Merge Models? Balancing the Helpfulness, Honesty, and Harmlessness of Large Language Model via Model Merging

Jinluan Yang, Dingnan Jin, Anke Tang et al.

NEURIPS 2025arXiv:2502.06876
14
citations
#3396

Fine-Grained Evaluation of Large Vision-Language Models in Autonomous Driving

Yue Li, Meng Tian, Zhenyu Lin et al.

ICCV 2025arXiv:2503.21505
14
citations
#3397

Noise Hypernetworks: Amortizing Test-Time Compute in Diffusion Models

Luca Eyring, Shyamgopal Karthik, Alexey Dosovitskiy et al.

NEURIPS 2025arXiv:2508.09968
14
citations
#3398

Convergence of Score-Based Discrete Diffusion Models: A Discrete-Time Analysis

Zikun Zhang, Zixiang Chen, Quanquan Gu

ICLR 2025arXiv:2410.02321
14
citations
#3399

Improved Regret Bounds for Linear Adversarial MDPs via Linear Optimization

XiangCheng Zhang, Fang Kong, Baoxiang Wang et al.

ICLR 2025arXiv:2302.06834
14
citations
#3400

Cluster-guided Contrastive Class-imbalanced Graph Classification

Wei Ju, Zhengyang Mao, Siyu Yi et al.

AAAI 2025paperarXiv:2412.12984
14
citations