Most Cited 2025 "scene understanding tasks" Papers

22,274 papers found • Page 6 of 112

#1001

Vevo: Controllable Zero-Shot Voice Imitation with Self-Supervised Disentanglement

Xueyao Zhang, Xiaohui Zhang, Kainan Peng et al.

ICLR 2025arXiv:2502.07243
41
citations
#1002

FastVLM: Efficient Vision Encoding for Vision Language Models

Pavan Kumar Anasosalu Vasu, Fartash Faghri, Chun-Liang Li et al.

CVPR 2025arXiv:2412.13303
41
citations
#1003

Do Vision-Language Models Represent Space and How? Evaluating Spatial Frame of Reference under Ambiguities

Zheyuan Zhang, Fengyuan Hu, Jayjun Lee et al.

ICLR 2025arXiv:2410.17385
41
citations
#1004

3DTrajMaster: Mastering 3D Trajectory for Multi-Entity Motion in Video Generation

Xiao Fu, Xian Liu, Xintao WANG et al.

ICLR 2025arXiv:2412.07759
41
citations
#1005

Towards Long-Horizon Vision-Language Navigation: Platform, Benchmark and Method

Xinshuai Song, weixing chen, Yang Liu et al.

CVPR 2025arXiv:2412.09082
41
citations
#1006

Scaling Speech-Text Pre-training with Synthetic Interleaved Data

Aohan Zeng, Zhengxiao Du, Mingdao Liu et al.

ICLR 2025arXiv:2411.17607
41
citations
#1007

PaPaGei: Open Foundation Models for Optical Physiological Signals

Arvind Pillai, Dimitris Spathis, Fahim Kawsar et al.

ICLR 2025arXiv:2410.20542
41
citations
#1008

Proactive Agent: Shifting LLM Agents from Reactive Responses to Active Assistance

Yaxi Lu, Shenzhi Yang, Cheng Qian et al.

ICLR 2025arXiv:2410.12361
41
citations
#1009

Diffusion Self-Distillation for Zero-Shot Customized Image Generation

Shengqu Cai, Eric Ryan Chan, Yunzhi Zhang et al.

CVPR 2025arXiv:2411.18616
41
citations
#1010

Sparse Autoencoders Do Not Find Canonical Units of Analysis

Patrick Leask, Bart Bussmann, Michael Pearce et al.

ICLR 2025arXiv:2502.04878
41
citations
#1011

UniGoal: Towards Universal Zero-shot Goal-oriented Navigation

Hang Yin, Xiuwei Xu, Linqing Zhao et al.

CVPR 2025arXiv:2503.10630
41
citations
#1012

Generalizing Deepfake Video Detection with Plug-and-Play: Video-Level Blending and Spatiotemporal Adapter Tuning

Zhiyuan Yan, Yandan Zhao, Shen Chen et al.

CVPR 2025arXiv:2408.17065
41
citations
#1013

Towards LLM Unlearning Resilient to Relearning Attacks: A Sharpness-Aware Minimization Perspective and Beyond

Chongyu Fan, jinghan jia, Yihua Zhang et al.

ICML 2025arXiv:2502.05374
41
citations
#1014

SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration

Heming Xia, Yongqi Li, Jun Zhang et al.

ICLR 2025arXiv:2410.06916
41
citations
#1015

SUTrack: Towards Simple and Unified Single Object Tracking

Xin Chen, Ben Kang, Wanting Geng et al.

AAAI 2025paperarXiv:2412.19138
41
citations
#1016

PartField: Learning 3D Feature Fields for Part Segmentation and Beyond

Minghua Liu, Mikaela Uy, Donglai Xiang et al.

ICCV 2025arXiv:2504.11451
41
citations
#1017

Dita: Scaling Diffusion Transformer for Generalist Vision-Language-Action Policy

Zhi Hou, Tianyi Zhang, Yuwen Xiong et al.

ICCV 2025arXiv:2503.19757
40
citations
#1018

HD-EPIC: A Highly-Detailed Egocentric Video Dataset

Toby Perrett, Ahmad Darkhalil, Saptarshi Sinha et al.

CVPR 2025arXiv:2502.04144
40
citations
#1019

SANA: Efficient High-Resolution Text-to-Image Synthesis with Linear Diffusion Transformers

Enze Xie, Junsong Chen, Junyu Chen et al.

ICLR 2025
40
citations
#1020

Combining Induction and Transduction for Abstract Reasoning

Wen-Ding Li, Keya Hu, Carter Larsen et al.

ICLR 2025arXiv:2411.02272
40
citations
#1021

EG4D: Explicit Generation of 4D Object without Score Distillation

Qi Sun, Zhiyang Guo, Ziyu Wan et al.

ICLR 2025oralarXiv:2405.18132
40
citations
#1022

Agentic RL Scaling Law: Spontaneous Code Execution for Mathematical Problem Solving

Xinji Mai, Haotian Xu, Xing W et al.

NEURIPS 2025
40
citations
#1023

Trajectory attention for fine-grained video motion control

Zeqi Xiao, Wenqi Ouyang, Yifan Zhou et al.

ICLR 2025oralarXiv:2411.19324
40
citations
#1024

Sparse VideoGen2: Accelerate Video Generation with Sparse Attention via Semantic-Aware Permutation

Shuo Yang, Haocheng Xi, Yilong Zhao et al.

NEURIPS 2025spotlightarXiv:2505.18875
40
citations
#1025

Feynman-Kac Correctors in Diffusion: Annealing, Guidance, and Product of Experts

Marta Skreta, Tara Akhound-Sadegh, Viktor Ohanesian et al.

ICML 2025spotlightarXiv:2503.02819
40
citations
#1026

MoH: Multi-Head Attention as Mixture-of-Head Attention

Peng Jin, Bo Zhu, Li Yuan et al.

ICML 2025arXiv:2410.11842
40
citations
#1027

On Evaluating the Durability of Safeguards for Open-Weight LLMs

Xiangyu Qi, Boyi Wei, Nicholas Carlini et al.

ICLR 2025arXiv:2412.07097
40
citations
#1028

Animate Anyone 2: High-Fidelity Character Image Animation with Environment Affordance

Li Hu, wang yuan, Zhen Shen et al.

ICCV 2025arXiv:2502.06145
40
citations
#1029

Position: Graph Learning Will Lose Relevance Due To Poor Benchmarks

Maya Bechler-Speicher, Ben Finkelshtein, Fabrizio Frasca et al.

ICML 2025arXiv:2502.14546
40
citations
#1030

CVE-Bench: A Benchmark for AI Agents’ Ability to Exploit Real-World Web Application Vulnerabilities

Yuxuan Zhu, Antony Kellermann, Dylan Bowman et al.

ICML 2025spotlightarXiv:2503.17332
40
citations
#1031

Rethinking Diffusion for Text-Driven Human Motion Generation: Redundant Representations, Evaluation, and Masked Autoregression

Zichong Meng, Yiming Xie, Xiaogang Peng et al.

CVPR 2025arXiv:2411.16575
40
citations
#1032

Flash Diffusion: Accelerating Any Conditional Diffusion Model for Few Steps Image Generation

Clément Chadebec, Onur Tasar, Eyal Benaroche et al.

AAAI 2025paperarXiv:2406.02347
40
citations
#1033

A-Bench: Are LMMs Masters at Evaluating AI-generated Images?

Zicheng Zhang, Haoning Wu, Chunyi Li et al.

ICLR 2025arXiv:2406.03070
40
citations
#1034

From Flatland to Space: Teaching Vision-Language Models to Perceive and Reason in 3D

Jiahui Zhang, Yurui Chen, Yueming Xu et al.

NEURIPS 2025arXiv:2503.22976
40
citations
#1035

The Diffusion Duality

Subham Sekhar Sahoo, Justin Deschenaux, Aaron Gokaslan et al.

ICML 2025arXiv:2506.10892
40
citations
#1036

Multi-subject Open-set Personalization in Video Generation

Tsai-Shien Chen, Aliaksandr Siarohin, Willi Menapace et al.

CVPR 2025arXiv:2501.06187
40
citations
#1037

Diversity Empowers Intelligence: Integrating Expertise of Software Engineering Agents

Kexun Zhang, Weiran Yao, Zuxin Liu et al.

ICLR 2025arXiv:2408.07060
40
citations
#1038

Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search

Maohao Shen, Guangtao Zeng, Zhenting Qi et al.

ICML 2025arXiv:2502.02508
40
citations
#1039

Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective

Jorge (Zhoujun) Cheng, Shibo Hao, Tianyang Liu et al.

NEURIPS 2025arXiv:2506.14965
40
citations
#1040

DEFOM-Stereo: Depth Foundation Model Based Stereo Matching

Hualie Jiang, Zhiqiang Lou, Laiyan Ding et al.

CVPR 2025arXiv:2501.09466
40
citations
#1041

DartControl: A Diffusion-Based Autoregressive Motion Model for Real-Time Text-Driven Motion Control

Kaifeng Zhao, Gen Li, Siyu Tang

ICLR 2025arXiv:2410.05260
40
citations
#1042

PUP 3D-GS: Principled Uncertainty Pruning for 3D Gaussian Splatting

Alex Hanson, Allen Tu, Vasu Singla et al.

CVPR 2025arXiv:2406.10219
40
citations
#1043

Unlearning or Obfuscating? Jogging the Memory of Unlearned LLMs via Benign Relearning

Shengyuan Hu, Yiwei Fu, Steven Wu et al.

ICLR 2025arXiv:2406.13356
40
citations
#1044

DiscoveryBench: Towards Data-Driven Discovery with Large Language Models

Bodhisattwa Prasad Majumder, Harshit Surana, Dhruv Agarwal et al.

ICLR 2025arXiv:2407.01725
40
citations
#1045

Holmes-VAU: Towards Long-term Video Anomaly Understanding at Any Granularity

Huaxin Zhang, Xiaohao Xu, Xiang Wang et al.

CVPR 2025highlightarXiv:2412.06171
40
citations
#1046

3D-LLaVA: Towards Generalist 3D LMMs with Omni Superpoint Transformer

Jiajun Deng, Tianyu He, Li Jiang et al.

CVPR 2025arXiv:2501.01163
40
citations
#1047

Think Only When You Need with Large Hybrid-Reasoning Models

Lingjie Jiang, Xun Wu, Shaohan Huang et al.

NEURIPS 2025arXiv:2505.14631
40
citations
#1048

How Feature Learning Can Improve Neural Scaling Laws

Blake Bordelon, Alexander Atanasov, Cengiz Pehlevan

ICLR 2025arXiv:2409.17858
40
citations
#1049

GaussianFormer-2: Probabilistic Gaussian Superposition for Efficient 3D Occupancy Prediction

Yuanhui Huang, Amonnut Thammatadatrakoon, Wenzhao Zheng et al.

CVPR 2025arXiv:2412.04384
40
citations
#1050

Video-Guided Foley Sound Generation with Multimodal Controls

Ziyang Chen, Prem Seetharaman, Bryan Russell et al.

CVPR 2025arXiv:2411.17698
40
citations
#1051

Geo4D: Leveraging Video Generators for Geometric 4D Scene Reconstruction

Zeren Jiang, Chuanxia Zheng, Iro Laina et al.

ICCV 2025highlightarXiv:2504.07961
40
citations
#1052

OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?

Junbo Niu, Yifei Li, Ziyang Miao et al.

CVPR 2025arXiv:2501.05510
40
citations
#1053

ReMA: Learning to Meta-Think for LLMs with Multi-agent Reinforcement Learning

Ziyu Wan, Yunxiang Li, Xiaoyu Wen et al.

NEURIPS 2025arXiv:2503.09501
40
citations
#1054

DrVideo: Document Retrieval Based Long Video Understanding

Ziyu Ma, Chenhui Gou, Hengcan Shi et al.

CVPR 2025arXiv:2406.12846
39
citations
#1055

FlatQuant: Flatness Matters for LLM Quantization

Yuxuan Sun, Ruikang Liu, Haoli Bai et al.

ICML 2025arXiv:2410.09426
39
citations
#1056

Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention

Wenbin An, Feng Tian, Sicong Leng et al.

CVPR 2025arXiv:2406.12718
39
citations
#1057

Think while You Generate: Discrete Diffusion with Planned Denoising

Sulin Liu, Juno Nam, Andrew Campbell et al.

ICLR 2025arXiv:2410.06264
39
citations
#1058

Evaluating the Evaluator: Measuring LLMs’ Adherence to Task Evaluation Instructions

Bhuvanashree Murugadoss, Christian Poelitz, Ian Drosos et al.

AAAI 2025paperarXiv:2408.08781
39
citations
#1059

DiG: Scalable and Efficient Diffusion Models with Gated Linear Attention

Lianghui Zhu, Zilong Huang, Bencheng Liao et al.

CVPR 2025arXiv:2405.18428
39
citations
#1060

Synthetic continued pretraining

Zitong Yang, Neil Band, Shuangping Li et al.

ICLR 2025arXiv:2409.07431
39
citations
#1061

What is Wrong with Perplexity for Long-context Language Modeling?

Lizhe Fang, Yifei Wang, Zhaoyang Liu et al.

ICLR 2025arXiv:2410.23771
39
citations
#1062

Latent Space Chain-of-Embedding Enables Output-free LLM Self-Evaluation

Yiming Wang, Pei Zhang, Baosong Yang et al.

ICLR 2025arXiv:2410.13640
39
citations
#1063

EnerVerse: Envisioning Embodied Future Space for Robotics Manipulation

Siyuan Huang, Liliang Chen, Pengfei Zhou et al.

NEURIPS 2025arXiv:2501.01895
39
citations
#1064

Watermark Anything With Localized Messages

Tom Sander, Pierre Fernandez, Alain Oliviero Durmus et al.

ICLR 2025arXiv:2411.07231
39
citations
#1065

On Scaling Up 3D Gaussian Splatting Training

Hexu Zhao, Haoyang Weng, Daohan Lu et al.

ICLR 2025arXiv:2406.18533
39
citations
#1066

SafeArena: Evaluating the Safety of Autonomous Web Agents

Ada Tur, Nicholas Meade, Xing Han Lù et al.

ICML 2025arXiv:2503.04957
39
citations
#1067

Multi-Objective Evolution of Heuristic Using Large Language Model

Shunyu Yao, Fei Liu, Xi Lin et al.

AAAI 2025paperarXiv:2409.16867
39
citations
#1068

OTTER: A Vision-Language-Action Model with Text-Aware Visual Feature Extraction

Huang Huang, Fangchen Liu, Letian Fu et al.

ICML 2025arXiv:2503.03734
39
citations
#1069

CoCoCo: Improving Text-Guided Video Inpainting for Better Consistency, Controllability and Compatibility

Bojia Zi, Shihao Zhao, Xianbiao Qi et al.

AAAI 2025paperarXiv:2403.12035
39
citations
#1070

Number it: Temporal Grounding Videos like Flipping Manga

Yongliang Wu, Xinting Hu, Yuyang Sun et al.

CVPR 2025arXiv:2411.10332
39
citations
#1071

Attention with Markov: A Curious Case of Single-layer Transformers

Ashok Makkuva, Marco Bondaschi, Adway Girish et al.

ICLR 2025arXiv:2402.04161
39
citations
#1072

Improving Pretraining Data Using Perplexity Correlations

Tristan Thrush, Christopher Potts, Tatsunori Hashimoto

ICLR 2025arXiv:2409.05816
39
citations
#1073

Uni-Sign: Toward Unified Sign Language Understanding at Scale

Zecheng Li, Wengang Zhou, Weichao Zhao et al.

ICLR 2025arXiv:2501.15187
39
citations
#1074

RATT: A Thought Structure for Coherent and Correct LLM Reasoning

Jinghan Zhang, Xiting Wang, Weijieying Ren et al.

AAAI 2025paperarXiv:2406.02746
39
citations
#1075

INCLUDE: Evaluating Multilingual Language Understanding with Regional Knowledge

Angelika Romanou, Negar Foroutan, Anna Sotnikova et al.

ICLR 2025arXiv:2411.19799
39
citations
#1076

Strong Model Collapse

Elvis Dohmatob, Yunzhen Feng, Arjun Subramonian et al.

ICLR 2025arXiv:2410.04840
39
citations
#1077

Learning When to Think: Shaping Adaptive Reasoning in R1-Style Models via Multi-Stage RL

Songjun Tu, Jiahao Lin, Qichao Zhang et al.

NEURIPS 2025arXiv:2505.10832
39
citations
#1078

SliM-LLM: Salience-Driven Mixed-Precision Quantization for Large Language Models

Wei Huang, Haotong Qin, Yangdong Liu et al.

ICML 2025arXiv:2405.14917
39
citations
#1079

Building Interactable Replicas of Complex Articulated Objects via Gaussian Splatting

Yu Liu, Baoxiong Jia, Ruijie Lu et al.

ICLR 2025arXiv:2502.19459
39
citations
#1080

Improving the Diffusability of Autoencoders

Ivan Skorokhodov, Sharath Girish, Benran Hu et al.

ICML 2025arXiv:2502.14831
39
citations
#1081

Theory, Analysis, and Best Practices for Sigmoid Self-Attention

Jason Ramapuram, Federico Danieli, Eeshan Gunesh Dhekane et al.

ICLR 2025arXiv:2409.04431
39
citations
#1082

Training-Free Activation Sparsity in Large Language Models

James Liu, Pragaash Ponnusamy, Tianle Cai et al.

ICLR 2025arXiv:2408.14690
39
citations
#1083

DyCoke: Dynamic Compression of Tokens for Fast Video Large Language Models

Keda Tao, Can Qin, Haoxuan You et al.

CVPR 2025arXiv:2411.15024
39
citations
#1084

Human-Object Interaction from Human-Level Instructions

Zhen Wu, Jiaman Li, Pei Xu et al.

ICCV 2025arXiv:2406.17840
39
citations
#1085

MaIR: A Locality- and Continuity-Preserving Mamba for Image Restoration

Boyun Li, Haiyu Zhao, Wenxin Wang et al.

CVPR 2025arXiv:2412.20066
39
citations
#1086

LazyDiT: Lazy Learning for the Acceleration of Diffusion Transformers

Xuan Shen, Zhao Song, Yufa Zhou et al.

AAAI 2025paperarXiv:2412.12444
38
citations
#1087

Align Your Flow: Scaling Continuous-Time Flow Map Distillation

Amirmojtaba Sabour, Sanja Fidler, Karsten Kreis

NEURIPS 2025arXiv:2506.14603
38
citations
#1088

Restructuring Vector Quantization with the Rotation Trick

Christopher Fifty, Ronald Junkins, Dennis Duan et al.

ICLR 2025arXiv:2410.06424
38
citations
#1089

Frequency-Aligned Knowledge Distillation for Lightweight Spatiotemporal Forecasting

Yuqi Li, Chuanguang Yang, Hansheng Zeng et al.

ICCV 2025arXiv:2507.02939
38
citations
#1090

No Training, No Problem: Rethinking Classifier-Free Guidance for Diffusion Models

Seyedmorteza Sadat, Manuel Kansy, Otmar Hilliges et al.

ICLR 2025arXiv:2407.02687
38
citations
#1091

WMAdapter: Adding WaterMark Control to Latent Diffusion Models

Hai Ci, Yiren Song, Pei Yang et al.

ICML 2025arXiv:2406.08337
38
citations
#1092

Persistent Pre-training Poisoning of LLMs

Yiming Zhang, Javier Rando, Ivan Evtimov et al.

ICLR 2025arXiv:2410.13722
38
citations
#1093

Speedy-Splat: Fast 3D Gaussian Splatting with Sparse Pixels and Sparse Primitives

Alex Hanson, Allen Tu, Geng Lin et al.

CVPR 2025arXiv:2412.00578
38
citations
#1094

Gradient Alignment in Physics-informed Neural Networks: A Second-Order Optimization Perspective

Sifan Wang, Ananyae bhartari, Bowen Li et al.

NEURIPS 2025arXiv:2502.00604
38
citations
#1095

Vision-Language Models Do Not Understand Negation

Kumail Alhamoud, Shaden Alshammari, Yonglong Tian et al.

CVPR 2025arXiv:2501.09425
38
citations
#1096

FG-CLIP: Fine-Grained Visual and Textual Alignment

Chunyu Xie, Bin Wang, Fanjing Kong et al.

ICML 2025arXiv:2505.05071
38
citations
#1097

UniCTokens: Boosting Personalized Understanding and Generation via Unified Concept Tokens

Ruichuan An, Sihan Yang, Renrui Zhang et al.

NEURIPS 2025arXiv:2505.14671
38
citations
#1098

DeepRetrieval: Hacking Real Search Engines and Retrievers with Large Language Models via Reinforcement Learning

Pengcheng Jiang, Jiacheng Lin, Lang Cao et al.

COLM 2025paperarXiv:2503.00223
38
citations
#1099

Sequential Controlled Langevin Diffusions

Junhua Chen, Lorenz Richter, Julius Berner et al.

ICLR 2025arXiv:2412.07081
38
citations
#1100

Mitigating Hallucinations in Large Vision-Language Models via DPO: On-Policy Data Hold the Key

Zhihe Yang, Xufang Luo, Dongqi Han et al.

CVPR 2025arXiv:2501.09695
38
citations
#1101

STI-Bench: Are MLLMs Ready for Precise Spatial-Temporal World Understanding?

Yun Li, Yiming Zhang, Tao Lin et al.

ICCV 2025arXiv:2503.23765
38
citations
#1102

Cycle3D: High-quality and Consistent Image-to-3D Generation via Generation-Reconstruction Cycle

Zhenyu Tang, Junwu Zhang, Xinhua Cheng et al.

AAAI 2025paperarXiv:2407.19548
38
citations
#1103

Speculative Thinking: Enhancing Small-Model Reasoning with Large Model Guidance at Inference Time

Van Yang, Xiang Yue, Vipin Chaudhary et al.

COLM 2025paperarXiv:2504.12329
38
citations
#1104

GaussianOcc: Fully Self-supervised and Efficient 3D Occupancy Estimation with Gaussian Splatting

Wanshui Gan, Fang Liu, Hongbin Xu et al.

ICCV 2025arXiv:2408.11447
38
citations
#1105

Variational Best-of-N Alignment

Afra Amini, Tim Vieira, Elliott Ash et al.

ICLR 2025arXiv:2407.06057
38
citations
#1106

LLaVA-MoD: Making LLaVA Tiny via MoE-Knowledge Distillation

Fangxun Shu, Yue Liao, Lei Zhang et al.

ICLR 2025arXiv:2408.15881
38
citations
#1107

SCBench: A KV Cache-Centric Analysis of Long-Context Methods

Yucheng Li, Huiqiang Jiang, Qianhui Wu et al.

ICLR 2025arXiv:2412.10319
38
citations
#1108

Multi-modal Agent Tuning: Building a VLM-Driven Agent for Efficient Tool Usage

Zhi Gao, Bofei Zhang, Pengxiang Li et al.

ICLR 2025arXiv:2412.15606
38
citations
#1109

HelpSteer3-Preference: Open Human-Annotated Preference Data across Diverse Tasks and Languages

Zhilin Wang, Jiaqi Zeng, Olivier Delalleau et al.

NEURIPS 2025arXiv:2505.11475
38
citations
#1110

SeedVR: Seeding Infinity in Diffusion Transformer Towards Generic Video Restoration

Jianyi Wang, Zhijie Lin, Meng Wei et al.

CVPR 2025highlightarXiv:2501.01320
38
citations
#1111

Generalizing Verifiable Instruction Following

Valentina Pyatkin, Saumya Malik, Victoria Graf et al.

NEURIPS 2025arXiv:2507.02833
38
citations
#1112

ATP-LLaVA: Adaptive Token Pruning for Large Vision Language Models

Xubing Ye, Yukang Gan, Yixiao Ge et al.

CVPR 2025arXiv:2412.00447
38
citations
#1113

GFlow: Recovering 4D World from Monocular Video

Shizun Wang, Xingyi Yang, Qiuhong Shen et al.

AAAI 2025paperarXiv:2405.18426
38
citations
#1114

SplatAD: Real-Time Lidar and Camera Rendering with 3D Gaussian Splatting for Autonomous Driving

Georg Hess, Carl Lindström, Maryam Fatemi et al.

CVPR 2025arXiv:2411.16816
38
citations
#1115

Graph-constrained Reasoning: Faithful Reasoning on Knowledge Graphs with Large Language Models

Linhao Luo, Zicheng Zhao, Reza Haffari et al.

ICML 2025arXiv:2410.13080
38
citations
#1116

Missing Premise exacerbates Overthinking: Are Reasoning Models losing Critical Thinking Skill?

Chenrui Fan, Ming Li, Lichao Sun et al.

COLM 2025paperarXiv:2504.06514
38
citations
#1117

Dynamic Diffusion Transformer

Wangbo Zhao, Yizeng Han, Jiasheng Tang et al.

ICLR 2025arXiv:2410.03456
38
citations
#1118

SAeUron: Interpretable Concept Unlearning in Diffusion Models with Sparse Autoencoders

Bartosz Cywiński, Kamil Deja

ICML 2025arXiv:2501.18052
37
citations
#1119

TOMATO: Assessing Visual Temporal Reasoning Capabilities in Multimodal Foundation Models

Ziyao Shangguan, Chuhan Li, Yuxuan Ding et al.

ICLR 2025oralarXiv:2410.23266
37
citations
#1120

Harmonizing Visual Representations for Unified Multimodal Understanding and Generation

Size Wu, Wenwei Zhang, Lumin Xu et al.

ICCV 2025arXiv:2503.21979
37
citations
#1121

Informed Correctors for Discrete Diffusion Models

Yixiu Zhao, Jiaxin Shi, Feng Chen et al.

NEURIPS 2025arXiv:2407.21243
37
citations
#1122

Mixture of Attention Spans: Optimizing LLM Inference Efficiency with Heterogeneous Sliding-Window Lengths

Tianyu Fu, Haofeng Huang, Xuefei Ning et al.

COLM 2025paperarXiv:2406.14909
37
citations
#1123

Locate Anything on Earth: Advancing Open-Vocabulary Object Detection for Remote Sensing Community

Jiancheng Pan, Yanxing Liu, Yuqian Fu et al.

AAAI 2025paperarXiv:2408.09110
37
citations
#1124

Compositional Entailment Learning for Hyperbolic Vision-Language Models

Avik Pal, Max van Spengler, Guido D'Amely di Melendugno et al.

ICLR 2025arXiv:2410.06912
37
citations
#1125

FairMT-Bench: Benchmarking Fairness for Multi-turn Dialogue in Conversational LLMs

Zhiting Fan, Ruizhe Chen, Tianxiang Hu et al.

ICLR 2025arXiv:2410.19317
37
citations
#1126

Preserving Diversity in Supervised Fine-Tuning of Large Language Models

Ziniu Li, Congliang Chen, Tian Xu et al.

ICLR 2025arXiv:2408.16673
37
citations
#1127

MALT: Improving Reasoning with Multi-Agent LLM Training

Sumeet Ramesh Motwani, Chandler Smith, Rocktim Jyoti Das et al.

COLM 2025paperarXiv:2412.01928
37
citations
#1128

Flow Matching with General Discrete Paths: A Kinetic-Optimal Perspective

Neta Shaul, Itai Gat, Marton Havasi et al.

ICLR 2025arXiv:2412.03487
37
citations
#1129

ChatQA 2: Bridging the Gap to Proprietary LLMs in Long Context and RAG Capabilities

Peng Xu, Wei Ping, Xianchao Wu et al.

ICLR 2025arXiv:2407.14482
37
citations
#1130

SpecReason: Fast and Accurate Inference-Time Compute via Speculative Reasoning

Rui Pan, Yinwei Dai, Zhihao Zhang et al.

NEURIPS 2025arXiv:2504.07891
37
citations
#1131

Look Twice Before You Answer: Memory-Space Visual Retracing for Hallucination Mitigation in Multimodal Large Language Models

Xin Zou, Yizhou WANG, Yibo Yan et al.

ICML 2025arXiv:2410.03577
37
citations
#1132

DeciMamba: Exploring the Length Extrapolation Potential of Mamba

Assaf Ben-Kish, Itamar Zimerman, Shady Abu-Hussein et al.

ICLR 2025arXiv:2406.14528
37
citations
#1133

From Reusing to Forecasting: Accelerating Diffusion Models with TaylorSeers

Jiacheng Liu, Chang Zou, Yuanhuiyi Lyu et al.

ICCV 2025arXiv:2503.06923
37
citations
#1134

Large-Scale Text-to-Image Model with Inpainting is a Zero-Shot Subject-Driven Image Generator

Chaehun Shin, Jooyoung Choi, Heeseung Kim et al.

CVPR 2025arXiv:2411.15466
37
citations
#1135

Deconstructing What Makes a Good Optimizer for Autoregressive Language Models

Rosie Zhao, Depen Morwani, David Brandfonbrener et al.

ICLR 2025
37
citations
#1136

OpenCUA: Open Foundations for Computer-Use Agents

Xinyuan Wang, Bowen Wang, Dunjie Lu et al.

NEURIPS 2025spotlightarXiv:2508.09123
37
citations
#1137

Which Attention Heads Matter for In-Context Learning?

Kayo Yin, Jacob Steinhardt

ICML 2025arXiv:2502.14010
37
citations
#1138

Transformers Provably Solve Parity Efficiently with Chain of Thought

Juno Kim, Taiji Suzuki

ICLR 2025arXiv:2410.08633
37
citations
#1139

ExACT: Teaching AI Agents to Explore with Reflective-MCTS and Exploratory Learning

Xiao Yu, Baolin Peng, Vineeth Vajipey et al.

ICLR 2025arXiv:2410.02052
37
citations
#1140

ControlAR: Controllable Image Generation with Autoregressive Models

Zongming Li, Tianheng Cheng, Shoufa Chen et al.

ICLR 2025arXiv:2410.02705
37
citations
#1141

Direct3D-S2: Gigascale 3D Generation Made Easy with Spatial Sparse Attention

Shuang Wu, Youtian Lin, Feihu Zhang et al.

NEURIPS 2025arXiv:2505.17412
37
citations
#1142

Modeling Future Conversation Turns to Teach LLMs to Ask Clarifying Questions

Michael Zhang, W. Bradley Knox, Eunsol Choi

ICLR 2025arXiv:2410.13788
37
citations
#1143

Efficient Evolutionary Search Over Chemical Space with Large Language Models

Haorui Wang, Marta Skreta, Cher-Tian Ser et al.

ICLR 2025arXiv:2406.16976
37
citations
#1144

COMBO: Compositional World Models for Embodied Multi-Agent Cooperation

Hongxin Zhang, Zeyuan Wang, Qiushi Lyu et al.

ICLR 2025arXiv:2404.10775
37
citations
#1145

WorldModelBench: Judging Video Generation Models As World Models

Dacheng Li, Yunhao Fang, Yukang Chen et al.

NEURIPS 2025arXiv:2502.20694
37
citations
#1146

FaceXFormer: A Unified Transformer for Facial Analysis

Kartik Narayan, Vibashan VS, Rama Chellappa et al.

ICCV 2025arXiv:2403.12960
37
citations
#1147

Instructional Segment Embedding: Improving LLM Safety with Instruction Hierarchy

Tong Wu, Shujian Zhang, Kaiqiang Song et al.

ICLR 2025arXiv:2410.09102
37
citations
#1148

Towards General Visual-Linguistic Face Forgery Detection

Ke Sun, Shen Chen, Taiping Yao et al.

CVPR 2025arXiv:2307.16545
37
citations
#1149

SaLoRA: Safety-Alignment Preserved Low-Rank Adaptation

Mingjie Li, Wai Man Si, Michael Backes et al.

ICLR 2025arXiv:2501.01765
37
citations
#1150

GenARM: Reward Guided Generation with Autoregressive Reward Model for Test-Time Alignment

Yuancheng Xu, Udari Sehwag, Alec Koppel et al.

ICLR 2025arXiv:2410.08193
37
citations
#1151

LaMP: Language-Motion Pretraining for Motion Generation, Retrieval, and Captioning

Zhe Li, Weihao Yuan, Yisheng He et al.

ICLR 2025arXiv:2410.07093
37
citations
#1152

ConceptPrune: Concept Editing in Diffusion Models via Skilled Neuron Pruning

Ruchika Chavhan, Da Li, Timothy Hospedales

ICLR 2025arXiv:2405.19237
37
citations
#1153

Collapse or Thrive: Perils and Promises of Synthetic Data in a Self-Generating World

Joshua Kazdan, Rylan Schaeffer, Apratim Dey et al.

ICML 2025arXiv:2410.16713
37
citations
#1154

EarthDial: Turning Multi-sensory Earth Observations to Interactive Dialogues

Sagar Soni, Akshay Dudhane, Hiyam Debary et al.

CVPR 2025arXiv:2412.15190
37
citations
#1155

Reinforcing the Diffusion Chain of Lateral Thought with Diffusion Language Models

Zemin Huang, Zhiyang Chen, Zijun Wang et al.

NEURIPS 2025arXiv:2505.10446
37
citations
#1156

Sparse MeZO: Less Parameters for Better Performance in Zeroth-Order LLM Fine-Tuning

Yong Liu, Zirui Zhu, Chaoyu Gong et al.

NEURIPS 2025arXiv:2402.15751
37
citations
#1157

Your Large Vision-Language Model Only Needs A Few Attention Heads For Visual Grounding

seil kang, Jinyeong Kim, Junhyeok Kim et al.

CVPR 2025highlightarXiv:2503.06287
37
citations
#1158

MM-Spatial: Exploring 3D Spatial Understanding in Multimodal LLMs

Erik Daxberger, Nina Wenzel, David Griffiths et al.

ICCV 2025arXiv:2503.13111
36
citations
#1159

MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos

Xuehai He, Weixi Feng, Kaizhi Zheng et al.

ICLR 2025arXiv:2406.08407
36
citations
#1160

Rethinking LLM Unlearning Objectives: A Gradient Perspective and Go Beyond

Qizhou Wang, Jin Zhou, (Andrew) Zhanke Zhou et al.

ICLR 2025arXiv:2502.19301
36
citations
#1161

xPatch: Dual-Stream Time Series Forecasting with Exponential Seasonal-Trend Decomposition

Artyom Stitsyuk, Jaesik Choi

AAAI 2025paperarXiv:2412.17323
36
citations
#1162

VideoRoPE: What Makes for Good Video Rotary Position Embedding?

Xilin Wei, Xiaoran Liu, Yuhang Zang et al.

ICML 2025oralarXiv:2502.05173
36
citations
#1163

AgentStudio: A Toolkit for Building General Virtual Agents

Longtao Zheng, Zhiyuan Huang, Zhenghai Xue et al.

ICLR 2025arXiv:2403.17918
36
citations
#1164

CreatiLayout: Siamese Multimodal Diffusion Transformer for Creative Layout-to-Image Generation

Hui Zhang, Dexiang Hong, Yitong Wang et al.

ICCV 2025arXiv:2412.03859
36
citations
#1165

SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation

Zekun Qi, Wenyao Zhang, Yufei Ding et al.

NEURIPS 2025spotlightarXiv:2502.13143
36
citations
#1166

Attentive Eraser: Unleashing Diffusion Model’s Object Removal Potential via Self-Attention Redirection Guidance

Wenhao Sun, Xue-Mei Dong, Benlei Cui et al.

AAAI 2025paperarXiv:2412.12974
36
citations
#1167

Process vs. Outcome Reward: Which is Better for Agentic RAG Reinforcement Learning

Wenlin Zhang, Xiangyang Li, Kuicai Dong et al.

NEURIPS 2025arXiv:2505.14069
36
citations
#1168

Time-VLM: Exploring Multimodal Vision-Language Models for Augmented Time Series Forecasting

Siru Zhong, Weilin Ruan, Ming Jin et al.

ICML 2025oralarXiv:2502.04395
36
citations
#1169

VILA-M3: Enhancing Vision-Language Models with Medical Expert Knowledge

Vishwesh Nath, Wenqi Li, Dong Yang et al.

CVPR 2025highlightarXiv:2411.12915
36
citations
#1170

IDOL: Instant Photorealistic 3D Human Creation from a Single Image

Yiyu Zhuang, Jiaxi Lv, Hao Wen et al.

CVPR 2025arXiv:2412.14963
36
citations
#1171

Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs

Mantas Mazeika, Xuwang Yin, Rishub Tamirisa et al.

NEURIPS 2025spotlightarXiv:2502.08640
36
citations
#1172

ManipTrans: Efficient Dexterous Bimanual Manipulation Transfer via Residual Learning

Kailin Li, Puhao Li, Tengyu Liu et al.

CVPR 2025arXiv:2503.21860
36
citations
#1173

PAD: Personalized Alignment of LLMs at Decoding-time

Ruizhe Chen, Xiaotian Zhang, Meng Luo et al.

ICLR 2025arXiv:2410.04070
36
citations
#1174

ParaHome: Parameterizing Everyday Home Activities Towards 3D Generative Modeling of Human-Object Interactions

Jeonghwan Kim, Jisoo Kim, Jeonghyeon Na et al.

CVPR 2025arXiv:2401.10232
36
citations
#1175

Spatial-Mamba: Effective Visual State Space Models via Structure-Aware State Fusion

Chaodong Xiao, Minghan Li, zhengqiang ZHANG et al.

ICLR 2025arXiv:2410.15091
36
citations
#1176

Read, Watch and Scream! Sound Generation from Text and Video

Yujin Jeong, Yunji Kim, Sanghyuk Chun et al.

AAAI 2025paperarXiv:2407.05551
36
citations
#1177

StreetCrafter: Street View Synthesis with Controllable Video Diffusion Models

Yunzhi Yan, Zhen Xu, Haotong Lin et al.

CVPR 2025arXiv:2412.13188
36
citations
#1178

SparseFlex: High-Resolution and Arbitrary-Topology 3D Shape Modeling

Xianglong He, Zi-Xin Zou, Chia Hao Chen et al.

ICCV 2025arXiv:2503.21732
36
citations
#1179

Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models

Yongxin Guo, Zhenglin Cheng, Xiaoying Tang et al.

ICLR 2025arXiv:2405.14297
36
citations
#1180

Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction

Rui Qian, Shuangrui Ding, Xiaoyi Dong et al.

CVPR 2025arXiv:2501.03218
36
citations
#1181

Causal Prompting: Debiasing Large Language Model Prompting Based on Front-Door Adjustment

Congzhi Zhang, Linhai Zhang, Jialong Wu et al.

AAAI 2025paperarXiv:2403.02738
36
citations
#1182

CogCoM: A Visual Language Model with Chain-of-Manipulations Reasoning

Ji Qi, Ming Ding, Weihan Wang et al.

ICLR 2025arXiv:2402.04236
36
citations
#1183

Transolver++: An Accurate Neural Solver for PDEs on Million-Scale Geometries

HUAKUN LUO, Haixu Wu, Hang Zhou et al.

ICML 2025arXiv:2502.02414
36
citations
#1184

Diffusion Bridge Implicit Models

Kaiwen Zheng, Guande He, Jianfei Chen et al.

ICLR 2025arXiv:2405.15885
36
citations
#1185

Adaptive Data Optimization: Dynamic Sample Selection with Scaling Laws

Yiding Jiang, Allan Zhou, Zhili Feng et al.

ICLR 2025arXiv:2410.11820
36
citations
#1186

MuMA-ToM: Multi-modal Multi-Agent Theory of Mind

Haojun Shi, Suyu Ye, Xinyu Fang et al.

AAAI 2025paperarXiv:2408.12574
36
citations
#1187

Selective Forgetting: Advancing Machine Unlearning Techniques and Evaluation in Language Models

Lingzhi Wang, Xingshan Zeng, Jinsong Guo et al.

AAAI 2025paperarXiv:2402.05813
36
citations
#1188

FreeVS: Generative View Synthesis on Free Driving Trajectory

Qitai Wang, Lue Fan, Yuqi Wang et al.

ICLR 2025arXiv:2410.18079
36
citations
#1189

SoftVQ-VAE: Efficient 1-Dimensional Continuous Tokenizer

Hao Chen, Ze Wang, Xiang Li et al.

CVPR 2025arXiv:2412.10958
36
citations
#1190

AutoEval Done Right: Using Synthetic Data for Model Evaluation

Pierre Boyeau, Anastasios Angelopoulos, Tianle Li et al.

ICML 2025arXiv:2403.07008
36
citations
#1191

Do LLM Agents Have Regret? A Case Study in Online Learning and Games

Chanwoo Park, Xiangyu Liu, Asuman Ozdaglar et al.

ICLR 2025arXiv:2403.16843
36
citations
#1192

Learning to Route LLMs with Confidence Tokens

Yu-Neng Chuang, Prathusha Sarma, Parikshit Gopalan et al.

ICML 2025arXiv:2410.13284
36
citations
#1193

Know Me, Respond to Me: Benchmarking LLMs for Dynamic User Profiling and Personalized Responses at Scale

Bowen Jiang, Zhuoqun Hao, Young Min Cho et al.

COLM 2025paperarXiv:2504.14225
36
citations
#1194

AeroGen: Enhancing Remote Sensing Object Detection with Diffusion-Driven Data Generation

Datao Tang, Xiangyong Cao, Xuan Wu et al.

CVPR 2025arXiv:2411.15497
36
citations
#1195

TiRex: Zero-Shot Forecasting Across Long and Short Horizons with Enhanced In-Context Learning

Andreas Auer, Patrick Podest, Daniel Klotz et al.

NEURIPS 2025arXiv:2505.23719
36
citations
#1196

MAGIC: Generating Self-Correction Guideline for In-Context Text-to-SQL

Arian Askari, Christian Poelitz, Xinye Tang

AAAI 2025paperarXiv:2406.12692
36
citations
#1197

SINGAPO: Single Image Controlled Generation of Articulated Parts in Objects

Jiayi Liu, Denys Iliash, Angel Chang et al.

ICLR 2025arXiv:2410.16499
36
citations
#1198

MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation Experts

Peng Jin, Bo Zhu, Yuan Li et al.

ICLR 2025arXiv:2410.07348
36
citations
#1199

Merging LoRAs like Playing LEGO: Pushing the Modularity of LoRA to Extremes Through Rank-Wise Clustering

Ziyu Zhao, tao shen, Didi Zhu et al.

ICLR 2025arXiv:2409.16167
35
citations
#1200

LLMDet: Learning Strong Open-Vocabulary Object Detectors under the Supervision of Large Language Models

Shenghao Fu, Qize Yang, Qijie Mo et al.

CVPR 2025highlightarXiv:2501.18954
35
citations