Most Cited 2025 "gradient reweighting" Papers

22,274 papers found • Page 4 of 112

#601

How efficient is LLM-generated code? A rigorous & high-standard benchmark

Ruizhong Qiu, Weiliang Zeng, James Ezick et al.

ICLR 2025posterarXiv:2406.06647
43
citations
#602

SeeGround: See and Ground for Zero-Shot Open-Vocabulary 3D Visual Grounding

Rong Li, Shijie Li, Lingdong Kong et al.

CVPR 2025posterarXiv:2412.04383
43
citations
#603

ACE: All-round Creator and Editor Following Instructions via Diffusion Transformer

Zhen Han, Zeyinzi Jiang, Yulin Pan et al.

ICLR 2025posterarXiv:2410.00086
43
citations
#604

Beyond Text-Visual Attention: Exploiting Visual Cues for Effective Token Pruning in VLMs

Qizhe Zhang, Aosong Cheng, Ming Lu et al.

ICCV 2025posterarXiv:2412.01818
43
citations
#605

Learning 4D Embodied World Models

Haoyu Zhen, Qiao Sun, Hongxin Zhang et al.

ICCV 2025posterarXiv:2504.20995
43
citations
#606

All Languages Matter: Evaluating LMMs on Culturally Diverse 100 Languages

Ashmal Vayani, Dinura Dissanayake, Hasindri Watawana et al.

CVPR 2025highlightarXiv:2411.16508
42
citations
#607

Trust or Escalate: LLM Judges with Provable Guarantees for Human Agreement

Jaehun Jung, Faeze Brahman, Yejin Choi

ICLR 2025posterarXiv:2407.18370
42
citations
#608

Context is Key: A Benchmark for Forecasting with Essential Textual Information

Andrew Williams, Arjun Ashok, Étienne Marcotte et al.

ICML 2025posterarXiv:2410.18959
42
citations
#609

Accelerating Auto-regressive Text-to-Image Generation with Training-free Speculative Jacobi Decoding

Yao Teng, Han Shi, Xian Liu et al.

ICLR 2025posterarXiv:2410.01699
42
citations
#610

ENCODER: Entity Mining and Modification Relation Binding for Composed Image Retrieval

Zixu Li, Zhiwei Chen, Haokun Wen et al.

AAAI 2025paper
42
citations
#611

Robust LLM safeguarding via refusal feature adversarial training

Lei Yu, Virginie Do, Karen Hambardzumyan et al.

ICLR 2025posterarXiv:2409.20089
42
citations
#612

Universal Actions for Enhanced Embodied Foundation Models

Jinliang Zheng, Jianxiong Li, Dongxiu Liu et al.

CVPR 2025posterarXiv:2501.10105
42
citations
#613

Fine-Tuning Discrete Diffusion Models via Reward Optimization with Applications to DNA and Protein Design

Chenyu Wang, Masatoshi Uehara, Yichun He et al.

ICLR 2025posterarXiv:2410.13643
42
citations
#614

To Code or Not To Code? Exploring Impact of Code in Pre-training

Viraat Aryabumi, Yixuan Su, Raymond Ma et al.

ICLR 2025posterarXiv:2408.10914
42
citations
#615

GEM: A Generalizable Ego-Vision Multimodal World Model for Fine-Grained Ego-Motion, Object Dynamics, and Scene Composition Control

Mariam Hassan, Sebastian Stapf, Ahmad Rahimi et al.

CVPR 2025posterarXiv:2412.11198
42
citations
#616

St4RTrack: Simultaneous 4D Reconstruction and Tracking in the World

Haiwen Feng, Junyi Zhang, Qianqian Wang et al.

ICCV 2025posterarXiv:2504.13152
42
citations
#617

Reasoning-SQL: Reinforcement Learning with SQL Tailored Partial Rewards for Reasoning-Enhanced Text-to-SQL

Mohammadreza Pourreza, Shayan Talaei, Ruoxi Sun et al.

COLM 2025paperarXiv:2503.23157
42
citations
#618

Time-R1: Post-Training Large Vision Language Model for Temporal Video Grounding

Ye Wang, Ziheng Wang, Boshen Xu et al.

NEURIPS 2025oralarXiv:2503.13377
42
citations
#619

Real2Code: Reconstruct Articulated Objects via Code Generation

Mandi Zhao, Yijia Weng, Dominik Bauer et al.

ICLR 2025posterarXiv:2406.08474
42
citations
#620

Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate

Yubo Wang, Xiang Yue, Wenhu Chen

COLM 2025paperarXiv:2501.17703
42
citations
#621

Correcting the Mythos of KL-Regularization: Direct Alignment without Overoptimization via Chi-Squared Preference Optimization

Audrey Huang, Wenhao Zhan, Tengyang Xie et al.

ICLR 2025posterarXiv:2407.13399
42
citations
#622

Towards Realistic Data Generation for Real-World Super-Resolution

Long Peng, Wenbo Li, Renjing Pei et al.

ICLR 2025posterarXiv:2406.07255
42
citations
#623

DINOv2 Meets Text: A Unified Framework for Image- and Pixel-Level Vision-Language Alignment

Dahyun Kang, Piotr Bojanowski, Huy V. Vo et al.

CVPR 2025posterarXiv:2412.16334
42
citations
#624

ETA: Evaluating Then Aligning Safety of Vision Language Models at Inference Time

Yi Ding, Bolian Li, Ruqi Zhang

ICLR 2025posterarXiv:2410.06625
42
citations
#625

CC-OCR: A Comprehensive and Challenging OCR Benchmark for Evaluating Large Multimodal Models in Literacy

Zhibo Yang, Jun Tang, Zhaohai Li et al.

ICCV 2025posterarXiv:2412.02210
42
citations
#626

Arctic-Embed 2.0: Multilingual Retrieval Without Compromise

Puxuan Yu, Luke Merrick, Gaurav Nuti et al.

COLM 2025paperarXiv:2412.04506
42
citations
#627

Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning

Yibin Wang, li zhimin, Yuhang Zang et al.

NEURIPS 2025posterarXiv:2505.03318
42
citations
#628

Frame-Voyager: Learning to Query Frames for Video Large Language Models

Sicheng Yu, CHENGKAI JIN, Huanyu Wang et al.

ICLR 2025posterarXiv:2410.03226
42
citations
#629

SEAL: Steerable Reasoning Calibration of Large Language Models for Free

Runjin Chen, Zhenyu Zhang, Junyuan Hong et al.

COLM 2025paperarXiv:2504.07986
41
citations
#630

Agents' Room: Narrative Generation through Multi-step Collaboration

Fantine Huot, Reinald Kim Amplayo, Jennimaria Palomaki et al.

ICLR 2025posterarXiv:2410.02603
41
citations
#631

Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models

Michael Noukhovitch, Shengyi Huang, Sophie Xhonneux et al.

ICLR 2025posterarXiv:2410.18252
41
citations
#632

Prototypical Calibrating Ambiguous Samples for Micro-Action Recognition

Kun Li, Dan Guo, Guoliang Chen et al.

AAAI 2025paperarXiv:2412.14719
41
citations
#633

Learning to Prompt with Text Only Supervision for Vision-Language Models

Muhammad Uzair Khattak, Muhammad Ferjad Naeem, Muzammal Naseer et al.

AAAI 2025paperarXiv:2401.02418
41
citations
#634

WritingBench: A Comprehensive Benchmark for Generative Writing

Yuning Wu, Jiahao Mei, Ming Yan et al.

NEURIPS 2025posterarXiv:2503.05244
41
citations
#635

VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM

Yuqian Yuan, Hang Zhang, Wentong Li et al.

CVPR 2025posterarXiv:2501.00599
41
citations
#636

CG-Bench: Clue-grounded Question Answering Benchmark for Long Video Understanding

Guo Chen, Yicheng Liu, Yifei Huang et al.

ICLR 2025posterarXiv:2412.12075
41
citations
#637

MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs

Yusu Qian, Hanrong Ye, Jean-Philippe Fauconnier et al.

ICLR 2025posterarXiv:2407.01509
41
citations
#638

On Targeted Manipulation and Deception when Optimizing LLMs for User Feedback

Marcus Williams, Micah Carroll, Adhyyan Narang et al.

ICLR 2025posterarXiv:2411.02306
41
citations
#639

Video World Models with Long-term Spatial Memory

Tong Wu, Shuai Yang, Ryan Po et al.

NEURIPS 2025oralarXiv:2506.05284
41
citations
#640

Point-SAM: Promptable 3D Segmentation Model for Point Clouds

Yuchen Zhou, Jiayuan Gu, Tung Chiang et al.

ICLR 2025posterarXiv:2406.17741
41
citations
#641

Scaling RL to Long Videos

Yukang Chen, Wei Huang, Baifeng Shi et al.

NEURIPS 2025posterarXiv:2507.07966
41
citations
#642

3DTrajMaster: Mastering 3D Trajectory for Multi-Entity Motion in Video Generation

Xiao Fu, Xian Liu, Xintao WANG et al.

ICLR 2025posterarXiv:2412.07759
41
citations
#643

On the expressiveness and spectral bias of KANs

Yixuan Wang, Jonathan Siegel, Ziming Liu et al.

ICLR 2025posterarXiv:2410.01803
41
citations
#644

A Distractor-Aware Memory for Visual Object Tracking with SAM2

Alan Lukezic, Jovana Videnović, Matej Kristan

CVPR 2025posterarXiv:2411.17576
40
citations
#645

HD-EPIC: A Highly-Detailed Egocentric Video Dataset

Toby Perrett, Ahmad Darkhalil, Saptarshi Sinha et al.

CVPR 2025posterarXiv:2502.04144
40
citations
#646

Don't Shake the Wheel: Momentum-Aware Planning in End-to-End Autonomous Driving

Ziying Song, Caiyan Jia, Lin Liu et al.

CVPR 2025posterarXiv:2503.03125
40
citations
#647

SANA: Efficient High-Resolution Text-to-Image Synthesis with Linear Diffusion Transformers

Enze Xie, Junsong Chen, Junyu Chen et al.

ICLR 2025poster
40
citations
#648

Theory on Mixture-of-Experts in Continual Learning

Hongbo Li, Sen Lin, Lingjie Duan et al.

ICLR 2025posterarXiv:2406.16437
40
citations
#649

AA-CLIP: Enhancing Zero-Shot Anomaly Detection via Anomaly-Aware CLIP

wenxin ma, Xu Zhang, Qingsong Yao et al.

CVPR 2025posterarXiv:2503.06661
40
citations
#650

Self-Evolving Multi-Agent Collaboration Networks for Software Development

Yue Hu, Yuzhu Cai, Yaxin Du et al.

ICLR 2025posterarXiv:2410.16946
40
citations
#651

Agentic RL Scaling Law: Spontaneous Code Execution for Mathematical Problem Solving

Xinji Mai, Haotian Xu, Xing W et al.

NEURIPS 2025poster
40
citations
#652

PhysTwin: Physics-Informed Reconstruction and Simulation of Deformable Objects from Videos

Hanxiao Jiang, Hao-Yu Hsu, Kaifeng Zhang et al.

ICCV 2025posterarXiv:2503.17973
40
citations
#653

Multi-subject Open-set Personalization in Video Generation

Tsai-Shien Chen, Aliaksandr Siarohin, Willi Menapace et al.

CVPR 2025posterarXiv:2501.06187
40
citations
#654

Soft Thinking: Unlocking the Reasoning Potential of LLMs in Continuous Concept Space

Zhen Zhang, Xuehai He, Weixiang Yan et al.

NEURIPS 2025posterarXiv:2505.15778
40
citations
#655

Human-inspired Episodic Memory for Infinite Context LLMs

Zafeirios Fountas, Martin A Benfeghoul, Adnan Oomerjee et al.

ICLR 2025oralarXiv:2407.09450
40
citations
#656

DEFOM-Stereo: Depth Foundation Model Based Stereo Matching

Hualie Jiang, Zhiqiang Lou, Laiyan Ding et al.

CVPR 2025posterarXiv:2501.09466
40
citations
#657

A-Bench: Are LMMs Masters at Evaluating AI-generated Images?

Zicheng Zhang, Haoning Wu, Chunyi Li et al.

ICLR 2025posterarXiv:2406.03070
40
citations
#658

Trajectory attention for fine-grained video motion control

Zeqi Xiao, Wenqi Ouyang, Yifan Zhou et al.

ICLR 2025oralarXiv:2411.19324
40
citations
#659

Generalizing Deepfake Video Detection with Plug-and-Play: Video-Level Blending and Spatiotemporal Adapter Tuning

Zhiyuan Yan, Yandan Zhao, Shen Chen et al.

CVPR 2025posterarXiv:2408.17065
40
citations
#660

Looking Inward: Language Models Can Learn About Themselves by Introspection

Felix Jedidja Binder, James Chua, Tomek Korbak et al.

ICLR 2025oralarXiv:2410.13787
40
citations
#661

On the Role of Attention Heads in Large Language Model Safety

Zhenhong Zhou, Haiyang Yu, Xinghua Zhang et al.

ICLR 2025posterarXiv:2410.13708
40
citations
#662

The Surprising Effectiveness of Test-Time Training for Few-Shot Learning

Ekin Akyürek, Mehul Damani, Adam Zweiger et al.

ICML 2025posterarXiv:2411.07279
40
citations
#663

PartField: Learning 3D Feature Fields for Part Segmentation and Beyond

Minghua Liu, Mikaela Uy, Donglai Xiang et al.

ICCV 2025posterarXiv:2504.11451
40
citations
#664

TestGenEval: A Real World Unit Test Generation and Test Completion Benchmark

Kush Jain, Gabriel Synnaeve, Baptiste Roziere

ICLR 2025posterarXiv:2410.00752
40
citations
#665

Dita: Scaling Diffusion Transformer for Generalist Vision-Language-Action Policy

Zhi Hou, Tianyi Zhang, Yuwen Xiong et al.

ICCV 2025posterarXiv:2503.19757
40
citations
#666

Scaling Speech-Text Pre-training with Synthetic Interleaved Data

Aohan Zeng, Zhengxiao Du, Mingdao Liu et al.

ICLR 2025posterarXiv:2411.17607
40
citations
#667

OpenHumanVid: A Large-Scale High-Quality Dataset for Enhancing Human-Centric Video Generation

Hui Li, Mingwang Xu, Qingkun Su et al.

CVPR 2025highlightarXiv:2412.00115
40
citations
#668

Streaming Video Question-Answering with In-context Video KV-Cache Retrieval

Shangzhe Di, Zhelun Yu, Guanghao Zhang et al.

ICLR 2025posterarXiv:2503.00540
40
citations
#669

Do Vision-Language Models Represent Space and How? Evaluating Spatial Frame of Reference under Ambiguities

Zheyuan Zhang, Fengyuan Hu, Jayjun Lee et al.

ICLR 2025posterarXiv:2410.17385
40
citations
#670

Diffusion Feedback Helps CLIP See Better

Wenxuan Wang, Quan Sun, Fan Zhang et al.

ICLR 2025posterarXiv:2407.20171
39
citations
#671

An Architecture Search Framework for Inference-Time Techniques

Jon Saad-Falcon, Adrian Lafuente, Shlok Natarajan et al.

ICML 2025posterarXiv:2409.15254
39
citations
#672

3D-LLaVA: Towards Generalist 3D LMMs with Omni Superpoint Transformer

Jiajun Deng, Tianyu He, Li Jiang et al.

CVPR 2025posterarXiv:2501.01163
39
citations
#673

Test-time Alignment of Diffusion Models without Reward Over-optimization

Sunwoo Kim, Minkyu Kim, Dongmin Park

ICLR 2025posterarXiv:2501.05803
39
citations
#674

Scaling Language-Free Visual Representation Learning

David Fan, Shengbang Tong, Jiachen Zhu et al.

ICCV 2025highlightarXiv:2504.01017
39
citations
#675

Competition Dynamics Shape Algorithmic Phases of In-Context Learning

Core Francisco Park, Ekdeep Singh Lubana, Hidenori Tanaka

ICLR 2025posterarXiv:2412.01003
39
citations
#676

The Semantic Hub Hypothesis: Language Models Share Semantic Representations Across Languages and Modalities

Zhaofeng Wu, Xinyan Yu, Dani Yogatama et al.

ICLR 2025posterarXiv:2411.04986
39
citations
#677

TabArena: A Living Benchmark for Machine Learning on Tabular Data

Nick Erickson, Lennart Purucker, Andrej Tschalzev et al.

NEURIPS 2025spotlightarXiv:2506.16791
39
citations
#678

Theory, Analysis, and Best Practices for Sigmoid Self-Attention

Jason Ramapuram, Federico Danieli, Eeshan Gunesh Dhekane et al.

ICLR 2025posterarXiv:2409.04431
39
citations
#679

SANA-Sprint: One-Step Diffusion with Continuous-Time Consistency Distillation

Junsong Chen, Shuchen Xue, Yuyang Zhao et al.

ICCV 2025highlightarXiv:2503.09641
39
citations
#680

DrVideo: Document Retrieval Based Long Video Understanding

Ziyu Ma, Chenhui Gou, Hengcan Shi et al.

CVPR 2025posterarXiv:2406.12846
39
citations
#681

OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?

Junbo Niu, Yifei Li, Ziyang Miao et al.

CVPR 2025posterarXiv:2501.05510
39
citations
#682

MR-GSM8K: A Meta-Reasoning Benchmark for Large Language Model Evaluation

Zhongshen Zeng, Pengguang Chen, Shu Liu et al.

ICLR 2025posterarXiv:2312.17080
39
citations
#683

RATT: A Thought Structure for Coherent and Correct LLM Reasoning

Jinghan Zhang, Xiting Wang, Weijieying Ren et al.

AAAI 2025paperarXiv:2406.02746
39
citations
#684

Sonata: Self-Supervised Learning of Reliable Point Representations

Xiaoyang Wu, Daniel DeTone, Duncan Frost et al.

CVPR 2025highlightarXiv:2503.16429
39
citations
#685

Diversity Empowers Intelligence: Integrating Expertise of Software Engineering Agents

Kexun Zhang, Weiran Yao, Zuxin Liu et al.

ICLR 2025posterarXiv:2408.07060
39
citations
#686

EG4D: Explicit Generation of 4D Object without Score Distillation

Qi Sun, Zhiyang Guo, Ziyu Wan et al.

ICLR 2025oralarXiv:2405.18132
39
citations
#687

SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration

Heming Xia, Yongqi Li, Jun Zhang et al.

ICLR 2025posterarXiv:2410.06916
39
citations
#688

Attention with Markov: A Curious Case of Single-layer Transformers

Ashok Makkuva, Marco Bondaschi, Adway Girish et al.

ICLR 2025posterarXiv:2402.04161
39
citations
#689

Building Interactable Replicas of Complex Articulated Objects via Gaussian Splatting

Yu Liu, Baoxiong Jia, Ruijie Lu et al.

ICLR 2025posterarXiv:2502.19459
39
citations
#690

Reasoning Gym: Reasoning Environments for Reinforcement Learning with Verifiable Rewards

Zafir Stojanovski, Oliver Stanley, Joe Sharratt et al.

NEURIPS 2025spotlightarXiv:2505.24760
39
citations
#691

Combining Induction and Transduction for Abstract Reasoning

Wen-Ding Li, Keya Hu, Carter Larsen et al.

ICLR 2025posterarXiv:2411.02272
38
citations
#692

DiG: Scalable and Efficient Diffusion Models with Gated Linear Attention

Lianghui Zhu, Zilong Huang, Bencheng Liao et al.

CVPR 2025posterarXiv:2405.18428
38
citations
#693

Animate Anyone 2: High-Fidelity Character Image Animation with Environment Affordance

Li Hu, wang yuan, Zhen Shen et al.

ICCV 2025posterarXiv:2502.06145
38
citations
#694

Sampling-Efficient Test-Time Scaling: Self-Estimating the Best-of-N Sampling in Early Decoding

Yiming Wang, Pei Zhang, Siyuan Huang et al.

NEURIPS 2025spotlightarXiv:2503.01422
38
citations
#695

Dynamic-LLaVA: Efficient Multimodal Large Language Models via Dynamic Vision-language Context Sparsification

Wenxuan Huang, Zijie Zhai, Yunhang Shen et al.

ICLR 2025posterarXiv:2412.00876
38
citations
#696

5%>100%: Breaking Performance Shackles of Full Fine-Tuning on Visual Recognition Tasks

Dongshuo Yin, Leiyi Hu, Bin Li et al.

CVPR 2025posterarXiv:2408.08345
38
citations
#697

Exploiting Multimodal Spatial-temporal Patterns for Video Object Tracking

Xiantao Hu, Ying Tai, Xu Zhao et al.

AAAI 2025paperarXiv:2412.15691
38
citations
#698

MIDI: Multi-Instance Diffusion for Single Image to 3D Scene Generation

Zehuan Huang, Yuanchen Guo, Xingqiao An et al.

CVPR 2025posterarXiv:2412.03558
38
citations
#699

Flash Diffusion: Accelerating Any Conditional Diffusion Model for Few Steps Image Generation

Clément Chadebec, Onur Tasar, Eyal Benaroche et al.

AAAI 2025paperarXiv:2406.02347
38
citations
#700

Watermark Anything With Localized Messages

Tom Sander, Pierre Fernandez, Alain Oliviero Durmus et al.

ICLR 2025posterarXiv:2411.07231
38
citations
#701

Strong Model Collapse

Elvis Dohmatob, Yunzhen Feng, Arjun Subramonian et al.

ICLR 2025posterarXiv:2410.04840
38
citations
#702

CoCoCo: Improving Text-Guided Video Inpainting for Better Consistency, Controllability and Compatibility

Bojia Zi, Shihao Zhao, Xianbiao Qi et al.

AAAI 2025paperarXiv:2403.12035
38
citations
#703

Uni-Sign: Toward Unified Sign Language Understanding at Scale

Zecheng Li, Wengang Zhou, Weichao Zhao et al.

ICLR 2025posterarXiv:2501.15187
38
citations
#704

GaussianOcc: Fully Self-supervised and Efficient 3D Occupancy Estimation with Gaussian Splatting

Wanshui Gan, Fang Liu, Hongbin Xu et al.

ICCV 2025posterarXiv:2408.11447
38
citations
#705

FastVLM: Efficient Vision Encoding for Vision Language Models

Pavan Kumar Anasosalu Vasu, Fartash Faghri, Chun-Liang Li et al.

CVPR 2025posterarXiv:2412.13303
38
citations
#706

EVER: Exact Volumetric Ellipsoid Rendering for Real-time View Synthesis

Alexander Mai, Peter Hedman, George Kopanas et al.

ICCV 2025posterarXiv:2410.01804
38
citations
#707

LazyDiT: Lazy Learning for the Acceleration of Diffusion Transformers

Xuan Shen, Zhao Song, Yufa Zhou et al.

AAAI 2025paperarXiv:2412.12444
38
citations
#708

Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective

Jorge (Zhoujun) Cheng, Shibo Hao, Tianyang Liu et al.

NEURIPS 2025posterarXiv:2506.14965
38
citations
#709

Multi-modal Agent Tuning: Building a VLM-Driven Agent for Efficient Tool Usage

Zhi Gao, Bofei Zhang, Pengxiang Li et al.

ICLR 2025posterarXiv:2412.15606
38
citations
#710

GaussianFormer-2: Probabilistic Gaussian Superposition for Efficient 3D Occupancy Prediction

Yuanhui Huang, Amonnut Thammatadatrakoon, Wenzhao Zheng et al.

CVPR 2025posterarXiv:2412.04384
38
citations
#711

VISTA3D: A Unified Segmentation Foundation Model For 3D Medical Imaging

Yufan He, Pengfei Guo, Yucheng Tang et al.

CVPR 2025posterarXiv:2406.05285
38
citations
#712

PaPaGei: Open Foundation Models for Optical Physiological Signals

Arvind Pillai, Dimitris Spathis, Fahim Kawsar et al.

ICLR 2025posterarXiv:2410.20542
38
citations
#713

Video-Guided Foley Sound Generation with Multimodal Controls

Ziyang Chen, Prem Seetharaman, Bryan Russell et al.

CVPR 2025posterarXiv:2411.17698
38
citations
#714

Missing Premise exacerbates Overthinking: Are Reasoning Models losing Critical Thinking Skill?

Chenrui Fan, Ming Li, Lichao Sun et al.

COLM 2025paperarXiv:2504.06514
38
citations
#715

DiffSplat: Repurposing Image Diffusion Models for Scalable Gaussian Splat Generation

Chenguo Lin, Panwang Pan, Bangbang Yang et al.

ICLR 2025posterarXiv:2501.16764
38
citations
#716

SG-I2V: Self-Guided Trajectory Control in Image-to-Video Generation

Koichi Namekata, Sherwin Bahmani, Ziyi Wu et al.

ICLR 2025posterarXiv:2411.04989
38
citations
#717

Cycle3D: High-quality and Consistent Image-to-3D Generation via Generation-Reconstruction Cycle

Zhenyu Tang, Junwu Zhang, Xinhua Cheng et al.

AAAI 2025paperarXiv:2407.19548
37
citations
#718

Training-Free Activation Sparsity in Large Language Models

James Liu, Pragaash Ponnusamy, Tianle Cai et al.

ICLR 2025posterarXiv:2408.14690
37
citations
#719

Looped Transformers for Length Generalization

Ying Fan, Yilun Du, Kannan Ramchandran et al.

ICLR 2025posterarXiv:2409.15647
37
citations
#720

Synthetic continued pretraining

Zitong Yang, Neil Band, Shuangping Li et al.

ICLR 2025posterarXiv:2409.07431
37
citations
#721

Harmonizing Visual Representations for Unified Multimodal Understanding and Generation

Size Wu, Wenwei Zhang, Lumin Xu et al.

ICCV 2025posterarXiv:2503.21979
37
citations
#722

Rethinking Diffusion for Text-Driven Human Motion Generation: Redundant Representations, Evaluation, and Masked Autoregression

Zichong Meng, Yiming Xie, Xiaogang Peng et al.

CVPR 2025posterarXiv:2411.16575
37
citations
#723

MoH: Multi-Head Attention as Mixture-of-Head Attention

Peng Jin, Bo Zhu, Li Yuan et al.

ICML 2025posterarXiv:2410.11842
37
citations
#724

Mixture of Attention Spans: Optimizing LLM Inference Efficiency with Heterogeneous Sliding-Window Lengths

Tianyu Fu, Haofeng Huang, Xuefei Ning et al.

COLM 2025paperarXiv:2406.14909
37
citations
#725

No Training, No Problem: Rethinking Classifier-Free Guidance for Diffusion Models

Seyedmorteza Sadat, Manuel Kansy, Otmar Hilliges et al.

ICLR 2025posterarXiv:2407.02687
37
citations
#726

Autoregressive Adversarial Post-Training for Real-Time Interactive Video Generation

Shanchuan Lin, Ceyuan Yang, Hao He et al.

NEURIPS 2025posterarXiv:2506.09350
37
citations
#727

Evaluating the Evaluator: Measuring LLMs’ Adherence to Task Evaluation Instructions

Bhuvanashree Murugadoss, Christian Poelitz, Ian Drosos et al.

AAAI 2025paperarXiv:2408.08781
37
citations
#728

Sparse VideoGen2: Accelerate Video Generation with Sparse Attention via Semantic-Aware Permutation

Shuo Yang, Haocheng Xi, Yilong Zhao et al.

NEURIPS 2025spotlightarXiv:2505.18875
37
citations
#729

MALT: Improving Reasoning with Multi-Agent LLM Training

Sumeet Ramesh Motwani, Chandler Smith, Rocktim Jyoti Das et al.

COLM 2025paperarXiv:2412.01928
37
citations
#730

SUTrack: Towards Simple and Unified Single Object Tracking

Xin Chen, Ben Kang, Wanting Geng et al.

AAAI 2025paperarXiv:2412.19138
37
citations
#731

TSD-SR: One-Step Diffusion with Target Score Distillation for Real-World Image Super-Resolution

linwei dong, Qingnan Fan, Yihong Guo et al.

CVPR 2025posterarXiv:2411.18263
37
citations
#732

On Scaling Up 3D Gaussian Splatting Training

Hexu Zhao, Haoyang Weng, Daohan Lu et al.

ICLR 2025posterarXiv:2406.18533
37
citations
#733

Speculative Thinking: Enhancing Small-Model Reasoning with Large Model Guidance at Inference Time

Van Yang, Xiang Yue, Vipin Chaudhary et al.

COLM 2025paperarXiv:2504.12329
37
citations
#734

SpecReason: Fast and Accurate Inference-Time Compute via Speculative Reasoning

Rui Pan, Yinwei Dai, Zhihao Zhang et al.

NEURIPS 2025posterarXiv:2504.07891
37
citations
#735

SymmCD: Symmetry-Preserving Crystal Generation with Diffusion Models

Daniel Levy, Siba Smarak Panigrahi, Sékou-Oumar Kaba et al.

ICLR 2025posterarXiv:2502.03638
37
citations
#736

Large Language Models Assume People are More Rational than We Really are

Ryan Liu, Jiayi Geng, Joshua Peterson et al.

ICLR 2025posterarXiv:2406.17055
37
citations
#737

EnerVerse: Envisioning Embodied Future Space for Robotics Manipulation

Siyuan Huang, Liliang Chen, Pengfei Zhou et al.

NEURIPS 2025posterarXiv:2501.01895
37
citations
#738

Variational Best-of-N Alignment

Afra Amini, Tim Vieira, Elliott Ash et al.

ICLR 2025posterarXiv:2407.06057
37
citations
#739

Modeling Future Conversation Turns to Teach LLMs to Ask Clarifying Questions

Michael Zhang, W. Bradley Knox, Eunsol Choi

ICLR 2025posterarXiv:2410.13788
37
citations
#740

SplatAD: Real-Time Lidar and Camera Rendering with 3D Gaussian Splatting for Autonomous Driving

Georg Hess, Carl Lindström, Maryam Fatemi et al.

CVPR 2025posterarXiv:2411.16816
37
citations
#741

Sparse Meets Dense: Unified Generative Recommendations with Cascaded Sparse-Dense Representations

Yuhao Yang, ZhI JI, Zhaopeng Li et al.

NEURIPS 2025posterarXiv:2503.02453
37
citations
#742

VLAS: Vision-Language-Action Model with Speech Instructions for Customized Robot Manipulation

Wei Zhao, Pengxiang Ding, Zhang Min et al.

ICLR 2025posterarXiv:2502.13508
37
citations
#743

Deconstructing What Makes a Good Optimizer for Autoregressive Language Models

Rosie Zhao, Depen Morwani, David Brandfonbrener et al.

ICLR 2025poster
37
citations
#744

Sparse MeZO: Less Parameters for Better Performance in Zeroth-Order LLM Fine-Tuning

Yong Liu, Zirui Zhu, Chaoyu Gong et al.

NEURIPS 2025posterarXiv:2402.15751
37
citations
#745

Sparse Autoencoders Do Not Find Canonical Units of Analysis

Patrick Leask, Bart Bussmann, Michael Pearce et al.

ICLR 2025posterarXiv:2502.04878
37
citations
#746

How Does Critical Batch Size Scale in Pre-training?

Hanlin Zhang, Depen Morwani, Nikhil Vyas et al.

ICLR 2025posterarXiv:2410.21676
37
citations
#747

TinySAM: Pushing the Envelope for Efficient Segment Anything Model

Han Shu, Wenshuo Li, Yehui Tang et al.

AAAI 2025paperarXiv:2312.13789
37
citations
#748

Real-Time Execution of Action Chunking Flow Policies

Kevin Black, Manuel Galliker, Sergey Levine

NEURIPS 2025oralarXiv:2506.07339
37
citations
#749

ShieldAgent: Shielding Agents via Verifiable Safety Policy Reasoning

Zhaorun Chen, Mintong Kang, Bo Li

ICML 2025posterarXiv:2503.22738
37
citations
#750

Speedy-Splat: Fast 3D Gaussian Splatting with Sparse Pixels and Sparse Primitives

Alex Hanson, Allen Tu, Geng Lin et al.

CVPR 2025posterarXiv:2412.00578
36
citations
#751

Know Me, Respond to Me: Benchmarking LLMs for Dynamic User Profiling and Personalized Responses at Scale

Bowen Jiang, Zhuoqun Hao, Young Min Cho et al.

COLM 2025paperarXiv:2504.14225
36
citations
#752

ReMA: Learning to Meta-Think for LLMs with Multi-agent Reinforcement Learning

Ziyu Wan, Yunxiang Li, Xiaoyu Wen et al.

NEURIPS 2025posterarXiv:2503.09501
36
citations
#753

IDOL: Instant Photorealistic 3D Human Creation from a Single Image

Yiyu Zhuang, Jiaxi Lv, Hao Wen et al.

CVPR 2025posterarXiv:2412.14963
36
citations
#754

TOMATO: Assessing Visual Temporal Reasoning Capabilities in Multimodal Foundation Models

Ziyao Shangguan, Chuhan Li, Yuxuan Ding et al.

ICLR 2025oralarXiv:2410.23266
36
citations
#755

Gradient Alignment in Physics-informed Neural Networks: A Second-Order Optimization Perspective

Sifan Wang, Ananyae bhartari, Bowen Li et al.

NEURIPS 2025posterarXiv:2502.00604
36
citations
#756

MAGIC: Generating Self-Correction Guideline for In-Context Text-to-SQL

Arian Askari, Christian Poelitz, Xinye Tang

AAAI 2025paperarXiv:2406.12692
36
citations
#757

STI-Bench: Are MLLMs Ready for Precise Spatial-Temporal World Understanding?

Yun Li, Yiming Zhang, Tao Lin et al.

ICCV 2025posterarXiv:2503.23765
36
citations
#758

DiTAR: Diffusion Transformer Autoregressive Modeling for Speech Generation

Dongya Jia, Zhuo Chen, Jiawei Chen et al.

ICML 2025posterarXiv:2502.03930
36
citations
#759

Large-Scale Text-to-Image Model with Inpainting is a Zero-Shot Subject-Driven Image Generator

Chaehun Shin, Jooyoung Choi, Heeseung Kim et al.

CVPR 2025posterarXiv:2411.15466
36
citations
#760

Re-thinking Temporal Search for Long-Form Video Understanding

Jinhui Ye, Zihan Wang, Haosen Sun et al.

CVPR 2025posterarXiv:2504.02259
36
citations
#761

FairMT-Bench: Benchmarking Fairness for Multi-turn Dialogue in Conversational LLMs

Zhiting Fan, Ruizhe Chen, Tianxiang Hu et al.

ICLR 2025posterarXiv:2410.19317
36
citations
#762

ConceptPrune: Concept Editing in Diffusion Models via Skilled Neuron Pruning

Ruchika Chavhan, Da Li, Timothy Hospedales

ICLR 2025posterarXiv:2405.19237
36
citations
#763

Human-Object Interaction from Human-Level Instructions

Zhen Wu, Jiaman Li, Pei Xu et al.

ICCV 2025posterarXiv:2406.17840
36
citations
#764

OpenCUA: Open Foundations for Computer-Use Agents

Xinyuan Wang, Bowen Wang, Dunjie Lu et al.

NEURIPS 2025spotlightarXiv:2508.09123
36
citations
#765

PolaFormer: Polarity-aware Linear Attention for Vision Transformers

Weikang Meng, Yadan Luo, Xin Li et al.

ICLR 2025posterarXiv:2501.15061
36
citations
#766

GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents

Qianhui Wu, Kanzhi Cheng, Rui Yang et al.

NEURIPS 2025posterarXiv:2506.03143
36
citations
#767

Vision-Language Models Do Not Understand Negation

Kumail Alhamoud, Shaden Alshammari, Yonglong Tian et al.

CVPR 2025posterarXiv:2501.09425
36
citations
#768

FreeVS: Generative View Synthesis on Free Driving Trajectory

Qitai Wang, Lue Fan, Yuqi Wang et al.

ICLR 2025posterarXiv:2410.18079
36
citations
#769

DartControl: A Diffusion-Based Autoregressive Motion Model for Real-Time Text-Driven Motion Control

Kaifeng Zhao, Gen Li, Siyu Tang

ICLR 2025posterarXiv:2410.05260
36
citations
#770

Causal Prompting: Debiasing Large Language Model Prompting Based on Front-Door Adjustment

Congzhi Zhang, Linhai Zhang, Jialong Wu et al.

AAAI 2025paperarXiv:2403.02738
36
citations
#771

Frequency-Aligned Knowledge Distillation for Lightweight Spatiotemporal Forecasting

Yuqi Li, Chuanguang Yang, Hansheng Zeng et al.

ICCV 2025posterarXiv:2507.02939
36
citations
#772

DiscoveryBench: Towards Data-Driven Discovery with Large Language Models

Bodhisattwa Prasad Majumder, Harshit Surana, Dhruv Agarwal et al.

ICLR 2025posterarXiv:2407.01725
36
citations
#773

FaceXFormer: A Unified Transformer for Facial Analysis

Kartik Narayan, Vibashan VS, Rama Chellappa et al.

ICCV 2025posterarXiv:2403.12960
36
citations
#774

StreetCrafter: Street View Synthesis with Controllable Video Diffusion Models

Yunzhi Yan, Zhen Xu, Haotong Lin et al.

CVPR 2025posterarXiv:2412.13188
35
citations
#775

Adaptive Data Optimization: Dynamic Sample Selection with Scaling Laws

Yiding Jiang, Allan Zhou, Zhili Feng et al.

ICLR 2025posterarXiv:2410.11820
35
citations
#776

Direct3D-S2: Gigascale 3D Generation Made Easy with Spatial Sparse Attention

Shuang Wu, Youtian Lin, Feihu Zhang et al.

NEURIPS 2025posterarXiv:2505.17412
35
citations
#777

EmbodiedSAM: Online Segment Any 3D Thing in Real Time

Xiuwei Xu, Huangxing Chen, Linqing Zhao et al.

ICLR 2025posterarXiv:2408.11811
35
citations
#778

Towards General Visual-Linguistic Face Forgery Detection

Ke Sun, Shen Chen, Taiping Yao et al.

CVPR 2025posterarXiv:2307.16545
35
citations
#779

Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning Models

Ruikang Liu, Yuxuan Sun, Manyi Zhang et al.

COLM 2025paperarXiv:2504.04823
35
citations
#780

Adversarial Perturbations Cannot Reliably Protect Artists From Generative AI

Robert Hönig, Javier Rando, Nicholas Carlini et al.

ICLR 2025posterarXiv:2406.12027
35
citations
#781

Think Only When You Need with Large Hybrid-Reasoning Models

Lingjie Jiang, Xun Wu, Shaohan Huang et al.

NEURIPS 2025posterarXiv:2505.14631
35
citations
#782

TrajectoryCrafter: Redirecting Camera Trajectory for Monocular Videos via Diffusion Models

Mark YU, Wenbo Hu, Jinbo Xing et al.

ICCV 2025posterarXiv:2503.05638
35
citations
#783

Generalizing Verifiable Instruction Following

Valentina Pyatkin, Saumya Malik, Victoria Graf et al.

NEURIPS 2025posterarXiv:2507.02833
35
citations
#784

Reconstructive Visual Instruction Tuning

Haochen Wang, Anlin Zheng, Yucheng Zhao et al.

ICLR 2025posterarXiv:2410.09575
35
citations
#785

SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding

Hao Li, Changyao TIAN, Jie Shao et al.

CVPR 2025posterarXiv:2412.09604
35
citations
#786

Attentive Eraser: Unleashing Diffusion Model’s Object Removal Potential via Self-Attention Redirection Guidance

Wenhao Sun, Xue-Mei Dong, Benlei Cui et al.

AAAI 2025paperarXiv:2412.12974
35
citations
#787

Preserving Diversity in Supervised Fine-Tuning of Large Language Models

Ziniu Li, Congliang Chen, Tian Xu et al.

ICLR 2025posterarXiv:2408.16673
35
citations
#788

From Reusing to Forecasting: Accelerating Diffusion Models with TaylorSeers

Jiacheng Liu, Chang Zou, Yuanhuiyi Lyu et al.

ICCV 2025posterarXiv:2503.06923
35
citations
#789

Flow Matching with General Discrete Paths: A Kinetic-Optimal Perspective

Neta Shaul, Itai Gat, Marton Havasi et al.

ICLR 2025posterarXiv:2412.03487
35
citations
#790

Sequential Controlled Langevin Diffusions

Junhua Chen, Lorenz Richter, Julius Berner et al.

ICLR 2025posterarXiv:2412.07081
35
citations
#791

PAD: Personalized Alignment of LLMs at Decoding-time

Ruizhe Chen, Xiaotian Zhang, Meng Luo et al.

ICLR 2025posterarXiv:2410.04070
35
citations
#792

Scaling Laws of Synthetic Data for Language Model

Zeyu Qin, Qingxiu Dong, Xingxing Zhang et al.

COLM 2025paperarXiv:2503.19551
35
citations
#793

AgentStudio: A Toolkit for Building General Virtual Agents

Longtao Zheng, Zhiyuan Huang, Zhenghai Xue et al.

ICLR 2025posterarXiv:2403.17918
35
citations
#794

Align Your Flow: Scaling Continuous-Time Flow Map Distillation

Amirmojtaba Sabour, Sanja Fidler, Karsten Kreis

NEURIPS 2025posterarXiv:2506.14603
35
citations
#795

UniCTokens: Boosting Personalized Understanding and Generation via Unified Concept Tokens

Ruichuan An, Sihan Yang, Renrui Zhang et al.

NEURIPS 2025posterarXiv:2505.14671
35
citations
#796

DeepMesh: Auto-Regressive Artist-mesh Creation with Reinforcement Learning

Ruowen Zhao, James Jun Liang Chen Ye, Zhengyi Wang et al.

ICCV 2025posterarXiv:2503.15265
35
citations
#797

Pow3R: Empowering Unconstrained 3D Reconstruction with Camera and Scene Priors

Wonbong Jang, Philippe Weinzaepfel, Vincent Leroy et al.

CVPR 2025posterarXiv:2503.17316
35
citations
#798

MNE-SLAM: Multi-Agent Neural SLAM for Mobile Robots

Tianchen Deng, Guole Shen, Chen Xun et al.

CVPR 2025poster
35
citations
#799

KungfuBot: Physics-Based Humanoid Whole-Body Control for Learning Highly-Dynamic Skills

Weiji Xie, Jinrui Han, Jiakun Zheng et al.

NEURIPS 2025posterarXiv:2506.12851
35
citations
#800

ExACT: Teaching AI Agents to Explore with Reflective-MCTS and Exploratory Learning

Xiao Yu, Baolin Peng, Vineeth Vajipey et al.

ICLR 2025posterarXiv:2410.02052
35
citations