Most Cited 2025 "region representation" Papers

22,274 papers found • Page 4 of 112

#601

Reasoning-SQL: Reinforcement Learning with SQL Tailored Partial Rewards for Reasoning-Enhanced Text-to-SQL

Mohammadreza Pourreza, Shayan Talaei, Ruoxi Sun et al.

COLM 2025paperarXiv:2503.23157
42
citations
#602

Correcting the Mythos of KL-Regularization: Direct Alignment without Overoptimization via Chi-Squared Preference Optimization

Audrey Huang, Wenhao Zhan, Tengyang Xie et al.

ICLR 2025posterarXiv:2407.13399
42
citations
#603

Towards Realistic Data Generation for Real-World Super-Resolution

Long Peng, Wenbo Li, Renjing Pei et al.

ICLR 2025posterarXiv:2406.07255
42
citations
#604

Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate

Yubo Wang, Xiang Yue, Wenhu Chen

COLM 2025paperarXiv:2501.17703
42
citations
#605

Frame-Voyager: Learning to Query Frames for Video Large Language Models

Sicheng Yu, CHENGKAI JIN, Huanyu Wang et al.

ICLR 2025posterarXiv:2410.03226
42
citations
#606

DINOv2 Meets Text: A Unified Framework for Image- and Pixel-Level Vision-Language Alignment

Dahyun Kang, Piotr Bojanowski, Huy V. Vo et al.

CVPR 2025posterarXiv:2412.16334
42
citations
#607

CC-OCR: A Comprehensive and Challenging OCR Benchmark for Evaluating Large Multimodal Models in Literacy

Zhibo Yang, Jun Tang, Zhaohai Li et al.

ICCV 2025posterarXiv:2412.02210
42
citations
#608

Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning

Yibin Wang, li zhimin, Yuhang Zang et al.

NEURIPS 2025posterarXiv:2505.03318
42
citations
#609

ETA: Evaluating Then Aligning Safety of Vision Language Models at Inference Time

Yi Ding, Bolian Li, Ruqi Zhang

ICLR 2025posterarXiv:2410.06625
42
citations
#610

Agents' Room: Narrative Generation through Multi-step Collaboration

Fantine Huot, Reinald Kim Amplayo, Jennimaria Palomaki et al.

ICLR 2025posterarXiv:2410.02603
41
citations
#611

WritingBench: A Comprehensive Benchmark for Generative Writing

Yuning Wu, Jiahao Mei, Ming Yan et al.

NEURIPS 2025posterarXiv:2503.05244
41
citations
#612

Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models

Michael Noukhovitch, Shengyi Huang, Sophie Xhonneux et al.

ICLR 2025posterarXiv:2410.18252
41
citations
#613

Learning to Prompt with Text Only Supervision for Vision-Language Models

Muhammad Uzair Khattak, Muhammad Ferjad Naeem, Muzammal Naseer et al.

AAAI 2025paperarXiv:2401.02418
41
citations
#614

Prototypical Calibrating Ambiguous Samples for Micro-Action Recognition

Kun Li, Dan Guo, Guoliang Chen et al.

AAAI 2025paperarXiv:2412.14719
41
citations
#615

VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM

Yuqian Yuan, Hang Zhang, Wentong Li et al.

CVPR 2025posterarXiv:2501.00599
41
citations
#616

CG-Bench: Clue-grounded Question Answering Benchmark for Long Video Understanding

Guo Chen, Yicheng Liu, Yifei Huang et al.

ICLR 2025posterarXiv:2412.12075
41
citations
#617

MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs

Yusu Qian, Hanrong Ye, Jean-Philippe Fauconnier et al.

ICLR 2025posterarXiv:2407.01509
41
citations
#618

On Targeted Manipulation and Deception when Optimizing LLMs for User Feedback

Marcus Williams, Micah Carroll, Adhyyan Narang et al.

ICLR 2025posterarXiv:2411.02306
41
citations
#619

On the expressiveness and spectral bias of KANs

Yixuan Wang, Jonathan Siegel, Ziming Liu et al.

ICLR 2025posterarXiv:2410.01803
41
citations
#620

Point-SAM: Promptable 3D Segmentation Model for Point Clouds

Yuchen Zhou, Jiayuan Gu, Tung Chiang et al.

ICLR 2025posterarXiv:2406.17741
41
citations
#621

3DTrajMaster: Mastering 3D Trajectory for Multi-Entity Motion in Video Generation

Xiao Fu, Xian Liu, Xintao WANG et al.

ICLR 2025posterarXiv:2412.07759
41
citations
#622

Scaling RL to Long Videos

Yukang Chen, Wei Huang, Baifeng Shi et al.

NEURIPS 2025posterarXiv:2507.07966
41
citations
#623

Video World Models with Long-term Spatial Memory

Tong Wu, Shuai Yang, Ryan Po et al.

NEURIPS 2025oralarXiv:2506.05284
41
citations
#624

A Distractor-Aware Memory for Visual Object Tracking with SAM2

Alan Lukezic, Jovana Videnović, Matej Kristan

CVPR 2025posterarXiv:2411.17576
40
citations
#625

HD-EPIC: A Highly-Detailed Egocentric Video Dataset

Toby Perrett, Ahmad Darkhalil, Saptarshi Sinha et al.

CVPR 2025posterarXiv:2502.04144
40
citations
#626

Don't Shake the Wheel: Momentum-Aware Planning in End-to-End Autonomous Driving

Ziying Song, Caiyan Jia, Lin Liu et al.

CVPR 2025posterarXiv:2503.03125
40
citations
#627

Theory on Mixture-of-Experts in Continual Learning

Hongbo Li, Sen Lin, Lingjie Duan et al.

ICLR 2025posterarXiv:2406.16437
40
citations
#628

AA-CLIP: Enhancing Zero-Shot Anomaly Detection via Anomaly-Aware CLIP

wenxin ma, Xu Zhang, Qingsong Yao et al.

CVPR 2025posterarXiv:2503.06661
40
citations
#629

Self-Evolving Multi-Agent Collaboration Networks for Software Development

Yue Hu, Yuzhu Cai, Yaxin Du et al.

ICLR 2025posterarXiv:2410.16946
40
citations
#630

SANA: Efficient High-Resolution Text-to-Image Synthesis with Linear Diffusion Transformers

Enze Xie, Junsong Chen, Junyu Chen et al.

ICLR 2025poster
40
citations
#631

Agentic RL Scaling Law: Spontaneous Code Execution for Mathematical Problem Solving

Xinji Mai, Haotian Xu, Xing W et al.

NEURIPS 2025poster
40
citations
#632

Streaming Video Question-Answering with In-context Video KV-Cache Retrieval

Shangzhe Di, Zhelun Yu, Guanghao Zhang et al.

ICLR 2025posterarXiv:2503.00540
40
citations
#633

The Surprising Effectiveness of Test-Time Training for Few-Shot Learning

Ekin Akyürek, Mehul Damani, Adam Zweiger et al.

ICML 2025posterarXiv:2411.07279
40
citations
#634

PhysTwin: Physics-Informed Reconstruction and Simulation of Deformable Objects from Videos

Hanxiao Jiang, Hao-Yu Hsu, Kaifeng Zhang et al.

ICCV 2025posterarXiv:2503.17973
40
citations
#635

Multi-subject Open-set Personalization in Video Generation

Tsai-Shien Chen, Aliaksandr Siarohin, Willi Menapace et al.

CVPR 2025posterarXiv:2501.06187
40
citations
#636

Human-inspired Episodic Memory for Infinite Context LLMs

Zafeirios Fountas, Martin A Benfeghoul, Adnan Oomerjee et al.

ICLR 2025oralarXiv:2407.09450
40
citations
#637

DEFOM-Stereo: Depth Foundation Model Based Stereo Matching

Hualie Jiang, Zhiqiang Lou, Laiyan Ding et al.

CVPR 2025posterarXiv:2501.09466
40
citations
#638

Soft Thinking: Unlocking the Reasoning Potential of LLMs in Continuous Concept Space

Zhen Zhang, Xuehai He, Weixiang Yan et al.

NEURIPS 2025posterarXiv:2505.15778
40
citations
#639

A-Bench: Are LMMs Masters at Evaluating AI-generated Images?

Zicheng Zhang, Haoning Wu, Chunyi Li et al.

ICLR 2025posterarXiv:2406.03070
40
citations
#640

Trajectory attention for fine-grained video motion control

Zeqi Xiao, Wenqi Ouyang, Yifan Zhou et al.

ICLR 2025oralarXiv:2411.19324
40
citations
#641

Generalizing Deepfake Video Detection with Plug-and-Play: Video-Level Blending and Spatiotemporal Adapter Tuning

Zhiyuan Yan, Yandan Zhao, Shen Chen et al.

CVPR 2025posterarXiv:2408.17065
40
citations
#642

Looking Inward: Language Models Can Learn About Themselves by Introspection

Felix Jedidja Binder, James Chua, Tomek Korbak et al.

ICLR 2025oralarXiv:2410.13787
40
citations
#643

PartField: Learning 3D Feature Fields for Part Segmentation and Beyond

Minghua Liu, Mikaela Uy, Donglai Xiang et al.

ICCV 2025posterarXiv:2504.11451
40
citations
#644

TestGenEval: A Real World Unit Test Generation and Test Completion Benchmark

Kush Jain, Gabriel Synnaeve, Baptiste Roziere

ICLR 2025posterarXiv:2410.00752
40
citations
#645

On the Role of Attention Heads in Large Language Model Safety

Zhenhong Zhou, Haiyang Yu, Xinghua Zhang et al.

ICLR 2025posterarXiv:2410.13708
40
citations
#646

Scaling Speech-Text Pre-training with Synthetic Interleaved Data

Aohan Zeng, Zhengxiao Du, Mingdao Liu et al.

ICLR 2025posterarXiv:2411.17607
40
citations
#647

OpenHumanVid: A Large-Scale High-Quality Dataset for Enhancing Human-Centric Video Generation

Hui Li, Mingwang Xu, Qingkun Su et al.

CVPR 2025highlightarXiv:2412.00115
40
citations
#648

Do Vision-Language Models Represent Space and How? Evaluating Spatial Frame of Reference under Ambiguities

Zheyuan Zhang, Fengyuan Hu, Jayjun Lee et al.

ICLR 2025posterarXiv:2410.17385
40
citations
#649

Diffusion Feedback Helps CLIP See Better

Wenxuan Wang, Quan Sun, Fan Zhang et al.

ICLR 2025posterarXiv:2407.20171
39
citations
#650

3D-LLaVA: Towards Generalist 3D LMMs with Omni Superpoint Transformer

Jiajun Deng, Tianyu He, Li Jiang et al.

CVPR 2025posterarXiv:2501.01163
39
citations
#651

Test-time Alignment of Diffusion Models without Reward Over-optimization

Sunwoo Kim, Minkyu Kim, Dongmin Park

ICLR 2025posterarXiv:2501.05803
39
citations
#652

An Architecture Search Framework for Inference-Time Techniques

Jon Saad-Falcon, Adrian Lafuente, Shlok Natarajan et al.

ICML 2025posterarXiv:2409.15254
39
citations
#653

Scaling Language-Free Visual Representation Learning

David Fan, Shengbang Tong, Jiachen Zhu et al.

ICCV 2025highlightarXiv:2504.01017
39
citations
#654

TabArena: A Living Benchmark for Machine Learning on Tabular Data

Nick Erickson, Lennart Purucker, Andrej Tschalzev et al.

NEURIPS 2025spotlightarXiv:2506.16791
39
citations
#655

Competition Dynamics Shape Algorithmic Phases of In-Context Learning

Core Francisco Park, Ekdeep Singh Lubana, Hidenori Tanaka

ICLR 2025posterarXiv:2412.01003
39
citations
#656

The Semantic Hub Hypothesis: Language Models Share Semantic Representations Across Languages and Modalities

Zhaofeng Wu, Xinyan Yu, Dani Yogatama et al.

ICLR 2025posterarXiv:2411.04986
39
citations
#657

Theory, Analysis, and Best Practices for Sigmoid Self-Attention

Jason Ramapuram, Federico Danieli, Eeshan Gunesh Dhekane et al.

ICLR 2025posterarXiv:2409.04431
39
citations
#658

SANA-Sprint: One-Step Diffusion with Continuous-Time Consistency Distillation

Junsong Chen, Shuchen Xue, Yuyang Zhao et al.

ICCV 2025highlightarXiv:2503.09641
39
citations
#659

DrVideo: Document Retrieval Based Long Video Understanding

Ziyu Ma, Chenhui Gou, Hengcan Shi et al.

CVPR 2025posterarXiv:2406.12846
39
citations
#660

OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?

Junbo Niu, Yifei Li, Ziyang Miao et al.

CVPR 2025posterarXiv:2501.05510
39
citations
#661

MR-GSM8K: A Meta-Reasoning Benchmark for Large Language Model Evaluation

Zhongshen Zeng, Pengguang Chen, Shu Liu et al.

ICLR 2025posterarXiv:2312.17080
39
citations
#662

HSEvo: Elevating Automatic Heuristic Design with Diversity-Driven Harmony Search and Genetic Algorithm Using LLMs

Pham Vu Tuan Dat, Long Doan, Huynh Thi Thanh Binh

AAAI 2025paperarXiv:2412.14995
39
citations
#663

Sonata: Self-Supervised Learning of Reliable Point Representations

Xiaoyang Wu, Daniel DeTone, Duncan Frost et al.

CVPR 2025highlightarXiv:2503.16429
39
citations
#664

Diversity Empowers Intelligence: Integrating Expertise of Software Engineering Agents

Kexun Zhang, Weiran Yao, Zuxin Liu et al.

ICLR 2025posterarXiv:2408.07060
39
citations
#665

RATT: A Thought Structure for Coherent and Correct LLM Reasoning

Jinghan Zhang, Xiting Wang, Weijieying Ren et al.

AAAI 2025paperarXiv:2406.02746
39
citations
#666

SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration

Heming Xia, Yongqi Li, Jun Zhang et al.

ICLR 2025posterarXiv:2410.06916
39
citations
#667

EG4D: Explicit Generation of 4D Object without Score Distillation

Qi Sun, Zhiyang Guo, Ziyu Wan et al.

ICLR 2025oralarXiv:2405.18132
39
citations
#668

Attention with Markov: A Curious Case of Single-layer Transformers

Ashok Makkuva, Marco Bondaschi, Adway Girish et al.

ICLR 2025posterarXiv:2402.04161
39
citations
#669

Reasoning Gym: Reasoning Environments for Reinforcement Learning with Verifiable Rewards

Zafir Stojanovski, Oliver Stanley, Joe Sharratt et al.

NEURIPS 2025spotlightarXiv:2505.24760
39
citations
#670

Building Interactable Replicas of Complex Articulated Objects via Gaussian Splatting

Yu Liu, Baoxiong Jia, Ruijie Lu et al.

ICLR 2025posterarXiv:2502.19459
39
citations
#671

Combining Induction and Transduction for Abstract Reasoning

Wen-Ding Li, Keya Hu, Carter Larsen et al.

ICLR 2025posterarXiv:2411.02272
38
citations
#672

DiG: Scalable and Efficient Diffusion Models with Gated Linear Attention

Lianghui Zhu, Zilong Huang, Bencheng Liao et al.

CVPR 2025posterarXiv:2405.18428
38
citations
#673

Dynamic-LLaVA: Efficient Multimodal Large Language Models via Dynamic Vision-language Context Sparsification

Wenxuan Huang, Zijie Zhai, Yunhang Shen et al.

ICLR 2025posterarXiv:2412.00876
38
citations
#674

Sampling-Efficient Test-Time Scaling: Self-Estimating the Best-of-N Sampling in Early Decoding

Yiming Wang, Pei Zhang, Siyuan Huang et al.

NEURIPS 2025spotlightarXiv:2503.01422
38
citations
#675

Animate Anyone 2: High-Fidelity Character Image Animation with Environment Affordance

Li Hu, wang yuan, Zhen Shen et al.

ICCV 2025posterarXiv:2502.06145
38
citations
#676

Missing Premise exacerbates Overthinking: Are Reasoning Models losing Critical Thinking Skill?

Chenrui Fan, Ming Li, Lichao Sun et al.

COLM 2025paperarXiv:2504.06514
38
citations
#677

5%>100%: Breaking Performance Shackles of Full Fine-Tuning on Visual Recognition Tasks

Dongshuo Yin, Leiyi Hu, Bin Li et al.

CVPR 2025posterarXiv:2408.08345
38
citations
#678

SG-I2V: Self-Guided Trajectory Control in Image-to-Video Generation

Koichi Namekata, Sherwin Bahmani, Ziyi Wu et al.

ICLR 2025posterarXiv:2411.04989
38
citations
#679

MIDI: Multi-Instance Diffusion for Single Image to 3D Scene Generation

Zehuan Huang, Yuanchen Guo, Xingqiao An et al.

CVPR 2025posterarXiv:2412.03558
38
citations
#680

LazyDiT: Lazy Learning for the Acceleration of Diffusion Transformers

Xuan Shen, Zhao Song, Yufa Zhou et al.

AAAI 2025paperarXiv:2412.12444
38
citations
#681

Uni-Sign: Toward Unified Sign Language Understanding at Scale

Zecheng Li, Wengang Zhou, Weichao Zhao et al.

ICLR 2025posterarXiv:2501.15187
38
citations
#682

Watermark Anything With Localized Messages

Tom Sander, Pierre Fernandez, Alain Oliviero Durmus et al.

ICLR 2025posterarXiv:2411.07231
38
citations
#683

Strong Model Collapse

Elvis Dohmatob, Yunzhen Feng, Arjun Subramonian et al.

ICLR 2025posterarXiv:2410.04840
38
citations
#684

Flash Diffusion: Accelerating Any Conditional Diffusion Model for Few Steps Image Generation

Clément Chadebec, Onur Tasar, Eyal Benaroche et al.

AAAI 2025paperarXiv:2406.02347
38
citations
#685

GaussianOcc: Fully Self-supervised and Efficient 3D Occupancy Estimation with Gaussian Splatting

Wanshui Gan, Fang Liu, Hongbin Xu et al.

ICCV 2025posterarXiv:2408.11447
38
citations
#686

FastVLM: Efficient Vision Encoding for Vision Language Models

Pavan Kumar Anasosalu Vasu, Fartash Faghri, Chun-Liang Li et al.

CVPR 2025posterarXiv:2412.13303
38
citations
#687

Exploiting Multimodal Spatial-temporal Patterns for Video Object Tracking

Xiantao Hu, Ying Tai, Xu Zhao et al.

AAAI 2025paperarXiv:2412.15691
38
citations
#688

CoCoCo: Improving Text-Guided Video Inpainting for Better Consistency, Controllability and Compatibility

Bojia Zi, Shihao Zhao, Xianbiao Qi et al.

AAAI 2025paperarXiv:2403.12035
38
citations
#689

EVER: Exact Volumetric Ellipsoid Rendering for Real-time View Synthesis

Alexander Mai, Peter Hedman, George Kopanas et al.

ICCV 2025posterarXiv:2410.01804
38
citations
#690

GaussianFormer-2: Probabilistic Gaussian Superposition for Efficient 3D Occupancy Prediction

Yuanhui Huang, Amonnut Thammatadatrakoon, Wenzhao Zheng et al.

CVPR 2025posterarXiv:2412.04384
38
citations
#691

VISTA3D: A Unified Segmentation Foundation Model For 3D Medical Imaging

Yufan He, Pengfei Guo, Yucheng Tang et al.

CVPR 2025posterarXiv:2406.05285
38
citations
#692

Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective

Jorge (Zhoujun) Cheng, Shibo Hao, Tianyang Liu et al.

NEURIPS 2025posterarXiv:2506.14965
38
citations
#693

PaPaGei: Open Foundation Models for Optical Physiological Signals

Arvind Pillai, Dimitris Spathis, Fahim Kawsar et al.

ICLR 2025posterarXiv:2410.20542
38
citations
#694

Video-Guided Foley Sound Generation with Multimodal Controls

Ziyang Chen, Prem Seetharaman, Bryan Russell et al.

CVPR 2025posterarXiv:2411.17698
38
citations
#695

DiffSplat: Repurposing Image Diffusion Models for Scalable Gaussian Splat Generation

Chenguo Lin, Panwang Pan, Bangbang Yang et al.

ICLR 2025posterarXiv:2501.16764
38
citations
#696

Multi-modal Agent Tuning: Building a VLM-Driven Agent for Efficient Tool Usage

Zhi Gao, Bofei Zhang, Pengxiang Li et al.

ICLR 2025posterarXiv:2412.15606
38
citations
#697

Training-Free Activation Sparsity in Large Language Models

James Liu, Pragaash Ponnusamy, Tianle Cai et al.

ICLR 2025posterarXiv:2408.14690
37
citations
#698

Synthetic continued pretraining

Zitong Yang, Neil Band, Shuangping Li et al.

ICLR 2025posterarXiv:2409.07431
37
citations
#699

Cycle3D: High-quality and Consistent Image-to-3D Generation via Generation-Reconstruction Cycle

Zhenyu Tang, Junwu Zhang, Xinhua Cheng et al.

AAAI 2025paperarXiv:2407.19548
37
citations
#700

Harmonizing Visual Representations for Unified Multimodal Understanding and Generation

Size Wu, Wenwei Zhang, Lumin Xu et al.

ICCV 2025posterarXiv:2503.21979
37
citations
#701

Rethinking Diffusion for Text-Driven Human Motion Generation: Redundant Representations, Evaluation, and Masked Autoregression

Zichong Meng, Yiming Xie, Xiaogang Peng et al.

CVPR 2025posterarXiv:2411.16575
37
citations
#702

No Training, No Problem: Rethinking Classifier-Free Guidance for Diffusion Models

Seyedmorteza Sadat, Manuel Kansy, Otmar Hilliges et al.

ICLR 2025posterarXiv:2407.02687
37
citations
#703

MoH: Multi-Head Attention as Mixture-of-Head Attention

Peng Jin, Bo Zhu, Li Yuan et al.

ICML 2025posterarXiv:2410.11842
37
citations
#704

Looped Transformers for Length Generalization

Ying Fan, Yilun Du, Kannan Ramchandran et al.

ICLR 2025posterarXiv:2409.15647
37
citations
#705

ShieldAgent: Shielding Agents via Verifiable Safety Policy Reasoning

Zhaorun Chen, Mintong Kang, Bo Li

ICML 2025posterarXiv:2503.22738
37
citations
#706

Autoregressive Adversarial Post-Training for Real-Time Interactive Video Generation

Shanchuan Lin, Ceyuan Yang, Hao He et al.

NEURIPS 2025posterarXiv:2506.09350
37
citations
#707

Sparse VideoGen2: Accelerate Video Generation with Sparse Attention via Semantic-Aware Permutation

Shuo Yang, Haocheng Xi, Yilong Zhao et al.

NEURIPS 2025spotlightarXiv:2505.18875
37
citations
#708

TinySAM: Pushing the Envelope for Efficient Segment Anything Model

Han Shu, Wenshuo Li, Yehui Tang et al.

AAAI 2025paperarXiv:2312.13789
37
citations
#709

TSD-SR: One-Step Diffusion with Target Score Distillation for Real-World Image Super-Resolution

linwei dong, Qingnan Fan, Yihong Guo et al.

CVPR 2025posterarXiv:2411.18263
37
citations
#710

Evaluating the Evaluator: Measuring LLMs’ Adherence to Task Evaluation Instructions

Bhuvanashree Murugadoss, Christian Poelitz, Ian Drosos et al.

AAAI 2025paperarXiv:2408.08781
37
citations
#711

SUTrack: Towards Simple and Unified Single Object Tracking

Xin Chen, Ben Kang, Wanting Geng et al.

AAAI 2025paperarXiv:2412.19138
37
citations
#712

Large Language Models Assume People are More Rational than We Really are

Ryan Liu, Jiayi Geng, Joshua Peterson et al.

ICLR 2025posterarXiv:2406.17055
37
citations
#713

SpecReason: Fast and Accurate Inference-Time Compute via Speculative Reasoning

Rui Pan, Yinwei Dai, Zhihao Zhang et al.

NEURIPS 2025posterarXiv:2504.07891
37
citations
#714

On Scaling Up 3D Gaussian Splatting Training

Hexu Zhao, Haoyang Weng, Daohan Lu et al.

ICLR 2025posterarXiv:2406.18533
37
citations
#715

SymmCD: Symmetry-Preserving Crystal Generation with Diffusion Models

Daniel Levy, Siba Smarak Panigrahi, Sékou-Oumar Kaba et al.

ICLR 2025posterarXiv:2502.03638
37
citations
#716

Variational Best-of-N Alignment

Afra Amini, Tim Vieira, Elliott Ash et al.

ICLR 2025posterarXiv:2407.06057
37
citations
#717

Modeling Future Conversation Turns to Teach LLMs to Ask Clarifying Questions

Michael Zhang, W. Bradley Knox, Eunsol Choi

ICLR 2025posterarXiv:2410.13788
37
citations
#718

SplatAD: Real-Time Lidar and Camera Rendering with 3D Gaussian Splatting for Autonomous Driving

Georg Hess, Carl Lindström, Maryam Fatemi et al.

CVPR 2025posterarXiv:2411.16816
37
citations
#719

Sparse Meets Dense: Unified Generative Recommendations with Cascaded Sparse-Dense Representations

Yuhao Yang, ZhI JI, Zhaopeng Li et al.

NEURIPS 2025posterarXiv:2503.02453
37
citations
#720

Real-Time Execution of Action Chunking Flow Policies

Kevin Black, Manuel Galliker, Sergey Levine

NEURIPS 2025oralarXiv:2506.07339
37
citations
#721

Speculative Thinking: Enhancing Small-Model Reasoning with Large Model Guidance at Inference Time

Van Yang, Xiang Yue, Vipin Chaudhary et al.

COLM 2025paperarXiv:2504.12329
37
citations
#722

Sparse Autoencoders Do Not Find Canonical Units of Analysis

Patrick Leask, Bart Bussmann, Michael Pearce et al.

ICLR 2025posterarXiv:2502.04878
37
citations
#723

Sparse MeZO: Less Parameters for Better Performance in Zeroth-Order LLM Fine-Tuning

Yong Liu, Zirui Zhu, Chaoyu Gong et al.

NEURIPS 2025posterarXiv:2402.15751
37
citations
#724

VLAS: Vision-Language-Action Model with Speech Instructions for Customized Robot Manipulation

Wei Zhao, Pengxiang Ding, Zhang Min et al.

ICLR 2025posterarXiv:2502.13508
37
citations
#725

How Does Critical Batch Size Scale in Pre-training?

Hanlin Zhang, Depen Morwani, Nikhil Vyas et al.

ICLR 2025posterarXiv:2410.21676
37
citations
#726

Deconstructing What Makes a Good Optimizer for Autoregressive Language Models

Rosie Zhao, Depen Morwani, David Brandfonbrener et al.

ICLR 2025poster
37
citations
#727

EnerVerse: Envisioning Embodied Future Space for Robotics Manipulation

Siyuan Huang, Liliang Chen, Pengfei Zhou et al.

NEURIPS 2025posterarXiv:2501.01895
37
citations
#728

TOMATO: Assessing Visual Temporal Reasoning Capabilities in Multimodal Foundation Models

Ziyao Shangguan, Chuhan Li, Yuxuan Ding et al.

ICLR 2025oralarXiv:2410.23266
36
citations
#729

Speedy-Splat: Fast 3D Gaussian Splatting with Sparse Pixels and Sparse Primitives

Alex Hanson, Allen Tu, Geng Lin et al.

CVPR 2025posterarXiv:2412.00578
36
citations
#730

Know Me, Respond to Me: Benchmarking LLMs for Dynamic User Profiling and Personalized Responses at Scale

Bowen Jiang, Zhuoqun Hao, Young Min Cho et al.

COLM 2025paperarXiv:2504.14225
36
citations
#731

IDOL: Instant Photorealistic 3D Human Creation from a Single Image

Yiyu Zhuang, Jiaxi Lv, Hao Wen et al.

CVPR 2025posterarXiv:2412.14963
36
citations
#732

ReMA: Learning to Meta-Think for LLMs with Multi-agent Reinforcement Learning

Ziyu Wan, Yunxiang Li, Xiaoyu Wen et al.

NEURIPS 2025posterarXiv:2503.09501
36
citations
#733

Gradient Alignment in Physics-informed Neural Networks: A Second-Order Optimization Perspective

Sifan Wang, Ananyae bhartari, Bowen Li et al.

NEURIPS 2025posterarXiv:2502.00604
36
citations
#734

Large-Scale Text-to-Image Model with Inpainting is a Zero-Shot Subject-Driven Image Generator

Chaehun Shin, Jooyoung Choi, Heeseung Kim et al.

CVPR 2025posterarXiv:2411.15466
36
citations
#735

Re-thinking Temporal Search for Long-Form Video Understanding

Jinhui Ye, Zihan Wang, Haosen Sun et al.

CVPR 2025posterarXiv:2504.02259
36
citations
#736

FairMT-Bench: Benchmarking Fairness for Multi-turn Dialogue in Conversational LLMs

Zhiting Fan, Ruizhe Chen, Tianxiang Hu et al.

ICLR 2025posterarXiv:2410.19317
36
citations
#737

PolaFormer: Polarity-aware Linear Attention for Vision Transformers

Weikang Meng, Yadan Luo, Xin Li et al.

ICLR 2025posterarXiv:2501.15061
36
citations
#738

OpenCUA: Open Foundations for Computer-Use Agents

Xinyuan Wang, Bowen Wang, Dunjie Lu et al.

NEURIPS 2025spotlightarXiv:2508.09123
36
citations
#739

Human-Object Interaction from Human-Level Instructions

Zhen Wu, Jiaman Li, Pei Xu et al.

ICCV 2025posterarXiv:2406.17840
36
citations
#740

GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents

Qianhui Wu, Kanzhi Cheng, Rui Yang et al.

NEURIPS 2025posterarXiv:2506.03143
36
citations
#741

STI-Bench: Are MLLMs Ready for Precise Spatial-Temporal World Understanding?

Yun Li, Yiming Zhang, Tao Lin et al.

ICCV 2025posterarXiv:2503.23765
36
citations
#742

Vision-Language Models Do Not Understand Negation

Kumail Alhamoud, Shaden Alshammari, Yonglong Tian et al.

CVPR 2025posterarXiv:2501.09425
36
citations
#743

ConceptPrune: Concept Editing in Diffusion Models via Skilled Neuron Pruning

Ruchika Chavhan, Da Li, Timothy Hospedales

ICLR 2025posterarXiv:2405.19237
36
citations
#744

DartControl: A Diffusion-Based Autoregressive Motion Model for Real-Time Text-Driven Motion Control

Kaifeng Zhao, Gen Li, Siyu Tang

ICLR 2025posterarXiv:2410.05260
36
citations
#745

FreeVS: Generative View Synthesis on Free Driving Trajectory

Qitai Wang, Lue Fan, Yuqi Wang et al.

ICLR 2025posterarXiv:2410.18079
36
citations
#746

FaceXFormer: A Unified Transformer for Facial Analysis

Kartik Narayan, Vibashan VS, Rama Chellappa et al.

ICCV 2025posterarXiv:2403.12960
36
citations
#747

Dita: Scaling Diffusion Transformer for Generalist Vision-Language-Action Policy

Zhi Hou, Tianyi Zhang, Yuwen Xiong et al.

ICCV 2025posterarXiv:2503.19757
36
citations
#748

Causal Prompting: Debiasing Large Language Model Prompting Based on Front-Door Adjustment

Congzhi Zhang, Linhai Zhang, Jialong Wu et al.

AAAI 2025paperarXiv:2403.02738
36
citations
#749

Frequency-Aligned Knowledge Distillation for Lightweight Spatiotemporal Forecasting

Yuqi Li, Chuanguang Yang, Hansheng Zeng et al.

ICCV 2025posterarXiv:2507.02939
36
citations
#750

DiscoveryBench: Towards Data-Driven Discovery with Large Language Models

Bodhisattwa Prasad Majumder, Harshit Surana, Dhruv Agarwal et al.

ICLR 2025posterarXiv:2407.01725
36
citations
#751

DiTAR: Diffusion Transformer Autoregressive Modeling for Speech Generation

Dongya Jia, Zhuo Chen, Jiawei Chen et al.

ICML 2025posterarXiv:2502.03930
36
citations
#752

StreetCrafter: Street View Synthesis with Controllable Video Diffusion Models

Yunzhi Yan, Zhen Xu, Haotong Lin et al.

CVPR 2025posterarXiv:2412.13188
35
citations
#753

Direct3D-S2: Gigascale 3D Generation Made Easy with Spatial Sparse Attention

Shuang Wu, Youtian Lin, Feihu Zhang et al.

NEURIPS 2025posterarXiv:2505.17412
35
citations
#754

Towards General Visual-Linguistic Face Forgery Detection

Ke Sun, Shen Chen, Taiping Yao et al.

CVPR 2025posterarXiv:2307.16545
35
citations
#755

Adaptive Data Optimization: Dynamic Sample Selection with Scaling Laws

Yiding Jiang, Allan Zhou, Zhili Feng et al.

ICLR 2025posterarXiv:2410.11820
35
citations
#756

EmbodiedSAM: Online Segment Any 3D Thing in Real Time

Xiuwei Xu, Huangxing Chen, Linqing Zhao et al.

ICLR 2025posterarXiv:2408.11811
35
citations
#757

Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning Models

Ruikang Liu, Yuxuan Sun, Manyi Zhang et al.

COLM 2025paperarXiv:2504.04823
35
citations
#758

Adversarial Perturbations Cannot Reliably Protect Artists From Generative AI

Robert Hönig, Javier Rando, Nicholas Carlini et al.

ICLR 2025posterarXiv:2406.12027
35
citations
#759

Think Only When You Need with Large Hybrid-Reasoning Models

Lingjie Jiang, Xun Wu, Shaohan Huang et al.

NEURIPS 2025posterarXiv:2505.14631
35
citations
#760

Generalizing Verifiable Instruction Following

Valentina Pyatkin, Saumya Malik, Victoria Graf et al.

NEURIPS 2025posterarXiv:2507.02833
35
citations
#761

TrajectoryCrafter: Redirecting Camera Trajectory for Monocular Videos via Diffusion Models

Mark YU, Wenbo Hu, Jinbo Xing et al.

ICCV 2025posterarXiv:2503.05638
35
citations
#762

Preserving Diversity in Supervised Fine-Tuning of Large Language Models

Ziniu Li, Congliang Chen, Tian Xu et al.

ICLR 2025posterarXiv:2408.16673
35
citations
#763

SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding

Hao Li, Changyao TIAN, Jie Shao et al.

CVPR 2025posterarXiv:2412.09604
35
citations
#764

Reconstructive Visual Instruction Tuning

Haochen Wang, Anlin Zheng, Yucheng Zhao et al.

ICLR 2025posterarXiv:2410.09575
35
citations
#765

Flow Matching with General Discrete Paths: A Kinetic-Optimal Perspective

Neta Shaul, Itai Gat, Marton Havasi et al.

ICLR 2025posterarXiv:2412.03487
35
citations
#766

From Reusing to Forecasting: Accelerating Diffusion Models with TaylorSeers

Jiacheng Liu, Chang Zou, Yuanhuiyi Lyu et al.

ICCV 2025posterarXiv:2503.06923
35
citations
#767

Align Your Flow: Scaling Continuous-Time Flow Map Distillation

Amirmojtaba Sabour, Sanja Fidler, Karsten Kreis

NEURIPS 2025posterarXiv:2506.14603
35
citations
#768

PAD: Personalized Alignment of LLMs at Decoding-time

Ruizhe Chen, Xiaotian Zhang, Meng Luo et al.

ICLR 2025posterarXiv:2410.04070
35
citations
#769

AgentStudio: A Toolkit for Building General Virtual Agents

Longtao Zheng, Zhiyuan Huang, Zhenghai Xue et al.

ICLR 2025posterarXiv:2403.17918
35
citations
#770

Sequential Controlled Langevin Diffusions

Junhua Chen, Lorenz Richter, Julius Berner et al.

ICLR 2025posterarXiv:2412.07081
35
citations
#771

UniCTokens: Boosting Personalized Understanding and Generation via Unified Concept Tokens

Ruichuan An, Sihan Yang, Renrui Zhang et al.

NEURIPS 2025posterarXiv:2505.14671
35
citations
#772

Attentive Eraser: Unleashing Diffusion Model’s Object Removal Potential via Self-Attention Redirection Guidance

Wenhao Sun, Xue-Mei Dong, Benlei Cui et al.

AAAI 2025paperarXiv:2412.12974
35
citations
#773

Pow3R: Empowering Unconstrained 3D Reconstruction with Camera and Scene Priors

Wonbong Jang, Philippe Weinzaepfel, Vincent Leroy et al.

CVPR 2025posterarXiv:2503.17316
35
citations
#774

DeepMesh: Auto-Regressive Artist-mesh Creation with Reinforcement Learning

Ruowen Zhao, James Jun Liang Chen Ye, Zhengyi Wang et al.

ICCV 2025posterarXiv:2503.15265
35
citations
#775

MNE-SLAM: Multi-Agent Neural SLAM for Mobile Robots

Tianchen Deng, Guole Shen, Chen Xun et al.

CVPR 2025poster
35
citations
#776

ExACT: Teaching AI Agents to Explore with Reflective-MCTS and Exploratory Learning

Xiao Yu, Baolin Peng, Vineeth Vajipey et al.

ICLR 2025posterarXiv:2410.02052
35
citations
#777

KungfuBot: Physics-Based Humanoid Whole-Body Control for Learning Highly-Dynamic Skills

Weiji Xie, Jinrui Han, Jiakun Zheng et al.

NEURIPS 2025posterarXiv:2506.12851
35
citations
#778

LLMDet: Learning Strong Open-Vocabulary Object Detectors under the Supervision of Large Language Models

Shenghao Fu, Qize Yang, Qijie Mo et al.

CVPR 2025highlightarXiv:2501.18954
34
citations
#779

Preference Optimization for Reasoning with Pseudo Feedback

Fangkai Jiao, Geyang Guo, Xingxing Zhang et al.

ICLR 2025posterarXiv:2411.16345
34
citations
#780

Computational Limits of Low-Rank Adaptation (LoRA) Fine-Tuning for Transformer Models

Jerry Yao-Chieh Hu, Maojiang Su, En-Jui Kuo et al.

ICLR 2025posterarXiv:2406.03136
34
citations
#781

Compositional Entailment Learning for Hyperbolic Vision-Language Models

Avik Pal, Max van Spengler, Guido D'Amely di Melendugno et al.

ICLR 2025posterarXiv:2410.06912
34
citations
#782

Dynamic Diffusion Transformer

Wangbo Zhao, Yizeng Han, Jiasheng Tang et al.

ICLR 2025posterarXiv:2410.03456
34
citations
#783

DrivingForward: Feed-forward 3D Gaussian Splatting for Driving Scene Reconstruction from Flexible Surround-view Input

Qijian Tian, Xin Tan, Yuan Xie et al.

AAAI 2025paperarXiv:2409.12753
34
citations
#784

Think while You Generate: Discrete Diffusion with Planned Denoising

Sulin Liu, Juno Nam, Andrew Campbell et al.

ICLR 2025posterarXiv:2410.06264
34
citations
#785

Persistent Pre-training Poisoning of LLMs

Yiming Zhang, Javier Rando, Ivan Evtimov et al.

ICLR 2025posterarXiv:2410.13722
34
citations
#786

ParaHome: Parameterizing Everyday Home Activities Towards 3D Generative Modeling of Human-Object Interactions

Jeonghwan Kim, Jisoo Kim, Jeonghyeon Na et al.

CVPR 2025posterarXiv:2401.10232
34
citations
#787

Add-it: Training-Free Object Insertion in Images With Pretrained Diffusion Models

Yoad Tewel, Rinon Gal, Dvir Samuel et al.

ICLR 2025posterarXiv:2411.07232
34
citations
#788

Improving the Diffusability of Autoencoders

Ivan Skorokhodov, Sharath Girish, Benran Hu et al.

ICML 2025posterarXiv:2502.14831
34
citations
#789

ForecastBench: A Dynamic Benchmark of AI Forecasting Capabilities

Ezra Karger, Houtan Bastani, Chen Yueh-Han et al.

ICLR 2025posterarXiv:2409.19839
34
citations
#790

One Diffusion to Generate Them All

Duong H. Le, Tuan Pham, Sangho Lee et al.

CVPR 2025posterarXiv:2411.16318
34
citations
#791

MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos

Xuehai He, Weixi Feng, Kaizhi Zheng et al.

ICLR 2025posterarXiv:2406.08407
34
citations
#792

Merging LoRAs like Playing LEGO: Pushing the Modularity of LoRA to Extremes Through Rank-Wise Clustering

Ziyu Zhao, tao shen, Didi Zhu et al.

ICLR 2025posterarXiv:2409.16167
34
citations
#793

Robust Autonomy Emerges from Self-Play

Marco Cusumano-Towner, David Hafner, Alexander Hertzberg et al.

ICML 2025posterarXiv:2502.03349
34
citations
#794

WISA: World simulator assistant for physics-aware text-to-video generation

Jing Wang, Ao Ma, Ke Cao et al.

NEURIPS 2025spotlightarXiv:2503.08153
34
citations
#795

xPatch: Dual-Stream Time Series Forecasting with Exponential Seasonal-Trend Decomposition

Artyom Stitsyuk, Jaesik Choi

AAAI 2025paperarXiv:2412.17323
34
citations
#796

Accelerating Diffusion LLMs via Adaptive Parallel Decoding

Daniel Israel, Guy Van den Broeck, Aditya Grover

NEURIPS 2025spotlightarXiv:2506.00413
34
citations
#797

LongGenBench: Benchmarking Long-Form Generation in Long Context LLMs

Yuhao Wu, Ming Shan Hee, Zhiqiang Hu et al.

ICLR 2025posterarXiv:2409.02076
34
citations
#798

Efficient Evolutionary Search Over Chemical Space with Large Language Models

Haorui Wang, Marta Skreta, Cher-Tian Ser et al.

ICLR 2025posterarXiv:2406.16976
34
citations
#799

A Closer Look at Machine Unlearning for Large Language Models

Xiaojian Yuan, Tianyu Pang, Chao Du et al.

ICLR 2025posterarXiv:2410.08109
34
citations
#800

Sparc3D: Sparse Representation and Construction for High-Resolution 3D Shapes Modeling

Zhihao Li, Yufei Wang, Heliang Zheng et al.

NEURIPS 2025posterarXiv:2505.14521
34
citations