Most Cited 2025 "cross-domain segmentation" Papers

22,274 papers found • Page 5 of 112

#801

OMNI-EPIC: Open-endedness via Models of human Notions of Interestingness with Environments Programmed in Code

Maxence Faldor, Jenny Zhang, Antoine Cully et al.

ICLR 2025arXiv:2405.15568
48
citations
#802

Visual Agents as Fast and Slow Thinkers

Guangyan Sun, Mingyu Jin, Zhenting Wang et al.

ICLR 2025arXiv:2408.08862
48
citations
#803

Can LLMs Separate Instructions From Data? And What Do We Even Mean By That?

Egor Zverev, Sahar Abdelnabi, Soroush Tabesh et al.

ICLR 2025arXiv:2403.06833
48
citations
#804

Data Shapley in One Training Run

Jiachen (Tianhao) Wang, Prateek Mittal, Dawn Song et al.

ICLR 2025arXiv:2406.11011
48
citations
#805

MLGym: A New Framework and Benchmark for Advancing AI Research Agents

Deepak Nathani, Lovish Madaan, Nicholas Roberts et al.

COLM 2025paper
48
citations
#806

Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models

Keisuke Kamahori, Tian Tang, Yile Gu et al.

ICLR 2025arXiv:2402.07033
48
citations
#807

The Matrix: Infinite-Horizon World Generation with Real-Time Moving Control

Ruili Feng, Han Zhang, Zhilei Shu et al.

NEURIPS 2025arXiv:2412.03568
48
citations
#808

STAIR: Improving Safety Alignment with Introspective Reasoning

Yichi Zhang, Siyuan Zhang, Yao Huang et al.

ICML 2025oralarXiv:2502.02384
48
citations
#809

VGGT-SLAM: Dense RGB SLAM Optimized on the SL(4) Manifold

Dominic Maggio, Hyungtae Lim, Luca Carlone

NEURIPS 2025arXiv:2505.12549
48
citations
#810

Causal Order: The Key to Leveraging Imperfect Experts in Causal Inference

Aniket Vashishtha, Abbavaram Gowtham Reddy, Abhinav Kumar et al.

ICLR 2025arXiv:2310.15117
48
citations
#811

MOOSE-Chem: Large Language Models for Rediscovering Unseen Chemistry Scientific Hypotheses

Zonglin Yang, Wanhao Liu, Ben Gao et al.

ICLR 2025arXiv:2410.07076
47
citations
#812

SimBa: Simplicity Bias for Scaling Up Parameters in Deep Reinforcement Learning

Hojoon Lee, Dongyoon Hwang, Donghu Kim et al.

ICLR 2025arXiv:2410.09754
47
citations
#813

Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing

Xiangyu Zhao, Peiyuan Zhang, Kexian Tang et al.

NEURIPS 2025oralarXiv:2504.02826
47
citations
#814

PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding

Jang Hyun Cho, Andrea Madotto, Effrosyni Mavroudi et al.

NEURIPS 2025oralarXiv:2504.13180
47
citations
#815

Scaling FP8 training to trillion-token LLMs

Maxim Fishman, Brian Chmiel, Ron Banner et al.

ICLR 2025arXiv:2409.12517
47
citations
#816

FlipAttack: Jailbreak LLMs via Flipping

Yue Liu, Xiaoxin He, Miao Xiong et al.

ICML 2025arXiv:2410.02832
47
citations
#817

SQL-R1: Training Natural Language to SQL Reasoning Model By Reinforcement Learning

Peixian Ma, Xialie Zhuang, Chengjin Xu et al.

NEURIPS 2025arXiv:2504.08600
47
citations
#818

FlowAR: Scale-wise Autoregressive Image Generation Meets Flow Matching

Sucheng Ren, Qihang Yu, Ju He et al.

ICML 2025arXiv:2412.15205
47
citations
#819

OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations

Linke Ouyang, Yuan Qu, Hongbin Zhou et al.

CVPR 2025arXiv:2412.07626
47
citations
#820

AnyEdit: Edit Any Knowledge Encoded in Language Models

Houcheng Jiang, Junfeng Fang, Ningyu Zhang et al.

ICML 2025arXiv:2502.05628
47
citations
#821

Test-time Alignment of Diffusion Models without Reward Over-optimization

Sunwoo Kim, Minkyu Kim, Dongmin Park

ICLR 2025arXiv:2501.05803
47
citations
#822

OSTQuant: Refining Large Language Model Quantization with Orthogonal and Scaling Transformations for Better Distribution Fitting

Xing Hu, Yuan Cheng, Dawei Yang et al.

ICLR 2025arXiv:2501.13987
47
citations
#823

Learn Beneficial Noise as Graph Augmentation

Siqi Huang, Yanchen Xu, Hongyuan Zhang et al.

ICML 2025arXiv:2505.19024
47
citations
#824

OmniManip: Towards General Robotic Manipulation via Object-Centric Interaction Primitives as Spatial Constraints

Mingjie Pan, Jiyao Zhang, Tianshu Wu et al.

CVPR 2025highlightarXiv:2501.03841
47
citations
#825

RMB: Comprehensively benchmarking reward models in LLM alignment

Enyu Zhou, Guodong Zheng, Binghai Wang et al.

ICLR 2025arXiv:2410.09893
47
citations
#826

DLF: Disentangled-Language-Focused Multimodal Sentiment Analysis

Pan Wang, Qiang Zhou, Yawen Wu et al.

AAAI 2025paperarXiv:2412.12225
47
citations
#827

Dualformer: Controllable Fast and Slow Thinking by Learning with Randomized Reasoning Traces

Andy (DiJia) Su, Sainbayar Sukhbaatar, Michael Rabbat et al.

ICLR 2025arXiv:2410.09918
47
citations
#828

Reloc3r: Large-Scale Training of Relative Camera Pose Regression for Generalizable, Fast, and Accurate Visual Localization

Siyan Dong, Shuzhe Wang, Shaohui Liu et al.

CVPR 2025arXiv:2412.08376
47
citations
#829

Selective Aggregation for Low-Rank Adaptation in Federated Learning

Pengxin Guo, Shuang Zeng, Yanran Wang et al.

ICLR 2025arXiv:2410.01463
47
citations
#830

Learning How Hard to Think: Input-Adaptive Allocation of LM Computation

Mehul Damani, Idan Shenfeld, Andi Peng et al.

ICLR 2025arXiv:2410.04707
47
citations
#831

Reasoning Gym: Reasoning Environments for Reinforcement Learning with Verifiable Rewards

Zafir Stojanovski, Oliver Stanley, Joe Sharratt et al.

NEURIPS 2025spotlightarXiv:2505.24760
47
citations
#832

4K4DGen: Panoramic 4D Generation at 4K Resolution

Renjie Li, Panwang Pan, Bangbang Yang et al.

ICLR 2025oralarXiv:2406.13527
47
citations
#833

Promptriever: Instruction-Trained Retrievers Can Be Prompted Like Language Models

Orion Weller, Ben Van Durme, Dawn Lawrie et al.

ICLR 2025arXiv:2409.11136
47
citations
#834

Rank1: Test-Time Compute for Reranking in Information Retrieval

Orion Weller, Kathryn Ricci, Eugene Yang et al.

COLM 2025paperarXiv:2502.18418
47
citations
#835

Generator Matching: Generative modeling with arbitrary Markov processes

Peter Holderrieth, Marton Havasi, Jason Yim et al.

ICLR 2025arXiv:2410.20587
46
citations
#836

Empirical Design in Reinforcement Learning

Andrew Patterson, Samuel F Neumann, Martha White et al.

ICML 2025arXiv:2304.01315
46
citations
#837

Why Is Spatial Reasoning Hard for VLMs? An Attention Mechanism Perspective on Focus Areas

Shiqi Chen, Tongyao Zhu, Ruochen Zhou et al.

ICML 2025arXiv:2503.01773
46
citations
#838

VLOGGER: Multimodal Diffusion for Embodied Avatar Synthesis

Enric Corona, Andrei Zanfir, Eduard Gabriel Bazavan et al.

CVPR 2025arXiv:2403.08764
46
citations
#839

HS-FPN: High Frequency and Spatial Perception FPN for Tiny Object Detection

Zican Shi, Jing Hu, Jie Ren et al.

AAAI 2025paperarXiv:2412.10116
46
citations
#840

Real-Time Execution of Action Chunking Flow Policies

Kevin Black, Manuel Galliker, Sergey Levine

NEURIPS 2025oralarXiv:2506.07339
46
citations
#841

WritingBench: A Comprehensive Benchmark for Generative Writing

Yuning Wu, Jiahao Mei, Ming Yan et al.

NEURIPS 2025arXiv:2503.05244
46
citations
#842

SpecDec++: Boosting Speculative Decoding via Adaptive Candidate Lengths

Kaixuan Huang, Xudong Guo, Mengdi Wang

COLM 2025paper
46
citations
#843

Universal Actions for Enhanced Embodied Foundation Models

Jinliang Zheng, Jianxiong Li, Dongxiu Liu et al.

CVPR 2025arXiv:2501.10105
46
citations
#844

Preble: Efficient Distributed Prompt Scheduling for LLM Serving

Vikranth Srivatsa, Zijian He, Reyna Abhyankar et al.

ICLR 2025arXiv:2407.00023
46
citations
#845

DINOv2 Meets Text: A Unified Framework for Image- and Pixel-Level Vision-Language Alignment

Dahyun Kang, Piotr Bojanowski, Huy V. Vo et al.

CVPR 2025arXiv:2412.16334
46
citations
#846

GaussianWorld: Gaussian World Model for Streaming 3D Occupancy Prediction

Sicheng Zuo, Wenzhao Zheng, Yuanhui Huang et al.

CVPR 2025arXiv:2412.10373
46
citations
#847

Is Your Model Really A Good Math Reasoner? Evaluating Mathematical Reasoning with Checklist

Zihao Zhou, Shudong Liu, Maizhen Ning et al.

ICLR 2025arXiv:2407.08733
46
citations
#848

R2E-Gym: Procedural Environment Generation and Hybrid Verifiers for Scaling Open-Weights SWE Agents

Naman Jain, Jaskirat Singh, Manish Shetty et al.

COLM 2025paper
46
citations
#849

Image Conductor: Precision Control for Interactive Video Synthesis

Yaowei Li, Xintao Wang, Zhaoyang Zhang et al.

AAAI 2025paperarXiv:2406.15339
46
citations
#850

WorldScore: Unified Evaluation Benchmark for World Generation

Haoyi Duan, Hong-Xing Yu, Sirui Chen et al.

ICCV 2025
46
citations
#851

DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation

Minghong Cai, Xiaodong Cun, Xiaoyu Li et al.

CVPR 2025arXiv:2412.18597
46
citations
#852

Depth Any Video with Scalable Synthetic Data

Honghui Yang, Di Huang, Wei Yin et al.

ICLR 2025oralarXiv:2410.10815
46
citations
#853

MultiBooth: Towards Generating All Your Concepts in an Image from Text

Chenyang Zhu, Kai Li, Yue Ma et al.

AAAI 2025paperarXiv:2404.14239
46
citations
#854

M-LLM Based Video Frame Selection for Efficient Video Understanding

Kai Hu, Feng Gao, Xiaohan Nie et al.

CVPR 2025arXiv:2502.19680
46
citations
#855

Which Agent Causes Task Failures and When? On Automated Failure Attribution of LLM Multi-Agent Systems

Shaokun Zhang, Ming Yin, Jieyu Zhang et al.

ICML 2025spotlightarXiv:2505.00212
46
citations
#856

Learning Diverse Attacks on Large Language Models for Robust Red-Teaming and Safety Tuning

Seanie Lee, Minsu Kim, Lynn Cherif et al.

ICLR 2025arXiv:2405.18540
46
citations
#857

End-to-End Autonomous Driving Through V2X Cooperation

Haibao Yu, Wenxian Yang, Jiaru Zhong et al.

AAAI 2025paperarXiv:2404.00717
46
citations
#858

St4RTrack: Simultaneous 4D Reconstruction and Tracking in the World

Haiwen Feng, Junyi Zhang, Qianqian Wang et al.

ICCV 2025arXiv:2504.13152
46
citations
#859

What Can RL Bring to VLA Generalization? An Empirical Study

Jijia Liu, Feng Gao, Bingwen Wei et al.

NEURIPS 2025arXiv:2505.19789
46
citations
#860

HSEvo: Elevating Automatic Heuristic Design with Diversity-Driven Harmony Search and Genetic Algorithm Using LLMs

Pham Vu Tuan Dat, Long Doan, Huynh Thi Thanh Binh

AAAI 2025paperarXiv:2412.14995
46
citations
#861

Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms

Zhangheng LI, Keen You, Haotian Zhang et al.

ICLR 2025arXiv:2410.18967
45
citations
#862

Robust LLM safeguarding via refusal feature adversarial training

Lei Yu, Virginie Do, Karen Hambardzumyan et al.

ICLR 2025arXiv:2409.20089
45
citations
#863

Beyond Text-Visual Attention: Exploiting Visual Cues for Effective Token Pruning in VLMs

Qizhe Zhang, Aosong Cheng, Ming Lu et al.

ICCV 2025arXiv:2412.01818
45
citations
#864

MobileMamba: Lightweight Multi-Receptive Visual Mamba Network

Haoyang He, Jiangning Zhang, Yuxuan Cai et al.

CVPR 2025arXiv:2411.15941
45
citations
#865

Context is Key: A Benchmark for Forecasting with Essential Textual Information

Andrew Williams, Arjun Ashok, Étienne Marcotte et al.

ICML 2025arXiv:2410.18959
45
citations
#866

Taming Overconfidence in LLMs: Reward Calibration in RLHF

Jixuan Leng, Chengsong Huang, Banghua Zhu et al.

ICLR 2025arXiv:2410.09724
45
citations
#867

RAD: Training an End-to-End Driving Policy via Large-Scale 3DGS-based Reinforcement Learning

Hao Gao, Shaoyu Chen, Bo Jiang et al.

NEURIPS 2025arXiv:2502.13144
45
citations
#868

Learning 4D Embodied World Models

Haoyu Zhen, Qiao Sun, Hongxin Zhang et al.

ICCV 2025arXiv:2504.20995
45
citations
#869

UP-VLA: A Unified Understanding and Prediction Model for Embodied Agent

Jianke Zhang, Yanjiang Guo, Yucheng Hu et al.

ICML 2025arXiv:2501.18867
45
citations
#870

The Surprising Effectiveness of Test-Time Training for Few-Shot Learning

Ekin Akyürek, Mehul Damani, Adam Zweiger et al.

ICML 2025arXiv:2411.07279
45
citations
#871

TEOChat: A Large Vision-Language Assistant for Temporal Earth Observation Data

Jeremy Irvin, Emily Liu, Joyce Chen et al.

ICLR 2025oralarXiv:2410.06234
45
citations
#872

LLMs as Research Tools: A Large Scale Survey of Researchers’ Usage and Perceptions

Zhehui Liao, Maria Antoniak, Inyoung Cheong et al.

COLM 2025paperarXiv:2411.05025
45
citations
#873

PEARL: Parallel Speculative Decoding with Adaptive Draft Length

Tianyu Liu, Yun Li, Qitan Lv et al.

ICLR 2025arXiv:2408.11850
45
citations
#874

Adaptive Caching for Faster Video Generation with Diffusion Transformers

Kumara Kahatapitiya, Haozhe Liu, Sen He et al.

ICCV 2025arXiv:2411.02397
45
citations
#875

How efficient is LLM-generated code? A rigorous & high-standard benchmark

Ruizhong Qiu, Weiliang Zeng, James Ezick et al.

ICLR 2025arXiv:2406.06647
45
citations
#876

Fine-Tuning Discrete Diffusion Models via Reward Optimization with Applications to DNA and Protein Design

Chenyu Wang, Masatoshi Uehara, Yichun He et al.

ICLR 2025arXiv:2410.13643
45
citations
#877

FireFlow: Fast Inversion of Rectified Flow for Image Semantic Editing

Yingying Deng, Xiangyu He, Changwang Mei et al.

ICML 2025arXiv:2412.07517
45
citations
#878

On the Resilience of LLM-Based Multi-Agent Collaboration with Faulty Agents

Jen-Tse Huang, Jiaxu Zhou, Tailin Jin et al.

ICML 2025arXiv:2408.00989
45
citations
#879

FaithEval: Can Your Language Model Stay Faithful to Context, Even If "The Moon is Made of Marshmallows"

Yifei Ming, Senthil Purushwalkam, Shrey Pandit et al.

ICLR 2025
45
citations
#880

Trust or Escalate: LLM Judges with Provable Guarantees for Human Agreement

Jaehun Jung, Faeze Brahman, Yejin Choi

ICLR 2025arXiv:2407.18370
45
citations
#881

Towards Practical Real-Time Neural Video Compression

Zhaoyang Jia, Bin Li, Jiahao Li et al.

CVPR 2025arXiv:2502.20762
45
citations
#882

Don't Shake the Wheel: Momentum-Aware Planning in End-to-End Autonomous Driving

Ziying Song, Caiyan Jia, Lin Liu et al.

CVPR 2025arXiv:2503.03125
44
citations
#883

Autoregressive Adversarial Post-Training for Real-Time Interactive Video Generation

Shanchuan Lin, Ceyuan Yang, Hao He et al.

NEURIPS 2025arXiv:2506.09350
44
citations
#884

To Code or Not To Code? Exploring Impact of Code in Pre-training

Viraat Aryabumi, Yixuan Su, Raymond Ma et al.

ICLR 2025arXiv:2408.10914
44
citations
#885

ReasonIR: Training Retrievers for Reasoning Tasks

Rulin Shao, Rui Qiao, Varsha Kishore et al.

COLM 2025paperarXiv:2504.20595
44
citations
#886

Accelerating Auto-regressive Text-to-Image Generation with Training-free Speculative Jacobi Decoding

Yao Teng, Han Shi, Xian Liu et al.

ICLR 2025arXiv:2410.01699
44
citations
#887

ThinK: Thinner Key Cache by Query-Driven Pruning

Yuhui Xu, Zhanming Jie, Hanze Dong et al.

ICLR 2025arXiv:2407.21018
44
citations
#888

Sampling-Efficient Test-Time Scaling: Self-Estimating the Best-of-N Sampling in Early Decoding

Yiming Wang, Pei Zhang, Siyuan Huang et al.

NEURIPS 2025spotlightarXiv:2503.01422
44
citations
#889

Articulate-Anything: Automatic Modeling of Articulated Objects via a Vision-Language Foundation Model

Long Le, Jason Xie, William Liang et al.

ICLR 2025arXiv:2410.13882
44
citations
#890

Scaling RL to Long Videos

Yukang Chen, Wei Huang, Baifeng Shi et al.

NEURIPS 2025arXiv:2507.07966
44
citations
#891

Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis

Jinbin Bai, Tian Ye, Wei Chow et al.

ICLR 2025arXiv:2410.08261
44
citations
#892

MET3R: Measuring Multi-View Consistency in Generated Images

Mohammad Asim, Christopher Wewer, Thomas Wimmer et al.

CVPR 2025arXiv:2501.06336
44
citations
#893

Video World Models with Long-term Spatial Memory

Tong Wu, Shuai Yang, Ryan Po et al.

NEURIPS 2025oralarXiv:2506.05284
44
citations
#894

Accelerating Diffusion LLMs via Adaptive Parallel Decoding

Daniel Israel, Guy Van den Broeck, Aditya Grover

NEURIPS 2025spotlightarXiv:2506.00413
44
citations
#895

InterMimic: Towards Universal Whole-Body Control for Physics-Based Human-Object Interactions

Sirui Xu, Hung Yu Ling, Yu-Xiong Wang et al.

CVPR 2025highlightarXiv:2502.20390
44
citations
#896

Thinking LLMs: General Instruction Following with Thought Generation

Tianhao Wu, Janice Lan, Weizhe Yuan et al.

ICML 2025arXiv:2410.10630
44
citations
#897

QuEST: Low-bit Diffusion Model Quantization via Efficient Selective Finetuning

Haoxuan Wang, Yuzhang Shang, Zhihang Yuan et al.

ICCV 2025arXiv:2402.03666
44
citations
#898

RIFLEx: A Free Lunch for Length Extrapolation in Video Diffusion Transformers

Min Zhao, Guande He, Yixiao Chen et al.

ICML 2025oralarXiv:2502.15894
44
citations
#899

Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning

Chengqi Lyu, Songyang Gao, Yuzhe Gu et al.

COLM 2025paper
44
citations
#900

Prototypical Calibrating Ambiguous Samples for Micro-Action Recognition

Kun Li, Dan Guo, Guoliang Chen et al.

AAAI 2025paperarXiv:2412.14719
44
citations
#901

Sonata: Self-Supervised Learning of Reliable Point Representations

Xiaoyang Wu, Daniel DeTone, Duncan Frost et al.

CVPR 2025highlightarXiv:2503.16429
44
citations
#902

CraftRTL: High-quality Synthetic Data Generation for Verilog Code Models with Correct-by-Construction Non-Textual Representations and Targeted Code Repair

Mingjie Liu, Yun-Da Tsai, Wenfei Zhou et al.

ICLR 2025arXiv:2409.12993
44
citations
#903

One-Step Diffusion Policy: Fast Visuomotor Policies via Diffusion Distillation

Zhendong Wang, Max Li, Ajay Mandlekar et al.

ICML 2025arXiv:2410.21257
44
citations
#904

AdaWorld: Learning Adaptable World Models with Latent Actions

Shenyuan Gao, Siyuan Zhou, Yilun Du et al.

ICML 2025arXiv:2503.18938
44
citations
#905

CameraCtrl II: Dynamic Scene Exploration via Camera-controlled Video Diffusion Models

Hao He, Ceyuan Yang, Shanchuan Lin et al.

ICCV 2025arXiv:2503.10592
44
citations
#906

UniPortrait: A Unified Framework for Identity-Preserving Single- and Multi-Human Image Personalization

Junjie He, Yifeng Geng, Liefeng Bo

ICCV 2025arXiv:2408.05939
44
citations
#907

Looking Inward: Language Models Can Learn About Themselves by Introspection

Felix Jedidja Binder, James Chua, Tomek Korbak et al.

ICLR 2025oralarXiv:2410.13787
44
citations
#908

Act Only When It Pays: Efficient Reinforcement Learning for LLM Reasoning via Selective Rollouts

Haizhong Zheng, Yang Zhou, Brian Bartoldson et al.

NEURIPS 2025oralarXiv:2506.02177
44
citations
#909

AA-CLIP: Enhancing Zero-Shot Anomaly Detection via Anomaly-Aware CLIP

wenxin ma, Xu Zhang, Qingsong Yao et al.

CVPR 2025arXiv:2503.06661
44
citations
#910

TestGenEval: A Real World Unit Test Generation and Test Completion Benchmark

Kush Jain, Gabriel Synnaeve, Baptiste Roziere

ICLR 2025arXiv:2410.00752
44
citations
#911

ILLUME: Illuminating Your LLMs to See, Draw, and Self-Enhance

Chunwei Wang, Guansong Lu, Junwei Yang et al.

ICCV 2025arXiv:2412.06673
44
citations
#912

All Languages Matter: Evaluating LMMs on Culturally Diverse 100 Languages

Ashmal Vayani, Dinura Dissanayake, Hasindri Watawana et al.

CVPR 2025highlightarXiv:2411.16508
44
citations
#913

Theory on Mixture-of-Experts in Continual Learning

Hongbo Li, Sen Lin, Lingjie Duan et al.

ICLR 2025arXiv:2406.16437
44
citations
#914

On the Optimization and Generalization of Multi-head Attention

Christos Thrampoulidis, Rouzbeh Ghaderi, Hossein Taheri et al.

ICLR 2025arXiv:2310.12680
44
citations
#915

An Architecture Search Framework for Inference-Time Techniques

Jon Saad-Falcon, Adrian Lafuente, Shlok Natarajan et al.

ICML 2025arXiv:2409.15254
43
citations
#916

Detecting Data Deviations in Electronic Health Records

Kaiping Zheng, Horng-Ruey Chua, Beng Chin Ooi

NEURIPS 2025
43
citations
#917

The Semantic Hub Hypothesis: Language Models Share Semantic Representations Across Languages and Modalities

Zhaofeng Wu, Xinyan Yu, Dani Yogatama et al.

ICLR 2025arXiv:2411.04986
43
citations
#918

CC-OCR: A Comprehensive and Challenging OCR Benchmark for Evaluating Large Multimodal Models in Literacy

Zhibo Yang, Jun Tang, Zhaohai Li et al.

ICCV 2025arXiv:2412.02210
43
citations
#919

On the Role of Attention Heads in Large Language Model Safety

Zhenhong Zhou, Haiyang Yu, Xinghua Zhang et al.

ICLR 2025arXiv:2410.13708
43
citations
#920

Large Language Models Assume People are More Rational than We Really are

Ryan Liu, Jiayi Geng, Joshua Peterson et al.

ICLR 2025arXiv:2406.17055
43
citations
#921

PhysTwin: Physics-Informed Reconstruction and Simulation of Deformable Objects from Videos

Hanxiao Jiang, Hao-Yu Hsu, Kaifeng Zhang et al.

ICCV 2025arXiv:2503.17973
43
citations
#922

ShorterBetter: Guiding Reasoning Models to Find Optimal Inference Length for Efficient Reasoning

Jingyang Yi, Jiazheng Wang, Sida Li

NEURIPS 2025arXiv:2504.21370
43
citations
#923

Streaming Video Question-Answering with In-context Video KV-Cache Retrieval

Shangzhe Di, Zhelun Yu, Guanghao Zhang et al.

ICLR 2025arXiv:2503.00540
43
citations
#924

GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents

Qianhui Wu, Kanzhi Cheng, Rui Yang et al.

NEURIPS 2025arXiv:2506.03143
43
citations
#925

CG-Bench: Clue-grounded Question Answering Benchmark for Long Video Understanding

Guo Chen, Yicheng Liu, Yifei Huang et al.

ICLR 2025arXiv:2412.12075
43
citations
#926

Is Best-of-N the Best of Them? Coverage, Scaling, and Optimality in Inference-Time Alignment

Audrey Huang, Adam Block, Qinghua Liu et al.

ICML 2025arXiv:2503.21878
43
citations
#927

VISTA3D: A Unified Segmentation Foundation Model For 3D Medical Imaging

Yufan He, Pengfei Guo, Yucheng Tang et al.

CVPR 2025arXiv:2406.05285
43
citations
#928

CollabLLM: From Passive Responders to Active Collaborators

Shirley Wu, Michel Galley, Baolin Peng et al.

ICML 2025oralarXiv:2502.00640
43
citations
#929

Arctic-Embed 2.0: Multilingual Retrieval Without Compromise

Puxuan Yu, Luke Merrick, Gaurav Nuti et al.

COLM 2025paperarXiv:2412.04506
43
citations
#930

LLaVA-ST: A Multimodal Large Language Model for Fine-Grained Spatial-Temporal Understanding

Hongyu Li, Jinyu Chen, Ziyu Wei et al.

CVPR 2025arXiv:2501.08282
43
citations
#931

SpargeAttention: Accurate and Training-free Sparse Attention Accelerating Any Model Inference

Jintao Zhang, Chendong Xiang, Haofeng Huang et al.

ICML 2025arXiv:2502.18137
43
citations
#932

Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models

Michael Noukhovitch, Shengyi Huang, Sophie Xhonneux et al.

ICLR 2025arXiv:2410.18252
43
citations
#933

5%>100%: Breaking Performance Shackles of Full Fine-Tuning on Visual Recognition Tasks

Dongshuo Yin, Leiyi Hu, Bin Li et al.

CVPR 2025arXiv:2408.08345
43
citations
#934

TabArena: A Living Benchmark for Machine Learning on Tabular Data

Nick Erickson, Lennart Purucker, Andrej Tschalzev et al.

NEURIPS 2025spotlightarXiv:2506.16791
43
citations
#935

MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs

Yusu Qian, Hanrong Ye, Jean-Philippe Fauconnier et al.

ICLR 2025arXiv:2407.01509
43
citations
#936

Towards Realistic Data Generation for Real-World Super-Resolution

Long Peng, Wenbo Li, Renjing Pei et al.

ICLR 2025arXiv:2406.07255
43
citations
#937

STP: Self-play LLM Theorem Provers with Iterative Conjecturing and Proving

Kefan Dong, Tengyu Ma

ICML 2025arXiv:2502.00212
43
citations
#938

EVER: Exact Volumetric Ellipsoid Rendering for Real-time View Synthesis

Alexander Mai, Peter Hedman, George Kopanas et al.

ICCV 2025arXiv:2410.01804
43
citations
#939

MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image Synthesis

Wanggui He, Siming Fu, Mushui Liu et al.

AAAI 2025paperarXiv:2407.07614
43
citations
#940

ShieldAgent: Shielding Agents via Verifiable Safety Policy Reasoning

Zhaorun Chen, Mintong Kang, Bo Li

ICML 2025arXiv:2503.22738
43
citations
#941

ETA: Evaluating Then Aligning Safety of Vision Language Models at Inference Time

Yi Ding, Bolian Li, Ruqi Zhang

ICLR 2025arXiv:2410.06625
43
citations
#942

How Does Critical Batch Size Scale in Pre-training?

Hanlin Zhang, Depen Morwani, Nikhil Vyas et al.

ICLR 2025arXiv:2410.21676
43
citations
#943

Self-Evolving Multi-Agent Collaboration Networks for Software Development

Yue Hu, Yuzhu Cai, Yaxin Du et al.

ICLR 2025arXiv:2410.16946
43
citations
#944

Learning Harmonized Representations for Speculative Sampling

Lefan Zhang, Xiaodan Wang, Yanhua Huang et al.

ICLR 2025arXiv:2408.15766
43
citations
#945

ACE: All-round Creator and Editor Following Instructions via Diffusion Transformer

Zhen Han, Zeyinzi Jiang, Yulin Pan et al.

ICLR 2025arXiv:2410.00086
43
citations
#946

Point-SAM: Promptable 3D Segmentation Model for Point Clouds

Yuchen Zhou, Jiayuan Gu, Tung Chiang et al.

ICLR 2025arXiv:2406.17741
43
citations
#947

Orthogonal Subspace Decomposition for Generalizable AI-Generated Image Detection

Zhiyuan Yan, Jiangming Wang, Peng Jin et al.

ICML 2025oralarXiv:2411.15633
43
citations
#948

On Targeted Manipulation and Deception when Optimizing LLMs for User Feedback

Marcus Williams, Micah Carroll, Adhyyan Narang et al.

ICLR 2025arXiv:2411.02306
43
citations
#949

VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM

Yuqian Yuan, Hang Zhang, Wentong Li et al.

CVPR 2025arXiv:2501.00599
43
citations
#950

OpenHumanVid: A Large-Scale High-Quality Dataset for Enhancing Human-Centric Video Generation

Hui Li, Mingwang Xu, Qingkun Su et al.

CVPR 2025highlightarXiv:2412.00115
43
citations
#951

Perception Tokens Enhance Visual Reasoning in Multimodal Language Models

Mahtab Bigverdi, Zelun Luo, Cheng-Yu Hsieh et al.

CVPR 2025arXiv:2412.03548
43
citations
#952

Driving in the Occupancy World: Vision-Centric 4D Occupancy Forecasting and Planning via World Models for Autonomous Driving

Yu Yang, Jianbiao Mei, Yukai Ma et al.

AAAI 2025paperarXiv:2408.14197
43
citations
#953

YOLOE: Real-Time Seeing Anything

Ao Wang, Lihao Liu, Hui Chen et al.

ICCV 2025arXiv:2503.07465
43
citations
#954

SeeGround: See and Ground for Zero-Shot Open-Vocabulary 3D Visual Grounding

Rong Li, Shijie Li, Lingdong Kong et al.

CVPR 2025arXiv:2412.04383
43
citations
#955

VLAS: Vision-Language-Action Model with Speech Instructions for Customized Robot Manipulation

Wei Zhao, Pengxiang Ding, Zhang Min et al.

ICLR 2025arXiv:2502.13508
43
citations
#956

Competition Dynamics Shape Algorithmic Phases of In-Context Learning

Core Francisco Park, Ekdeep Singh Lubana, Hidenori Tanaka

ICLR 2025arXiv:2412.01003
43
citations
#957

MR-GSM8K: A Meta-Reasoning Benchmark for Large Language Model Evaluation

Zhongshen Zeng, Pengguang Chen, Shu Liu et al.

ICLR 2025arXiv:2312.17080
43
citations
#958

Reasoning-SQL: Reinforcement Learning with SQL Tailored Partial Rewards for Reasoning-Enhanced Text-to-SQL

Mohammadreza Pourreza, Shayan Talaei, Ruoxi Sun et al.

COLM 2025paperarXiv:2503.23157
42
citations
#959

A Distractor-Aware Memory for Visual Object Tracking with SAM2

Alan Lukezic, Jovana Videnović, Matej Kristan

CVPR 2025arXiv:2411.17576
42
citations
#960

Aligning Language Models with Demonstrated Feedback

Omar Shaikh, Michelle Lam, Joey Hejna et al.

ICLR 2025arXiv:2406.00888
42
citations
#961

Robust Function-Calling for On-Device Language Model via Function Masking

Qiqiang Lin, Muning Wen, Qiuying Peng et al.

ICLR 2025arXiv:2410.04587
42
citations
#962

Real2Code: Reconstruct Articulated Objects via Code Generation

Mandi Zhao, Yijia Weng, Dominik Bauer et al.

ICLR 2025arXiv:2406.08474
42
citations
#963

Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate

Yubo Wang, Xiang Yue, Wenhu Chen

COLM 2025paperarXiv:2501.17703
42
citations
#964

Why Does the Effective Context Length of LLMs Fall Short?

Chenxin An, Jun Zhang, Ming Zhong et al.

ICLR 2025arXiv:2410.18745
42
citations
#965

GEM: A Generalizable Ego-Vision Multimodal World Model for Fine-Grained Ego-Motion, Object Dynamics, and Scene Composition Control

Mariam Hassan, Sebastian Stapf, Ahmad Rahimi et al.

CVPR 2025arXiv:2412.11198
42
citations
#966

Dynamic-LLaVA: Efficient Multimodal Large Language Models via Dynamic Vision-language Context Sparsification

Wenxuan Huang, Zijie Zhai, Yunhang Shen et al.

ICLR 2025arXiv:2412.00876
42
citations
#967

Frame-Voyager: Learning to Query Frames for Video Large Language Models

Sicheng Yu, CHENGKAI JIN, Huanyu Wang et al.

ICLR 2025arXiv:2410.03226
42
citations
#968

PhyT2V: LLM-Guided Iterative Self-Refinement for Physics-Grounded Text-to-Video Generation

Qiyao Xue, Xiangyu Yin, Boyuan Yang et al.

CVPR 2025arXiv:2412.00596
42
citations
#969

Aesthetic Post-Training Diffusion Models from Generic Preferences with Step-by-step Preference Optimization

Zhanhao Liang, Yuhui Yuan, Shuyang Gu et al.

CVPR 2025arXiv:2406.04314
42
citations
#970

TSD-SR: One-Step Diffusion with Target Score Distillation for Real-World Image Super-Resolution

linwei dong, Qingnan Fan, Yihong Guo et al.

CVPR 2025arXiv:2411.18263
42
citations
#971

Learning to Prompt with Text Only Supervision for Vision-Language Models

Muhammad Uzair Khattak, Muhammad Ferjad Naeem, Muzammal Naseer et al.

AAAI 2025paperarXiv:2401.02418
42
citations
#972

Exploiting Multimodal Spatial-temporal Patterns for Video Object Tracking

Xiantao Hu, Ying Tai, Xu Zhao et al.

AAAI 2025paperarXiv:2412.15691
42
citations
#973

Faster Video Diffusion with Trainable Sparse Attention

Peiyuan Zhang, Yongqi Chen, Haofeng Huang et al.

NEURIPS 2025arXiv:2505.13389
42
citations
#974

GenXD: Generating Any 3D and 4D Scenes

Yuyang Zhao, Chung-Ching Lin, Kevin Lin et al.

ICLR 2025oralarXiv:2411.02319
42
citations
#975

Parallelized Autoregressive Visual Generation

Yuqing Wang, Shuhuai Ren, Zhijie Lin et al.

CVPR 2025highlightarXiv:2412.15119
42
citations
#976

Agents' Room: Narrative Generation through Multi-step Collaboration

Fantine Huot, Reinald Kim Amplayo, Jennimaria Palomaki et al.

ICLR 2025arXiv:2410.02603
42
citations
#977

PolaFormer: Polarity-aware Linear Attention for Vision Transformers

Weikang Meng, Yadan Luo, Xin Li et al.

ICLR 2025arXiv:2501.15061
42
citations
#978

ReCapture: Generative Video Camera Controls for User-Provided Videos using Masked Video Fine-Tuning

David Junhao Zhang, Roni Paiss, Shiran Zada et al.

CVPR 2025arXiv:2411.05003
42
citations
#979

ENCODER: Entity Mining and Modification Relation Binding for Composed Image Retrieval

Zixu Li, Zhiwei Chen, Haokun Wen et al.

AAAI 2025paper
42
citations
#980

Transformer Layers as Painters

Qi Sun, Marc Pickett, Aakash Kumar Nain et al.

AAAI 2025paperarXiv:2407.09298
42
citations
#981

Diffusion Feedback Helps CLIP See Better

Wenxuan Wang, Quan Sun, Fan Zhang et al.

ICLR 2025arXiv:2407.20171
42
citations
#982

SymmCD: Symmetry-Preserving Crystal Generation with Diffusion Models

Daniel Levy, Siba Smarak Panigrahi, Sékou-Oumar Kaba et al.

ICLR 2025arXiv:2502.03638
41
citations
#983

SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration

Heming Xia, Yongqi Li, Jun Zhang et al.

ICLR 2025arXiv:2410.06916
41
citations
#984

Diffusion Self-Distillation for Zero-Shot Customized Image Generation

Shengqu Cai, Eric Ryan Chan, Yunzhi Zhang et al.

CVPR 2025arXiv:2411.18616
41
citations
#985

On the expressiveness and spectral bias of KANs

Yixuan Wang, Jonathan Siegel, Ziming Liu et al.

ICLR 2025arXiv:2410.01803
41
citations
#986

Proactive Agent: Shifting LLM Agents from Reactive Responses to Active Assistance

Yaxi Lu, Shenzhi Yang, Cheng Qian et al.

ICLR 2025arXiv:2410.12361
41
citations
#987

SG-I2V: Self-Guided Trajectory Control in Image-to-Video Generation

Koichi Namekata, Sherwin Bahmani, Ziyi Wu et al.

ICLR 2025arXiv:2411.04989
41
citations
#988

Towards Long-Horizon Vision-Language Navigation: Platform, Benchmark and Method

Xinshuai Song, weixing chen, Yang Liu et al.

CVPR 2025arXiv:2412.09082
41
citations
#989

OverLoCK: An Overview-first-Look-Closely-next ConvNet with Context-Mixing Dynamic Kernels

Meng Lou, Yizhou Yu

CVPR 2025arXiv:2502.20087
41
citations
#990

TinySAM: Pushing the Envelope for Efficient Segment Anything Model

Han Shu, Wenshuo Li, Yehui Tang et al.

AAAI 2025paperarXiv:2312.13789
41
citations
#991

UniGoal: Towards Universal Zero-shot Goal-oriented Navigation

Hang Yin, Xiuwei Xu, Linqing Zhao et al.

CVPR 2025arXiv:2503.10630
41
citations
#992

SANA-Sprint: One-Step Diffusion with Continuous-Time Consistency Distillation

Junsong Chen, Shuchen Xue, Yuyang Zhao et al.

ICCV 2025highlightarXiv:2503.09641
41
citations
#993

DiTAR: Diffusion Transformer Autoregressive Modeling for Speech Generation

Dongya Jia, Zhuo Chen, Jiawei Chen et al.

ICML 2025arXiv:2502.03930
41
citations
#994

Adjoint Sampling: Highly Scalable Diffusion Samplers via Adjoint Matching

Aaron Havens, Benjamin Kurt Miller, Bing Yan et al.

ICML 2025arXiv:2504.11713
41
citations
#995

Generalizing Deepfake Video Detection with Plug-and-Play: Video-Level Blending and Spatiotemporal Adapter Tuning

Zhiyuan Yan, Yandan Zhao, Shen Chen et al.

CVPR 2025arXiv:2408.17065
41
citations
#996

MIDI: Multi-Instance Diffusion for Single Image to 3D Scene Generation

Zehuan Huang, Yuanchen Guo, Xingqiao An et al.

CVPR 2025arXiv:2412.03558
41
citations
#997

DELTA: DENSE EFFICIENT LONG-RANGE 3D TRACKING FOR ANY VIDEO

Tuan Ngo, Peiye Zhuang, Evangelos Kalogerakis et al.

ICLR 2025arXiv:2410.24211
41
citations
#998

Towards LLM Unlearning Resilient to Relearning Attacks: A Sharpness-Aware Minimization Perspective and Beyond

Chongyu Fan, jinghan jia, Yihua Zhang et al.

ICML 2025arXiv:2502.05374
41
citations
#999

KungfuBot: Physics-Based Humanoid Whole-Body Control for Learning Highly-Dynamic Skills

Weiji Xie, Jinrui Han, Jiakun Zheng et al.

NEURIPS 2025arXiv:2506.12851
41
citations
#1000

Re-thinking Temporal Search for Long-Form Video Understanding

Jinhui Ye, Zihan Wang, Haosen Sun et al.

CVPR 2025arXiv:2504.02259
41
citations