Most Cited 2025 "cross-domain segmentation" Papers

22,274 papers found • Page 5 of 112

Filters:Most Cited 2025 cross-domain segmentation Clear all

Conference

AAAI 2025 (3,028)COLM 2025 (418)CVPR 2025 (2,873)ICCV 2025 (2,701)ICLR 2025 (3,827)ICML 2025 (3,340)ISMAR 2025 (229)NEURIPS 2025 (5,858)AAAI 2024 (2,289)CVPR 2024 (2,716)ECCV 2024 (2,387)ICLR 2024 (2,297)ICML 2024 (2,635)

Paper Type

poster (24,624)paper (8,558)oral (1,594)spotlight (1,421)highlight (975)

#801

OMNI-EPIC: Open-endedness via Models of human Notions of Interestingness with Environments Programmed in Code

Maxence Faldor, Jenny Zhang, Antoine Cully et al.

ICLR 2025arXiv:2405.15568

citations

#802

Visual Agents as Fast and Slow Thinkers

Guangyan Sun, Mingyu Jin, Zhenting Wang et al.

ICLR 2025arXiv:2408.08862

citations

#803

Can LLMs Separate Instructions From Data? And What Do We Even Mean By That?

Egor Zverev, Sahar Abdelnabi, Soroush Tabesh et al.

ICLR 2025arXiv:2403.06833

citations

#804

Data Shapley in One Training Run

Jiachen (Tianhao) Wang, Prateek Mittal, Dawn Song et al.

ICLR 2025arXiv:2406.11011

citations

#805

MLGym: A New Framework and Benchmark for Advancing AI Research Agents

Deepak Nathani, Lovish Madaan, Nicholas Roberts et al.

COLM 2025paper

citations

#806

Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models

Keisuke Kamahori, Tian Tang, Yile Gu et al.

ICLR 2025arXiv:2402.07033

citations

#807

The Matrix: Infinite-Horizon World Generation with Real-Time Moving Control

Ruili Feng, Han Zhang, Zhilei Shu et al.

NEURIPS 2025arXiv:2412.03568

citations

#808

STAIR: Improving Safety Alignment with Introspective Reasoning

Yichi Zhang, Siyuan Zhang, Yao Huang et al.

ICML 2025oralarXiv:2502.02384

citations

#809

VGGT-SLAM: Dense RGB SLAM Optimized on the SL(4) Manifold

Dominic Maggio, Hyungtae Lim, Luca Carlone

NEURIPS 2025arXiv:2505.12549

citations

#810

Causal Order: The Key to Leveraging Imperfect Experts in Causal Inference

Aniket Vashishtha, Abbavaram Gowtham Reddy, Abhinav Kumar et al.

ICLR 2025arXiv:2310.15117

citations

#811

MOOSE-Chem: Large Language Models for Rediscovering Unseen Chemistry Scientific Hypotheses

Zonglin Yang, Wanhao Liu, Ben Gao et al.

ICLR 2025arXiv:2410.07076

citations

#812

SimBa: Simplicity Bias for Scaling Up Parameters in Deep Reinforcement Learning

Hojoon Lee, Dongyoon Hwang, Donghu Kim et al.

ICLR 2025arXiv:2410.09754

citations

#813

Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing

Xiangyu Zhao, Peiyuan Zhang, Kexian Tang et al.

NEURIPS 2025oralarXiv:2504.02826

citations

#814

PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding

Jang Hyun Cho, Andrea Madotto, Effrosyni Mavroudi et al.

NEURIPS 2025oralarXiv:2504.13180

citations

#815

Scaling FP8 training to trillion-token LLMs

Maxim Fishman, Brian Chmiel, Ron Banner et al.

ICLR 2025arXiv:2409.12517

citations

#816

FlipAttack: Jailbreak LLMs via Flipping

Yue Liu, Xiaoxin He, Miao Xiong et al.

ICML 2025arXiv:2410.02832

citations

#817

SQL-R1: Training Natural Language to SQL Reasoning Model By Reinforcement Learning

Peixian Ma, Xialie Zhuang, Chengjin Xu et al.

NEURIPS 2025arXiv:2504.08600

citations

#818

FlowAR: Scale-wise Autoregressive Image Generation Meets Flow Matching

Sucheng Ren, Qihang Yu, Ju He et al.

ICML 2025arXiv:2412.15205

citations

#819

OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations

Linke Ouyang, Yuan Qu, Hongbin Zhou et al.

CVPR 2025arXiv:2412.07626

citations

#820

AnyEdit: Edit Any Knowledge Encoded in Language Models

Houcheng Jiang, Junfeng Fang, Ningyu Zhang et al.

ICML 2025arXiv:2502.05628

citations

#821

Test-time Alignment of Diffusion Models without Reward Over-optimization

Sunwoo Kim, Minkyu Kim, Dongmin Park

ICLR 2025arXiv:2501.05803

citations

#822

OSTQuant: Refining Large Language Model Quantization with Orthogonal and Scaling Transformations for Better Distribution Fitting

Xing Hu, Yuan Cheng, Dawei Yang et al.

ICLR 2025arXiv:2501.13987

citations

#823

Learn Beneficial Noise as Graph Augmentation

Siqi Huang, Yanchen Xu, Hongyuan Zhang et al.

ICML 2025arXiv:2505.19024

citations

#824

OmniManip: Towards General Robotic Manipulation via Object-Centric Interaction Primitives as Spatial Constraints

Mingjie Pan, Jiyao Zhang, Tianshu Wu et al.

CVPR 2025highlightarXiv:2501.03841

citations

#825

RMB: Comprehensively benchmarking reward models in LLM alignment

Enyu Zhou, Guodong Zheng, Binghai Wang et al.

ICLR 2025arXiv:2410.09893

citations

#826

DLF: Disentangled-Language-Focused Multimodal Sentiment Analysis

Pan Wang, Qiang Zhou, Yawen Wu et al.

AAAI 2025paperarXiv:2412.12225

citations

#827

Dualformer: Controllable Fast and Slow Thinking by Learning with Randomized Reasoning Traces

Andy (DiJia) Su, Sainbayar Sukhbaatar, Michael Rabbat et al.

ICLR 2025arXiv:2410.09918

citations

#828

Reloc3r: Large-Scale Training of Relative Camera Pose Regression for Generalizable, Fast, and Accurate Visual Localization

Siyan Dong, Shuzhe Wang, Shaohui Liu et al.

CVPR 2025arXiv:2412.08376

citations

#829

Selective Aggregation for Low-Rank Adaptation in Federated Learning

Pengxin Guo, Shuang Zeng, Yanran Wang et al.

ICLR 2025arXiv:2410.01463

citations

#830

Learning How Hard to Think: Input-Adaptive Allocation of LM Computation

Mehul Damani, Idan Shenfeld, Andi Peng et al.

ICLR 2025arXiv:2410.04707

citations

#831

Reasoning Gym: Reasoning Environments for Reinforcement Learning with Verifiable Rewards

Zafir Stojanovski, Oliver Stanley, Joe Sharratt et al.

NEURIPS 2025spotlightarXiv:2505.24760

citations

#832

4K4DGen: Panoramic 4D Generation at 4K Resolution

Renjie Li, Panwang Pan, Bangbang Yang et al.

ICLR 2025oralarXiv:2406.13527

citations

#833

Promptriever: Instruction-Trained Retrievers Can Be Prompted Like Language Models

Orion Weller, Ben Van Durme, Dawn Lawrie et al.

ICLR 2025arXiv:2409.11136

citations

#834

Rank1: Test-Time Compute for Reranking in Information Retrieval

Orion Weller, Kathryn Ricci, Eugene Yang et al.

COLM 2025paperarXiv:2502.18418

citations

#835

Generator Matching: Generative modeling with arbitrary Markov processes

Peter Holderrieth, Marton Havasi, Jason Yim et al.

ICLR 2025arXiv:2410.20587

citations

#836

Empirical Design in Reinforcement Learning

Andrew Patterson, Samuel F Neumann, Martha White et al.

ICML 2025arXiv:2304.01315

citations

#837

Why Is Spatial Reasoning Hard for VLMs? An Attention Mechanism Perspective on Focus Areas

Shiqi Chen, Tongyao Zhu, Ruochen Zhou et al.

ICML 2025arXiv:2503.01773

citations

#838

VLOGGER: Multimodal Diffusion for Embodied Avatar Synthesis

Enric Corona, Andrei Zanfir, Eduard Gabriel Bazavan et al.

CVPR 2025arXiv:2403.08764

citations

#839

HS-FPN: High Frequency and Spatial Perception FPN for Tiny Object Detection

Zican Shi, Jing Hu, Jie Ren et al.

AAAI 2025paperarXiv:2412.10116

citations

#840

Real-Time Execution of Action Chunking Flow Policies

Kevin Black, Manuel Galliker, Sergey Levine

NEURIPS 2025oralarXiv:2506.07339

citations

#841

WritingBench: A Comprehensive Benchmark for Generative Writing

Yuning Wu, Jiahao Mei, Ming Yan et al.

NEURIPS 2025arXiv:2503.05244

citations

#842

SpecDec++: Boosting Speculative Decoding via Adaptive Candidate Lengths

Kaixuan Huang, Xudong Guo, Mengdi Wang

COLM 2025paper

citations

#843

Universal Actions for Enhanced Embodied Foundation Models

Jinliang Zheng, Jianxiong Li, Dongxiu Liu et al.

CVPR 2025arXiv:2501.10105

citations

#844

Preble: Efficient Distributed Prompt Scheduling for LLM Serving

Vikranth Srivatsa, Zijian He, Reyna Abhyankar et al.

ICLR 2025arXiv:2407.00023

citations

#845

DINOv2 Meets Text: A Unified Framework for Image- and Pixel-Level Vision-Language Alignment

Dahyun Kang, Piotr Bojanowski, Huy V. Vo et al.

CVPR 2025arXiv:2412.16334

citations

#846

GaussianWorld: Gaussian World Model for Streaming 3D Occupancy Prediction

Sicheng Zuo, Wenzhao Zheng, Yuanhui Huang et al.

CVPR 2025arXiv:2412.10373

citations

#847

Is Your Model Really A Good Math Reasoner? Evaluating Mathematical Reasoning with Checklist

Zihao Zhou, Shudong Liu, Maizhen Ning et al.

ICLR 2025arXiv:2407.08733

citations

#848

R2E-Gym: Procedural Environment Generation and Hybrid Verifiers for Scaling Open-Weights SWE Agents

Naman Jain, Jaskirat Singh, Manish Shetty et al.

COLM 2025paper

citations

#849

Image Conductor: Precision Control for Interactive Video Synthesis

Yaowei Li, Xintao Wang, Zhaoyang Zhang et al.

AAAI 2025paperarXiv:2406.15339

citations

#850

WorldScore: Unified Evaluation Benchmark for World Generation

Haoyi Duan, Hong-Xing Yu, Sirui Chen et al.

ICCV 2025

citations

#851

DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation

Minghong Cai, Xiaodong Cun, Xiaoyu Li et al.

CVPR 2025arXiv:2412.18597

citations

#852

Depth Any Video with Scalable Synthetic Data

Honghui Yang, Di Huang, Wei Yin et al.

ICLR 2025oralarXiv:2410.10815

citations

#853

MultiBooth: Towards Generating All Your Concepts in an Image from Text

Chenyang Zhu, Kai Li, Yue Ma et al.

AAAI 2025paperarXiv:2404.14239

citations

#854

M-LLM Based Video Frame Selection for Efficient Video Understanding

Kai Hu, Feng Gao, Xiaohan Nie et al.

CVPR 2025arXiv:2502.19680

citations

#855

Which Agent Causes Task Failures and When? On Automated Failure Attribution of LLM Multi-Agent Systems

Shaokun Zhang, Ming Yin, Jieyu Zhang et al.

ICML 2025spotlightarXiv:2505.00212

citations

#856

Learning Diverse Attacks on Large Language Models for Robust Red-Teaming and Safety Tuning

Seanie Lee, Minsu Kim, Lynn Cherif et al.

ICLR 2025arXiv:2405.18540

citations

#857

End-to-End Autonomous Driving Through V2X Cooperation

Haibao Yu, Wenxian Yang, Jiaru Zhong et al.

AAAI 2025paperarXiv:2404.00717

citations

#858

St4RTrack: Simultaneous 4D Reconstruction and Tracking in the World

Haiwen Feng, Junyi Zhang, Qianqian Wang et al.

ICCV 2025arXiv:2504.13152

citations

#859

What Can RL Bring to VLA Generalization? An Empirical Study

Jijia Liu, Feng Gao, Bingwen Wei et al.

NEURIPS 2025arXiv:2505.19789

citations

#860

HSEvo: Elevating Automatic Heuristic Design with Diversity-Driven Harmony Search and Genetic Algorithm Using LLMs

Pham Vu Tuan Dat, Long Doan, Huynh Thi Thanh Binh

AAAI 2025paperarXiv:2412.14995

citations

#861

Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms

Zhangheng LI, Keen You, Haotian Zhang et al.

ICLR 2025arXiv:2410.18967

citations

#862

Robust LLM safeguarding via refusal feature adversarial training

Lei Yu, Virginie Do, Karen Hambardzumyan et al.

ICLR 2025arXiv:2409.20089

citations

#863

Beyond Text-Visual Attention: Exploiting Visual Cues for Effective Token Pruning in VLMs

Qizhe Zhang, Aosong Cheng, Ming Lu et al.

ICCV 2025arXiv:2412.01818

citations

#864

MobileMamba: Lightweight Multi-Receptive Visual Mamba Network

Haoyang He, Jiangning Zhang, Yuxuan Cai et al.

CVPR 2025arXiv:2411.15941

citations

#865

Context is Key: A Benchmark for Forecasting with Essential Textual Information

Andrew Williams, Arjun Ashok, Étienne Marcotte et al.

ICML 2025arXiv:2410.18959

citations

#866

Taming Overconfidence in LLMs: Reward Calibration in RLHF

Jixuan Leng, Chengsong Huang, Banghua Zhu et al.

ICLR 2025arXiv:2410.09724

citations

#867

RAD: Training an End-to-End Driving Policy via Large-Scale 3DGS-based Reinforcement Learning

Hao Gao, Shaoyu Chen, Bo Jiang et al.

NEURIPS 2025arXiv:2502.13144

citations

#868

Learning 4D Embodied World Models

Haoyu Zhen, Qiao Sun, Hongxin Zhang et al.

ICCV 2025arXiv:2504.20995

citations

#869

UP-VLA: A Unified Understanding and Prediction Model for Embodied Agent

Jianke Zhang, Yanjiang Guo, Yucheng Hu et al.

ICML 2025arXiv:2501.18867

citations

#870

The Surprising Effectiveness of Test-Time Training for Few-Shot Learning

Ekin Akyürek, Mehul Damani, Adam Zweiger et al.

ICML 2025arXiv:2411.07279

citations

#871

TEOChat: A Large Vision-Language Assistant for Temporal Earth Observation Data

Jeremy Irvin, Emily Liu, Joyce Chen et al.

ICLR 2025oralarXiv:2410.06234

citations

#872

LLMs as Research Tools: A Large Scale Survey of Researchers’ Usage and Perceptions

Zhehui Liao, Maria Antoniak, Inyoung Cheong et al.

COLM 2025paperarXiv:2411.05025

citations

#873

PEARL: Parallel Speculative Decoding with Adaptive Draft Length

Tianyu Liu, Yun Li, Qitan Lv et al.

ICLR 2025arXiv:2408.11850

citations

#874

Adaptive Caching for Faster Video Generation with Diffusion Transformers

Kumara Kahatapitiya, Haozhe Liu, Sen He et al.

ICCV 2025arXiv:2411.02397

citations

#875

How efficient is LLM-generated code? A rigorous & high-standard benchmark

Ruizhong Qiu, Weiliang Zeng, James Ezick et al.

ICLR 2025arXiv:2406.06647

citations

#876

Fine-Tuning Discrete Diffusion Models via Reward Optimization with Applications to DNA and Protein Design

Chenyu Wang, Masatoshi Uehara, Yichun He et al.

ICLR 2025arXiv:2410.13643

citations

#877

FireFlow: Fast Inversion of Rectified Flow for Image Semantic Editing

Yingying Deng, Xiangyu He, Changwang Mei et al.

ICML 2025arXiv:2412.07517

citations

#878

On the Resilience of LLM-Based Multi-Agent Collaboration with Faulty Agents

Jen-Tse Huang, Jiaxu Zhou, Tailin Jin et al.

ICML 2025arXiv:2408.00989

citations

#879

FaithEval: Can Your Language Model Stay Faithful to Context, Even If "The Moon is Made of Marshmallows"

Yifei Ming, Senthil Purushwalkam, Shrey Pandit et al.

ICLR 2025

citations

#880

Trust or Escalate: LLM Judges with Provable Guarantees for Human Agreement

Jaehun Jung, Faeze Brahman, Yejin Choi

ICLR 2025arXiv:2407.18370

citations

#881

Towards Practical Real-Time Neural Video Compression

Zhaoyang Jia, Bin Li, Jiahao Li et al.

CVPR 2025arXiv:2502.20762

citations

#882

Don't Shake the Wheel: Momentum-Aware Planning in End-to-End Autonomous Driving

Ziying Song, Caiyan Jia, Lin Liu et al.

CVPR 2025arXiv:2503.03125

citations

#883

Autoregressive Adversarial Post-Training for Real-Time Interactive Video Generation

Shanchuan Lin, Ceyuan Yang, Hao He et al.

NEURIPS 2025arXiv:2506.09350

citations

#884

To Code or Not To Code? Exploring Impact of Code in Pre-training

Viraat Aryabumi, Yixuan Su, Raymond Ma et al.

ICLR 2025arXiv:2408.10914

citations

#885

ReasonIR: Training Retrievers for Reasoning Tasks

Rulin Shao, Rui Qiao, Varsha Kishore et al.

COLM 2025paperarXiv:2504.20595

citations

#886

Accelerating Auto-regressive Text-to-Image Generation with Training-free Speculative Jacobi Decoding

Yao Teng, Han Shi, Xian Liu et al.

ICLR 2025arXiv:2410.01699

citations

#887

ThinK: Thinner Key Cache by Query-Driven Pruning

Yuhui Xu, Zhanming Jie, Hanze Dong et al.

ICLR 2025arXiv:2407.21018

citations

#888

Sampling-Efficient Test-Time Scaling: Self-Estimating the Best-of-N Sampling in Early Decoding

Yiming Wang, Pei Zhang, Siyuan Huang et al.

NEURIPS 2025spotlightarXiv:2503.01422

citations

#889

Articulate-Anything: Automatic Modeling of Articulated Objects via a Vision-Language Foundation Model

Long Le, Jason Xie, William Liang et al.

ICLR 2025arXiv:2410.13882

citations

#890

Scaling RL to Long Videos

Yukang Chen, Wei Huang, Baifeng Shi et al.

NEURIPS 2025arXiv:2507.07966

citations

#891

Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis

Jinbin Bai, Tian Ye, Wei Chow et al.

ICLR 2025arXiv:2410.08261

citations

#892

MET3R: Measuring Multi-View Consistency in Generated Images

Mohammad Asim, Christopher Wewer, Thomas Wimmer et al.

CVPR 2025arXiv:2501.06336

citations

#893

Video World Models with Long-term Spatial Memory

Tong Wu, Shuai Yang, Ryan Po et al.

NEURIPS 2025oralarXiv:2506.05284

citations

#894

Accelerating Diffusion LLMs via Adaptive Parallel Decoding

Daniel Israel, Guy Van den Broeck, Aditya Grover

NEURIPS 2025spotlightarXiv:2506.00413

citations

#895

InterMimic: Towards Universal Whole-Body Control for Physics-Based Human-Object Interactions

Sirui Xu, Hung Yu Ling, Yu-Xiong Wang et al.

CVPR 2025highlightarXiv:2502.20390

citations

#896

Thinking LLMs: General Instruction Following with Thought Generation

Tianhao Wu, Janice Lan, Weizhe Yuan et al.

ICML 2025arXiv:2410.10630

citations

#897

QuEST: Low-bit Diffusion Model Quantization via Efficient Selective Finetuning

Haoxuan Wang, Yuzhang Shang, Zhihang Yuan et al.

ICCV 2025arXiv:2402.03666

citations

#898

RIFLEx: A Free Lunch for Length Extrapolation in Video Diffusion Transformers

Min Zhao, Guande He, Yixiao Chen et al.

ICML 2025oralarXiv:2502.15894

citations

#899

Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning

Chengqi Lyu, Songyang Gao, Yuzhe Gu et al.

COLM 2025paper

citations

#900

Prototypical Calibrating Ambiguous Samples for Micro-Action Recognition

Kun Li, Dan Guo, Guoliang Chen et al.

AAAI 2025paperarXiv:2412.14719

citations

#901

Sonata: Self-Supervised Learning of Reliable Point Representations

Xiaoyang Wu, Daniel DeTone, Duncan Frost et al.

CVPR 2025highlightarXiv:2503.16429

citations

#902

CraftRTL: High-quality Synthetic Data Generation for Verilog Code Models with Correct-by-Construction Non-Textual Representations and Targeted Code Repair

Mingjie Liu, Yun-Da Tsai, Wenfei Zhou et al.

ICLR 2025arXiv:2409.12993

citations

#903

One-Step Diffusion Policy: Fast Visuomotor Policies via Diffusion Distillation

Zhendong Wang, Max Li, Ajay Mandlekar et al.

ICML 2025arXiv:2410.21257

citations

#904

AdaWorld: Learning Adaptable World Models with Latent Actions

Shenyuan Gao, Siyuan Zhou, Yilun Du et al.

ICML 2025arXiv:2503.18938

citations

#905

CameraCtrl II: Dynamic Scene Exploration via Camera-controlled Video Diffusion Models

Hao He, Ceyuan Yang, Shanchuan Lin et al.

ICCV 2025arXiv:2503.10592

citations

#906

UniPortrait: A Unified Framework for Identity-Preserving Single- and Multi-Human Image Personalization

Junjie He, Yifeng Geng, Liefeng Bo

ICCV 2025arXiv:2408.05939

citations

#907

Looking Inward: Language Models Can Learn About Themselves by Introspection

Felix Jedidja Binder, James Chua, Tomek Korbak et al.

ICLR 2025oralarXiv:2410.13787

citations

#908

Act Only When It Pays: Efficient Reinforcement Learning for LLM Reasoning via Selective Rollouts

Haizhong Zheng, Yang Zhou, Brian Bartoldson et al.

NEURIPS 2025oralarXiv:2506.02177

citations

#909

AA-CLIP: Enhancing Zero-Shot Anomaly Detection via Anomaly-Aware CLIP

wenxin ma, Xu Zhang, Qingsong Yao et al.

CVPR 2025arXiv:2503.06661

citations

#910

TestGenEval: A Real World Unit Test Generation and Test Completion Benchmark

Kush Jain, Gabriel Synnaeve, Baptiste Roziere

ICLR 2025arXiv:2410.00752

citations

#911

ILLUME: Illuminating Your LLMs to See, Draw, and Self-Enhance

Chunwei Wang, Guansong Lu, Junwei Yang et al.

ICCV 2025arXiv:2412.06673

citations

#912

All Languages Matter: Evaluating LMMs on Culturally Diverse 100 Languages

Ashmal Vayani, Dinura Dissanayake, Hasindri Watawana et al.

CVPR 2025highlightarXiv:2411.16508

citations

#913

Theory on Mixture-of-Experts in Continual Learning

Hongbo Li, Sen Lin, Lingjie Duan et al.

ICLR 2025arXiv:2406.16437

citations

#914

On the Optimization and Generalization of Multi-head Attention

Christos Thrampoulidis, Rouzbeh Ghaderi, Hossein Taheri et al.

ICLR 2025arXiv:2310.12680

citations

#915

An Architecture Search Framework for Inference-Time Techniques

Jon Saad-Falcon, Adrian Lafuente, Shlok Natarajan et al.

ICML 2025arXiv:2409.15254

citations

#916

Detecting Data Deviations in Electronic Health Records

Kaiping Zheng, Horng-Ruey Chua, Beng Chin Ooi

NEURIPS 2025

citations

#917

The Semantic Hub Hypothesis: Language Models Share Semantic Representations Across Languages and Modalities

Zhaofeng Wu, Xinyan Yu, Dani Yogatama et al.

ICLR 2025arXiv:2411.04986

citations

#918

CC-OCR: A Comprehensive and Challenging OCR Benchmark for Evaluating Large Multimodal Models in Literacy

Zhibo Yang, Jun Tang, Zhaohai Li et al.

ICCV 2025arXiv:2412.02210

citations

#919

On the Role of Attention Heads in Large Language Model Safety

Zhenhong Zhou, Haiyang Yu, Xinghua Zhang et al.

ICLR 2025arXiv:2410.13708

citations

#920

Large Language Models Assume People are More Rational than We Really are

Ryan Liu, Jiayi Geng, Joshua Peterson et al.

ICLR 2025arXiv:2406.17055

citations

#921

PhysTwin: Physics-Informed Reconstruction and Simulation of Deformable Objects from Videos

Hanxiao Jiang, Hao-Yu Hsu, Kaifeng Zhang et al.

ICCV 2025arXiv:2503.17973

citations

#922

ShorterBetter: Guiding Reasoning Models to Find Optimal Inference Length for Efficient Reasoning

Jingyang Yi, Jiazheng Wang, Sida Li

NEURIPS 2025arXiv:2504.21370

citations

#923

Streaming Video Question-Answering with In-context Video KV-Cache Retrieval

Shangzhe Di, Zhelun Yu, Guanghao Zhang et al.

ICLR 2025arXiv:2503.00540

citations

#924

GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents

Qianhui Wu, Kanzhi Cheng, Rui Yang et al.

NEURIPS 2025arXiv:2506.03143

citations

#925

CG-Bench: Clue-grounded Question Answering Benchmark for Long Video Understanding

Guo Chen, Yicheng Liu, Yifei Huang et al.

ICLR 2025arXiv:2412.12075

citations

#926

Is Best-of-N the Best of Them? Coverage, Scaling, and Optimality in Inference-Time Alignment

Audrey Huang, Adam Block, Qinghua Liu et al.

ICML 2025arXiv:2503.21878

citations

#927

VISTA3D: A Unified Segmentation Foundation Model For 3D Medical Imaging

Yufan He, Pengfei Guo, Yucheng Tang et al.

CVPR 2025arXiv:2406.05285

citations

#928

CollabLLM: From Passive Responders to Active Collaborators

Shirley Wu, Michel Galley, Baolin Peng et al.

ICML 2025oralarXiv:2502.00640

citations

#929

Arctic-Embed 2.0: Multilingual Retrieval Without Compromise

Puxuan Yu, Luke Merrick, Gaurav Nuti et al.

COLM 2025paperarXiv:2412.04506

citations

#930

LLaVA-ST: A Multimodal Large Language Model for Fine-Grained Spatial-Temporal Understanding

Hongyu Li, Jinyu Chen, Ziyu Wei et al.

CVPR 2025arXiv:2501.08282

citations

#931

SpargeAttention: Accurate and Training-free Sparse Attention Accelerating Any Model Inference

Jintao Zhang, Chendong Xiang, Haofeng Huang et al.

ICML 2025arXiv:2502.18137

citations

#932

Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models

Michael Noukhovitch, Shengyi Huang, Sophie Xhonneux et al.

ICLR 2025arXiv:2410.18252

citations

#933

5%>100%: Breaking Performance Shackles of Full Fine-Tuning on Visual Recognition Tasks

Dongshuo Yin, Leiyi Hu, Bin Li et al.

CVPR 2025arXiv:2408.08345

citations

#934

TabArena: A Living Benchmark for Machine Learning on Tabular Data

Nick Erickson, Lennart Purucker, Andrej Tschalzev et al.

NEURIPS 2025spotlightarXiv:2506.16791

citations

#935

MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs

Yusu Qian, Hanrong Ye, Jean-Philippe Fauconnier et al.

ICLR 2025arXiv:2407.01509

citations

#936

Towards Realistic Data Generation for Real-World Super-Resolution

Long Peng, Wenbo Li, Renjing Pei et al.

ICLR 2025arXiv:2406.07255

citations

#937

STP: Self-play LLM Theorem Provers with Iterative Conjecturing and Proving

Kefan Dong, Tengyu Ma

ICML 2025arXiv:2502.00212

citations

#938

EVER: Exact Volumetric Ellipsoid Rendering for Real-time View Synthesis

Alexander Mai, Peter Hedman, George Kopanas et al.

ICCV 2025arXiv:2410.01804

citations

#939

MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image Synthesis

Wanggui He, Siming Fu, Mushui Liu et al.

AAAI 2025paperarXiv:2407.07614

citations

#940

ShieldAgent: Shielding Agents via Verifiable Safety Policy Reasoning

Zhaorun Chen, Mintong Kang, Bo Li

ICML 2025arXiv:2503.22738

citations

#941

ETA: Evaluating Then Aligning Safety of Vision Language Models at Inference Time

Yi Ding, Bolian Li, Ruqi Zhang

ICLR 2025arXiv:2410.06625

citations

#942

How Does Critical Batch Size Scale in Pre-training?

Hanlin Zhang, Depen Morwani, Nikhil Vyas et al.

ICLR 2025arXiv:2410.21676

citations

#943

Self-Evolving Multi-Agent Collaboration Networks for Software Development

Yue Hu, Yuzhu Cai, Yaxin Du et al.

ICLR 2025arXiv:2410.16946

citations

#944

Learning Harmonized Representations for Speculative Sampling

Lefan Zhang, Xiaodan Wang, Yanhua Huang et al.

ICLR 2025arXiv:2408.15766

citations

#945

ACE: All-round Creator and Editor Following Instructions via Diffusion Transformer

Zhen Han, Zeyinzi Jiang, Yulin Pan et al.

ICLR 2025arXiv:2410.00086

citations

#946

Point-SAM: Promptable 3D Segmentation Model for Point Clouds

Yuchen Zhou, Jiayuan Gu, Tung Chiang et al.

ICLR 2025arXiv:2406.17741

citations

#947

Orthogonal Subspace Decomposition for Generalizable AI-Generated Image Detection

Zhiyuan Yan, Jiangming Wang, Peng Jin et al.

ICML 2025oralarXiv:2411.15633

citations

#948

On Targeted Manipulation and Deception when Optimizing LLMs for User Feedback

Marcus Williams, Micah Carroll, Adhyyan Narang et al.

ICLR 2025arXiv:2411.02306

citations

#949

VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM

Yuqian Yuan, Hang Zhang, Wentong Li et al.

CVPR 2025arXiv:2501.00599

citations

#950

OpenHumanVid: A Large-Scale High-Quality Dataset for Enhancing Human-Centric Video Generation

Hui Li, Mingwang Xu, Qingkun Su et al.

CVPR 2025highlightarXiv:2412.00115

citations

#951

Perception Tokens Enhance Visual Reasoning in Multimodal Language Models

Mahtab Bigverdi, Zelun Luo, Cheng-Yu Hsieh et al.

CVPR 2025arXiv:2412.03548

citations

#952

Driving in the Occupancy World: Vision-Centric 4D Occupancy Forecasting and Planning via World Models for Autonomous Driving

Yu Yang, Jianbiao Mei, Yukai Ma et al.

AAAI 2025paperarXiv:2408.14197

citations

#953

YOLOE: Real-Time Seeing Anything

Ao Wang, Lihao Liu, Hui Chen et al.

ICCV 2025arXiv:2503.07465

citations

#954

SeeGround: See and Ground for Zero-Shot Open-Vocabulary 3D Visual Grounding

Rong Li, Shijie Li, Lingdong Kong et al.

CVPR 2025arXiv:2412.04383

citations

#955

VLAS: Vision-Language-Action Model with Speech Instructions for Customized Robot Manipulation

Wei Zhao, Pengxiang Ding, Zhang Min et al.

ICLR 2025arXiv:2502.13508

citations

#956

Competition Dynamics Shape Algorithmic Phases of In-Context Learning

Core Francisco Park, Ekdeep Singh Lubana, Hidenori Tanaka

ICLR 2025arXiv:2412.01003

citations

#957

MR-GSM8K: A Meta-Reasoning Benchmark for Large Language Model Evaluation

Zhongshen Zeng, Pengguang Chen, Shu Liu et al.

ICLR 2025arXiv:2312.17080

citations

#958

Reasoning-SQL: Reinforcement Learning with SQL Tailored Partial Rewards for Reasoning-Enhanced Text-to-SQL

Mohammadreza Pourreza, Shayan Talaei, Ruoxi Sun et al.

COLM 2025paperarXiv:2503.23157

citations

#959

A Distractor-Aware Memory for Visual Object Tracking with SAM2

Alan Lukezic, Jovana Videnović, Matej Kristan

CVPR 2025arXiv:2411.17576

citations

#960

Aligning Language Models with Demonstrated Feedback

Omar Shaikh, Michelle Lam, Joey Hejna et al.

ICLR 2025arXiv:2406.00888

citations

#961

Robust Function-Calling for On-Device Language Model via Function Masking

Qiqiang Lin, Muning Wen, Qiuying Peng et al.

ICLR 2025arXiv:2410.04587

citations

#962

Real2Code: Reconstruct Articulated Objects via Code Generation

Mandi Zhao, Yijia Weng, Dominik Bauer et al.

ICLR 2025arXiv:2406.08474

citations

#963

Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate

Yubo Wang, Xiang Yue, Wenhu Chen

COLM 2025paperarXiv:2501.17703

citations

#964

Why Does the Effective Context Length of LLMs Fall Short?

Chenxin An, Jun Zhang, Ming Zhong et al.

ICLR 2025arXiv:2410.18745

citations

#965

GEM: A Generalizable Ego-Vision Multimodal World Model for Fine-Grained Ego-Motion, Object Dynamics, and Scene Composition Control

Mariam Hassan, Sebastian Stapf, Ahmad Rahimi et al.

CVPR 2025arXiv:2412.11198

citations

#966

Dynamic-LLaVA: Efficient Multimodal Large Language Models via Dynamic Vision-language Context Sparsification

Wenxuan Huang, Zijie Zhai, Yunhang Shen et al.

ICLR 2025arXiv:2412.00876

citations

#967

Frame-Voyager: Learning to Query Frames for Video Large Language Models

Sicheng Yu, CHENGKAI JIN, Huanyu Wang et al.

ICLR 2025arXiv:2410.03226

citations

#968

PhyT2V: LLM-Guided Iterative Self-Refinement for Physics-Grounded Text-to-Video Generation

Qiyao Xue, Xiangyu Yin, Boyuan Yang et al.

CVPR 2025arXiv:2412.00596

citations

#969

Aesthetic Post-Training Diffusion Models from Generic Preferences with Step-by-step Preference Optimization

Zhanhao Liang, Yuhui Yuan, Shuyang Gu et al.

CVPR 2025arXiv:2406.04314

citations

#970

TSD-SR: One-Step Diffusion with Target Score Distillation for Real-World Image Super-Resolution

linwei dong, Qingnan Fan, Yihong Guo et al.

CVPR 2025arXiv:2411.18263

citations

#971

Learning to Prompt with Text Only Supervision for Vision-Language Models

Muhammad Uzair Khattak, Muhammad Ferjad Naeem, Muzammal Naseer et al.

AAAI 2025paperarXiv:2401.02418

citations

#972

Exploiting Multimodal Spatial-temporal Patterns for Video Object Tracking

Xiantao Hu, Ying Tai, Xu Zhao et al.

AAAI 2025paperarXiv:2412.15691

citations

#973

Faster Video Diffusion with Trainable Sparse Attention

Peiyuan Zhang, Yongqi Chen, Haofeng Huang et al.

NEURIPS 2025arXiv:2505.13389

citations

#974

GenXD: Generating Any 3D and 4D Scenes

Yuyang Zhao, Chung-Ching Lin, Kevin Lin et al.

ICLR 2025oralarXiv:2411.02319

citations

#975

Parallelized Autoregressive Visual Generation

Yuqing Wang, Shuhuai Ren, Zhijie Lin et al.

CVPR 2025highlightarXiv:2412.15119

citations

#976

Agents' Room: Narrative Generation through Multi-step Collaboration

Fantine Huot, Reinald Kim Amplayo, Jennimaria Palomaki et al.

ICLR 2025arXiv:2410.02603

citations

#977

PolaFormer: Polarity-aware Linear Attention for Vision Transformers

Weikang Meng, Yadan Luo, Xin Li et al.

ICLR 2025arXiv:2501.15061

citations

#978

ReCapture: Generative Video Camera Controls for User-Provided Videos using Masked Video Fine-Tuning

David Junhao Zhang, Roni Paiss, Shiran Zada et al.

CVPR 2025arXiv:2411.05003

citations

#979

ENCODER: Entity Mining and Modification Relation Binding for Composed Image Retrieval

Zixu Li, Zhiwei Chen, Haokun Wen et al.

AAAI 2025paper

citations

#980

Transformer Layers as Painters

Qi Sun, Marc Pickett, Aakash Kumar Nain et al.

AAAI 2025paperarXiv:2407.09298

citations

#981

Diffusion Feedback Helps CLIP See Better

Wenxuan Wang, Quan Sun, Fan Zhang et al.

ICLR 2025arXiv:2407.20171

citations

#982

SymmCD: Symmetry-Preserving Crystal Generation with Diffusion Models

Daniel Levy, Siba Smarak Panigrahi, Sékou-Oumar Kaba et al.

ICLR 2025arXiv:2502.03638

citations

#983

SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration

Heming Xia, Yongqi Li, Jun Zhang et al.

ICLR 2025arXiv:2410.06916

citations

#984

Diffusion Self-Distillation for Zero-Shot Customized Image Generation

Shengqu Cai, Eric Ryan Chan, Yunzhi Zhang et al.

CVPR 2025arXiv:2411.18616

citations

#985

On the expressiveness and spectral bias of KANs

Yixuan Wang, Jonathan Siegel, Ziming Liu et al.

ICLR 2025arXiv:2410.01803

citations

#986

Proactive Agent: Shifting LLM Agents from Reactive Responses to Active Assistance

Yaxi Lu, Shenzhi Yang, Cheng Qian et al.

ICLR 2025arXiv:2410.12361

citations

#987

SG-I2V: Self-Guided Trajectory Control in Image-to-Video Generation

Koichi Namekata, Sherwin Bahmani, Ziyi Wu et al.

ICLR 2025arXiv:2411.04989

citations

#988

Towards Long-Horizon Vision-Language Navigation: Platform, Benchmark and Method

Xinshuai Song, weixing chen, Yang Liu et al.

CVPR 2025arXiv:2412.09082

citations

#989

OverLoCK: An Overview-first-Look-Closely-next ConvNet with Context-Mixing Dynamic Kernels

Meng Lou, Yizhou Yu

CVPR 2025arXiv:2502.20087

citations

#990

TinySAM: Pushing the Envelope for Efficient Segment Anything Model

Han Shu, Wenshuo Li, Yehui Tang et al.

AAAI 2025paperarXiv:2312.13789

citations

#991

UniGoal: Towards Universal Zero-shot Goal-oriented Navigation

Hang Yin, Xiuwei Xu, Linqing Zhao et al.

CVPR 2025arXiv:2503.10630

citations

#992

SANA-Sprint: One-Step Diffusion with Continuous-Time Consistency Distillation

Junsong Chen, Shuchen Xue, Yuyang Zhao et al.

ICCV 2025highlightarXiv:2503.09641

citations

#993

DiTAR: Diffusion Transformer Autoregressive Modeling for Speech Generation

Dongya Jia, Zhuo Chen, Jiawei Chen et al.

ICML 2025arXiv:2502.03930

citations

#994

Adjoint Sampling: Highly Scalable Diffusion Samplers via Adjoint Matching

Aaron Havens, Benjamin Kurt Miller, Bing Yan et al.

ICML 2025arXiv:2504.11713

citations

#995

Generalizing Deepfake Video Detection with Plug-and-Play: Video-Level Blending and Spatiotemporal Adapter Tuning

Zhiyuan Yan, Yandan Zhao, Shen Chen et al.

CVPR 2025arXiv:2408.17065

citations

#996

MIDI: Multi-Instance Diffusion for Single Image to 3D Scene Generation

Zehuan Huang, Yuanchen Guo, Xingqiao An et al.

CVPR 2025arXiv:2412.03558

citations

#997

DELTA: DENSE EFFICIENT LONG-RANGE 3D TRACKING FOR ANY VIDEO

Tuan Ngo, Peiye Zhuang, Evangelos Kalogerakis et al.

ICLR 2025arXiv:2410.24211

citations

#998

Towards LLM Unlearning Resilient to Relearning Attacks: A Sharpness-Aware Minimization Perspective and Beyond

Chongyu Fan, jinghan jia, Yihua Zhang et al.

ICML 2025arXiv:2502.05374

citations

#999

KungfuBot: Physics-Based Humanoid Whole-Body Control for Learning Highly-Dynamic Skills

Weiji Xie, Jinrui Han, Jiakun Zheng et al.

NEURIPS 2025arXiv:2506.12851

citations

#1000

Re-thinking Temporal Search for Long-Form Video Understanding

Jinhui Ye, Zihan Wang, Haosen Sun et al.

CVPR 2025arXiv:2504.02259

citations

← Previous

1...3 4 5 6 7...112