Most Cited 2025 "stochastic perturbation" Papers

22,274 papers found • Page 4 of 112

Filters:Most Cited 2025 stochastic perturbation Clear all

Conference

AAAI 2025 (3,028)COLM 2025 (418)CVPR 2025 (2,873)ICCV 2025 (2,701)ICLR 2025 (3,827)ICML 2025 (3,340)ISMAR 2025 (229)NEURIPS 2025 (5,858)AAAI 2024 (2,289)CVPR 2024 (2,716)ECCV 2024 (2,387)ICLR 2024 (2,297)ICML 2024 (2,635)

Paper Type

poster (24,624)paper (8,558)oral (1,594)spotlight (1,421)highlight (975)

#601

RLAIF-V: Open-Source AI Feedback Leads to Super GPT-4V Trustworthiness

Tianyu Yu, Haoye Zhang, Qiming Li et al.

CVPR 2025highlightarXiv:2405.17220

citations

#602

Booster: Tackling Harmful Fine-tuning for Large Language Models via Attenuating Harmful Perturbation

Tiansheng Huang, Sihao Hu, Fatih Ilhan et al.

ICLR 2025arXiv:2409.01586

citations

#603

Self-Improvement in Language Models: The Sharpening Mechanism

Audrey Huang, Adam Block, Dylan Foster et al.

ICLR 2025arXiv:2412.01951

citations

#604

VoCo-LLaMA: Towards Vision Compression with Large Language Models

Xubing Ye, Yukang Gan, Xiaoke Huang et al.

CVPR 2025arXiv:2406.12275

citations

#605

Stable Flow: Vital Layers for Training-Free Image Editing

Omri Avrahami, Or Patashnik, Ohad Fried et al.

CVPR 2025arXiv:2411.14430

citations

#606

Repetition Improves Language Model Embeddings

Jacob Springer, Suhas Kotha, Daniel Fried et al.

ICLR 2025arXiv:2402.15449

citations

#607

AutoML-Agent: A Multi-Agent LLM Framework for Full-Pipeline AutoML

Patara Trirat, Wonyong Jeong, Sung Ju Hwang

ICML 2025arXiv:2410.02958

citations

#608

Not All Heads Matter: A Head-Level KV Cache Compression Method with Integrated Retrieval and Reasoning

Yu Fu, Zefan Cai, Abedelkadir Asi et al.

ICLR 2025arXiv:2410.19258

citations

#609

Long Context Tuning for Video Generation

Yuwei Guo, Ceyuan Yang, Ziyan Yang et al.

ICCV 2025arXiv:2503.10589

citations

#610

DriveArena: A Closed-loop Generative Simulation Platform for Autonomous Driving

Xuemeng Yang, Licheng Wen, Tiantian Wei et al.

ICCV 2025arXiv:2408.00415

citations

#611

Scaling Computer-Use Grounding via User Interface Decomposition and Synthesis

Tianbao Xie, Jiaqi Deng, Xiaochuan Li et al.

NEURIPS 2025spotlightarXiv:2505.13227

citations

#612

Math-PUMA: Progressive Upward Multimodal Alignment to Enhance Mathematical Reasoning

Wenwen Zhuang, Xin Huang, Xiantao Zhang et al.

AAAI 2025paperarXiv:2408.08640

citations

#613

Devils in Middle Layers of Large Vision-Language Models: Interpreting, Detecting and Mitigating Object Hallucinations via Attention Lens

Zhangqi Jiang, Junkai Chen, Beier Zhu et al.

CVPR 2025arXiv:2411.16724

citations

#614

Tell me about yourself: LLMs are aware of their learned behaviors

Jan Betley, Xuchan Bao, Martín Soto et al.

ICLR 2025oralarXiv:2501.11120

citations

#615

Unlocking the Power of LSTM for Long Term Time Series Forecasting

Yaxuan Kong, Zepu Wang, Yuqi Nie et al.

AAAI 2025paperarXiv:2408.10006

citations

#616

Frame Context Packing and Drift Prevention in Next-Frame-Prediction Video Diffusion Models

Lvmin Zhang, Shengqu Cai, Muyang Li et al.

NEURIPS 2025spotlightarXiv:2504.12626

citations

#617

Golden Noise for Diffusion Models: A Learning Framework

zikai zhou, Shitong Shao, Lichen Bai et al.

ICCV 2025arXiv:2411.09502

citations

#618

ThinkAct: Vision-Language-Action Reasoning via Reinforced Visual Latent Planning

Chi-Pin Huang, Yueh-Hua Wu, Min-Hung Chen et al.

NEURIPS 2025arXiv:2507.16815

citations

#619

Phantom: Subject-Consistent Video Generation via Cross-Modal Alignment

Lijie Liu, Tianxiang Ma, Bingchuan Li et al.

ICCV 2025highlightarXiv:2502.11079

citations

#620

Align3R: Aligned Monocular Depth Estimation for Dynamic Videos

Edward LOO, Tianyu HUANG, Peng Li et al.

CVPR 2025highlightarXiv:2412.03079

citations

#621

GRIT: Teaching MLLMs to Think with Images

Yue Fan, Xuehai He, Diji Yang et al.

NEURIPS 2025arXiv:2505.15879

citations

#622

Normalizing Flows are Capable Generative Models

Shuangfei Zhai, Ruixiang Zhang, Preetum Nakkiran et al.

ICML 2025oralarXiv:2412.06329

citations

#623

Learning Multi-Level Features with Matryoshka Sparse Autoencoders

Bart Bussmann, Noa Nabeshima, Adam Karvonen et al.

ICML 2025arXiv:2503.17547

citations

#624

RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics

Enshen Zhou, Jingkun An, Cheng Chi et al.

NEURIPS 2025arXiv:2506.04308

citations

#625

FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality

Zhengyao Lyu, Chenyang Si, Junhao Song et al.

ICLR 2025oralarXiv:2410.19355

citations

#626

NeuroLM: A Universal Multi-task Foundation Model for Bridging the Gap between Language and EEG Signals

Wei-Bang Jiang, Yansen Wang, Bao-liang Lu et al.

ICLR 2025oralarXiv:2409.00101

citations

#627

Simple is Effective: The Roles of Graphs and Large Language Models in Knowledge-Graph-Based Retrieval-Augmented Generation

Mufei Li, Siqi Miao, Pan Li

ICLR 2025arXiv:2410.20724

citations

#628

ReconDreamer: Crafting World Models for Driving Scene Reconstruction via Online Restoration

Chaojun Ni, Guosheng Zhao, Xiaofeng Wang et al.

CVPR 2025arXiv:2411.19548

citations

#629

SAEBench: A Comprehensive Benchmark for Sparse Autoencoders in Language Model Interpretability

Adam Karvonen, Can Rager, Johnny Lin et al.

ICML 2025arXiv:2503.09532

citations

#630

Long-LRM: Long-sequence Large Reconstruction Model for Wide-coverage Gaussian Splats

Chen Ziwen, Hao Tan, Kai Zhang et al.

ICCV 2025highlightarXiv:2410.12781

citations

#631

Lumina-Image 2.0: A Unified and Efficient Image Generative Framework

Qi Qin, Le Zhuo, Yi Xin et al.

ICCV 2025arXiv:2503.21758

citations

#632

LLaRA: Supercharging Robot Learning Data for Vision-Language Policy

Xiang Li, Cristina Mata, Jongwoo Park et al.

ICLR 2025arXiv:2406.20095

citations

#633

EdgeRunner: Auto-regressive Auto-encoder for Artistic Mesh Generation

Jiaxiang Tang, Max Li, Zekun Hao et al.

ICLR 2025arXiv:2409.18114

citations

#634

Towards Semantic Equivalence of Tokenization in Multimodal LLM

Shengqiong Wu, Hao Fei, Xiangtai Li et al.

ICLR 2025arXiv:2406.05127

citations

#635

How to Evaluate Reward Models for RLHF

Evan Frick, Tianle Li, Connor Chen et al.

ICLR 2025arXiv:2410.14872

citations

#636

WiLoR: End-to-end 3D Hand Localization and Reconstruction in-the-wild

Rolandos Alexandros Potamias, Jinglei Zhang, Jiankang Deng et al.

CVPR 2025arXiv:2409.12259

citations

#637

What Matters When Repurposing Diffusion Models for General Dense Perception Tasks?

Guangkai Xu, yongtao ge, Mingyu Liu et al.

ICLR 2025arXiv:2403.06090

citations

#638

Hymba: A Hybrid-head Architecture for Small Language Models

Xin Dong, Yonggan Fu, Shizhe Diao et al.

ICLR 2025arXiv:2411.13676

citations

#639

ChemVLM: Exploring the Power of Multimodal Large Language Models in Chemistry Area

Junxian Li, Di Zhang, Xunzhi Wang et al.

AAAI 2025paperarXiv:2408.07246

citations

#640

ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning

Mingyang Chen, Linzhuang Sun, Tianpeng Li et al.

NEURIPS 2025arXiv:2503.19470

citations

#641

VTG-LLM: Integrating Timestamp Knowledge into Video LLMs for Enhanced Video Temporal Grounding

Yongxin Guo, Jingyu Liu, Mingda Li et al.

AAAI 2025paperarXiv:2405.13382

citations

#642

VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection

Songhao Han, Wei Huang, Hairong Shi et al.

CVPR 2025arXiv:2411.14794

citations

#643

Physics-Informed Diffusion Models

Jan-Hendrik Bastek, WaiChing Sun, Dennis Kochmann

ICLR 2025arXiv:2403.14404

citations

#644

MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix

Ziyang Ma, Yinghao Ma, Yanqiao Zhu et al.

NEURIPS 2025arXiv:2505.13032

citations

#645

SLAM3R: Real-Time Dense Scene Reconstruction from Monocular RGB Videos

Yuzheng Liu, Siyan Dong, Shuzhe Wang et al.

CVPR 2025highlightarXiv:2412.09401

citations

#646

NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation

Xiangyan Liu, Jinjie Ni, Zijian Wu et al.

NEURIPS 2025arXiv:2504.13055

citations

#647

Q-DiT: Accurate Post-Training Quantization for Diffusion Transformers

Lei Chen, Yuan Meng, Chen Tang et al.

CVPR 2025arXiv:2406.17343

citations

#648

NoLiMa: Long-Context Evaluation Beyond Literal Matching

Ali Modarressi, Hanieh Deilamsalehy, Franck Dernoncourt et al.

ICML 2025arXiv:2502.05167

citations

#649

Controlling Space and Time with Diffusion Models

Daniel Watson, Saurabh Saxena, Lala Li et al.

ICLR 2025arXiv:2407.07860

citations

#650

DivPrune: Diversity-based Visual Token Pruning for Large Multimodal Models

Saeed Ranjbar Alvar, Gursimran Singh, Mohammad Akbari et al.

CVPR 2025arXiv:2503.02175

citations

#651

LLM-SR: Scientific Equation Discovery via Programming with Large Language Models

Parshin Shojaee, Kazem Meidani, Shashank Gupta et al.

ICLR 2025arXiv:2404.18400

citations

#652

LLMOPT: Learning to Define and Solve General Optimization Problems from Scratch

caigao jiang, Xiang Shu, Hong Qian et al.

ICLR 2025arXiv:2410.13213

citations

#653

Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent

Yangning Li, Yinghui Li, Xinyu Wang et al.

ICLR 2025arXiv:2411.02937

citations

#654

DME-Driver: Integrating Human Decision Logic and 3D Scene Perception in Autonomous Driving

Wencheng Han, Dongqian Guo, Cheng-Zhong Xu et al.

AAAI 2025paperarXiv:2401.03641

citations

#655

Dinomaly: The Less Is More Philosophy in Multi-Class Unsupervised Anomaly Detection

Jia Guo, Shuai Lu, Weihang Zhang et al.

CVPR 2025arXiv:2405.14325

citations

#656

RepoGraph: Enhancing AI Software Engineering with Repository-level Code Graph

Siru Ouyang, Wenhao Yu, Kaixin Ma et al.

ICLR 2025arXiv:2410.14684

citations

#657

Are Sparse Autoencoders Useful? A Case Study in Sparse Probing

Subhash Kantamneni, Josh Engels, Senthooran Rajamanoharan et al.

ICML 2025arXiv:2502.16681

citations

#658

Towards Interpreting Visual Information Processing in Vision-Language Models

Clement Neo, Luke Ong, Philip Torr et al.

ICLR 2025arXiv:2410.07149

citations

#659

An Illusion of Progress? Assessing the Current State of Web Agents

Tianci Xue, Weijian Qi, Tianneng Shi et al.

COLM 2025paperarXiv:2504.01382

citations

#660

Eliminating Oversaturation and Artifacts of High Guidance Scales in Diffusion Models

Seyedmorteza Sadat, Otmar Hilliges, Romann Weber

ICLR 2025arXiv:2410.02416

citations

#661

VinePPO: Refining Credit Assignment in RL Training of LLMs

Amirhossein Kazemnejad, Milad Aghajohari, Eva Portelance et al.

ICML 2025arXiv:2410.01679

citations

#662

CLoSD: Closing the Loop between Simulation and Diffusion for multi-task character control

Guy Tevet, Sigal Raab, Setareh Cohan et al.

ICLR 2025arXiv:2410.03441

citations

#663

3DGUT: Enabling Distorted Cameras and Secondary Rays in Gaussian Splatting

Qi Wu, Janick Martinez Esturo, Ashkan Mirzaei et al.

CVPR 2025arXiv:2412.12507

citations

#664

HOT3D: Hand and Object Tracking in 3D from Egocentric Multi-View Videos

Prithviraj Banerjee, Sindi Shkodrani, Pierre Moulon et al.

CVPR 2025highlightarXiv:2411.19167

citations

#665

SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree

Shuangrui Ding, Rui Qian, Xiaoyi Dong et al.

ICCV 2025arXiv:2410.16268

citations

#666

FlowPolicy: Enabling Fast and Robust 3D Flow-Based Policy via Consistency Flow Matching for Robot Manipulation

Qinglun Zhang, Zhen Liu, Haoqiang Fan et al.

AAAI 2025paperarXiv:2412.04987

citations

#667

Simplifying Deep Temporal Difference Learning

Matteo Gallici, Mattie Fellows, Benjamin Ellis et al.

ICLR 2025oralarXiv:2407.04811

citations

#668

VisionTS: Visual Masked Autoencoders Are Free-Lunch Zero-Shot Time Series Forecasters

Mouxiang Chen, Lefei Shen, Zhuo Li et al.

ICML 2025arXiv:2408.17253

citations

#669

TabM: Advancing tabular deep learning with parameter-efficient ensembling

Yury Gorishniy, Akim Kotelnikov, Artem Babenko

ICLR 2025arXiv:2410.24210

citations

#670

CPPO: Accelerating the Training of Group Relative Policy Optimization-Based Reasoning Models

Zhihang Lin, Mingbao Lin, Yuan Xie et al.

NEURIPS 2025arXiv:2503.22342

citations

#671

Wonderland: Navigating 3D Scenes from a Single Image

Hanwen Liang, Junli Cao, Vidit Goel et al.

CVPR 2025arXiv:2412.12091

citations

#672

Model merging with SVD to tie the Knots

George Stoica, Pratik Ramesh, Boglarka Ecsedi et al.

ICLR 2025arXiv:2410.19735

citations

#673

EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation

Rang Meng, Xingyu Zhang, Yuming Li et al.

CVPR 2025arXiv:2411.10061

citations

#674

CustomCrafter: Customized Video Generation with Preserving Motion and Concept Composition Abilities

Tao Wu, Yong Zhang, Xintao Wang et al.

AAAI 2025paperarXiv:2408.13239

citations

#675

SynCamMaster: Synchronizing Multi-Camera Video Generation from Diverse Viewpoints

Jianhong Bai, Menghan Xia, Xintao WANG et al.

ICLR 2025arXiv:2412.07760

citations

#676

Apollo: An Exploration of Video Understanding in Large Multimodal Models

Orr Zohar, Xiaohan Wang, Yann Dubois et al.

CVPR 2025arXiv:2412.10360

citations

#677

GoalFlow: Goal-Driven Flow Matching for Multimodal Trajectories Generation in End-to-End Autonomous Driving

Zebin Xing, Xingyu Zhang, Yang Hu et al.

CVPR 2025arXiv:2503.05689

citations

#678

Multiple Object Tracking as ID Prediction

Ruopeng Gao, Ji Qi, Limin Wang

CVPR 2025arXiv:2403.16848

citations

#679

KOR-Bench: Benchmarking Language Models on Knowledge-Orthogonal Reasoning Tasks

Kaijing Ma, Xeron Du, Yunran Wang et al.

ICLR 2025arXiv:2410.06526

citations

#680

Energy-Based Diffusion Language Models for Text Generation

Minkai Xu, Tomas Geffner, Karsten Kreis et al.

ICLR 2025arXiv:2410.21357

citations

#681

Towards Realistic UAV Vision-Language Navigation: Platform, Benchmark, and Methodology

Xiangyu Wang, Donglin Yang, ziqin wang et al.

ICLR 2025arXiv:2410.07087

citations

#682

End-to-End Driving with Online Trajectory Evaluation via BEV World Model

Yingyan Li, Yuqi Wang, Yang Liu et al.

ICCV 2025arXiv:2504.01941

citations

#683

LLM Unlearning via Loss Adjustment with Only Forget Data

Yaxuan Wang, Jiaheng Wei, Yuhao Liu et al.

ICLR 2025arXiv:2410.11143

citations

#684

Correcting the Mythos of KL-Regularization: Direct Alignment without Overoptimization via Chi-Squared Preference Optimization

Audrey Huang, Wenhao Zhan, Tengyang Xie et al.

ICLR 2025arXiv:2407.13399

citations

#685

AgentTrek: Agent Trajectory Synthesis via Guiding Replay with Web Tutorials

Yiheng Xu, Dunjie Lu, Zhennan Shen et al.

ICLR 2025arXiv:2412.09605

citations

#686

EvaGaussians: Event Stream Assisted Gaussian Splatting from Blurry Images

Wangbo Yu, Chaoran Feng, Jianing Li et al.

ICCV 2025arXiv:2405.20224

citations

#687

DistRL: An Asynchronous Distributed Reinforcement Learning Framework for On-Device Control Agent

Taiyi Wang, Zhihao Wu, Jianheng Liu et al.

ICLR 2025arXiv:2410.14803

citations

#688

WASP: Benchmarking Web Agent Security Against Prompt Injection Attacks

Ivan Evtimov, Arman Zharmagambetov, Aaron Grattafiori et al.

NEURIPS 2025arXiv:2504.18575

citations

#689

LoRA-Pro: Are Low-Rank Adapters Properly Optimized?

Zhengbo Wang, Jian Liang, Ran He et al.

ICLR 2025arXiv:2407.18242

citations

#690

VHM: Versatile and Honest Vision Language Model for Remote Sensing Image Analysis

Chao Pang, Xingxing Weng, Jiang Wu et al.

AAAI 2025paperarXiv:2403.20213

citations

#691

SEAL: Safety-enhanced Aligned LLM Fine-tuning via Bilevel Data Selection

Han Shen, Pin-Yu Chen, Payel Das et al.

ICLR 2025arXiv:2410.07471

citations

#692

TRACE: Temporal Grounding Video LLM via Causal Event Modeling

Yongxin Guo, Jingyu Liu, Mingda Li et al.

ICLR 2025oralarXiv:2410.05643

citations

#693

Goku: Flow Based Video Generative Foundation Models

Shoufa Chen, Chongjian GE, Yuqi Zhang et al.

CVPR 2025highlightarXiv:2502.04896

citations

#694

MindSearch: Mimicking Human Minds Elicits Deep AI Searcher

Zehui Chen, Kuikun Liu, Qiuchen Wang et al.

ICLR 2025arXiv:2407.20183

citations

#695

Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing

Junfei Wu, Jian Guan, Kaituo Feng et al.

NEURIPS 2025arXiv:2506.09965

citations

#696

Inference Scaling for Long-Context Retrieval Augmented Generation

Zhenrui Yue, Honglei Zhuang, Aijun Bai et al.

ICLR 2025arXiv:2410.04343

citations

#697

MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation

Chenxi Wang, Xiang Chen, Ningyu Zhang et al.

ICLR 2025arXiv:2410.11779

citations

#698

Accelerated Sampling from Masked Diffusion Models via Entropy Bounded Unmasking

Heli Ben-Hamu, Itai Gat, Daniel Severo et al.

NEURIPS 2025arXiv:2505.24857

citations

#699

Think-on-Graph 2.0: Deep and Faithful Large Language Model Reasoning with Knowledge-guided Retrieval Augmented Generation

Shengjie Ma, Chengjin Xu, Xuhui Jiang et al.

ICLR 2025arXiv:2407.10805

citations

#700

BOND: Aligning LLMs with Best-of-N Distillation

Pier Giuseppe Sessa, Robert Dadashi, Léonard Hussenot-Desenonges et al.

ICLR 2025arXiv:2407.14622

citations

#701

Organize the Web: Constructing Domains Enhances Pre-Training Data Curation

Alexander Wettig, Kyle Lo, Sewon Min et al.

ICML 2025arXiv:2502.10341

citations

#702

Self-play with Execution Feedback: Improving Instruction-following Capabilities of Large Language Models

Guanting Dong, Keming Lu, Chengpeng Li et al.

ICLR 2025arXiv:2406.13542

citations

#703

MIMO: Controllable Character Video Synthesis with Spatial Decomposed Modeling

Yifang Men, Yuan Yao, Miaomiao Cui et al.

CVPR 2025arXiv:2409.16160

citations

#704

Arbitrary-steps Image Super-resolution via Diffusion Inversion

Zongsheng Yue, Kang Liao, Chen Change Loy

CVPR 2025arXiv:2412.09013

citations

#705

Beyond Next-Token: Next-X Prediction for Autoregressive Visual Generation

Sucheng Ren, Qihang Yu, Ju He et al.

ICCV 2025arXiv:2502.20388

citations

#706

WorldMem: Long-term Consistent World Simulation with Memory

Zeqi Xiao, Yushi LAN, Yifan Zhou et al.

NEURIPS 2025oralarXiv:2504.12369

citations

#707

What Makes a Reward Model a Good Teacher? An Optimization Perspective

Noam Razin, Zixuan Wang, Hubert Strauss et al.

NEURIPS 2025spotlightarXiv:2503.15477

citations

#708

RADIOv2.5: Improved Baselines for Agglomerative Vision Foundation Models

Greg Heinrich, Mike Ranzinger, Danny Yin et al.

CVPR 2025arXiv:2412.07679

citations

#709

NVS-Solver: Video Diffusion Model as Zero-Shot Novel View Synthesizer

Meng YOU, Zhiyu Zhu, Hui LIU et al.

ICLR 2025arXiv:2405.15364

citations

#710

TorchTitan: One-stop PyTorch native solution for production ready LLM pretraining

Wanchao Liang, Tianyu Liu, Less Wright et al.

ICLR 2025

citations

#711

A Decade's Battle on Dataset Bias: Are We There Yet?

Zhuang Liu, Kaiming He

ICLR 2025arXiv:2403.08632

citations

#712

Knowledge Insulating Vision-Language-Action Models: Train Fast, Run Fast, Generalize Better

Danny Driess, Jost Springenberg, Brian Ichter et al.

NEURIPS 2025spotlightarXiv:2505.23705

citations

#713

DPLM-2: A Multimodal Diffusion Protein Language Model

Xinyou Wang, Zaixiang Zheng, Fei YE et al.

ICLR 2025arXiv:2410.13782

citations

#714

Long-Term TalkingFace Generation via Motion-Prior Conditional Diffusion Model

SHEN FEI, Cong Wang, Junyao Gao et al.

ICML 2025oralarXiv:2502.09533

citations

#715

LamRA: Large Multimodal Model as Your Advanced Retrieval Assistant

Yikun Liu, Yajie Zhang, jiayin cai et al.

CVPR 2025arXiv:2412.01720

citations

#716

OmniBench: Towards The Future of Universal Omni-Language Models

Yizhi Li, Ge Zhang, Yinghao Ma et al.

NEURIPS 2025arXiv:2409.15272

citations

#717

Describe Anything: Detailed Localized Image and Video Captioning

Long Lian, Yifan Ding, Yunhao Ge et al.

ICCV 2025arXiv:2504.16072

citations

#718

Pangea: A Fully Open Multilingual Multimodal LLM for 39 Languages

Xiang Yue, Yueqi Song, Akari Asai et al.

ICLR 2025arXiv:2410.16153

citations

#719

Dual Diffusion for Unified Image Generation and Understanding

Zijie Li, Henry Li, Yichun Shi et al.

CVPR 2025arXiv:2501.00289

citations

#720

OpenCodeReasoning: Advancing Data Distillation for Competitive Coding

Wasi Uddin Ahmad, Sean Narenthiran, Somshubra Majumdar et al.

COLM 2025paper

citations

#721

How Two-Layer Neural Networks Learn, One (Giant) Step at a Time

Yatin Dandi, Florent Krzakala, Bruno Loureiro et al.

ICLR 2025arXiv:2305.18270

citations

#722

An Undetectable Watermark for Generative Image Models

Samuel Gunn, Xuandong Zhao, Dawn Song

ICLR 2025arXiv:2410.07369

citations

#723

HD-Painter: High-Resolution and Prompt-Faithful Text-Guided Image Inpainting with Diffusion Models

Hayk Manukyan, Andranik Sargsyan, Barsegh Atanyan et al.

ICLR 2025arXiv:2312.14091

citations

#724

Calibrating Large Language Models with Sample Consistency

Qing Lyu, Kumar Shridhar, Chaitanya Malaviya et al.

AAAI 2025paperarXiv:2402.13904

citations

#725

Timer-XL: Long-Context Transformers for Unified Time Series Forecasting

Yong Liu, Guo Qin, Xiangdong Huang et al.

ICLR 2025oralarXiv:2410.04803

citations

#726

MagicDrive-V2: High-Resolution Long Video Generation for Autonomous Driving with Adaptive Control

Ruiyuan Gao, Kai Chen, Bo Xiao et al.

ICCV 2025arXiv:2411.13807

citations

#727

Follow My Instruction and Spill the Beans: Scalable Data Extraction from Retrieval-Augmented Generation Systems

Zhenting Qi, Hanlin Zhang, Eric P Xing et al.

ICLR 2025arXiv:2402.17840

citations

#728

Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning

Andy (DiJia) Su, Hanlin Zhu, Yingchen Xu et al.

ICML 2025arXiv:2502.03275

citations

#729

S-GRPO: Early Exit via Reinforcement Learning in Reasoning Models

Muzhi Dai, Chenxu Yang, Qingyi Si

NEURIPS 2025oralarXiv:2505.07686

citations

#730

Masked Autoencoders Are Effective Tokenizers for Diffusion Models

Hao Chen, Yujin Han, Fangyi Chen et al.

ICML 2025spotlightarXiv:2502.03444

citations

#731

Bootstrapping Language Models with DPO Implicit Rewards

Changyu Chen, Zichen Liu, Chao Du et al.

ICLR 2025arXiv:2406.09760

citations

#732

TranSplat: Generalizable 3D Gaussian Splatting from Sparse Multi-View Images with Transformers

Chuanrui Zhang, Yingshuang Zou, Zhuoling Li et al.

AAAI 2025paperarXiv:2408.13770

citations

#733

Critical Tokens Matter: Token-Level Contrastive Estimation Enhances LLM’s Reasoning Capability

Zicheng Lin, Tian Liang, Jiahao Xu et al.

ICML 2025arXiv:2411.19943

citations

#734

RNNs are not Transformers (Yet): The Key Bottleneck on In-Context Retrieval

Kaiyue Wen, Xingyu Dang, Kaifeng Lyu

ICLR 2025arXiv:2402.18510

citations

#735

Language Model Can Listen While Speaking

Ziyang Ma, Yakun Song, Chenpeng Du et al.

AAAI 2025paperarXiv:2408.02622

citations

#736

DrivingGPT: Unifying Driving World Modeling and Planning with Multi-modal Autoregressive Transformers

Yuntao Chen, Yuqi Wang, Zhaoxiang Zhang

ICCV 2025arXiv:2412.18607

citations

#737

Do LLMs Recognize Your Preferences? Evaluating Personalized Preference Following in LLMs

Siyan Zhao, Mingyi Hong, Yang Liu et al.

ICLR 2025arXiv:2502.09597

citations

#738

Unlearning Concepts in Diffusion Model via Concept Domain Correction and Concept Preserving Gradient

Yongliang Wu, Shiji Zhou, Mingzhuo Yang et al.

AAAI 2025paperarXiv:2405.15304

citations

#739

StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information Structurization

Zhuoqun Li, Xuanang Chen, Haiyang Yu et al.

ICLR 2025arXiv:2410.08815

citations

#740

An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels

Duy-Kien Nguyen, Mahmoud Assran, Unnat Jain et al.

ICLR 2025arXiv:2406.09415

citations

#741

Unintentional Unalignment: Likelihood Displacement in Direct Preference Optimization

Noam Razin, Sadhika Malladi, Adithya Bhaskar et al.

ICLR 2025arXiv:2410.08847

citations

#742

FineWeb2: One Pipeline to Scale Them All — Adapting Pre-Training Data Processing to Every Language

Guilherme Penedo, Hynek Kydlíček, Vinko Sabolčec et al.

COLM 2025paperarXiv:2506.20920

citations

#743

EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling

Theodoros Kouzelis, Ioannis Kakogeorgiou, Spyros Gidaris et al.

ICML 2025arXiv:2502.09509

citations

#744

G-Designer: Architecting Multi-agent Communication Topologies via Graph Neural Networks

Guibin Zhang, Yanwei Yue, Xiangguo Sun et al.

ICML 2025spotlightarXiv:2410.11782

citations

#745

NETS: A Non-equilibrium Transport Sampler

Michael Albergo, Eric Vanden-Eijnden

ICML 2025arXiv:2410.02711

citations

#746

Sonic: Shifting Focus to Global Audio Perception in Portrait Animation

Xiaozhong Ji, Xiaobin Hu, Zhihong Xu et al.

CVPR 2025arXiv:2411.16331

citations

#747

FlexTok: Resampling Images into 1D Token Sequences of Flexible Length

Roman Bachmann, Jesse Allardice, David Mizrahi et al.

ICML 2025arXiv:2502.13967

citations

#748

SimLingo: Vision-Only Closed-Loop Autonomous Driving with Language-Action Alignment

Katrin Renz, Long Chen, Elahe Arani et al.

CVPR 2025highlightarXiv:2503.09594

citations

#749

BioDiscoveryAgent: An AI Agent for Designing Genetic Perturbation Experiments

Yusuf Roohani, Andrew Lee, Qian Huang et al.

ICLR 2025arXiv:2405.17631

citations

#750

Does Spatial Cognition Emerge in Frontier Models?

Santhosh Kumar Ramakrishnan, Erik Wijmans, Philipp Krähenbühl et al.

ICLR 2025arXiv:2410.06468

citations

#751

Lean-STaR: Learning to Interleave Thinking and Proving

Haohan Lin, Zhiqing Sun, Sean Welleck et al.

ICLR 2025arXiv:2407.10040

citations

#752

Monte Carlo Tree Search for Comprehensive Exploration in LLM-Based Automatic Heuristic Design

Zhi Zheng, Zhuoliang Xie, Zhenkun Wang et al.

ICML 2025arXiv:2501.08603

citations

#753

Hi3DGen: High-fidelity 3D Geometry Generation from Images via Normal Bridging

Chongjie Ye, Yushuang Wu, Ziteng Lu et al.

ICCV 2025arXiv:2503.22236

citations

#754

Physics of Language Models: Part 2.2, How to Learn From Mistakes on Grade-School Math Problems

Tian Ye, Zicheng Xu, Yuanzhi Li et al.

ICLR 2025arXiv:2408.16293

citations

#755

RRM: Robust Reward Model Training Mitigates Reward Hacking

Tianqi Liu, Wei Xiong, Jie Ren et al.

ICLR 2025arXiv:2409.13156

citations

#756

TerraMind: Large-Scale Generative Multimodality for Earth Observation

Johannes Jakubik, Felix Yang, Benedikt Blumenstiel et al.

ICCV 2025arXiv:2504.11171

citations

#757

Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model

Han Lin, Jaemin Cho, Abhay Zala et al.

ICLR 2025oralarXiv:2404.09967

citations

#758

Learn Your Reference Model for Real Good Alignment

Alexey Gorbatovski, Boris Shaposhnikov, Alexey Malakhov et al.

ICLR 2025arXiv:2404.09656

citations

#759

V2Xum-LLM: Cross-Modal Video Summarization with Temporal Prompt Instruction Tuning

Hang Hua, Yunlong Tang, Chenliang Xu et al.

AAAI 2025paperarXiv:2404.12353

citations

#760

Aether: Geometric-Aware Unified World Modeling

Haoyi Zhu, Yifan Wang, Jianjun Zhou et al.

ICCV 2025arXiv:2503.18945

citations

#761

LayoutVLM: Differentiable Optimization of 3D Layout via Vision-Language Models

Fan-Yun Sun, Weiyu Liu, Siyi Gu et al.

CVPR 2025arXiv:2412.02193

citations

#762

All-atom Diffusion Transformers: Unified generative modelling of molecules and materials

Chaitanya Joshi, Xiang Fu, Yi-Lun Liao et al.

ICML 2025arXiv:2503.03965

citations

#763

MENTOR: Multi-level Self-supervised Learning for Multimodal Recommendation

Jinfeng Xu, Zheyu Chen, Shuo Yang et al.

AAAI 2025paperarXiv:2402.19407

citations

#764

Eliminating Position Bias of Language Models: A Mechanistic Approach

Ziqi Wang, Hanlin Zhang, Xiner Li et al.

ICLR 2025arXiv:2407.01100

citations

#765

Rectified Diffusion: Straightness Is Not Your Need in Rectified Flow

Fu-Yun Wang, Ling Yang, Zhaoyang Huang et al.

ICLR 2025arXiv:2410.07303

citations

#766

PARTNR: A Benchmark for Planning and Reasoning in Embodied Multi-agent Tasks

Matthew Chang, Gunjan Chhablani, Alexander Clegg et al.

ICLR 2025oralarXiv:2411.00081

citations

#767

GAM-Agent: Game-Theoretic and Uncertainty-Aware Collaboration for Complex Visual Reasoning

jusheng zhang, Yijia Fan, Wenjun Lin et al.

NEURIPS 2025arXiv:2505.23399

citations

#768

AceReason-Nemotron: Advancing Math and Code Reasoning through Reinforcement Learning

Yang Chen, Zhuolin Yang, Zihan Liu et al.

NEURIPS 2025arXiv:2505.16400

citations

#769

Prompting Depth Anything for 4K Resolution Accurate Metric Depth Estimation

Haotong Lin, Sida Peng, Jingxiao Chen et al.

CVPR 2025arXiv:2412.14015

citations

#770

Dora: Sampling and Benchmarking for 3D Shape Variational Auto-Encoders

Rui Chen, Jianfeng Zhang, Yixun Liang et al.

CVPR 2025arXiv:2412.17808

citations

#771

MMFakeBench: A Mixed-Source Multimodal Misinformation Detection Benchmark for LVLMs

Xuannan Liu, Zekun Li, Pei Li et al.

ICLR 2025arXiv:2406.08772

citations

#772

Time-R1: Post-Training Large Vision Language Model for Temporal Video Grounding

Ye Wang, Ziheng Wang, Boshen Xu et al.

NEURIPS 2025oralarXiv:2503.13377

citations

#773

OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text

Qingyun Li, Zhe Chen, Weiyun Wang et al.

ICLR 2025arXiv:2406.08418

citations

#774

Catastrophic Failure of LLM Unlearning via Quantization

Zhiwei Zhang, Fali Wang, Xiaomin Li et al.

ICLR 2025arXiv:2410.16454

citations

#775

ReFocus: Visual Editing as a Chain of Thought for Structured Image Understanding

Xingyu Fu, Minqian Liu, Zhengyuan Yang et al.

ICML 2025arXiv:2501.05452

citations

#776

Inference-Aware Fine-Tuning for Best-of-N Sampling in Large Language Models

Yinlam Chow, Guy Tennenholtz, Izzeddin Gur et al.

ICLR 2025arXiv:2412.15287

citations

#777

Addressing Representation Collapse in Vector Quantized Models with One Linear Layer

Yongxin Zhu, Bocheng Li, Yifei Xin et al.

ICCV 2025arXiv:2411.02038

citations

#778

LLM Generated Persona is a Promise with a Catch

Leon Li, Haozhe Chen, Hongseok Namkoong et al.

NEURIPS 2025arXiv:2503.16527

citations

#779

Pixel-level and Semantic-level Adjustable Super-resolution: A Dual-LoRA Approach

Lingchen Sun, Rongyuan Wu, Zhiyuan Ma et al.

CVPR 2025arXiv:2412.03017

citations

#780

Atom of Thoughts for Markov LLM Test-Time Scaling

Fengwei Teng, Quan Shi, Zhaoyang Yu et al.

NEURIPS 2025arXiv:2502.12018

citations

#781

X-Teaming: Multi-Turn Jailbreaks and Defenses with Adaptive Multi-Agents

Salman Rahman, Liwei Jiang, James Shiffer et al.

COLM 2025paperarXiv:2504.13203

citations

#782

Vision Language Models are In-Context Value Learners

Yecheng Jason Ma, Joey Hejna, Chuyuan Fu et al.

ICLR 2025oralarXiv:2411.04549

citations

#783

ALLaM: Large Language Models for Arabic and English

M Saiful Bari, Yazeed Alnumay, Norah Alzahrani et al.

ICLR 2025arXiv:2407.15390

citations

#784

SAT: Dynamic Spatial Aptitude Training for Multimodal Language Models

Arijit Ray, Jiafei Duan, Ellis L Brown II et al.

COLM 2025paperarXiv:2412.07755

citations

#785

Harnessing Multimodal Large Language Models for Multimodal Sequential Recommendation

Yuyang Ye, Zhi Zheng, Yishan Shen et al.

AAAI 2025paperarXiv:2408.09698

citations

#786

SPA-BENCH: A COMPREHENSIVE BENCHMARK FOR SMARTPHONE AGENT EVALUATION

Jingxuan Chen, Derek Yuen, Bin Xie et al.

ICLR 2025arXiv:2410.15164

citations

#787

Unlocking State-Tracking in Linear RNNs Through Negative Eigenvalues

Riccardo Grazzi, Julien Siems, Arber Zela et al.

ICLR 2025arXiv:2411.12537

citations

#788

NaturalReasoning: Reasoning in the Wild with 2.8M Challenging Questions

Weizhe Yuan, Jane Yu, Song Jiang et al.

NEURIPS 2025arXiv:2502.13124

citations

#789

Learning Adaptive Parallel Reasoning with Language Models

Jiayi Pan, Xiuyu Li, Long Lian et al.

COLM 2025paperarXiv:2504.15466

citations

#790

EraseAnything: Enabling Concept Erasure in Rectified Flow Transformers

Daiheng Gao, Shilin Lu, Wenbo Zhou et al.

ICML 2025arXiv:2412.20413

citations

#791

Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning

Yibin Wang, li zhimin, Yuhang Zang et al.

NEURIPS 2025arXiv:2505.03318

citations

#792

You See it, You Got it: Learning 3D Creation on Pose-Free Videos at Scale

Baorui Ma, Huachen Gao, Haoge Deng et al.

CVPR 2025highlightarXiv:2412.06699

citations

#793

The Matrix: Infinite-Horizon World Generation with Real-Time Moving Control

Ruili Feng, Han Zhang, Zhilei Shu et al.

NEURIPS 2025arXiv:2412.03568

citations

#794

EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions

Kai Chen, Yunhao Gou, Runhui Huang et al.

CVPR 2025arXiv:2409.18042

citations

#795

MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models

Fanqing Meng, Jin Wang, Chuanhao Li et al.

ICLR 2025arXiv:2408.02718

citations

#796

STAIR: Improving Safety Alignment with Introspective Reasoning

Yichi Zhang, Siyuan Zhang, Yao Huang et al.

ICML 2025oralarXiv:2502.02384

citations

#797

Scaling Mesh Generation via Compressive Tokenization

Haohan Weng, Zibo Zhao, Biwen Lei et al.

CVPR 2025arXiv:2411.07025

citations

#798

Affordances-Oriented Planning Using Foundation Models for Continuous Vision-Language Navigation

Jiaqi Chen, Bingqian Lin, Xinmin Liu et al.

AAAI 2025paperarXiv:2407.05890

citations

#799

Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models

Keisuke Kamahori, Tian Tang, Yile Gu et al.

ICLR 2025arXiv:2402.07033

citations

#800

Shot2Story: A New Benchmark for Comprehensive Understanding of Multi-shot Videos

Mingfei Han, Linjie Yang, Xiaojun Chang et al.

ICLR 2025arXiv:2312.10300

citations

← Previous

1 2 3 4 5 6...112