Most Cited 2025 "large audio reasoning models" Papers

22,274 papers found • Page 6 of 112

#1001

BOLT: Boost Large Vision-Language Model Without Training for Long-form Video Understanding

Shuming Liu, Chen Zhao, Tianqi Xu et al.

CVPR 2025posterarXiv:2503.21483
26
citations
#1002

AutoPresent: Designing Structured Visuals from Scratch

Jiaxin Ge, Zora Zhiruo Wang, Xuhui Zhou et al.

CVPR 2025posterarXiv:2501.00912
25
citations
#1003

Steering Large Language Models between Code Execution and Textual Reasoning

Yongchao Chen, Harsh Jhamtani, Srinagesh Sharma et al.

ICLR 2025posterarXiv:2410.03524
25
citations
#1004

LiveCodeBench Pro: How Do Olympiad Medalists Judge LLMs in Competitive Programming?

Zihan Zheng, Zerui Cheng, Zeyu Shen et al.

NEURIPS 2025posterarXiv:2506.11928
25
citations
#1005

Can LLMs Solve Longer Math Word Problems Better?

Xin Xu, Tong Xiao, Zitong Chao et al.

ICLR 2025posterarXiv:2405.14804
25
citations
#1006

Grounded Multi-Hop VideoQA in Long-Form Egocentric Videos

Qirui Chen, Shangzhe Di, Weidi Xie

AAAI 2025paperarXiv:2408.14469
25
citations
#1007

PerturboLLaVA: Reducing Multimodal Hallucinations with Perturbative Visual Training

Cong Chen, Mingyu Liu, Chenchen Jing et al.

ICLR 2025posterarXiv:2503.06486
25
citations
#1008

VDocRAG: Retrieval-Augmented Generation over Visually-Rich Documents

Ryota Tanaka, Taichi Iki, Taku Hasegawa et al.

CVPR 2025posterarXiv:2504.09795
25
citations
#1009

Every SAM Drop Counts: Embracing Semantic Priors for Multi-Modality Image Fusion and Beyond

Guanyao Wu, Haoyu Liu, Hongming Fu et al.

CVPR 2025posterarXiv:2503.01210
25
citations
#1010

Frequency Dynamic Convolution for Dense Image Prediction

Linwei Chen, Lin Gu, Liang Li et al.

CVPR 2025posterarXiv:2503.18783
25
citations
#1011

Delving into RL for Image Generation with CoT: A Study on DPO vs. GRPO

Chengzhuo Tong, Ziyu Guo, Renrui Zhang et al.

NEURIPS 2025posterarXiv:2505.17017
25
citations
#1012

AffordDP: Generalizable Diffusion Policy with Transferable Affordance

Shijie Wu, Yihang Zhu, Yunao Huang et al.

CVPR 2025posterarXiv:2412.03142
25
citations
#1013

EfficientViM: Efficient Vision Mamba with Hidden State Mixer based State Space Duality

Sanghyeok Lee, Joonmyung Choi, Hyunwoo J. Kim

CVPR 2025posterarXiv:2411.15241
25
citations
#1014

Grounded Reinforcement Learning for Visual Reasoning

Gabriel Sarch, Snigdha Saha, Naitik Khandelwal et al.

NEURIPS 2025posterarXiv:2505.23678
25
citations
#1015

T2ISafety: Benchmark for Assessing Fairness, Toxicity, and Privacy in Image Generation

Lijun Li, Zhelun Shi, Xuhao Hu et al.

CVPR 2025posterarXiv:2501.12612
25
citations
#1016

Open Models, Closed Minds? On Agents Capabilities in Mimicking Human Personalities through Open Large Language Models

Lucio La Cava, Andrea Tagarelli

AAAI 2025paperarXiv:2401.07115
25
citations
#1017

T-Stitch: Accelerating Sampling in Pre-Trained Diffusion Models with Trajectory Stitching

Zizheng Pan, Bohan Zhuang, De-An Huang et al.

ICLR 2025posterarXiv:2402.14167
25
citations
#1018

CityWalker: Learning Embodied Urban Navigation from Web-Scale Videos

Xinhao Liu, Jintong Li, Yicheng Jiang et al.

CVPR 2025posterarXiv:2411.17820
25
citations
#1019

LSceneLLM: Enhancing Large 3D Scene Understanding Using Adaptive Visual Preferences

Hongyan Zhi, Peihao Chen, Junyan Li et al.

CVPR 2025posterarXiv:2412.01292
25
citations
#1020

Beyond Autoregression: Fast LLMs via Self-Distillation Through Time

Justin Deschenaux, Caglar Gulcehre

ICLR 2025posterarXiv:2410.21035
25
citations
#1021

XVerse: Consistent Multi-Subject Control of Identity and Semantic Attributes via DiT Modulation

Bowen Chen, Brynn zhao, Haomiao Sun et al.

NEURIPS 2025posterarXiv:2506.21416
25
citations
#1022

Understanding Factual Recall in Transformers via Associative Memories

Eshaan Nichani, Jason Lee, Alberto Bietti

ICLR 2025posterarXiv:2412.06538
25
citations
#1023

Interleaved-Modal Chain-of-Thought

Jun Gao, Yongqi Li, Ziqiang Cao et al.

CVPR 2025posterarXiv:2411.19488
25
citations
#1024

FreeSim: Toward Free-viewpoint Camera Simulation in Driving Scenes

Lue Fan, Hao ZHANG, Qitai Wang et al.

CVPR 2025posterarXiv:2412.03566
25
citations
#1025

A Formal Framework for Understanding Length Generalization in Transformers

Xinting Huang, Andy Yang, Satwik Bhattamishra et al.

ICLR 2025posterarXiv:2410.02140
25
citations
#1026

MagicQuill: An Intelligent Interactive Image Editing System

Zichen Liu, Yue Yu, Hao Ouyang et al.

CVPR 2025posterarXiv:2411.09703
25
citations
#1027

FineVQ: Fine-Grained User Generated Content Video Quality Assessment

Huiyu Duan, Qiang Hu, Wang Jiarui et al.

CVPR 2025highlightarXiv:2412.19238
25
citations
#1028

Adversarial Search Engine Optimization for Large Language Models

Fredrik Nestaas, Edoardo Debenedetti, Florian Tramer

ICLR 2025posterarXiv:2406.18382
25
citations
#1029

VideoWebArena: Evaluating Long Context Multimodal Agents with Video Understanding Web Tasks

Lawrence Jang, Yinheng Li, Dan Zhao et al.

ICLR 2025posterarXiv:2410.19100
25
citations
#1030

STORM: Spatio-TempOral Reconstruction Model For Large-Scale Outdoor Scenes

Jiawei Yang, Jiahui Huang, Boris Ivanovic et al.

ICLR 2025oralarXiv:2501.00602
25
citations
#1031

Multi-Agent Collaboration via Evolving Orchestration

Yufan Dang, Chen Qian, Xueheng Luo et al.

NEURIPS 2025posterarXiv:2505.19591
25
citations
#1032

CORE4D: A 4D Human-Object-Human Interaction Dataset for Collaborative Object REarrangement

Yun Liu, Chengwen Zhang, Ruofan Xing et al.

CVPR 2025posterarXiv:2406.19353
25
citations
#1033

Moral Alignment for LLM Agents

Elizaveta Tennant, Stephen Hailes, Mirco Musolesi

ICLR 2025oralarXiv:2410.01639
25
citations
#1034

GoodDrag: Towards Good Practices for Drag Editing with Diffusion Models

Zewei Zhang, Huan Liu, Jun Chen et al.

ICLR 2025posterarXiv:2404.07206
25
citations
#1035

Stationary Kernels and Gaussian Processes on Lie Groups and their Homogeneous Spaces I: the compact case

Iskander Azangulov, Andrei Smolensky, Alexander Terenin et al.

NEURIPS 2025oralarXiv:2208.14960
25
citations
#1036

UGPhysics: A Comprehensive Benchmark for Undergraduate Physics Reasoning with Large Language Models

Xin Xu, Qiyun Xu, Tong Xiao et al.

ICML 2025posterarXiv:2502.00334
25
citations
#1037

OpenS2V-Nexus: A Detailed Benchmark and Million-Scale Dataset for Subject-to-Video Generation

Shenghai Yuan, Xianyi He, Yufan Deng et al.

NEURIPS 2025posterarXiv:2505.20292
25
citations
#1038

KGGen: Extracting Knowledge Graphs from Plain Text with Language Models

Belinda Mo, Kyssen Yu, Joshua Kazdan et al.

NEURIPS 2025posterarXiv:2502.09956
25
citations
#1039

ResearchTown: Simulator of Human Research Community

Haofei Yu, Zhaochen Hong, Zirui Cheng et al.

ICML 2025posterarXiv:2412.17767
25
citations
#1040

Interactive Medical Image Segmentation: A Benchmark Dataset and Baseline

Junlong Cheng, Bin Fu, Jin Ye et al.

CVPR 2025posterarXiv:2411.12814
25
citations
#1041

EgoVid-5M: A Large-Scale Video-Action Dataset for Egocentric Videos Generation

Xiaofeng Wang, Kang Zhao, Feng Liu et al.

NEURIPS 2025posterarXiv:2411.08380
25
citations
#1042

An Intelligent Agentic System for Complex Image Restoration Problems

Kaiwen Zhu, Jinjin Gu, Zhiyuan You et al.

ICLR 2025posterarXiv:2410.17809
25
citations
#1043

NLSR: Neuron-Level Safety Realignment of Large Language Models Against Harmful Fine-Tuning

Xin Yi, Shunfan Zheng, Linlin Wang et al.

AAAI 2025paperarXiv:2412.12497
25
citations
#1044

Vid2Sim: Realistic and Interactive Simulation from Video for Urban Navigation

Ziyang Xie, Zhizheng Liu, Zhenghao Peng et al.

CVPR 2025posterarXiv:2501.06693
25
citations
#1045

DropGaussian: Structural Regularization for Sparse-view Gaussian Splatting

Hyunwoo Park, Gun Ryu, Wonjun Kim

CVPR 2025posterarXiv:2504.00773
25
citations
#1046

Adversarial Diffusion Compression for Real-World Image Super-Resolution

Bin Chen, Gehui Li, Rongyuan Wu et al.

CVPR 2025posterarXiv:2411.13383
25
citations
#1047

Revisiting Backdoor Attacks against Large Vision-Language Models from Domain Shift

Siyuan Liang, Jiawei Liang, Tianyu Pang et al.

CVPR 2025posterarXiv:2406.18844
25
citations
#1048

How to build a consistency model: Learning flow maps via self-distillation

Nicholas Boffi, Michael Albergo, Eric Vanden-Eijnden

NEURIPS 2025posterarXiv:2505.18825
25
citations
#1049

TANGO: Co-Speech Gesture Video Reenactment with Hierarchical Audio Motion Embedding and Diffusion Interpolation

haiyang liu, Xingchao Yang, Tomoya Akiyama et al.

ICLR 2025posterarXiv:2410.04221
25
citations
#1050

Divide and Translate: Compositional First-Order Logic Translation and Verification for Complex Logical Reasoning

Hyun Ryu, Gyeongman Kim, Hyemin S. Lee et al.

ICLR 2025posterarXiv:2410.08047
25
citations
#1051

Large Language Models Meet Symbolic Provers for Logical Reasoning Evaluation

Chengwen Qi, Ren Ma, Bowen Li et al.

ICLR 2025posterarXiv:2502.06563
25
citations
#1052

Señorita-2M: A High-Quality Instruction-based Dataset for General Video Editing by Video Specialists

Bojia Zi, Penghui Ruan, Marco Chen et al.

NEURIPS 2025posterarXiv:2502.06734
25
citations
#1053

CityNav: A Large-Scale Dataset for Real-World Aerial Navigation

Jungdae Lee, Taiki Miyanishi, Shuhei Kurita et al.

ICCV 2025posterarXiv:2406.14240
25
citations
#1054

MUSE-VL: Modeling Unified VLM through Semantic Discrete Encoding

Rongchang Xie, Chen Du, Ping Song et al.

ICCV 2025posterarXiv:2411.17762
25
citations
#1055

LeviTor: 3D Trajectory Oriented Image-to-Video Synthesis

Hanlin Wang, Hao Ouyang, Qiuyu Wang et al.

CVPR 2025highlightarXiv:2412.15214
25
citations
#1056

Meta Flow Matching: Integrating Vector Fields on the Wasserstein Manifold

Lazar Atanackovic, Xi (Nicole) Zhang, Brandon Amos et al.

ICLR 2025oralarXiv:2408.14608
24
citations
#1057

FastLGS: Speeding Up Language Embedded Gaussians with Feature Grid Mapping

Yuzhou Ji, He Zhu, Junshu Tang et al.

AAAI 2025paperarXiv:2406.01916
24
citations
#1058

AID: Adapting Image2Video Diffusion Models for Instruction-guided Video Prediction

Zhen Xing, Qi Dai, Zejia Weng et al.

ICCV 2025posterarXiv:2406.06465
24
citations
#1059

Exploring Unbiased Deepfake Detection via Token-Level Shuffling and Mixing

Xinghe Fu, Zhiyuan Yan, Taiping Yao et al.

AAAI 2025paperarXiv:2501.04376
24
citations
#1060

Your ViT is Secretly an Image Segmentation Model

Tommie Kerssies, Niccolò Cavagnero, Alexander Hermans et al.

CVPR 2025highlightarXiv:2503.19108
24
citations
#1061

SparX: A Sparse Cross-Layer Connection Mechanism for Hierarchical Vision Mamba and Transformer Networks

Meng Lou, Yunxiang Fu, Yizhou Yu

AAAI 2025paperarXiv:2409.09649
24
citations
#1062

SCSegamba: Lightweight Structure-Aware Vision Mamba for Crack Segmentation in Structures

Hui Liu, Chen Jia, Fan Shi et al.

CVPR 2025posterarXiv:2503.01113
24
citations
#1063

Learning from Videos for 3D World: Enhancing MLLMs with 3D Vision Geometry Priors

Duo Zheng, shijia Huang, Yanyang Li et al.

NEURIPS 2025posterarXiv:2505.24625
24
citations
#1064

XLRS-Bench: Could Your Multimodal LLMs Understand Extremely Large Ultra-High-Resolution Remote Sensing Imagery?

Fengxiang Wang, hongzhen wang, Zonghao Guo et al.

CVPR 2025highlightarXiv:2503.23771
24
citations
#1065

FLIP: Flow-Centric Generative Planning as General-Purpose Manipulation World Model

Chongkai Gao, Haozhuo Zhang, Zhixuan Xu et al.

ICLR 2025posterarXiv:2412.08261
24
citations
#1066

The Superposition of Diffusion Models Using the Itô Density Estimator

Marta Skreta, Lazar Atanackovic, Joey Bose et al.

ICLR 2025posterarXiv:2412.17762
24
citations
#1067

Dream to Manipulate: Compositional World Models Empowering Robot Imitation Learning with Imagination

Leonardo Barcellona, Andrii Zadaianchuk, Davide Allegro et al.

ICLR 2025posterarXiv:2412.14957
24
citations
#1068

PUMA: Empowering Unified MLLM with Multi-granular Visual Generation

Rongyao Fang, Chengqi Duan, Kun Wang et al.

ICCV 2025posterarXiv:2410.13861
24
citations
#1069

ADBM: Adversarial Diffusion Bridge Model for Reliable Adversarial Purification

Xiao Li, Wenxuan Sun, Huanran Chen et al.

ICLR 2025posterarXiv:2408.00315
24
citations
#1070

Energy-Weighted Flow Matching for Offline Reinforcement Learning

Shiyuan Zhang, Weitong Zhang, Quanquan Gu

ICLR 2025posterarXiv:2503.04975
24
citations
#1071

Model Poisoning Attacks to Federated Learning via Multi-Round Consistency

Yueqi Xie, Minghong Fang, Neil Zhenqiang Gong

CVPR 2025posterarXiv:2404.15611
24
citations
#1072

Procedural Knowledge in Pretraining Drives Reasoning in Large Language Models

Laura Ruis, Maximilian Mozes, Juhan Bae et al.

ICLR 2025posterarXiv:2411.12580
24
citations
#1073

Calibrated Multi-Preference Optimization for Aligning Diffusion Models

Kyungmin Lee, Xiaohang Li, Qifei Wang et al.

CVPR 2025posterarXiv:2502.02588
24
citations
#1074

Towards Understanding Safety Alignment: A Mechanistic Perspective from Safety Neurons

Jianhui Chen, Xiaozhi Wang, Zijun Yao et al.

NEURIPS 2025posterarXiv:2406.14144
24
citations
#1075

Efficient Online Reinforcement Learning for Diffusion Policy

Haitong Ma, Tianyi Chen, Kai Wang et al.

ICML 2025posterarXiv:2502.00361
24
citations
#1076

SPA: 3D Spatial-Awareness Enables Effective Embodied Representation

Haoyi Zhu, Honghui Yang, Yating Wang et al.

ICLR 2025posterarXiv:2410.08208
24
citations
#1077

Rethinking Vision-Language Model in Face Forensics: Multi-Modal Interpretable Forged Face Detector

Xiao Guo, Xiufeng Song, Yue Zhang et al.

CVPR 2025posterarXiv:2503.20188
24
citations
#1078

Results of the Big ANN: NeurIPS’23 competition

Harsha Vardhan simhadri, Martin Aumüller, Matthijs Douze et al.

NEURIPS 2025posterarXiv:2409.17424
24
citations
#1079

Answer, Assemble, Ace: Understanding How LMs Answer Multiple Choice Questions

Sarah Wiegreffe, Oyvind Tafjord, Yonatan Belinkov et al.

ICLR 2025posterarXiv:2407.15018
24
citations
#1080

Diffusion Beats Autoregressive in Data-Constrained Settings

Mihir Prabhudesai, Mengning Wu, Amir Zadeh et al.

NEURIPS 2025posterarXiv:2507.15857
24
citations
#1081

Surgical, Cheap, and Flexible: Mitigating False Refusal in Language Models via Single Vector Ablation

Xinpeng Wang, Chengzhi (Martin) Hu, Paul Röttger et al.

ICLR 2025posterarXiv:2410.03415
24
citations
#1082

Min-K%++: Improved Baseline for Pre-Training Data Detection from Large Language Models

Jingyang Zhang, Jingwei Sun, Eric Yeats et al.

ICLR 2025poster
24
citations
#1083

Universal Sparse Autoencoders: Interpretable Cross-Model Concept Alignment

Harrish Thasarathan, Julian Forsyth, Thomas Fel et al.

ICML 2025posterarXiv:2502.03714
24
citations
#1084

AnimateAnything: Consistent and Controllable Animation for Video Generation

guojun lei, Chi Wang, Rong Zhang et al.

CVPR 2025posterarXiv:2411.10836
24
citations
#1085

A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training

Kai Wang, Mingjia Shi, YuKun Zhou et al.

CVPR 2025posterarXiv:2405.17403
24
citations
#1086

Generating CAD Code with Vision-Language Models for 3D Designs

Kamel Alrashedy, Pradyumna Tambwekar, Zulfiqar Haider Zaidi et al.

ICLR 2025posterarXiv:2410.05340
24
citations
#1087

AIGV-Assessor: Benchmarking and Evaluating the Perceptual Quality of Text-to-Video Generation with LMM

Wang Jiarui, Huiyu Duan, Guangtao Zhai et al.

CVPR 2025posterarXiv:2411.17221
24
citations
#1088

Inverse Constitutional AI: Compressing Preferences into Principles

Arduin Findeis, Timo Kaufmann, Eyke Hüllermeier et al.

ICLR 2025posterarXiv:2406.06560
24
citations
#1089

VSSD: Vision Mamba with Non-Causal State Space Duality

Yuheng Shi, Mingjia Li, Minjing Dong et al.

ICCV 2025posterarXiv:2407.18559
24
citations
#1090

Decoupled Spatio-Temporal Consistency Learning for Self-Supervised Tracking

Yaozong Zheng, Bineng Zhong, Qihua Liang et al.

AAAI 2025paperarXiv:2507.21606
24
citations
#1091

Empowering LLMs with Pseudo-Untrimmed Videos for Audio-Visual Temporal Understanding

Yunlong Tang, Daiki Shimada, Jing Bi et al.

AAAI 2025paperarXiv:2403.16276
24
citations
#1092

Faster Diffusion Sampling with Randomized Midpoints: Sequential and Parallel

Shivam Gupta, Linda Cai, Sitan Chen

ICLR 2025posterarXiv:2406.00924
24
citations
#1093

GEOBench-VLM: Benchmarking Vision-Language Models for Geospatial Tasks

Muhammad Danish, Muhammad Akhtar Munir, Syed Shah et al.

ICCV 2025highlightarXiv:2411.19325
24
citations
#1094

What Makes a Good Diffusion Planner for Decision Making?

Haofei Lu, Dongqi Han, Yifei Shen et al.

ICLR 2025posterarXiv:2503.00535
24
citations
#1095

Move to Understand a 3D Scene: Bridging Visual Grounding and Exploration for Efficient and Versatile Embodied Navigation

ZIYU ZHU, Xilin Wang, Yixuan Li et al.

ICCV 2025highlightarXiv:2507.04047
24
citations
#1096

RouteLLM: Learning to Route LLMs from Preference Data

Isaac Ong, Amjad Almahairi, Vincent Wu et al.

ICLR 2025poster
24
citations
#1097

KVLink: Accelerating Large Language Models via Efficient KV Cache Reuse

Jingbo Yang, Bairu Hou, Wei Wei et al.

NEURIPS 2025posterarXiv:2502.16002
24
citations
#1098

LeGrad: An Explainability Method for Vision Transformers via Feature Formation Sensitivity

Walid Bousselham, Angie Boggust, Sofian Chaybouti et al.

ICCV 2025posterarXiv:2404.03214
24
citations
#1099

ScaMo: Exploring the Scaling Law in Autoregressive Motion Generation Model

Shunlin Lu, Jingbo Wang, Zeyu Lu et al.

CVPR 2025posterarXiv:2412.14559
24
citations
#1100

Specialized Foundation Models Struggle to Beat Supervised Baselines

Zongzhe Xu, Ritvik Gupta, Wenduo Cheng et al.

ICLR 2025posterarXiv:2411.02796
24
citations
#1101

NovelQA: Benchmarking Question Answering on Documents Exceeding 200K Tokens

Cunxiang Wang, Ruoxi Ning, Boqi Pan et al.

ICLR 2025posterarXiv:2403.12766
23
citations
#1102

Checklists Are Better Than Reward Models For Aligning Language Models

Vijay Viswanathan, Yanchao Sun, Xiang Kong et al.

NEURIPS 2025spotlightarXiv:2507.18624
23
citations
#1103

OphCLIP: Hierarchical Retrieval-Augmented Learning for Ophthalmic Surgical Video-Language Pretraining

Ming Hu, Kun yuan, Yaling Shen et al.

ICCV 2025posterarXiv:2411.15421
23
citations
#1104

Facilitating Multi-turn Function Calling for LLMs via Compositional Instruction Tuning

Mingyang Chen, sunhaoze, Tianpeng Li et al.

ICLR 2025posterarXiv:2410.12952
23
citations
#1105

Latent Reward: LLM-Empowered Credit Assignment in Episodic Reinforcement Learning

Yun Qu, Yuhang Jiang, Boyuan Wang et al.

AAAI 2025paperarXiv:2412.11120
23
citations
#1106

Language-Guided Image Tokenization for Generation

Kaiwen Zha, Lijun Yu, Alireza Fathi et al.

CVPR 2025posterarXiv:2412.05796
23
citations
#1107

DeltaProduct: Improving State-Tracking in Linear RNNs via Householder Products

Julien Siems, Timur Carstensen, Arber Zela et al.

NEURIPS 2025posterarXiv:2502.10297
23
citations
#1108

Asynchronous Federated Reinforcement Learning with Policy Gradient Updates: Algorithm Design and Convergence Analysis

Guangchen (Eric) Lan, Dong-Jun Han, Abolfazl Hashemi et al.

ICLR 2025posterarXiv:2404.08003
23
citations
#1109

ParetoQ: Improving Scaling Laws in Extremely Low-bit LLM Quantization

Zechun Liu, Changsheng Zhao, Hanxian Huang et al.

NEURIPS 2025posterarXiv:2502.02631
23
citations
#1110

Mani-GS: Gaussian Splatting Manipulation with Triangular Mesh

Xiangjun Gao, Xiaoyu Li, Yiyu Zhuang et al.

CVPR 2025posterarXiv:2405.17811
23
citations
#1111

NatureLM-audio: an Audio-Language Foundation Model for Bioacoustics

David Robinson, Marius Miron, Masato Hagiwara et al.

ICLR 2025posterarXiv:2411.07186
23
citations
#1112

Differentiation and Specialization of Attention Heads via the Refined Local Learning Coefficient

George Wang, Jesse Hoogland, Stan van Wingerden et al.

ICLR 2025posterarXiv:2410.02984
23
citations
#1113

CATANet: Efficient Content-Aware Token Aggregation for Lightweight Image Super-Resolution

Xin Liu, Jie Liu, Jie Tang et al.

CVPR 2025posterarXiv:2503.06896
23
citations
#1114

Flow to the Mode: Mode-Seeking Diffusion Autoencoders for State-of-the-Art Image Tokenization

Kyle Sargent, Kyle Hsu, Justin Johnson et al.

ICCV 2025posterarXiv:2503.11056
23
citations
#1115

SelfSplat: Pose-Free and 3D Prior-Free Generalizable 3D Gaussian Splatting

Gyeongjin Kang, Jisang Yoo, Jihyeon Park et al.

CVPR 2025posterarXiv:2411.17190
23
citations
#1116

SimPER: A Minimalist Approach to Preference Alignment without Hyperparameters

Teng Xiao, Yige Yuan, Zhengyu Chen et al.

ICLR 2025posterarXiv:2502.00883
23
citations
#1117

DisPose: Disentangling Pose Guidance for Controllable Human Image Animation

Hongxiang Li, Yaowei Li, Yuhang Yang et al.

ICLR 2025posterarXiv:2412.09349
23
citations
#1118

Diffusion Generative Modeling for Spatially Resolved Gene Expression Inference from Histology Images

Sichen Zhu, Yuchen Zhu, Molei Tao et al.

ICLR 2025posterarXiv:2501.15598
23
citations
#1119

LLaVA-KD: A Framework of Distilling Multimodal Large Language Models

Yuxuan Cai, Jiangning Zhang, Haoyang He et al.

ICCV 2025posterarXiv:2410.16236
23
citations
#1120

Text-to-Image Rectified Flow as Plug-and-Play Priors

Xiaofeng Yang, Cheng Chen, xulei yang et al.

ICLR 2025posterarXiv:2406.03293
23
citations
#1121

Epona: Autoregressive Diffusion World Model for Autonomous Driving

Kaiwen Zhang, Zhenyu Tang, Xiaotao Hu et al.

ICCV 2025posterarXiv:2506.24113
23
citations
#1122

Addressing Misspecification in Simulation-based Inference through Data-driven Calibration

Antoine Wehenkel, Juan L. Gamella, Ozan Sener et al.

ICML 2025oralarXiv:2405.08719
23
citations
#1123

MMQA: Evaluating LLMs with Multi-Table Multi-Hop Complex Questions

Jian Wu, Linyi Yang, Dongyuan Li et al.

ICLR 2025poster
23
citations
#1124

Efficient Visual State Space Model for Image Deblurring

Lingshun Kong, Jiangxin Dong, Jinhui Tang et al.

CVPR 2025posterarXiv:2405.14343
23
citations
#1125

RadGPT: Constructing 3D Image-Text Tumor Datasets

Pedro Bassi, Mehmet Yavuz, Ibrahim Ethem Hamamci et al.

ICCV 2025posterarXiv:2501.04678
23
citations
#1126

Open-Vocabulary Functional 3D Scene Graphs for Real-World Indoor Spaces

Chenyangguang Zhang, Alexandros Delitzas, Fangjinhua Wang et al.

CVPR 2025highlightarXiv:2503.19199
23
citations
#1127

A Percolation Model of Emergence: Analyzing Transformers Trained on a Formal Language

Ekdeep Singh Lubana, Kyogo Kawaguchi, Robert Dick et al.

ICLR 2025posterarXiv:2408.12578
23
citations
#1128

Nonconvex Stochastic Optimization under Heavy-Tailed Noises: Optimal Convergence without Gradient Clipping

Zijian Liu, Zhengyuan Zhou

ICLR 2025posterarXiv:2412.19529
23
citations
#1129

B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners

Weihao Zeng, Yuzhen Huang, Lulu Zhao et al.

ICLR 2025posterarXiv:2412.17256
23
citations
#1130

miniCTX: Neural Theorem Proving with (Long-)Contexts

Jiewen Hu, Thomas Zhu, Sean Welleck

ICLR 2025posterarXiv:2408.03350
23
citations
#1131

OSCAR: Operating System Control via State-Aware Reasoning and Re-Planning

Xiaoqiang Wang, Bang Liu

ICLR 2025posterarXiv:2410.18963
23
citations
#1132

MindTuner: Cross-Subject Visual Decoding with Visual Fingerprint and Semantic Correction

Zixuan Gong, Qi Zhang, Guangyin Bao et al.

AAAI 2025paperarXiv:2404.12630
23
citations
#1133

Bi-Factorial Preference Optimization: Balancing Safety-Helpfulness in Language Models

Wenxuan Zhang, Philip Torr, Mohamed Elhoseiny et al.

ICLR 2025posterarXiv:2408.15313
23
citations
#1134

Teaching Language Models to Critique via Reinforcement Learning

Zhihui Xie, Jie chen, Liyu Chen et al.

ICML 2025posterarXiv:2502.03492
23
citations
#1135

Navigation-Guided Sparse Scene Representation for End-to-End Autonomous Driving

Peidong Li, Dixiao Cui

ICLR 2025oralarXiv:2409.18341
23
citations
#1136

Language Imbalance Driven Rewarding for Multilingual Self-improving

Wen Yang, Junhong Wu, Chen Wang et al.

ICLR 2025posterarXiv:2410.08964
23
citations
#1137

JetFormer: An autoregressive generative model of raw images and text

Michael Tschannen, André Susano Pinto, Alexander Kolesnikov

ICLR 2025posterarXiv:2411.19722
23
citations
#1138

The AdEMAMix Optimizer: Better, Faster, Older

Matteo Pagliardini, Pierre Ablin, David Grangier

ICLR 2025posterarXiv:2409.03137
23
citations
#1139

KRIS-Bench: Benchmarking Next-Level Intelligent Image Editing Models

Yongliang Wu, Zonghui Li, Xinting Hu et al.

NEURIPS 2025posterarXiv:2505.16707
23
citations
#1140

A Stitch in Time Saves Nine: Small VLM is a Precise Guidance for Accelerating Large VLMs

Wangbo Zhao, Yizeng Han, Jiasheng Tang et al.

CVPR 2025posterarXiv:2412.03324
23
citations
#1141

From Multimodal LLMs to Generalist Embodied Agents: Methods and Lessons

Andrew Szot, Bogdan Mazoure, Omar Attia et al.

CVPR 2025posterarXiv:2412.08442
23
citations
#1142

Reward Guided Latent Consistency Distillation

William Wang, Jiachen Li, Weixi Feng et al.

ICLR 2025posterarXiv:2403.11027
23
citations
#1143

ICLR: In-Context Learning of Representations

Core Francisco Park, Andrew Lee, Ekdeep Singh Lubana et al.

ICLR 2025posterarXiv:2501.00070
23
citations
#1144

Enigmata: Scaling Logical Reasoning in Large Language Models with Synthetic Verifiable Puzzles

Jiangjie Chen, Qianyu He, Siyu Yuan et al.

NEURIPS 2025spotlightarXiv:2505.19914
23
citations
#1145

Fantastic Copyrighted Beasts and How (Not) to Generate Them

Luxi He, Yangsibo Huang, Weijia Shi et al.

ICLR 2025posterarXiv:2406.14526
23
citations
#1146

CAD-Llama: Leveraging Large Language Models for Computer-Aided Design Parametric 3D Model Generation

Jiahao Li, Weijian Ma, Xueyang Li et al.

CVPR 2025posterarXiv:2505.04481
23
citations
#1147

LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging

Ke Wang, Nikos Dimitriadis, Alessandro Favero et al.

ICLR 2025posterarXiv:2410.17146
23
citations
#1148

Limits to scalable evaluation at the frontier: LLM as judge won’t beat twice the data

Florian Eddie Dorner, Vivian Nastl, Moritz Hardt

ICLR 2025poster
23
citations
#1149

Language Representations Can be What Recommenders Need: Findings and Potentials

Leheng Sheng, An Zhang, Yi Zhang et al.

ICLR 2025posterarXiv:2407.05441
23
citations
#1150

Robots Pre-train Robots: Manipulation-Centric Robotic Representation from Large-Scale Robot Datasets

Guangqi Jiang, Yifei Sun, Tao Huang et al.

ICLR 2025posterarXiv:2410.22325
23
citations
#1151

Steering Masked Discrete Diffusion Models via Discrete Denoising Posterior Prediction

Jarrid Rector-Brooks, Mohsin Hasan, Zhangzhi Peng et al.

ICLR 2025posterarXiv:2410.08134
23
citations
#1152

Q-Frame: Query-aware Frame Selection and Multi-Resolution Adaptation for Video-LLMs

Shaojie Zhang, Jiahui Yang, Jianqin Yin et al.

ICCV 2025posterarXiv:2506.22139
23
citations
#1153

Towards a Mechanistic Explanation of Diffusion Model Generalization

Matthew Niedoba, Berend Zwartsenberg, Kevin Murphy et al.

ICML 2025spotlightarXiv:2411.19339
23
citations
#1154

GUI-World: A Video Benchmark and Dataset for Multimodal GUI-oriented Understanding

Dongping Chen, Yue Huang, Siyuan Wu et al.

ICLR 2025oralarXiv:2406.10819
23
citations
#1155

EditAR: Unified Conditional Generation with Autoregressive Models

Jiteng Mu, Nuno Vasconcelos, Xiaolong Wang

CVPR 2025posterarXiv:2501.04699
23
citations
#1156

Semantics-Adaptive Activation Intervention for LLMs via Dynamic Steering Vectors

Weixuan Wang, JINGYUAN YANG, Wei Peng

ICLR 2025posterarXiv:2410.12299
23
citations
#1157

Instant Policy: In-Context Imitation Learning via Graph Diffusion

Vitalis Vosylius, Edward Johns

ICLR 2025posterarXiv:2411.12633
23
citations
#1158

HELMET: How to Evaluate Long-context Models Effectively and Thoroughly

Howard Yen, Tianyu Gao, Minmin Hou et al.

ICLR 2025poster
23
citations
#1159

POSTA: A Go-to Framework for Customized Artistic Poster Generation

Haoyu Chen, Xiaojie Xu, Wenbo Li et al.

CVPR 2025posterarXiv:2503.14908
23
citations
#1160

Self-Consistency Preference Optimization

Archiki Prasad, Weizhe Yuan, Richard Yuanzhe Pang et al.

ICML 2025posterarXiv:2411.04109
23
citations
#1161

DeepVideo-R1: Video Reinforcement Fine-Tuning via Difficulty-aware Regressive GRPO

Jinyoung Park, Jeehye Na, Jinyoung Kim et al.

NEURIPS 2025posterarXiv:2506.07464
23
citations
#1162

Seeing Far and Clearly: Mitigating Hallucinations in MLLMs with Attention Causal Decoding

feilong tang, Chengzhi Liu, Zhongxing Xu et al.

CVPR 2025posterarXiv:2505.16652
22
citations
#1163

OSV: One Step is Enough for High-Quality Image to Video Generation

Xiaofeng Mao, Zhengkai Jiang, Fu-Yun Wang et al.

CVPR 2025posterarXiv:2409.11367
22
citations
#1164

Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers

Wei Pang, Kevin Qinghong Lin, Xiangru Jian et al.

NEURIPS 2025posterarXiv:2505.21497
22
citations
#1165

SWE-rebench: An Automated Pipeline for Task Collection and Decontaminated Evaluation of Software Engineering Agents

Ibragim Badertdinov, Alexander Golubev, Maksim Nekrashevich et al.

NEURIPS 2025posterarXiv:2505.20411
22
citations
#1166

Numerical Pruning for Efficient Autoregressive Models

Xuan Shen, Zhao Song, Yufa Zhou et al.

AAAI 2025paperarXiv:2412.12441
22
citations
#1167

Material Anything: Generating Materials for Any 3D Object via Diffusion

Xin Huang, Tengfei Wang, Ziwei Liu et al.

CVPR 2025highlightarXiv:2411.15138
22
citations
#1168

Towards Scalable Exact Machine Unlearning Using Parameter-Efficient Fine-Tuning

Somnath Basu Roy Chowdhury, Krzysztof Choromanski, Arijit Sehanobish et al.

ICLR 2025posterarXiv:2406.16257
22
citations
#1169

Token Cleaning: Fine-Grained Data Selection for LLM Supervised Fine-Tuning

Jinlong Pang, Na Di, Zhaowei Zhu et al.

ICML 2025posterarXiv:2502.01968
22
citations
#1170

DreamActor-M1: Holistic, Expressive and Robust Human Image Animation with Hybrid Guidance

Yuxuan Luo, Zhengkun Rong, Lizhen Wang et al.

ICCV 2025posterarXiv:2504.01724
22
citations
#1171

Straight to Zero: Why Linearly Decaying the Learning Rate to Zero Works Best for LLMs

Shane Bergsma, Nolan Dey, Gurpreet Gosal et al.

ICLR 2025posterarXiv:2502.15938
22
citations
#1172

Adaptive Guidance: Training-free Acceleration of Conditional Diffusion Models

Angela Castillo, Jonas Kohler, Juan C. Pérez et al.

AAAI 2025paperarXiv:2312.12487
22
citations
#1173

CPath-Omni: A Unified Multimodal Foundation Model for Patch and Whole Slide Image Analysis in Computational Pathology

Yuxuan Sun, Yixuan Si, Chenglu Zhu et al.

CVPR 2025posterarXiv:2412.12077
22
citations
#1174

Recite, Reconstruct, Recollect: Memorization in LMs as a Multifaceted Phenomenon

USVSN Sai Prashanth, Alvin Deng, Kyle O'Brien et al.

ICLR 2025posterarXiv:2406.17746
22
citations
#1175

From Lazy to Rich: Exact Learning Dynamics in Deep Linear Networks

Clementine Domine, Nicolas Anguita, Alexandra M Proca et al.

ICLR 2025poster
22
citations
#1176

Towards Robust Knowledge Unlearning: An Adversarial Framework for Assessing and Improving Unlearning Robustness in Large Language Models

Hongbang Yuan, Zhuoran Jin, Pengfei Cao et al.

AAAI 2025paperarXiv:2408.10682
22
citations
#1177

Controllable Safety Alignment: Inference-Time Adaptation to Diverse Safety Requirements

Jingyu Zhang, Ahmed Elgohary Ghoneim, Ahmed Magooda et al.

ICLR 2025posterarXiv:2410.08968
22
citations
#1178

Understanding and Mitigating Hallucination in Large Vision-Language Models via Modular Attribution and Intervention

Tianyun Yang, Ziniu Li, Juan Cao et al.

ICLR 2025poster
22
citations
#1179

Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates

Xiaosen Zheng, Tianyu Pang, Chao Du et al.

ICLR 2025posterarXiv:2410.07137
22
citations
#1180

Mini-Monkey: Alleviating the Semantic Sawtooth Effect for Lightweight MLLMs via Complementary Image Pyramid

Mingxin Huang, Yuliang Liu, Dingkang Liang et al.

ICLR 2025posterarXiv:2408.02034
22
citations
#1181

CleanDIFT: Diffusion Features without Noise

Nick Stracke, Stefan Andreas Baumann, Kolja Bauer et al.

CVPR 2025posterarXiv:2412.03439
22
citations
#1182

SONICS: Synthetic Or Not - Identifying Counterfeit Songs

Awsaf Rahman, Zaber Ibn Abdul Hakim, Najibul Haque Sarker et al.

ICLR 2025oralarXiv:2408.14080
22
citations
#1183

Mixture Compressor for Mixture-of-Experts LLMs Gains More

Wei Huang, Yue Liao, Jianhui Liu et al.

ICLR 2025posterarXiv:2410.06270
22
citations
#1184

Towards Foundation Models for Mixed Integer Linear Programming

Sirui Li, Janardhan Kulkarni, Ishai Menache et al.

ICLR 2025posterarXiv:2410.08288
22
citations
#1185

Bayesian Prompt Flow Learning for Zero-Shot Anomaly Detection

Zhen Qu, Xian Tao, Xinyi Gong et al.

CVPR 2025posterarXiv:2503.10080
22
citations
#1186

MINT-CoT: Enabling Interleaved Visual Tokens in Mathematical Chain-of-Thought Reasoning

Xinyan Chen, Renrui Zhang, Dongzhi JIANG et al.

NEURIPS 2025posterarXiv:2506.05331
22
citations
#1187

Residual Connections and Normalization Can Provably Prevent Oversmoothing in GNNs

Michael Scholkemper, Xinyi Wu, Ali Jadbabaie et al.

ICLR 2025posterarXiv:2406.02997
22
citations
#1188

Hypothetical Minds: Scaffolding Theory of Mind for Multi-Agent Tasks with Large Language Models

Logan Cross, Violet Xiang, Agam Bhatia et al.

ICLR 2025posterarXiv:2407.07086
22
citations
#1189

Artificial Kuramoto Oscillatory Neurons

Takeru Miyato, Sindy Löwe, Andreas Geiger et al.

ICLR 2025oralarXiv:2410.13821
22
citations
#1190

Do LLMs ``know'' internally when they follow instructions?

Juyeon Heo, Christina Heinze-Deml, Oussama Elachqar et al.

ICLR 2025posterarXiv:2410.14516
22
citations
#1191

Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN

Pengxiang Li, Lu Yin, Shiwei Liu

ICLR 2025posterarXiv:2412.13795
22
citations
#1192

Enhancing Video-LLM Reasoning via Agent-of-Thoughts Distillation

Yudi Shi, Shangzhe Di, Qirui Chen et al.

CVPR 2025posterarXiv:2412.01694
22
citations
#1193

Failures to Find Transferable Image Jailbreaks Between Vision-Language Models

Rylan Schaeffer, Dan Valentine, Luke Bailey et al.

ICLR 2025posterarXiv:2407.15211
22
citations
#1194

$\text{D}_{2}\text{O}$: Dynamic Discriminative Operations for Efficient Long-Context Inference of Large Language Models

Zhongwei Wan, Xinjian Wu, Yu Zhang et al.

ICLR 2025poster
22
citations
#1195

LICO: Large Language Models for In-Context Molecular Optimization

Tung Nguyen, Aditya Grover

ICLR 2025posterarXiv:2406.18851
22
citations
#1196

Syntactic and Semantic Control of Large Language Models via Sequential Monte Carlo

João Loula, Benjamin LeBrun, Li Du et al.

ICLR 2025posterarXiv:2504.13139
22
citations
#1197

GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation

Tianwei Xiong, Jun Hao Liew, Zilong Huang et al.

ICCV 2025posterarXiv:2504.08736
22
citations
#1198

Audio Large Language Models Can Be Descriptive Speech Quality Evaluators

CHEN CHEN, Yuchen Hu, Siyin Wang et al.

ICLR 2025posterarXiv:2501.17202
22
citations
#1199

Towards General-Purpose Model-Free Reinforcement Learning

Scott Fujimoto, Pierluca D'Oro, Amy Zhang et al.

ICLR 2025posterarXiv:2501.16142
22
citations
#1200

UniVAD: A Training-free Unified Model for Few-shot Visual Anomaly Detection

Zhaopeng Gu, Bingke Zhu, Guibo Zhu et al.

CVPR 2025posterarXiv:2412.03342
22
citations