Most Cited 2025 "reasoning path search" Papers

21,856 papers found • Page 6 of 110

#1001

Erasing Undesirable Influence in Diffusion Models

Jing Wu, Trung Le, Munawar Hayat et al.

CVPR 2025posterarXiv:2401.05779
26
citations
#1002

LSceneLLM: Enhancing Large 3D Scene Understanding Using Adaptive Visual Preferences

Hongyan Zhi, Peihao Chen, Junyan Li et al.

CVPR 2025posterarXiv:2412.01292
25
citations
#1003

EfficientViM: Efficient Vision Mamba with Hidden State Mixer based State Space Duality

Sanghyeok Lee, Joonmyung Choi, Hyunwoo J. Kim

CVPR 2025posterarXiv:2411.15241
25
citations
#1004

VDocRAG: Retrieval-Augmented Generation over Visually-Rich Documents

Ryota Tanaka, Taichi Iki, Taku Hasegawa et al.

CVPR 2025posterarXiv:2504.09795
25
citations
#1005

Vid2Sim: Realistic and Interactive Simulation from Video for Urban Navigation

Ziyang Xie, Zhizheng Liu, Zhenghao Peng et al.

CVPR 2025posterarXiv:2501.06693
25
citations
#1006

FineVQ: Fine-Grained User Generated Content Video Quality Assessment

Huiyu Duan, Qiang Hu, Wang Jiarui et al.

CVPR 2025highlightarXiv:2412.19238
25
citations
#1007

Adversarial Diffusion Compression for Real-World Image Super-Resolution

Bin Chen, Gehui Li, Rongyuan Wu et al.

CVPR 2025posterarXiv:2411.13383
25
citations
#1008

Revisiting Backdoor Attacks against Large Vision-Language Models from Domain Shift

Siyuan Liang, Jiawei Liang, Tianyu Pang et al.

CVPR 2025posterarXiv:2406.18844
25
citations
#1009

T2ISafety: Benchmark for Assessing Fairness, Toxicity, and Privacy in Image Generation

Lijun Li, Zhelun Shi, Xuhao Hu et al.

CVPR 2025posterarXiv:2501.12612
25
citations
#1010

Every SAM Drop Counts: Embracing Semantic Priors for Multi-Modality Image Fusion and Beyond

Guanyao Wu, Haoyu Liu, Hongming Fu et al.

CVPR 2025posterarXiv:2503.01210
25
citations
#1011

MagicQuill: An Intelligent Interactive Image Editing System

Zichen Liu, Yue Yu, Hao Ouyang et al.

CVPR 2025posterarXiv:2411.09703
25
citations
#1012

Interleaved-Modal Chain-of-Thought

Jun Gao, Yongqi Li, Ziqiang Cao et al.

CVPR 2025posterarXiv:2411.19488
25
citations
#1013

CityNav: A Large-Scale Dataset for Real-World Aerial Navigation

Jungdae Lee, Taiki Miyanishi, Shuhei Kurita et al.

ICCV 2025posterarXiv:2406.14240
25
citations
#1014

LiveCodeBench Pro: How Do Olympiad Medalists Judge LLMs in Competitive Programming?

Zihan Zheng, Zerui Cheng, Zeyu Shen et al.

NEURIPS 2025posterarXiv:2506.11928
25
citations
#1015

OpenS2V-Nexus: A Detailed Benchmark and Million-Scale Dataset for Subject-to-Video Generation

Shenghai Yuan, Xianyi He, Yufan Deng et al.

NEURIPS 2025posterarXiv:2505.20292
25
citations
#1016

Stationary Kernels and Gaussian Processes on Lie Groups and their Homogeneous Spaces I: the compact case

Iskander Azangulov, Andrei Smolensky, Alexander Terenin et al.

NEURIPS 2025oralarXiv:2208.14960
25
citations
#1017

XVerse: Consistent Multi-Subject Control of Identity and Semantic Attributes via DiT Modulation

Bowen Chen, Brynn zhao, Haomiao Sun et al.

NEURIPS 2025posterarXiv:2506.21416
25
citations
#1018

Grounded Reinforcement Learning for Visual Reasoning

Gabriel Sarch, Snigdha Saha, Naitik Khandelwal et al.

NEURIPS 2025posterarXiv:2505.23678
25
citations
#1019

Large Language Models Meet Symbolic Provers for Logical Reasoning Evaluation

Chengwen Qi, Ren Ma, Bowen Li et al.

ICLR 2025posterarXiv:2502.06563
25
citations
#1020

Adversarial Search Engine Optimization for Large Language Models

Fredrik Nestaas, Edoardo Debenedetti, Florian Tramer

ICLR 2025posterarXiv:2406.18382
25
citations
#1021

Divide and Translate: Compositional First-Order Logic Translation and Verification for Complex Logical Reasoning

Hyun Ryu, Gyeongman Kim, Hyemin S. Lee et al.

ICLR 2025posterarXiv:2410.08047
25
citations
#1022

VideoWebArena: Evaluating Long Context Multimodal Agents with Video Understanding Web Tasks

Lawrence Jang, Yinheng Li, Dan Zhao et al.

ICLR 2025posterarXiv:2410.19100
25
citations
#1023

T-Stitch: Accelerating Sampling in Pre-Trained Diffusion Models with Trajectory Stitching

Zizheng Pan, Bohan Zhuang, De-An Huang et al.

ICLR 2025posterarXiv:2402.14167
25
citations
#1024

Understanding Factual Recall in Transformers via Associative Memories

Eshaan Nichani, Jason Lee, Alberto Bietti

ICLR 2025posterarXiv:2412.06538
25
citations
#1025

Open Models, Closed Minds? On Agents Capabilities in Mimicking Human Personalities through Open Large Language Models

Lucio La Cava, Andrea Tagarelli

AAAI 2025paperarXiv:2401.07115
25
citations
#1026

Grounded Multi-Hop VideoQA in Long-Form Egocentric Videos

Qirui Chen, Shangzhe Di, Weidi Xie

AAAI 2025paperarXiv:2408.14469
25
citations
#1027

NLSR: Neuron-Level Safety Realignment of Large Language Models Against Harmful Fine-Tuning

Xin Yi, Shunfan Zheng, Linlin Wang et al.

AAAI 2025paperarXiv:2412.12497
25
citations
#1028

ResearchTown: Simulator of Human Research Community

Haofei Yu, Zhaochen Hong, Zirui Cheng et al.

ICML 2025posterarXiv:2412.17767
25
citations
#1029

UGPhysics: A Comprehensive Benchmark for Undergraduate Physics Reasoning with Large Language Models

Xin Xu, Qiyun Xu, Tong Xiao et al.

ICML 2025posterarXiv:2502.00334
25
citations
#1030

CityWalker: Learning Embodied Urban Navigation from Web-Scale Videos

Xinhao Liu, Jintong Li, Yicheng Jiang et al.

CVPR 2025posterarXiv:2411.17820
25
citations
#1031

AffordDP: Generalizable Diffusion Policy with Transferable Affordance

Shijie Wu, Yihang Zhu, Yunao Huang et al.

CVPR 2025posterarXiv:2412.03142
25
citations
#1032

LeviTor: 3D Trajectory Oriented Image-to-Video Synthesis

Hanlin Wang, Hao Ouyang, Qiuyu Wang et al.

CVPR 2025highlightarXiv:2412.15214
25
citations
#1033

MUSE-VL: Modeling Unified VLM through Semantic Discrete Encoding

Rongchang Xie, Chen Du, Ping Song et al.

ICCV 2025posterarXiv:2411.17762
25
citations
#1034

KGGen: Extracting Knowledge Graphs from Plain Text with Language Models

Belinda Mo, Kyssen Yu, Joshua Kazdan et al.

NEURIPS 2025posterarXiv:2502.09956
25
citations
#1035

Señorita-2M: A High-Quality Instruction-based Dataset for General Video Editing by Video Specialists

Bojia Zi, Penghui Ruan, Marco Chen et al.

NEURIPS 2025posterarXiv:2502.06734
25
citations
#1036

PerturboLLaVA: Reducing Multimodal Hallucinations with Perturbative Visual Training

Cong Chen, Mingyu Liu, Chenchen Jing et al.

ICLR 2025posterarXiv:2503.06486
25
citations
#1037

TANGO: Co-Speech Gesture Video Reenactment with Hierarchical Audio Motion Embedding and Diffusion Interpolation

haiyang liu, Xingchao Yang, Tomoya Akiyama et al.

ICLR 2025posterarXiv:2410.04221
25
citations
#1038

GoodDrag: Towards Good Practices for Drag Editing with Diffusion Models

Zewei Zhang, Huan Liu, Jun Chen et al.

ICLR 2025posterarXiv:2404.07206
25
citations
#1039

A Formal Framework for Understanding Length Generalization in Transformers

Xinting Huang, Andy Yang, Satwik Bhattamishra et al.

ICLR 2025posterarXiv:2410.02140
25
citations
#1040

An Intelligent Agentic System for Complex Image Restoration Problems

Kaiwen Zhu, Jinjin Gu, Zhiyuan You et al.

ICLR 2025posterarXiv:2410.17809
25
citations
#1041

Multi-Agent Collaboration via Evolving Orchestration

Yufan Dang, Chen Qian, Xueheng Luo et al.

NEURIPS 2025posterarXiv:2505.19591
25
citations
#1042

Delving into RL for Image Generation with CoT: A Study on DPO vs. GRPO

Chengzhuo Tong, Ziyu Guo, Renrui Zhang et al.

NEURIPS 2025posterarXiv:2505.17017
25
citations
#1043

How to build a consistency model: Learning flow maps via self-distillation

Nicholas Boffi, Michael Albergo, Eric Vanden-Eijnden

NEURIPS 2025posterarXiv:2505.18825
25
citations
#1044

Beyond Autoregression: Fast LLMs via Self-Distillation Through Time

Justin Deschenaux, Caglar Gulcehre

ICLR 2025posterarXiv:2410.21035
25
citations
#1045

Steering Large Language Models between Code Execution and Textual Reasoning

Yongchao Chen, Harsh Jhamtani, Srinagesh Sharma et al.

ICLR 2025posterarXiv:2410.03524
25
citations
#1046

STORM: Spatio-TempOral Reconstruction Model For Large-Scale Outdoor Scenes

Jiawei Yang, Jiahui Huang, Boris Ivanovic et al.

ICLR 2025oralarXiv:2501.00602
25
citations
#1047

Can LLMs Solve Longer Math Word Problems Better?

Xin Xu, Tong Xiao, Zitong Chao et al.

ICLR 2025posterarXiv:2405.14804
25
citations
#1048

Moral Alignment for LLM Agents

Elizaveta Tennant, Stephen Hailes, Mirco Musolesi

ICLR 2025oralarXiv:2410.01639
25
citations
#1049

DropGaussian: Structural Regularization for Sparse-view Gaussian Splatting

Hyunwoo Park, Gun Ryu, Wonjun Kim

CVPR 2025posterarXiv:2504.00773
25
citations
#1050

Interactive Medical Image Segmentation: A Benchmark Dataset and Baseline

Junlong Cheng, Bin Fu, Jin Ye et al.

CVPR 2025posterarXiv:2411.12814
25
citations
#1051

CORE4D: A 4D Human-Object-Human Interaction Dataset for Collaborative Object REarrangement

Yun Liu, Chengwen Zhang, Ruofan Xing et al.

CVPR 2025posterarXiv:2406.19353
25
citations
#1052

FreeSim: Toward Free-viewpoint Camera Simulation in Driving Scenes

Lue Fan, Hao ZHANG, Qitai Wang et al.

CVPR 2025posterarXiv:2412.03566
25
citations
#1053

Frequency Dynamic Convolution for Dense Image Prediction

Linwei Chen, Lin Gu, Liang Li et al.

CVPR 2025posterarXiv:2503.18783
25
citations
#1054

AutoPresent: Designing Structured Visuals from Scratch

Jiaxin Ge, Zora Zhiruo Wang, Xuhui Zhou et al.

CVPR 2025posterarXiv:2501.00912
25
citations
#1055

EgoVid-5M: A Large-Scale Video-Action Dataset for Egocentric Videos Generation

Xiaofeng Wang, Kang Zhao, Feng Liu et al.

NEURIPS 2025posterarXiv:2411.08380
25
citations
#1056

Your ViT is Secretly an Image Segmentation Model

Tommie Kerssies, Niccolò Cavagnero, Alexander Hermans et al.

CVPR 2025highlightarXiv:2503.19108
24
citations
#1057

XLRS-Bench: Could Your Multimodal LLMs Understand Extremely Large Ultra-High-Resolution Remote Sensing Imagery?

Fengxiang Wang, hongzhen wang, Zonghao Guo et al.

CVPR 2025highlightarXiv:2503.23771
24
citations
#1058

AIGV-Assessor: Benchmarking and Evaluating the Perceptual Quality of Text-to-Video Generation with LMM

Wang Jiarui, Huiyu Duan, Guangtao Zhai et al.

CVPR 2025posterarXiv:2411.17221
24
citations
#1059

A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training

Kai Wang, Mingjia Shi, YuKun Zhou et al.

CVPR 2025posterarXiv:2405.17403
24
citations
#1060

Calibrated Multi-Preference Optimization for Aligning Diffusion Models

Kyungmin Lee, Xiaohang Li, Qifei Wang et al.

CVPR 2025posterarXiv:2502.02588
24
citations
#1061

ScaMo: Exploring the Scaling Law in Autoregressive Motion Generation Model

Shunlin Lu, Jingbo Wang, Zeyu Lu et al.

CVPR 2025posterarXiv:2412.14559
24
citations
#1062

LeGrad: An Explainability Method for Vision Transformers via Feature Formation Sensitivity

Walid Bousselham, Angie Boggust, Sofian Chaybouti et al.

ICCV 2025posterarXiv:2404.03214
24
citations
#1063

AID: Adapting Image2Video Diffusion Models for Instruction-guided Video Prediction

Zhen Xing, Qi Dai, Zejia Weng et al.

ICCV 2025posterarXiv:2406.06465
24
citations
#1064

GEOBench-VLM: Benchmarking Vision-Language Models for Geospatial Tasks

Muhammad Danish, Muhammad Akhtar Munir, Syed Shah et al.

ICCV 2025highlightarXiv:2411.19325
24
citations
#1065

Results of the Big ANN: NeurIPS’23 competition

Harsha Vardhan simhadri, Martin Aumüller, Matthijs Douze et al.

NEURIPS 2025posterarXiv:2409.17424
24
citations
#1066

Specialized Foundation Models Struggle to Beat Supervised Baselines

Zongzhe Xu, Ritvik Gupta, Wenduo Cheng et al.

ICLR 2025posterarXiv:2411.02796
24
citations
#1067

Faster Diffusion Sampling with Randomized Midpoints: Sequential and Parallel

Shivam Gupta, Linda Cai, Sitan Chen

ICLR 2025posterarXiv:2406.00924
24
citations
#1068

Inverse Constitutional AI: Compressing Preferences into Principles

Arduin Findeis, Timo Kaufmann, Eyke Hüllermeier et al.

ICLR 2025posterarXiv:2406.06560
24
citations
#1069

Generating CAD Code with Vision-Language Models for 3D Designs

Kamel Alrashedy, Pradyumna Tambwekar, Zulfiqar Haider Zaidi et al.

ICLR 2025posterarXiv:2410.05340
24
citations
#1070

Min-K%++: Improved Baseline for Pre-Training Data Detection from Large Language Models

Jingyang Zhang, Jingwei Sun, Eric Yeats et al.

ICLR 2025poster
24
citations
#1071

Dream to Manipulate: Compositional World Models Empowering Robot Imitation Learning with Imagination

Leonardo Barcellona, Andrii Zadaianchuk, Davide Allegro et al.

ICLR 2025posterarXiv:2412.14957
24
citations
#1072

ADBM: Adversarial Diffusion Bridge Model for Reliable Adversarial Purification

Xiao Li, Wenxuan Sun, Huanran Chen et al.

ICLR 2025posterarXiv:2408.00315
24
citations
#1073

RouteLLM: Learning to Route LLMs from Preference Data

Isaac Ong, Amjad Almahairi, Vincent Wu et al.

ICLR 2025poster
24
citations
#1074

Answer, Assemble, Ace: Understanding How LMs Answer Multiple Choice Questions

Sarah Wiegreffe, Oyvind Tafjord, Yonatan Belinkov et al.

ICLR 2025posterarXiv:2407.15018
24
citations
#1075

Exploring Unbiased Deepfake Detection via Token-Level Shuffling and Mixing

Xinghe Fu, Zhiyuan Yan, Taiping Yao et al.

AAAI 2025paperarXiv:2501.04376
24
citations
#1076

FastLGS: Speeding Up Language Embedded Gaussians with Feature Grid Mapping

Yuzhou Ji, He Zhu, Junshu Tang et al.

AAAI 2025paperarXiv:2406.01916
24
citations
#1077

Empowering LLMs with Pseudo-Untrimmed Videos for Audio-Visual Temporal Understanding

Yunlong Tang, Daiki Shimada, Jing Bi et al.

AAAI 2025paperarXiv:2403.16276
24
citations
#1078

Decoupled Spatio-Temporal Consistency Learning for Self-Supervised Tracking

Yaozong Zheng, Bineng Zhong, Qihua Liang et al.

AAAI 2025paperarXiv:2507.21606
24
citations
#1079

SparX: A Sparse Cross-Layer Connection Mechanism for Hierarchical Vision Mamba and Transformer Networks

Meng Lou, Yunxiang Fu, Yizhou Yu

AAAI 2025paperarXiv:2409.09649
24
citations
#1080

Efficient Online Reinforcement Learning for Diffusion Policy

Haitong Ma, Tianyi Chen, Kai Wang et al.

ICML 2025posterarXiv:2502.00361
24
citations
#1081

Universal Sparse Autoencoders: Interpretable Cross-Model Concept Alignment

Harrish Thasarathan, Julian Forsyth, Thomas Fel et al.

ICML 2025posterarXiv:2502.03714
24
citations
#1082

Rethinking Vision-Language Model in Face Forensics: Multi-Modal Interpretable Forged Face Detector

Xiao Guo, Xiufeng Song, Yue Zhang et al.

CVPR 2025posterarXiv:2503.20188
24
citations
#1083

PUMA: Empowering Unified MLLM with Multi-granular Visual Generation

Rongyao Fang, Chengqi Duan, Kun Wang et al.

ICCV 2025posterarXiv:2410.13861
24
citations
#1084

Diffusion Beats Autoregressive in Data-Constrained Settings

Mihir Prabhudesai, Mengning Wu, Amir Zadeh et al.

NEURIPS 2025posterarXiv:2507.15857
24
citations
#1085

The Superposition of Diffusion Models Using the Itô Density Estimator

Marta Skreta, Lazar Atanackovic, Joey Bose et al.

ICLR 2025posterarXiv:2412.17762
24
citations
#1086

Meta Flow Matching: Integrating Vector Fields on the Wasserstein Manifold

Lazar Atanackovic, Xi (Nicole) Zhang, Brandon Amos et al.

ICLR 2025oralarXiv:2408.14608
24
citations
#1087

Surgical, Cheap, and Flexible: Mitigating False Refusal in Language Models via Single Vector Ablation

Xinpeng Wang, Chengzhi (Martin) Hu, Paul Röttger et al.

ICLR 2025posterarXiv:2410.03415
24
citations
#1088

Energy-Weighted Flow Matching for Offline Reinforcement Learning

Shiyuan Zhang, Weitong Zhang, Quanquan Gu

ICLR 2025posterarXiv:2503.04975
24
citations
#1089

FLIP: Flow-Centric Generative Planning as General-Purpose Manipulation World Model

Chongkai Gao, Haozhuo Zhang, Zhixuan Xu et al.

ICLR 2025posterarXiv:2412.08261
24
citations
#1090

Towards Understanding Safety Alignment: A Mechanistic Perspective from Safety Neurons

Jianhui Chen, Xiaozhi Wang, Zijun Yao et al.

NEURIPS 2025posterarXiv:2406.14144
24
citations
#1091

Learning from Videos for 3D World: Enhancing MLLMs with 3D Vision Geometry Priors

Duo Zheng, shijia Huang, Yanyang Li et al.

NEURIPS 2025posterarXiv:2505.24625
24
citations
#1092

KVLink: Accelerating Large Language Models via Efficient KV Cache Reuse

Jingbo Yang, Bairu Hou, Wei Wei et al.

NEURIPS 2025posterarXiv:2502.16002
24
citations
#1093

Procedural Knowledge in Pretraining Drives Reasoning in Large Language Models

Laura Ruis, Maximilian Mozes, Juhan Bae et al.

ICLR 2025posterarXiv:2411.12580
24
citations
#1094

What Makes a Good Diffusion Planner for Decision Making?

Haofei Lu, Dongqi Han, Yifei Shen et al.

ICLR 2025posterarXiv:2503.00535
24
citations
#1095

SPA: 3D Spatial-Awareness Enables Effective Embodied Representation

Haoyi Zhu, Honghui Yang, Yating Wang et al.

ICLR 2025posterarXiv:2410.08208
24
citations
#1096

AnimateAnything: Consistent and Controllable Animation for Video Generation

guojun lei, Chi Wang, Rong Zhang et al.

CVPR 2025posterarXiv:2411.10836
24
citations
#1097

Model Poisoning Attacks to Federated Learning via Multi-Round Consistency

Yueqi Xie, Minghong Fang, Neil Zhenqiang Gong

CVPR 2025posterarXiv:2404.15611
24
citations
#1098

SCSegamba: Lightweight Structure-Aware Vision Mamba for Crack Segmentation in Structures

Hui Liu, Chen Jia, Fan Shi et al.

CVPR 2025posterarXiv:2503.01113
24
citations
#1099

Move to Understand a 3D Scene: Bridging Visual Grounding and Exploration for Efficient and Versatile Embodied Navigation

ZIYU ZHU, Xilin Wang, Yixuan Li et al.

ICCV 2025highlightarXiv:2507.04047
24
citations
#1100

VSSD: Vision Mamba with Non-Causal State Space Duality

Yuheng Shi, Mingjia Li, Minjing Dong et al.

ICCV 2025posterarXiv:2407.18559
24
citations
#1101

SelfSplat: Pose-Free and 3D Prior-Free Generalizable 3D Gaussian Splatting

Gyeongjin Kang, Jisang Yoo, Jihyeon Park et al.

CVPR 2025posterarXiv:2411.17190
23
citations
#1102

CATANet: Efficient Content-Aware Token Aggregation for Lightweight Image Super-Resolution

Xin Liu, Jie Liu, Jie Tang et al.

CVPR 2025posterarXiv:2503.06896
23
citations
#1103

CAD-Llama: Leveraging Large Language Models for Computer-Aided Design Parametric 3D Model Generation

Jiahao Li, Weijian Ma, Xueyang Li et al.

CVPR 2025posterarXiv:2505.04481
23
citations
#1104

From Multimodal LLMs to Generalist Embodied Agents: Methods and Lessons

Andrew Szot, Bogdan Mazoure, Omar Attia et al.

CVPR 2025posterarXiv:2412.08442
23
citations
#1105

Open-Vocabulary Functional 3D Scene Graphs for Real-World Indoor Spaces

Chenyangguang Zhang, Alexandros Delitzas, Fangjinhua Wang et al.

CVPR 2025highlightarXiv:2503.19199
23
citations
#1106

Efficient Visual State Space Model for Image Deblurring

Lingshun Kong, Jiangxin Dong, Jinhui Tang et al.

CVPR 2025posterarXiv:2405.14343
23
citations
#1107

OphCLIP: Hierarchical Retrieval-Augmented Learning for Ophthalmic Surgical Video-Language Pretraining

Ming Hu, Kun yuan, Yaling Shen et al.

ICCV 2025posterarXiv:2411.15421
23
citations
#1108

RadGPT: Constructing 3D Image-Text Tumor Datasets

Pedro Bassi, Mehmet Yavuz, Ibrahim Ethem Hamamci et al.

ICCV 2025posterarXiv:2501.04678
23
citations
#1109

Epona: Autoregressive Diffusion World Model for Autonomous Driving

Kaiwen Zhang, Zhenyu Tang, Xiaotao Hu et al.

ICCV 2025posterarXiv:2506.24113
23
citations
#1110

Enigmata: Scaling Logical Reasoning in Large Language Models with Synthetic Verifiable Puzzles

Jiangjie Chen, Qianyu He, Siyu Yuan et al.

NEURIPS 2025spotlightarXiv:2505.19914
23
citations
#1111

DeltaProduct: Improving State-Tracking in Linear RNNs via Householder Products

Julien Siems, Timur Carstensen, Arber Zela et al.

NEURIPS 2025posterarXiv:2502.10297
23
citations
#1112

ParetoQ: Improving Scaling Laws in Extremely Low-bit LLM Quantization

Zechun Liu, Changsheng Zhao, Hanxian Huang et al.

NEURIPS 2025posterarXiv:2502.02631
23
citations
#1113

GUI-World: A Video Benchmark and Dataset for Multimodal GUI-oriented Understanding

Dongping Chen, Yue Huang, Siyuan Wu et al.

ICLR 2025oralarXiv:2406.10819
23
citations
#1114

LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging

Ke Wang, Nikos Dimitriadis, Alessandro Favero et al.

ICLR 2025posterarXiv:2410.17146
23
citations
#1115

Language Representations Can be What Recommenders Need: Findings and Potentials

Leheng Sheng, An Zhang, Yi Zhang et al.

ICLR 2025posterarXiv:2407.05441
23
citations
#1116

JetFormer: An autoregressive generative model of raw images and text

Michael Tschannen, André Susano Pinto, Alexander Kolesnikov

ICLR 2025posterarXiv:2411.19722
23
citations
#1117

ICLR: In-Context Learning of Representations

Core Francisco Park, Andrew Lee, Ekdeep Singh Lubana et al.

ICLR 2025posterarXiv:2501.00070
23
citations
#1118

OSCAR: Operating System Control via State-Aware Reasoning and Re-Planning

Xiaoqiang Wang, Bang Liu

ICLR 2025posterarXiv:2410.18963
23
citations
#1119

A Percolation Model of Emergence: Analyzing Transformers Trained on a Formal Language

Ekdeep Singh Lubana, Kyogo Kawaguchi, Robert Dick et al.

ICLR 2025posterarXiv:2408.12578
23
citations
#1120

Asynchronous Federated Reinforcement Learning with Policy Gradient Updates: Algorithm Design and Convergence Analysis

Guangchen (Eric) Lan, Dong-Jun Han, Abolfazl Hashemi et al.

ICLR 2025posterarXiv:2404.08003
23
citations
#1121

Differentiation and Specialization of Attention Heads via the Refined Local Learning Coefficient

George Wang, Jesse Hoogland, Stan van Wingerden et al.

ICLR 2025posterarXiv:2410.02984
23
citations
#1122

Semantics-Adaptive Activation Intervention for LLMs via Dynamic Steering Vectors

Weixuan Wang, JINGYUAN YANG, Wei Peng

ICLR 2025posterarXiv:2410.12299
23
citations
#1123

Instant Policy: In-Context Imitation Learning via Graph Diffusion

Vitalis Vosylius, Edward Johns

ICLR 2025posterarXiv:2411.12633
23
citations
#1124

Bi-Factorial Preference Optimization: Balancing Safety-Helpfulness in Language Models

Wenxuan Zhang, Philip Torr, Mohamed Elhoseiny et al.

ICLR 2025posterarXiv:2408.15313
23
citations
#1125

Reward Guided Latent Consistency Distillation

William Wang, Jiachen Li, Weixi Feng et al.

ICLR 2025posterarXiv:2403.11027
23
citations
#1126

B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners

Weihao Zeng, Yuzhen Huang, Lulu Zhao et al.

ICLR 2025posterarXiv:2412.17256
23
citations
#1127

DisPose: Disentangling Pose Guidance for Controllable Human Image Animation

Hongxiang Li, Yaowei Li, Yuhang Yang et al.

ICLR 2025posterarXiv:2412.09349
23
citations
#1128

MindTuner: Cross-Subject Visual Decoding with Visual Fingerprint and Semantic Correction

Zixuan Gong, Qi Zhang, Guangyin Bao et al.

AAAI 2025paperarXiv:2404.12630
23
citations
#1129

Latent Reward: LLM-Empowered Credit Assignment in Episodic Reinforcement Learning

Yun Qu, Yuhang Jiang, Boyuan Wang et al.

AAAI 2025paperarXiv:2412.11120
23
citations
#1130

Teaching Language Models to Critique via Reinforcement Learning

Zhihui Xie, Jie chen, Liyu Chen et al.

ICML 2025posterarXiv:2502.03492
23
citations
#1131

Addressing Misspecification in Simulation-based Inference through Data-driven Calibration

Antoine Wehenkel, Juan L. Gamella, Ozan Sener et al.

ICML 2025oralarXiv:2405.08719
23
citations
#1132

Self-Consistency Preference Optimization

Archiki Prasad, Weizhe Yuan, Richard Yuanzhe Pang et al.

ICML 2025posterarXiv:2411.04109
23
citations
#1133

Towards a Mechanistic Explanation of Diffusion Model Generalization

Matthew Niedoba, Berend Zwartsenberg, Kevin Murphy et al.

ICML 2025spotlightarXiv:2411.19339
23
citations
#1134

A Stitch in Time Saves Nine: Small VLM is a Precise Guidance for Accelerating Large VLMs

Wangbo Zhao, Yizeng Han, Jiasheng Tang et al.

CVPR 2025posterarXiv:2412.03324
23
citations
#1135

Mani-GS: Gaussian Splatting Manipulation with Triangular Mesh

Xiangjun Gao, Xiaoyu Li, Yiyu Zhuang et al.

CVPR 2025posterarXiv:2405.17811
23
citations
#1136

EditAR: Unified Conditional Generation with Autoregressive Models

Jiteng Mu, Nuno Vasconcelos, Xiaolong Wang

CVPR 2025posterarXiv:2501.04699
23
citations
#1137

LLaVA-KD: A Framework of Distilling Multimodal Large Language Models

Yuxuan Cai, Jiangning Zhang, Haoyang He et al.

ICCV 2025posterarXiv:2410.16236
23
citations
#1138

Flow to the Mode: Mode-Seeking Diffusion Autoencoders for State-of-the-Art Image Tokenization

Kyle Sargent, Kyle Hsu, Justin Johnson et al.

ICCV 2025posterarXiv:2503.11056
23
citations
#1139

Q-Frame: Query-aware Frame Selection and Multi-Resolution Adaptation for Video-LLMs

Shaojie Zhang, Jiahui Yang, Jianqin Yin et al.

ICCV 2025posterarXiv:2506.22139
23
citations
#1140

MMQA: Evaluating LLMs with Multi-Table Multi-Hop Complex Questions

Jian Wu, Linyi Yang, Dongyuan Li et al.

ICLR 2025poster
23
citations
#1141

Fantastic Copyrighted Beasts and How (Not) to Generate Them

Luxi He, Yangsibo Huang, Weijia Shi et al.

ICLR 2025posterarXiv:2406.14526
23
citations
#1142

Language Imbalance Driven Rewarding for Multilingual Self-improving

Wen Yang, Junhong Wu, Chen Wang et al.

ICLR 2025posterarXiv:2410.08964
23
citations
#1143

Diffusion Generative Modeling for Spatially Resolved Gene Expression Inference from Histology Images

Sichen Zhu, Yuchen Zhu, Molei Tao et al.

ICLR 2025posterarXiv:2501.15598
23
citations
#1144

Limits to scalable evaluation at the frontier: LLM as judge won’t beat twice the data

Florian Eddie Dorner, Vivian Nastl, Moritz Hardt

ICLR 2025poster
23
citations
#1145

Navigation-Guided Sparse Scene Representation for End-to-End Autonomous Driving

Peidong Li, Dixiao Cui

ICLR 2025oralarXiv:2409.18341
23
citations
#1146

Robots Pre-train Robots: Manipulation-Centric Robotic Representation from Large-Scale Robot Datasets

Guangqi Jiang, Yifei Sun, Tao Huang et al.

ICLR 2025posterarXiv:2410.22325
23
citations
#1147

Checklists Are Better Than Reward Models For Aligning Language Models

Vijay Viswanathan, Yanchao Sun, Xiang Kong et al.

NEURIPS 2025spotlightarXiv:2507.18624
23
citations
#1148

miniCTX: Neural Theorem Proving with (Long-)Contexts

Jiewen Hu, Thomas Zhu, Sean Welleck

ICLR 2025posterarXiv:2408.03350
23
citations
#1149

Nonconvex Stochastic Optimization under Heavy-Tailed Noises: Optimal Convergence without Gradient Clipping

Zijian Liu, Zhengyuan Zhou

ICLR 2025posterarXiv:2412.19529
23
citations
#1150

The AdEMAMix Optimizer: Better, Faster, Older

Matteo Pagliardini, Pierre Ablin, David Grangier

ICLR 2025posterarXiv:2409.03137
23
citations
#1151

SimPER: A Minimalist Approach to Preference Alignment without Hyperparameters

Teng Xiao, Yige Yuan, Zhengyu Chen et al.

ICLR 2025posterarXiv:2502.00883
23
citations
#1152

NovelQA: Benchmarking Question Answering on Documents Exceeding 200K Tokens

Cunxiang Wang, Ruoxi Ning, Boqi Pan et al.

ICLR 2025posterarXiv:2403.12766
23
citations
#1153

HELMET: How to Evaluate Long-context Models Effectively and Thoroughly

Howard Yen, Tianyu Gao, Minmin Hou et al.

ICLR 2025poster
23
citations
#1154

NatureLM-audio: an Audio-Language Foundation Model for Bioacoustics

David Robinson, Marius Miron, Masato Hagiwara et al.

ICLR 2025posterarXiv:2411.07186
23
citations
#1155

Facilitating Multi-turn Function Calling for LLMs via Compositional Instruction Tuning

Mingyang Chen, sunhaoze, Tianpeng Li et al.

ICLR 2025posterarXiv:2410.12952
23
citations
#1156

Text-to-Image Rectified Flow as Plug-and-Play Priors

Xiaofeng Yang, Cheng Chen, xulei yang et al.

ICLR 2025posterarXiv:2406.03293
23
citations
#1157

Steering Masked Discrete Diffusion Models via Discrete Denoising Posterior Prediction

Jarrid Rector-Brooks, Mohsin Hasan, Zhangzhi Peng et al.

ICLR 2025posterarXiv:2410.08134
23
citations
#1158

POSTA: A Go-to Framework for Customized Artistic Poster Generation

Haoyu Chen, Xiaojie Xu, Wenbo Li et al.

CVPR 2025posterarXiv:2503.14908
23
citations
#1159

Language-Guided Image Tokenization for Generation

Kaiwen Zha, Lijun Yu, Alireza Fathi et al.

CVPR 2025posterarXiv:2412.05796
23
citations
#1160

KRIS-Bench: Benchmarking Next-Level Intelligent Image Editing Models

Yongliang Wu, Zonghui Li, Xinting Hu et al.

NEURIPS 2025posterarXiv:2505.16707
23
citations
#1161

DeepVideo-R1: Video Reinforcement Fine-Tuning via Difficulty-aware Regressive GRPO

Jinyoung Park, Jeehye Na, Jinyoung Kim et al.

NEURIPS 2025posterarXiv:2506.07464
23
citations
#1162

Bridging Past and Future: End-to-End Autonomous Driving with Historical Prediction and Planning

Bozhou Zhang, Nan Song, Xin Jin et al.

CVPR 2025posterarXiv:2503.14182
22
citations
#1163

Bayesian Prompt Flow Learning for Zero-Shot Anomaly Detection

Zhen Qu, Xian Tao, Xinyi Gong et al.

CVPR 2025posterarXiv:2503.10080
22
citations
#1164

Material Anything: Generating Materials for Any 3D Object via Diffusion

Xin Huang, Tengfei Wang, Ziwei Liu et al.

CVPR 2025highlightarXiv:2411.15138
22
citations
#1165

Nullu: Mitigating Object Hallucinations in Large Vision-Language Models via HalluSpace Projection

Le Yang, Ziwei Zheng, Boxu Chen et al.

CVPR 2025posterarXiv:2412.13817
22
citations
#1166

OSV: One Step is Enough for High-Quality Image to Video Generation

Xiaofeng Mao, Zhengkai Jiang, Fu-Yun Wang et al.

CVPR 2025posterarXiv:2409.11367
22
citations
#1167

Enhancing Video-LLM Reasoning via Agent-of-Thoughts Distillation

Yudi Shi, Shangzhe Di, Qirui Chen et al.

CVPR 2025posterarXiv:2412.01694
22
citations
#1168

MotionFollower: Editing Video Motion via Score-Guided Diffusion

Shuyuan Tu, Qi Dai, Zihao Zhang et al.

ICCV 2025poster
22
citations
#1169

GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation

Tianwei Xiong, Jun Hao Liew, Zilong Huang et al.

ICCV 2025posterarXiv:2504.08736
22
citations
#1170

DreamActor-M1: Holistic, Expressive and Robust Human Image Animation with Hybrid Guidance

Yuxuan Luo, Zhengkun Rong, Lizhen Wang et al.

ICCV 2025posterarXiv:2504.01724
22
citations
#1171

STAR: Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolution

Rui Xie, Yinhong Liu, Penghao Zhou et al.

ICCV 2025posterarXiv:2501.02976
22
citations
#1172

ReconDreamer++: Harmonizing Generative and Reconstructive Models for Driving Scene Representation

Guosheng Zhao, Xiaofeng Wang, Chaojun Ni et al.

ICCV 2025posterarXiv:2503.18438
22
citations
#1173

SWE-rebench: An Automated Pipeline for Task Collection and Decontaminated Evaluation of Software Engineering Agents

Ibragim Badertdinov, Alexander Golubev, Maksim Nekrashevich et al.

NEURIPS 2025posterarXiv:2505.20411
22
citations
#1174

Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers

Wei Pang, Kevin Qinghong Lin, Xiangru Jian et al.

NEURIPS 2025posterarXiv:2505.21497
22
citations
#1175

G-Memory: Tracing Hierarchical Memory for Multi-Agent Systems

Guibin Zhang, Muxin Fu, Kun Wang et al.

NEURIPS 2025spotlightarXiv:2506.07398
22
citations
#1176

ClinicalLab: Aligning Agents for Multi-Departmental Clinical Diagnostics in the Real World

Weixiang Yan, Haitian Liu, Tengxiao Wu et al.

NEURIPS 2025posterarXiv:2406.13890
22
citations
#1177

Scaling Unlocks Broader Generation and Deeper Functional Understanding of Proteins

Aadyot Bhatnagar, Sarthak Jain, Joel Beazer et al.

NEURIPS 2025spotlight
22
citations
#1178

Recite, Reconstruct, Recollect: Memorization in LMs as a Multifaceted Phenomenon

USVSN Sai Prashanth, Alvin Deng, Kyle O'Brien et al.

ICLR 2025posterarXiv:2406.17746
22
citations
#1179

3D-Properties: Identifying Challenges in DPO and Charting a Path Forward

Yuzi Yan, Yibo Miao, Jialian Li et al.

ICLR 2025posterarXiv:2406.07327
22
citations
#1180

LICO: Large Language Models for In-Context Molecular Optimization

Tung Nguyen, Aditya Grover

ICLR 2025posterarXiv:2406.18851
22
citations
#1181

Failures to Find Transferable Image Jailbreaks Between Vision-Language Models

Rylan Schaeffer, Dan Valentine, Luke Bailey et al.

ICLR 2025posterarXiv:2407.15211
22
citations
#1182

From Lazy to Rich: Exact Learning Dynamics in Deep Linear Networks

Clementine Domine, Nicolas Anguita, Alexandra M Proca et al.

ICLR 2025poster
22
citations
#1183

Controllable Safety Alignment: Inference-Time Adaptation to Diverse Safety Requirements

Jingyu Zhang, Ahmed Elgohary Ghoneim, Ahmed Magooda et al.

ICLR 2025posterarXiv:2410.08968
22
citations
#1184

Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation

Peiwen Sun, Sitong Cheng, Xiangtai Li et al.

ICLR 2025posterarXiv:2410.10676
22
citations
#1185

Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN

Pengxiang Li, Lu Yin, Shiwei Liu

ICLR 2025posterarXiv:2412.13795
22
citations
#1186

Syntactic and Semantic Control of Large Language Models via Sequential Monte Carlo

João Loula, Benjamin LeBrun, Li Du et al.

ICLR 2025posterarXiv:2504.13139
22
citations
#1187

Understanding and Mitigating Hallucination in Large Vision-Language Models via Modular Attribution and Intervention

Tianyun Yang, Ziniu Li, Juan Cao et al.

ICLR 2025poster
22
citations
#1188

Adaptive Guidance: Training-free Acceleration of Conditional Diffusion Models

Angela Castillo, Jonas Kohler, Juan C. Pérez et al.

AAAI 2025paperarXiv:2312.12487
22
citations
#1189

NightHaze: Nighttime Image Dehazing via Self-Prior Learning

Beibei Lin, Yeying Jin, Yan Wending et al.

AAAI 2025paperarXiv:2403.07408
22
citations
#1190

GlyphDraw2: Automatic Generation of Complex Glyph Posters with Diffusion Models and Large Language Models

Jian Ma, Yonglin Deng, Chen Chen et al.

AAAI 2025paperarXiv:2407.02252
22
citations
#1191

Robust Tracking via Mamba-based Context-aware Token Learning

Jinxia Xie, Bineng Zhong, Qihua Liang et al.

AAAI 2025paperarXiv:2412.13611
22
citations
#1192

Numerical Pruning for Efficient Autoregressive Models

Xuan Shen, Zhao Song, Yufa Zhou et al.

AAAI 2025paperarXiv:2412.12441
22
citations
#1193

Hierarchical Classification Auxiliary Network for Time Series Forecasting

Yanru Sun, Zongxia Xie, Dongyue Chen et al.

AAAI 2025paperarXiv:2405.18975
22
citations
#1194

Multi-Level Optimal Transport for Universal Cross-Tokenizer Knowledge Distillation on Language Models

Xiao Cui, Mo Zhu, Yulei Qin et al.

AAAI 2025paperarXiv:2412.14528
22
citations
#1195

Audio Entailment: Assessing Deductive Reasoning for Audio Understanding

Soham Deshmukh, Shuo Han, Hazim Bukhari et al.

AAAI 2025paperarXiv:2407.18062
22
citations
#1196

Towards Robust Knowledge Unlearning: An Adversarial Framework for Assessing and Improving Unlearning Robustness in Large Language Models

Hongbang Yuan, Zhuoran Jin, Pengfei Cao et al.

AAAI 2025paperarXiv:2408.10682
22
citations
#1197

Token Cleaning: Fine-Grained Data Selection for LLM Supervised Fine-Tuning

Jinlong Pang, Na Di, Zhaowei Zhu et al.

ICML 2025posterarXiv:2502.01968
22
citations
#1198

CleanDIFT: Diffusion Features without Noise

Nick Stracke, Stefan Andreas Baumann, Kolja Bauer et al.

CVPR 2025posterarXiv:2412.03439
22
citations
#1199

BooW-VTON: Boosting In-the-Wild Virtual Try-On via Mask-Free Pseudo Data Training

Xuanpu Zhang, Dan Song, pengxin zhan et al.

CVPR 2025posterarXiv:2408.06047
22
citations
#1200

SWE-bench Goes Live!

Linghao Zhang, Shilin He, Chaoyun Zhang et al.

NEURIPS 2025posterarXiv:2505.23419
22
citations