Most Cited 2025 "transformers with bottlenecks" Papers

22,274 papers found • Page 11 of 112

#2001

Talking Turns: Benchmarking Audio Foundation Models on Turn-Taking Dynamics

Siddhant Arora, Zhiyun Lu, Chung-Cheng Chiu et al.

ICLR 2025arXiv:2503.01174
23
citations
#2002

CPath-Omni: A Unified Multimodal Foundation Model for Patch and Whole Slide Image Analysis in Computational Pathology

Yuxuan Sun, Yixuan Si, Chenglu Zhu et al.

CVPR 2025arXiv:2412.12077
23
citations
#2003

VMem: Consistent Interactive Video Scene Generation with Surfel-Indexed View Memory

Runjia Li, Philip Torr, Andrea Vedaldi et al.

ICCV 2025highlightarXiv:2506.18903
23
citations
#2004

PromptHMR: Promptable Human Mesh Recovery

Yufu Wang, Yu Sun, Priyanka Patel et al.

CVPR 2025arXiv:2504.06397
23
citations
#2005

Population Transformer: Learning Population-level Representations of Neural Activity

Geeling Chau, Christopher Wang, Sabera Talukder et al.

ICLR 2025oralarXiv:2406.03044
23
citations
#2006

Enhanced Contrastive Learning with Multi-view Longitudinal Data for Chest X-ray Report Generation

Kang Liu, Zhuoqi Ma, Xiaolu Kang et al.

CVPR 2025arXiv:2502.20056
23
citations
#2007

ThoughtTerminator: Benchmarking, Calibrating, and Mitigating Overthinking in Reasoning Models

Xiao Pu, Michael Saxon, Wenyue Hua et al.

COLM 2025paperarXiv:2504.13367
23
citations
#2008

Towards Effective Evaluations and Comparisons for LLM Unlearning Methods

Qizhou Wang, Bo Han, Puning Yang et al.

ICLR 2025arXiv:2406.09179
23
citations
#2009

Posterior-Mean Rectified Flow: Towards Minimum MSE Photo-Realistic Image Restoration

Guy Ohayon, Tomer Michaeli, Michael Elad

ICLR 2025arXiv:2410.00418
23
citations
#2010

Unifying Causal Representation Learning with the Invariance Principle

Dingling Yao, Dario Rancati, Riccardo Cadei et al.

ICLR 2025arXiv:2409.02772
23
citations
#2011

HELMET: How to Evaluate Long-context Models Effectively and Thoroughly

Howard Yen, Tianyu Gao, Minmin Hou et al.

ICLR 2025
23
citations
#2012

AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories

Xing Han Lù, Amirhossein Kazemnejad, Nicholas Meade et al.

COLM 2025paperarXiv:2504.08942
23
citations
#2013

Language Models are Advanced Anonymizers

Robin Staab, Mark Vero, Mislav Balunovic et al.

ICLR 2025arXiv:2402.13846
23
citations
#2014

Numerical Pruning for Efficient Autoregressive Models

Xuan Shen, Zhao Song, Yufa Zhou et al.

AAAI 2025paperarXiv:2412.12441
23
citations
#2015

Towards Scalable Exact Machine Unlearning Using Parameter-Efficient Fine-Tuning

Somnath Basu Roy Chowdhury, Krzysztof Choromanski, Arijit Sehanobish et al.

ICLR 2025arXiv:2406.16257
23
citations
#2016

Go to Zero: Towards Zero-shot Motion Generation with Million-scale Data

Ke Fan, Shunlin Lu, Minyue Dai et al.

ICCV 2025highlightarXiv:2507.07095
23
citations
#2017

From Language Models over Tokens to Language Models over Characters

Tim Vieira, Benjamin LeBrun, Mario Giulianelli et al.

ICML 2025spotlightarXiv:2412.03719
23
citations
#2018

SAR3D: Autoregressive 3D Object Generation and Understanding via Multi-scale 3D VQVAE

YONGWEI CHEN, Yushi Lan, Shangchen Zhou et al.

CVPR 2025arXiv:2411.16856
23
citations
#2019

ConFIG: Towards Conflict-free Training of Physics Informed Neural Networks

Qiang Liu, Mengyu Chu, Nils Thuerey

ICLR 2025arXiv:2408.11104
23
citations
#2020

SpatialReasoner: Towards Explicit and Generalizable 3D Spatial Reasoning

Wufei Ma, Yu-Cheng Chou, Qihao Liu et al.

NEURIPS 2025arXiv:2504.20024
23
citations
#2021

Is Your Multimodal Language Model Oversensitive to Safe Queries?

Xirui Li, Hengguang Zhou, Ruochen Wang et al.

ICLR 2025arXiv:2406.17806
23
citations
#2022

AniGS: Animatable Gaussian Avatar from a Single Image with Inconsistent Gaussian Reconstruction

Lingteng Qiu, Shenhao Zhu, Qi Zuo et al.

CVPR 2025arXiv:2412.02684
23
citations
#2023

OSV: One Step is Enough for High-Quality Image to Video Generation

Xiaofeng Mao, Zhengkai Jiang, Fu-Yun Wang et al.

CVPR 2025arXiv:2409.11367
23
citations
#2024

FloVD: Optical Flow Meets Video Diffusion Model for Enhanced Camera-Controlled Video Synthesis

Wonjoon Jin, Qi Dai, Chong Luo et al.

CVPR 2025arXiv:2502.08244
23
citations
#2025

Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation

Yuhui Zhang, Yuchang Su, Yiming Liu et al.

CVPR 2025arXiv:2501.03225
23
citations
#2026

L4DR: LiDAR-4DRadar Fusion for Weather-Robust 3D Object Detection

Xun Huang, Ziyu Xu, Hai Wu et al.

AAAI 2025paperarXiv:2408.03677
23
citations
#2027

SynLogic: Synthesizing Verifiable Reasoning Data at Scale for Learning Logical Reasoning and Beyond

Junteng Liu, Yuanxiang Fan, Jiang Zhuo et al.

NEURIPS 2025arXiv:2505.19641
23
citations
#2028

Transformers are Universal In-context Learners

Takashi Furuya, Maarten V de Hoop, Gabriel Peyré

ICLR 2025arXiv:2408.01367
23
citations
#2029

2DMamba: Efficient State Space Model for Image Representation with Applications on Giga-Pixel Whole Slide Image Classification

Jingwei Zhang, Anh Tien Nguyen, Xi Han et al.

CVPR 2025arXiv:2412.00678
23
citations
#2030

TinyFusion: Diffusion Transformers Learned Shallow

Gongfan Fang, Kunjun Li, Xinyin Ma et al.

CVPR 2025highlightarXiv:2412.01199
23
citations
#2031

Effective Diffusion Transformer Architecture for Image Super-Resolution

Kun Cheng, Lei Yu, Zhijun Tu et al.

AAAI 2025paperarXiv:2409.19589
23
citations
#2032

UniVAD: A Training-free Unified Model for Few-shot Visual Anomaly Detection

Zhaopeng Gu, Bingke Zhu, Guibo Zhu et al.

CVPR 2025arXiv:2412.03342
23
citations
#2033

Position: Evaluating Generative AI Systems Is a Social Science Measurement Challenge

Hanna Wallach, Meera Desai, A. Feder Cooper et al.

ICML 2025arXiv:2502.00561
23
citations
#2034

Exploring Criteria of Loss Reweighting to Enhance LLM Unlearning

Puning Yang, Qizhou Wang, Zhuo Huang et al.

ICML 2025arXiv:2505.11953
23
citations
#2035

Adaptive Guidance: Training-free Acceleration of Conditional Diffusion Models

Angela Castillo, Jonas Kohler, Juan C. Pérez et al.

AAAI 2025paperarXiv:2312.12487
23
citations
#2036

Towards Fast, Specialized Machine Learning Force Fields: Distilling Foundation Models via Energy Hessians

Ishan Amin, Sanjeev Raja, Aditi Krishnapriyan

ICLR 2025arXiv:2501.09009
23
citations
#2037

ResAdapter: Domain Consistent Resolution Adapter for Diffusion Models

Jiaxiang Cheng, Pan Xie, Xin Xia et al.

AAAI 2025paperarXiv:2403.02084
23
citations
#2038

Residual Connections and Normalization Can Provably Prevent Oversmoothing in GNNs

Michael Scholkemper, Xinyi Wu, Ali Jadbabaie et al.

ICLR 2025arXiv:2406.02997
23
citations
#2039

Erasing Conceptual Knowledge from Language Models

Rohit Gandikota, Sheridan Feucht, Samuel Marks et al.

NEURIPS 2025arXiv:2410.02760
23
citations
#2040

Heavy-Tailed Diffusion Models

Kushagra Pandey, Jaideep Pathak, Yilun Xu et al.

ICLR 2025arXiv:2410.14171
23
citations
#2041

Q-Frame: Query-aware Frame Selection and Multi-Resolution Adaptation for Video-LLMs

Shaojie Zhang, Jiahui Yang, Jianqin Yin et al.

ICCV 2025arXiv:2506.22139
23
citations
#2042

Improving Data Efficiency via Curating LLM-Driven Rating Systems

Jinlong Pang, Jiaheng Wei, Ankit Parag Shah et al.

ICLR 2025arXiv:2410.10877
23
citations
#2043

DesignDiffusion: High-Quality Text-to-Design Image Generation with Diffusion Models

Zhendong Wang, Jianmin Bao, Shuyang Gu et al.

CVPR 2025arXiv:2503.01645
23
citations
#2044

Mastering Board Games by External and Internal Planning with Language Models

John Schultz, Jakub Adamek, Matej Jusup et al.

ICML 2025spotlightarXiv:2412.12119
23
citations
#2045

SCANS: Mitigating the Exaggerated Safety for LLMs via Safety-Conscious Activation Steering

Zouying Cao, Yifei Yang, Hai Zhao

AAAI 2025paperarXiv:2408.11491
23
citations
#2046

SPC: Evolving Self-Play Critic via Adversarial Games for LLM Reasoning

Jiaqi Chen, Bang Zhang, Ruotian Ma et al.

NEURIPS 2025arXiv:2504.19162
23
citations
#2047

Oscillatory State-Space Models

T. Konstantin Rusch, Daniela Rus

ICLR 2025arXiv:2410.03943
23
citations
#2048

Faster Cascades via Speculative Decoding

Harikrishna Narasimhan, Wittawat Jitkrittum, Ankit Singh Rawat et al.

ICLR 2025arXiv:2405.19261
23
citations
#2049

Amodal3R: Amodal 3D Reconstruction from Occluded 2D Images

Tianhao Wu, Chuanxia Zheng, Frank Guan et al.

ICCV 2025arXiv:2503.13439
23
citations
#2050

Steering Away from Harm: An Adaptive Approach to Defending Vision Language Model Against Jailbreaks

Han Wang, Gang Wang, Huan Zhang

CVPR 2025arXiv:2411.16721
23
citations
#2051

On Statistical Rates of Conditional Diffusion Transformers: Approximation, Estimation and Minimax Optimality

Jerry Yao-Chieh Hu, Weimin Wu, Yi-Chen Lee et al.

ICLR 2025arXiv:2411.17522
23
citations
#2052

A Simple Model of Inference Scaling Laws

Noam Levi

ICML 2025arXiv:2410.16377
23
citations
#2053

Enhancing Video-LLM Reasoning via Agent-of-Thoughts Distillation

Yudi Shi, Shangzhe Di, Qirui Chen et al.

CVPR 2025arXiv:2412.01694
23
citations
#2054

Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations

Jiaming Han, Hao Chen, Yang Zhao et al.

NEURIPS 2025arXiv:2506.18898
22
citations
#2055

Automated Proof Generation for Rust Code via Self-Evolution

Tianyu Chen, Shuai Lu, Shan Lu et al.

ICLR 2025arXiv:2410.15756
22
citations
#2056

Is In-Context Learning Sufficient for Instruction Following in LLMs?

Hao Zhao, Maksym Andriushchenko, francesco croce et al.

ICLR 2025arXiv:2405.19874
22
citations
#2057

Is Noise Conditioning Necessary for Denoising Generative Models?

Qiao Sun, Zhicheng Jiang, Hanhong Zhao et al.

ICML 2025arXiv:2502.13129
22
citations
#2058

Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers

Weiming Ren, Wentao Ma, Huan Yang et al.

ICCV 2025arXiv:2503.11579
22
citations
#2059

Small Batch Size Training for Language Models: When Vanilla SGD Works, and Why Gradient Accumulation is Wasteful

Martin Marek, Sanae Lotfi, Aditya Somasundaram et al.

NEURIPS 2025arXiv:2507.07101
22
citations
#2060

TIPS: Text-Image Pretraining with Spatial awareness

Kevis-Kokitsi Maninis, Kaifeng Chen, Soham Ghosh et al.

ICLR 2025arXiv:2410.16512
22
citations
#2061

Beyond Attention or Similarity: Maximizing Conditional Diversity for Token Pruning in MLLMs

Qizhe Zhang, Mengzhen Liu, Lichen Li et al.

NEURIPS 2025arXiv:2506.10967
22
citations
#2062

Bridging Past and Future: End-to-End Autonomous Driving with Historical Prediction and Planning

Bozhou Zhang, Nan Song, Xin Jin et al.

CVPR 2025arXiv:2503.14182
22
citations
#2063

Enhancing Decision-Making for LLM Agents via Step-Level Q-Value Models

Yuanzhao Zhai, Tingkai Yang, Kele Xu et al.

AAAI 2025paperarXiv:2409.09345
22
citations
#2064

Self-Improving Robust Preference Optimization

Eugene Choi, Arash Ahmadian, Matthieu Geist et al.

ICLR 2025arXiv:2406.01660
22
citations
#2065

Understanding and Mitigating Hallucination in Large Vision-Language Models via Modular Attribution and Intervention

Tianyun Yang, Ziniu Li, Juan Cao et al.

ICLR 2025
22
citations
#2066

Linear Representations of Political Perspective Emerge in Large Language Models

Junsol Kim, James Evans, Aaron Schein

ICLR 2025arXiv:2503.02080
22
citations
#2067

MV-VTON: Multi-View Virtual Try-On with Diffusion Models

Haoyu Wang, Zhilu Zhang, Donglin Di et al.

AAAI 2025paperarXiv:2404.17364
22
citations
#2068

SpatialLM: Training Large Language Models for Structured Indoor Modeling

Yongsen Mao, Junhao Zhong, Chuan Fang et al.

NEURIPS 2025arXiv:2506.07491
22
citations
#2069

Fourier Position Embedding: Enhancing Attention’s Periodic Extension for Length Generalization

Ermo Hua, Che Jiang, Xingtai Lv et al.

ICML 2025arXiv:2412.17739
22
citations
#2070

Fixing the Double Penalty in Data-Driven Weather Forecasting Through a Modified Spherical Harmonic Loss Function

Christopher Subich, Syed Husain, Leo Separovic et al.

ICML 2025arXiv:2501.19374
22
citations
#2071

Transformers Learn to Implement Multi-step Gradient Descent with Chain of Thought

Jianhao Huang, Zixuan Wang, Jason Lee

ICLR 2025arXiv:2502.21212
22
citations
#2072

MM-IFEngine: Towards Multimodal Instruction Following

Shengyuan Ding, Wu Shenxi, Xiangyu Zhao et al.

ICCV 2025arXiv:2504.07957
22
citations
#2073

Optimus-2: Multimodal Minecraft Agent with Goal-Observation-Action Conditioned Policy

Zaijing Li, Yuquan Xie, Rui Shao et al.

CVPR 2025arXiv:2502.19902
22
citations
#2074

Mitigating Object Hallucination in MLLMs via Data-augmented Phrase-level Alignment

Pritam Sarkar, Sayna Ebrahimi, Ali Etemad et al.

ICLR 2025arXiv:2405.18654
22
citations
#2075

MLLMs Need 3D-Aware Representation Supervision for Scene Understanding

Xiaohu Huang, Jingjing Wu, Qunyi Xie et al.

NEURIPS 2025arXiv:2506.01946
22
citations
#2076

FonTS: Text Rendering With Typography and Style Controls

Wenda SHI, Yiren Song, Dengming Zhang et al.

ICCV 2025arXiv:2412.00136
22
citations
#2077

Mini-Monkey: Alleviating the Semantic Sawtooth Effect for Lightweight MLLMs via Complementary Image Pyramid

Mingxin Huang, Yuliang Liu, Dingkang Liang et al.

ICLR 2025arXiv:2408.02034
22
citations
#2078

EmbedLLM: Learning Compact Representations of Large Language Models

Richard Zhuang, Tianhao Wu, Zhaojin Wen et al.

ICLR 2025arXiv:2410.02223
22
citations
#2079

Do LLMs ``know'' internally when they follow instructions?

Juyeon Heo, Christina Heinze-Deml, Oussama Elachqar et al.

ICLR 2025arXiv:2410.14516
22
citations
#2080

SE-Agent: Self-Evolution Trajectory Optimization in Multi-Step Reasoning with LLM-Based Agents

Yifu Guo, Jiaye Lin, Huacan Wang et al.

NEURIPS 2025arXiv:2508.02085
22
citations
#2081

RobustKV: Defending Large Language Models against Jailbreak Attacks via KV Eviction

Tanqiu Jiang, Zian Wang, Jiacheng Liang et al.

ICLR 2025arXiv:2410.19937
22
citations
#2082

OneForecast: A Universal Framework for Global and Regional Weather Forecasting

Yuan Gao, Hao Wu, Ruiqi Shu et al.

ICML 2025arXiv:2502.00338
22
citations
#2083

ImageGen-CoT: Enhancing Text-to-Image In-context Learning with Chain-of-Thought Reasoning

Jiaqi Liao, Zhengyuan Yang, Linjie Li et al.

ICCV 2025arXiv:2503.19312
22
citations
#2084

TimeXL: Explainable Multi-modal Time Series Prediction with LLM-in-the-Loop

Yushan Jiang, Wenchao Yu, Geon Lee et al.

NEURIPS 2025arXiv:2503.01013
22
citations
#2085

Scaling Unlocks Broader Generation and Deeper Functional Understanding of Proteins

Aadyot Bhatnagar, Sarthak Jain, Joel Beazer et al.

NEURIPS 2025spotlight
22
citations
#2086

GFM-RAG: Graph Foundation Model for Retrieval Augmented Generation

LINHAO LUO, Zicheng Zhao, Reza Haffari et al.

NEURIPS 2025arXiv:2502.01113
22
citations
#2087

Controlling Language and Diffusion Models by Transporting Activations

Pau Rodriguez, Arno Blaas, Michal Klein et al.

ICLR 2025arXiv:2410.23054
22
citations
#2088

Distraction is All You Need for Multimodal Large Language Model Jailbreaking

Zuopeng Yang, Jiluan Fan, Anli Yan et al.

CVPR 2025highlightarXiv:2502.10794
22
citations
#2089

Audio Entailment: Assessing Deductive Reasoning for Audio Understanding

Soham Deshmukh, Shuo Han, Hazim Bukhari et al.

AAAI 2025paperarXiv:2407.18062
22
citations
#2090

Self-Improving Transformers Overcome Easy-to-Hard and Length Generalization Challenges

Nayoung Lee, Jack Cai, Avi Schwarzschild et al.

ICML 2025arXiv:2502.01612
22
citations
#2091

F-LMM: Grounding Frozen Large Multimodal Models

Size Wu, Sheng Jin, Wenwei Zhang et al.

CVPR 2025arXiv:2406.05821
22
citations
#2092

Audio Large Language Models Can Be Descriptive Speech Quality Evaluators

CHEN CHEN, Yuchen Hu, Siyin Wang et al.

ICLR 2025arXiv:2501.17202
22
citations
#2093

GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmentation

Lang Lin, Xueyang Yu, Ziqi Pang et al.

CVPR 2025arXiv:2504.07962
22
citations
#2094

Unhackable Temporal Reward for Scalable Video MLLMs

En Yu, Kangheng Lin, Liang Zhao et al.

ICLR 2025oralarXiv:2502.12081
22
citations
#2095

BooW-VTON: Boosting In-the-Wild Virtual Try-On via Mask-Free Pseudo Data Training

Xuanpu Zhang, Dan Song, pengxin zhan et al.

CVPR 2025arXiv:2408.06047
22
citations
#2096

Meta CLIP 2: A Worldwide Scaling Recipe

Yung-Sung Chuang, Yang Li, Dong Wang et al.

NEURIPS 2025spotlightarXiv:2507.22062
22
citations
#2097

Impromptu VLA: Open Weights and Open Data for Driving Vision-Language-Action Models

Haohan Chi, Huan-ang Gao, Ziming Liu et al.

NEURIPS 2025arXiv:2505.23757
22
citations
#2098

VisOnlyQA: Large Vision Language Models Still Struggle with Visual Perception of Geometric Information

Ryo Kamoi, Yusen Zhang, Sarkar Snigdha Sarathi Das et al.

COLM 2025paperarXiv:2412.00947
22
citations
#2099

DSPO: Direct Score Preference Optimization for Diffusion Model Alignment

Huaisheng Zhu, Teng Xiao, Vasant Honavar

ICLR 2025
22
citations
#2100

PathGen-1.6M: 1.6 Million Pathology Image-text Pairs Generation through Multi-agent Collaboration

Yuxuan Sun, Yunlong Zhang, Yixuan Si et al.

ICLR 2025arXiv:2407.00203
22
citations
#2101

Inducing Programmatic Skills for Agentic Tasks

Zora Zhiruo Wang, Apurva Gandhi, Graham Neubig et al.

COLM 2025paperarXiv:2504.06821
22
citations
#2102

Establishing Task Scaling Laws via Compute-Efficient Model Ladders

Akshita Bhagia, Jiacheng Liu, Alexander Wettig et al.

COLM 2025paperarXiv:2412.04403
22
citations
#2103

Towards Robust and Parameter-Efficient Knowledge Unlearning for LLMs

Sungmin Cha, Sungjun Cho, Dasol Hwang et al.

ICLR 2025arXiv:2408.06621
22
citations
#2104

Game4Loc: A UAV Geo-Localization Benchmark from Game Data

Yuxiang Ji, Boyong He, Zhuoyue Tan et al.

AAAI 2025paperarXiv:2409.16925
22
citations
#2105

MotionFollower: Editing Video Motion via Score-Guided Diffusion

Shuyuan Tu, Qi Dai, Zihao Zhang et al.

ICCV 2025
22
citations
#2106

Diversity-Aware Policy Optimization for Large Language Model Reasoning

Jian Yao, Ran Cheng, Xingyu Wu et al.

NEURIPS 2025spotlightarXiv:2505.23433
22
citations
#2107

Multimodal Large Language Models for Inverse Molecular Design with Retrosynthetic Planning

Gang Liu, Michael Sun, Wojciech Matusik et al.

ICLR 2025arXiv:2410.04223
22
citations
#2108

$\text{D}_{2}\text{O}$: Dynamic Discriminative Operations for Efficient Long-Context Inference of Large Language Models

Zhongwei Wan, Xinjian Wu, Yu Zhang et al.

ICLR 2025
22
citations
#2109

B2Opt: Learning to Optimize Black-box Optimization with Little Budget

Xiaobin Li, Kai Wu, Xiaoyu Zhang et al.

AAAI 2025paperarXiv:2304.11787
22
citations
#2110

Truthful Aggregation of LLMs with an Application to Online Advertising

Ermis Soumalias, Michael Curry, Sven Seuken

NEURIPS 2025arXiv:2405.05905
22
citations
#2111

Scaling Optimal LR Across Token Horizons

Johan Bjorck, Alon Benhaim, Vishrav Chaudhary et al.

ICLR 2025arXiv:2409.19913
22
citations
#2112

SiriuS: Self-improving Multi-agent Systems via Bootstrapped Reasoning

Wanjia Zhao, Mert Yuksekgonul, Shirley Wu et al.

NEURIPS 2025arXiv:2502.04780
22
citations
#2113

Flow: Modularized Agentic Workflow Automation

Boye Niu, Yiliao Song, Kai Lian et al.

ICLR 2025arXiv:2501.07834
22
citations
#2114

From Lazy to Rich: Exact Learning Dynamics in Deep Linear Networks

Clementine Domine, Nicolas Anguita, Alexandra M Proca et al.

ICLR 2025
22
citations
#2115

Multi-Agent Systems Execute Arbitrary Malicious Code

Harold Triedman, Rishi Dev Jha, Vitaly Shmatikov

COLM 2025paperarXiv:2503.12188
22
citations
#2116

Reinforced Lifelong Editing for Language Models

Zherui Li, Houcheng Jiang, Hao Chen et al.

ICML 2025arXiv:2502.05759
22
citations
#2117

GI-GS: Global Illumination Decomposition on Gaussian Splatting for Inverse Rendering

Hongze CHEN, Zehong Lin, Jun Zhang

ICLR 2025arXiv:2410.02619
22
citations
#2118

Material Anything: Generating Materials for Any 3D Object via Diffusion

Xin Huang, Tengfei Wang, Ziwei Liu et al.

CVPR 2025highlightarXiv:2411.15138
22
citations
#2119

STRAP: Robot Sub-Trajectory Retrieval for Augmented Policy Learning

Marius Memmel, Jacob Berg, Bingqing Chen et al.

ICLR 2025arXiv:2412.15182
22
citations
#2120

YouTube-SL-25: A Large-Scale, Open-Domain Multilingual Sign Language Parallel Corpus

Garrett Tanzer, Biao Zhang

ICLR 2025arXiv:2407.11144
22
citations
#2121

Position: AI Evaluation Should Learn from How We Test Humans

Yan Zhuang, Qi Liu, Zachary Pardos et al.

ICML 2025arXiv:2306.10512
22
citations
#2122

Mani-GS: Gaussian Splatting Manipulation with Triangular Mesh

Xiangjun Gao, Xiaoyu Li, Yiyu Zhuang et al.

CVPR 2025arXiv:2405.17811
22
citations
#2123

Avat3r: Large Animatable Gaussian Reconstruction Model for High-fidelity 3D Head Avatars

Tobias Kirschstein, Javier Romero, Artem Sevastopolsky et al.

ICCV 2025arXiv:2502.20220
22
citations
#2124

Radiant Foam: Real-Time Differentiable Ray Tracing

Shrisudhan Govindarajan, Daniel Rebain, Kwang Moo Yi et al.

ICCV 2025highlightarXiv:2502.01157
22
citations
#2125

Reflect-DiT: Inference-Time Scaling for Text-to-Image Diffusion Transformers via In-Context Reflection

Shufan Li, Konstantinos Kallidromitis, Akash Gokul et al.

ICCV 2025arXiv:2503.12271
22
citations
#2126

Towards a General Time Series Forecasting Model with Unified Representation and Adaptive Transfer

Yihang Wang, Yuying Qiu, Peng Chen et al.

ICML 2025arXiv:2405.17478
22
citations
#2127

Harnessing Webpage UIs for Text-Rich Visual Understanding

Junpeng Liu, Tianyue Ou, Yifan Song et al.

ICLR 2025arXiv:2410.13824
22
citations
#2128

Monitoring Latent World States in Language Models with Propositional Probes

Jiahai Feng, Stuart Russell, Jacob Steinhardt

ICLR 2025arXiv:2406.19501
22
citations
#2129

TimeKAN: KAN-based Frequency Decomposition Learning Architecture for Long-term Time Series Forecasting

Songtao Huang, Zhen Zhao, Can Li et al.

ICLR 2025oralarXiv:2502.06910
22
citations
#2130

UniGEM: A Unified Approach to Generation and Property Prediction for Molecules

Shikun Feng, Yuyan Ni, Lu yan et al.

ICLR 2025arXiv:2410.10516
22
citations
#2131

DenseMatcher: Learning 3D Semantic Correspondence for Category-Level Manipulation from a Single Demo

Junzhe Zhu, Yuanchen Ju, Junyi Zhang et al.

ICLR 2025arXiv:2412.05268
22
citations
#2132

Adv-CPG: A Customized Portrait Generation Framework with Facial Adversarial Attacks

Junying Wang, Hongyuan Zhang, Yuan Yuan

CVPR 2025arXiv:2503.08269
22
citations
#2133

IRASim: A Fine-Grained World Model for Robot Manipulation

Fangqi Zhu, Hongtao Wu, Song Guo et al.

ICCV 2025arXiv:2406.14540
22
citations
#2134

One Model for ALL: Low-Level Task Interaction Is a Key to Task-Agnostic Image Fusion

Chunyang Cheng, Tianyang Xu, Zhenhua Feng et al.

CVPR 2025arXiv:2502.19854
22
citations
#2135

DiST-4D: Disentangled Spatiotemporal Diffusion with Metric Depth for 4D Driving Scene Generation

Jiazhe Guo, Yikang Ding, Xiwu Chen et al.

ICCV 2025arXiv:2503.15208
22
citations
#2136

Privacy Auditing of Large Language Models

Ashwinee Panda, Xinyu Tang, Christopher Choquette-Choo et al.

ICLR 2025arXiv:2503.06808
22
citations
#2137

Matrix3D: Large Photogrammetry Model All-in-One

Yuanxun Lu, Jingyang Zhang, Tian Fang et al.

CVPR 2025highlightarXiv:2502.07685
22
citations
#2138

Scaling Law with Learning Rate Annealing

Howe Tissue, Venus Wang, Lu Wang

NEURIPS 2025arXiv:2408.11029
22
citations
#2139

TASTE-Rob: Advancing Video Generation of Task-Oriented Hand-Object Interaction for Generalizable Robotic Manipulation

Hongxiang Zhao, Xingchen Liu, Mutian Xu et al.

CVPR 2025arXiv:2503.11423
22
citations
#2140

Moto: Latent Motion Token as the Bridging Language for Learning Robot Manipulation from Videos

Yi Chen, Yuying Ge, Weiliang Tang et al.

ICCV 2025arXiv:2412.04445
22
citations
#2141

CogNav: Cognitive Process Modeling for Object Goal Navigation with LLMs

Yihan Cao, Jiazhao Zhang, Zhinan Yu et al.

ICCV 2025arXiv:2412.10439
22
citations
#2142

3D-Properties: Identifying Challenges in DPO and Charting a Path Forward

Yuzi Yan, Yibo Miao, Jialian Li et al.

ICLR 2025arXiv:2406.07327
22
citations
#2143

Inference-Time Text-to-Video Alignment with Diffusion Latent Beam Search

Yuta Oshima, Masahiro Suzuki, Yutaka Matsuo et al.

NEURIPS 2025arXiv:2501.19252
22
citations
#2144

FoldToken: Learning Protein Language via Vector Quantization and Beyond

Zhangyang Gao, Cheng Tan, Jue Wang et al.

AAAI 2025paperarXiv:2403.09673
22
citations
#2145

Mind the Time: Temporally-Controlled Multi-Event Video Generation

Ziyi Wu, Aliaksandr Siarohin, Willi Menapace et al.

CVPR 2025arXiv:2412.05263
22
citations
#2146

Community Forensics: Using Thousands of Generators to Train Fake Image Detectors

Jeongsoo Park, Andrew Owens

CVPR 2025arXiv:2411.04125
22
citations
#2147

LongMamba: Enhancing Mamba's Long-Context Capabilities via Training-Free Receptive Field Enlargement

Zhifan Ye, Kejing Xia, Yonggan Fu et al.

ICLR 2025arXiv:2504.16053
22
citations
#2148

Detecting Out-of-Distribution Through the Lens of Neural Collapse

Litian Liu, Yao Qin

CVPR 2025arXiv:2311.01479
22
citations
#2149

Compute or Load KV Cache? Why Not Both?

Shuowei Jin, Xueshen Liu, Qingzhao Zhang et al.

ICML 2025arXiv:2410.03065
22
citations
#2150

Demystifying Cost-Efficiency in LLM Serving over Heterogeneous GPUs

Youhe Jiang, Fangcheng Fu, Xiaozhe Yao et al.

ICML 2025arXiv:2502.00722
22
citations
#2151

Puppet-Master: Scaling Interactive Video Generation as a Motion Prior for Part-Level Dynamics

Ruining Li, Chuanxia Zheng, Christian Rupprecht et al.

ICCV 2025arXiv:2408.04631
22
citations
#2152

Any-Resolution AI-Generated Image Detection by Spectral Learning

Dimitrios Karageorgiou, Symeon Papadopoulos, Ioannis Kompatsiaris et al.

CVPR 2025arXiv:2411.19417
22
citations
#2153

LVAgent: Long Video Understanding by Multi-Round Dynamical Collaboration of MLLM Agents

Boyu Chen, Zhengrong Yue, Siran Chen et al.

ICCV 2025arXiv:2503.10200
22
citations
#2154

Generative Video Propagation

Shaoteng Liu, Tianyu Wang, Jui-Hsien Wang et al.

CVPR 2025arXiv:2412.19761
22
citations
#2155

MagicMirror: ID-Preserved Video Generation in Video Diffusion Transformers

Yuechen Zhang, YaoYang Liu, Bin Xia et al.

ICCV 2025arXiv:2501.03931
22
citations
#2156

Nonparametric Modern Hopfield Models

Jerry Yao-Chieh Hu, Bo-Yu Chen, Dennis Wu et al.

ICML 2025arXiv:2404.03900
22
citations
#2157

ToolDial: Multi-turn Dialogue Generation Method for Tool-Augmented Language Models

Jeonghoon Shim, Gyuhyeon Seo, Cheongsu Lim et al.

ICLR 2025arXiv:2503.00564
22
citations
#2158

Recite, Reconstruct, Recollect: Memorization in LMs as a Multifaceted Phenomenon

USVSN Sai Prashanth, Alvin Deng, Kyle O'Brien et al.

ICLR 2025arXiv:2406.17746
22
citations
#2159

MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models

Ziyu Liu, Yuhang Zang, Xiaoyi Dong et al.

ICLR 2025arXiv:2410.17637
22
citations
#2160

Beyond correlation: The impact of human uncertainty in measuring the effectiveness of automatic evaluation and LLM-as-a-judge

Aparna Elangovan, Lei Xu, Jongwoo Ko et al.

ICLR 2025arXiv:2410.03775
22
citations
#2161

Understanding Optimization in Deep Learning with Central Flows

Jeremy Cohen, Alex Damian, Ameet Talwalkar et al.

ICLR 2025arXiv:2410.24206
22
citations
#2162

Unveil Inversion and Invariance in Flow Transformer for Versatile Image Editing

Pengcheng Xu, Boyuan Jiang, Xiaobin Hu et al.

CVPR 2025arXiv:2411.15843
22
citations
#2163

Robin3D: Improving 3D Large Language Model via Robust Instruction Tuning

Weitai Kang, Haifeng Huang, Yuzhang Shang et al.

ICCV 2025arXiv:2410.00255
21
citations
#2164

Self-Challenging Language Model Agents

Yifei Zhou, Sergey Levine, Jason Weston et al.

NEURIPS 2025arXiv:2506.01716
21
citations
#2165

NNsight and NDIF: Democratizing Access to Open-Weight Foundation Model Internals

Jaden Fiotto-Kaufman, Alexander Loftus, Eric Todd et al.

ICLR 2025arXiv:2407.14561
21
citations
#2166

Open-YOLO 3D: Towards Fast and Accurate Open-Vocabulary 3D Instance Segmentation

Mohamed el amine Boudjoghra, Angela Dai, Jean Lahoud et al.

ICLR 2025arXiv:2406.02548
21
citations
#2167

Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficient

Zigeng Chen, Xinyin Ma, Gongfan Fang et al.

CVPR 2025arXiv:2411.17787
21
citations
#2168

DeepRTL: Bridging Verilog Understanding and Generation with a Unified Representation Model

Yi Liu, Changran Xu, Yunhao Zhou et al.

ICLR 2025arXiv:2502.15832
21
citations
#2169

TimeBridge: Non-Stationarity Matters for Long-term Time Series Forecasting

Peiyuan Liu, Beiliang Wu, Yifan Hu et al.

ICML 2025arXiv:2410.04442
21
citations
#2170

Standard Gaussian Process is All You Need for High-Dimensional Bayesian Optimization

Zhitong Xu, Haitao Wang, Jeff Phillips et al.

ICLR 2025arXiv:2402.02746
21
citations
#2171

Compress then Serve: Serving Thousands of LoRA Adapters with Little Overhead

Rickard Gabrielsson, Jiacheng Zhu, Onkar Bhardwaj et al.

ICML 2025arXiv:2407.00066
21
citations
#2172

SplatFlow: Multi-View Rectified Flow Model for 3D Gaussian Splatting Synthesis

Hyojun Go, byeongjun park, Jiho Jang et al.

CVPR 2025arXiv:2411.16443
21
citations
#2173

ReinFlow: Fine-tuning Flow Matching Policy with Online Reinforcement Learning

Tonghe Zhang, Chao Yu, Sichang Su et al.

NEURIPS 2025arXiv:2505.22094
21
citations
#2174

Does SGD really happen in tiny subspaces?

Minhak Song, Kwangjun Ahn, Chulhee Yun

ICLR 2025arXiv:2405.16002
21
citations
#2175

Synergistic Multi-Agent Framework with Trajectory Learning for Knowledge-Intensive Tasks

Shengbin Yue, Siyuan Wang, Wei Chen et al.

AAAI 2025paperarXiv:2407.09893
21
citations
#2176

Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation

Xinyu Yang, Yuwei An, Hongyi Liu et al.

NEURIPS 2025spotlightarXiv:2506.09991
21
citations
#2177

SegEarth-OV: Towards Training-Free Open-Vocabulary Segmentation for Remote Sensing Images

Kaiyu Li, Ruixun Liu, Xiangyong Cao et al.

CVPR 2025arXiv:2410.01768
21
citations
#2178

REFINE: Inversion-Free Backdoor Defense via Model Reprogramming

Yukun Chen, Shuo Shao, Enhao Huang et al.

ICLR 2025arXiv:2502.18508
21
citations
#2179

VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning

Zhong-Yu Li, Ruoyi Du, Juncheng Yan et al.

ICCV 2025arXiv:2504.07960
21
citations
#2180

Learning Distributions of Complex Fluid Simulations with Diffusion Graph Networks

Mario Lino, Tobias Pfaff, Nils Thuerey

ICLR 2025arXiv:2504.02843
21
citations
#2181

Looking Backward: Streaming Video-to-Video Translation with Feature Banks

Feng Liang, Akio Kodaira, Chenfeng Xu et al.

ICLR 2025oralarXiv:2405.15757
21
citations
#2182

SSR: Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning

Yang Liu, Ming Ma, Xiaomin Yu et al.

NEURIPS 2025arXiv:2505.12448
21
citations
#2183

Is Sarcasm Detection a Step-by-Step Reasoning Process in Large Language Models?

Ben Yao, Yazhou Zhang, Qiuchi Li et al.

AAAI 2025paperarXiv:2407.12725
21
citations
#2184

ResMaster: Mastering High-Resolution Image Generation via Structural and Fine-Grained Guidance

Shuwei Shi, Wenbo Li, Yuechen Zhang et al.

AAAI 2025paperarXiv:2406.16476
21
citations
#2185

Decision Theoretic Foundations for Conformal Prediction: Optimal Uncertainty Quantification for Risk-Averse Agents

Shayan Kiyani, George Pappas, Aaron Roth et al.

ICML 2025spotlightarXiv:2502.02561
21
citations
#2186

HexGen-2: Disaggregated Generative Inference of LLMs in Heterogeneous Environment

YOUHE JIANG, Ran Yan, Binhang Yuan

ICLR 2025arXiv:2502.07903
21
citations
#2187

Task-driven Image Fusion with Learnable Fusion Loss

Haowen Bai, Jiangshe Zhang, Zixiang Zhao et al.

CVPR 2025highlightarXiv:2412.03240
21
citations
#2188

Video Depth without Video Models

Bingxin Ke, Dominik Narnhofer, Shengyu Huang et al.

CVPR 2025arXiv:2411.19189
21
citations
#2189

Selective Attention Improves Transformer

Yaniv Leviathan, Matan Kalman, Yossi Matias

ICLR 2025arXiv:2410.02703
21
citations
#2190

Adaptive Rectangular Convolution for Remote Sensing Pansharpening

Xueyang Wang, Zhixin Zheng, Jiandong Shao et al.

CVPR 2025arXiv:2503.00467
21
citations
#2191

Self-Correcting Decoding with Generative Feedback for Mitigating Hallucinations in Large Vision-Language Models

Ce Zhang, Zifu Wan, Zhehan Kan et al.

ICLR 2025arXiv:2502.06130
21
citations
#2192

SafeWatch: An Efficient Safety-Policy Following Video Guardrail Model with Transparent Explanations

Zhaorun Chen, Francesco Pinto, Minzhou Pan et al.

ICLR 2025arXiv:2412.06878
21
citations
#2193

UltraVideo: High-Quality UHD Video Dataset with Comprehensive Captions

Xue zhucun, Jiangning Zhang, Teng Hu et al.

NEURIPS 2025arXiv:2506.13691
21
citations
#2194

Training on the Benchmark Is Not All You Need

Shiwen Ni, Xiangtao Kong, Chengming Li et al.

AAAI 2025paperarXiv:2409.01790
21
citations
#2195

Textured Gaussians for Enhanced 3D Scene Appearance Modeling

Brian Chao, Hung-Yu Tseng, Lorenzo Porzi et al.

CVPR 2025arXiv:2411.18625
21
citations
#2196

MELON: Provable Defense Against Indirect Prompt Injection Attacks in AI Agents

Kaijie Zhu, Xianjun Yang, Jindong Wang et al.

ICML 2025arXiv:2502.05174
21
citations
#2197

QMambaBSR: Burst Image Super-Resolution with Query State Space Model

Xin Di, Long Peng, Peizhe Xia et al.

CVPR 2025arXiv:2408.08665
21
citations
#2198

Argus: Vision-Centric Reasoning with Grounded Chain-of-Thought

Yunze Man, De-An Huang, Guilin Liu et al.

CVPR 2025arXiv:2505.23766
21
citations
#2199

SnapGen-V: Generating a Five-Second Video within Five Seconds on a Mobile Device

Yushu Wu, Zhixing Zhang, Yanyu Li et al.

CVPR 2025arXiv:2412.10494
21
citations
#2200

Structure Language Models for Protein Conformation Generation

Jiarui Lu, Xiaoyin Chen, Stephen Lu et al.

ICLR 2025arXiv:2410.18403
21
citations