Most Cited 2025 "transformers with bottlenecks" Papers

22,274 papers found • Page 11 of 112

Filters:Most Cited 2025 transformers with bottlenecks Clear all

Conference

AAAI 2025 (3,028)COLM 2025 (418)CVPR 2025 (2,873)ICCV 2025 (2,701)ICLR 2025 (3,827)ICML 2025 (3,340)ISMAR 2025 (229)NEURIPS 2025 (5,858)AAAI 2024 (2,289)CVPR 2024 (2,716)ECCV 2024 (2,387)ICLR 2024 (2,297)ICML 2024 (2,635)

Paper Type

poster (24,624)paper (8,558)oral (1,594)spotlight (1,421)highlight (975)

#2001

Talking Turns: Benchmarking Audio Foundation Models on Turn-Taking Dynamics

Siddhant Arora, Zhiyun Lu, Chung-Cheng Chiu et al.

ICLR 2025arXiv:2503.01174

citations

#2002

CPath-Omni: A Unified Multimodal Foundation Model for Patch and Whole Slide Image Analysis in Computational Pathology

Yuxuan Sun, Yixuan Si, Chenglu Zhu et al.

CVPR 2025arXiv:2412.12077

citations

#2003

VMem: Consistent Interactive Video Scene Generation with Surfel-Indexed View Memory

Runjia Li, Philip Torr, Andrea Vedaldi et al.

ICCV 2025highlightarXiv:2506.18903

citations

#2004

PromptHMR: Promptable Human Mesh Recovery

Yufu Wang, Yu Sun, Priyanka Patel et al.

CVPR 2025arXiv:2504.06397

citations

#2005

Population Transformer: Learning Population-level Representations of Neural Activity

Geeling Chau, Christopher Wang, Sabera Talukder et al.

ICLR 2025oralarXiv:2406.03044

citations

#2006

Enhanced Contrastive Learning with Multi-view Longitudinal Data for Chest X-ray Report Generation

Kang Liu, Zhuoqi Ma, Xiaolu Kang et al.

CVPR 2025arXiv:2502.20056

citations

#2007

ThoughtTerminator: Benchmarking, Calibrating, and Mitigating Overthinking in Reasoning Models

Xiao Pu, Michael Saxon, Wenyue Hua et al.

COLM 2025paperarXiv:2504.13367

citations

#2008

Towards Effective Evaluations and Comparisons for LLM Unlearning Methods

Qizhou Wang, Bo Han, Puning Yang et al.

ICLR 2025arXiv:2406.09179

citations

#2009

Posterior-Mean Rectified Flow: Towards Minimum MSE Photo-Realistic Image Restoration

Guy Ohayon, Tomer Michaeli, Michael Elad

ICLR 2025arXiv:2410.00418

citations

#2010

Unifying Causal Representation Learning with the Invariance Principle

Dingling Yao, Dario Rancati, Riccardo Cadei et al.

ICLR 2025arXiv:2409.02772

citations

#2011

HELMET: How to Evaluate Long-context Models Effectively and Thoroughly

Howard Yen, Tianyu Gao, Minmin Hou et al.

ICLR 2025

citations

#2012

AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories

Xing Han Lù, Amirhossein Kazemnejad, Nicholas Meade et al.

COLM 2025paperarXiv:2504.08942

citations

#2013

Language Models are Advanced Anonymizers

Robin Staab, Mark Vero, Mislav Balunovic et al.

ICLR 2025arXiv:2402.13846

citations

#2014

Numerical Pruning for Efficient Autoregressive Models

Xuan Shen, Zhao Song, Yufa Zhou et al.

AAAI 2025paperarXiv:2412.12441

citations

#2015

Towards Scalable Exact Machine Unlearning Using Parameter-Efficient Fine-Tuning

Somnath Basu Roy Chowdhury, Krzysztof Choromanski, Arijit Sehanobish et al.

ICLR 2025arXiv:2406.16257

citations

#2016

Go to Zero: Towards Zero-shot Motion Generation with Million-scale Data

Ke Fan, Shunlin Lu, Minyue Dai et al.

ICCV 2025highlightarXiv:2507.07095

citations

#2017

From Language Models over Tokens to Language Models over Characters

Tim Vieira, Benjamin LeBrun, Mario Giulianelli et al.

ICML 2025spotlightarXiv:2412.03719

citations

#2018

SAR3D: Autoregressive 3D Object Generation and Understanding via Multi-scale 3D VQVAE

YONGWEI CHEN, Yushi Lan, Shangchen Zhou et al.

CVPR 2025arXiv:2411.16856

citations

#2019

ConFIG: Towards Conflict-free Training of Physics Informed Neural Networks

Qiang Liu, Mengyu Chu, Nils Thuerey

ICLR 2025arXiv:2408.11104

citations

#2020

SpatialReasoner: Towards Explicit and Generalizable 3D Spatial Reasoning

Wufei Ma, Yu-Cheng Chou, Qihao Liu et al.

NEURIPS 2025arXiv:2504.20024

citations

#2021

Is Your Multimodal Language Model Oversensitive to Safe Queries?

Xirui Li, Hengguang Zhou, Ruochen Wang et al.

ICLR 2025arXiv:2406.17806

citations

#2022

AniGS: Animatable Gaussian Avatar from a Single Image with Inconsistent Gaussian Reconstruction

Lingteng Qiu, Shenhao Zhu, Qi Zuo et al.

CVPR 2025arXiv:2412.02684

citations

#2023

OSV: One Step is Enough for High-Quality Image to Video Generation

Xiaofeng Mao, Zhengkai Jiang, Fu-Yun Wang et al.

CVPR 2025arXiv:2409.11367

citations

#2024

FloVD: Optical Flow Meets Video Diffusion Model for Enhanced Camera-Controlled Video Synthesis

Wonjoon Jin, Qi Dai, Chong Luo et al.

CVPR 2025arXiv:2502.08244

citations

#2025

Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation

Yuhui Zhang, Yuchang Su, Yiming Liu et al.

CVPR 2025arXiv:2501.03225

citations

#2026

L4DR: LiDAR-4DRadar Fusion for Weather-Robust 3D Object Detection

Xun Huang, Ziyu Xu, Hai Wu et al.

AAAI 2025paperarXiv:2408.03677

citations

#2027

SynLogic: Synthesizing Verifiable Reasoning Data at Scale for Learning Logical Reasoning and Beyond

Junteng Liu, Yuanxiang Fan, Jiang Zhuo et al.

NEURIPS 2025arXiv:2505.19641

citations

#2028

Transformers are Universal In-context Learners

Takashi Furuya, Maarten V de Hoop, Gabriel Peyré

ICLR 2025arXiv:2408.01367

citations

#2029

2DMamba: Efficient State Space Model for Image Representation with Applications on Giga-Pixel Whole Slide Image Classification

Jingwei Zhang, Anh Tien Nguyen, Xi Han et al.

CVPR 2025arXiv:2412.00678

citations

#2030

TinyFusion: Diffusion Transformers Learned Shallow

Gongfan Fang, Kunjun Li, Xinyin Ma et al.

CVPR 2025highlightarXiv:2412.01199

citations

#2031

Effective Diffusion Transformer Architecture for Image Super-Resolution

Kun Cheng, Lei Yu, Zhijun Tu et al.

AAAI 2025paperarXiv:2409.19589

citations

#2032

UniVAD: A Training-free Unified Model for Few-shot Visual Anomaly Detection

Zhaopeng Gu, Bingke Zhu, Guibo Zhu et al.

CVPR 2025arXiv:2412.03342

citations

#2033

Position: Evaluating Generative AI Systems Is a Social Science Measurement Challenge

Hanna Wallach, Meera Desai, A. Feder Cooper et al.

ICML 2025arXiv:2502.00561

citations

#2034

Exploring Criteria of Loss Reweighting to Enhance LLM Unlearning

Puning Yang, Qizhou Wang, Zhuo Huang et al.

ICML 2025arXiv:2505.11953

citations

#2035

Adaptive Guidance: Training-free Acceleration of Conditional Diffusion Models

Angela Castillo, Jonas Kohler, Juan C. Pérez et al.

AAAI 2025paperarXiv:2312.12487

citations

#2036

Towards Fast, Specialized Machine Learning Force Fields: Distilling Foundation Models via Energy Hessians

Ishan Amin, Sanjeev Raja, Aditi Krishnapriyan

ICLR 2025arXiv:2501.09009

citations

#2037

ResAdapter: Domain Consistent Resolution Adapter for Diffusion Models

Jiaxiang Cheng, Pan Xie, Xin Xia et al.

AAAI 2025paperarXiv:2403.02084

citations

#2038

Residual Connections and Normalization Can Provably Prevent Oversmoothing in GNNs

Michael Scholkemper, Xinyi Wu, Ali Jadbabaie et al.

ICLR 2025arXiv:2406.02997

citations

#2039

Erasing Conceptual Knowledge from Language Models

Rohit Gandikota, Sheridan Feucht, Samuel Marks et al.

NEURIPS 2025arXiv:2410.02760

citations

#2040

Heavy-Tailed Diffusion Models

Kushagra Pandey, Jaideep Pathak, Yilun Xu et al.

ICLR 2025arXiv:2410.14171

citations

#2041

Q-Frame: Query-aware Frame Selection and Multi-Resolution Adaptation for Video-LLMs

Shaojie Zhang, Jiahui Yang, Jianqin Yin et al.

ICCV 2025arXiv:2506.22139

citations

#2042

Improving Data Efficiency via Curating LLM-Driven Rating Systems

Jinlong Pang, Jiaheng Wei, Ankit Parag Shah et al.

ICLR 2025arXiv:2410.10877

citations

#2043

DesignDiffusion: High-Quality Text-to-Design Image Generation with Diffusion Models

Zhendong Wang, Jianmin Bao, Shuyang Gu et al.

CVPR 2025arXiv:2503.01645

citations

#2044

Mastering Board Games by External and Internal Planning with Language Models

John Schultz, Jakub Adamek, Matej Jusup et al.

ICML 2025spotlightarXiv:2412.12119

citations

#2045

SCANS: Mitigating the Exaggerated Safety for LLMs via Safety-Conscious Activation Steering

Zouying Cao, Yifei Yang, Hai Zhao

AAAI 2025paperarXiv:2408.11491

citations

#2046

SPC: Evolving Self-Play Critic via Adversarial Games for LLM Reasoning

Jiaqi Chen, Bang Zhang, Ruotian Ma et al.

NEURIPS 2025arXiv:2504.19162

citations

#2047

Oscillatory State-Space Models

T. Konstantin Rusch, Daniela Rus

ICLR 2025arXiv:2410.03943

citations

#2048

Faster Cascades via Speculative Decoding

Harikrishna Narasimhan, Wittawat Jitkrittum, Ankit Singh Rawat et al.

ICLR 2025arXiv:2405.19261

citations

#2049

Amodal3R: Amodal 3D Reconstruction from Occluded 2D Images

Tianhao Wu, Chuanxia Zheng, Frank Guan et al.

ICCV 2025arXiv:2503.13439

citations

#2050

Steering Away from Harm: An Adaptive Approach to Defending Vision Language Model Against Jailbreaks

Han Wang, Gang Wang, Huan Zhang

CVPR 2025arXiv:2411.16721

citations

#2051

On Statistical Rates of Conditional Diffusion Transformers: Approximation, Estimation and Minimax Optimality

Jerry Yao-Chieh Hu, Weimin Wu, Yi-Chen Lee et al.

ICLR 2025arXiv:2411.17522

citations

#2052

A Simple Model of Inference Scaling Laws

Noam Levi

ICML 2025arXiv:2410.16377

citations

#2053

Enhancing Video-LLM Reasoning via Agent-of-Thoughts Distillation

Yudi Shi, Shangzhe Di, Qirui Chen et al.

CVPR 2025arXiv:2412.01694

citations

#2054

Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations

Jiaming Han, Hao Chen, Yang Zhao et al.

NEURIPS 2025arXiv:2506.18898

citations

#2055

Automated Proof Generation for Rust Code via Self-Evolution

Tianyu Chen, Shuai Lu, Shan Lu et al.

ICLR 2025arXiv:2410.15756

citations

#2056

Is In-Context Learning Sufficient for Instruction Following in LLMs?

Hao Zhao, Maksym Andriushchenko, francesco croce et al.

ICLR 2025arXiv:2405.19874

citations

#2057

Is Noise Conditioning Necessary for Denoising Generative Models?

Qiao Sun, Zhicheng Jiang, Hanhong Zhao et al.

ICML 2025arXiv:2502.13129

citations

#2058

Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers

Weiming Ren, Wentao Ma, Huan Yang et al.

ICCV 2025arXiv:2503.11579

citations

#2059

Small Batch Size Training for Language Models: When Vanilla SGD Works, and Why Gradient Accumulation is Wasteful

Martin Marek, Sanae Lotfi, Aditya Somasundaram et al.

NEURIPS 2025arXiv:2507.07101

citations

#2060

TIPS: Text-Image Pretraining with Spatial awareness

Kevis-Kokitsi Maninis, Kaifeng Chen, Soham Ghosh et al.

ICLR 2025arXiv:2410.16512

citations

#2061

Beyond Attention or Similarity: Maximizing Conditional Diversity for Token Pruning in MLLMs

Qizhe Zhang, Mengzhen Liu, Lichen Li et al.

NEURIPS 2025arXiv:2506.10967

citations

#2062

Bridging Past and Future: End-to-End Autonomous Driving with Historical Prediction and Planning

Bozhou Zhang, Nan Song, Xin Jin et al.

CVPR 2025arXiv:2503.14182

citations

#2063

Enhancing Decision-Making for LLM Agents via Step-Level Q-Value Models

Yuanzhao Zhai, Tingkai Yang, Kele Xu et al.

AAAI 2025paperarXiv:2409.09345

citations

#2064

Self-Improving Robust Preference Optimization

Eugene Choi, Arash Ahmadian, Matthieu Geist et al.

ICLR 2025arXiv:2406.01660

citations

#2065

Understanding and Mitigating Hallucination in Large Vision-Language Models via Modular Attribution and Intervention

Tianyun Yang, Ziniu Li, Juan Cao et al.

ICLR 2025

citations

#2066

Linear Representations of Political Perspective Emerge in Large Language Models

Junsol Kim, James Evans, Aaron Schein

ICLR 2025arXiv:2503.02080

citations

#2067

MV-VTON: Multi-View Virtual Try-On with Diffusion Models

Haoyu Wang, Zhilu Zhang, Donglin Di et al.

AAAI 2025paperarXiv:2404.17364

citations

#2068

SpatialLM: Training Large Language Models for Structured Indoor Modeling

Yongsen Mao, Junhao Zhong, Chuan Fang et al.

NEURIPS 2025arXiv:2506.07491

citations

#2069

Fourier Position Embedding: Enhancing Attention’s Periodic Extension for Length Generalization

Ermo Hua, Che Jiang, Xingtai Lv et al.

ICML 2025arXiv:2412.17739

citations

#2070

Fixing the Double Penalty in Data-Driven Weather Forecasting Through a Modified Spherical Harmonic Loss Function

Christopher Subich, Syed Husain, Leo Separovic et al.

ICML 2025arXiv:2501.19374

citations

#2071

Transformers Learn to Implement Multi-step Gradient Descent with Chain of Thought

Jianhao Huang, Zixuan Wang, Jason Lee

ICLR 2025arXiv:2502.21212

citations

#2072

MM-IFEngine: Towards Multimodal Instruction Following

Shengyuan Ding, Wu Shenxi, Xiangyu Zhao et al.

ICCV 2025arXiv:2504.07957

citations

#2073

Optimus-2: Multimodal Minecraft Agent with Goal-Observation-Action Conditioned Policy

Zaijing Li, Yuquan Xie, Rui Shao et al.

CVPR 2025arXiv:2502.19902

citations

#2074

Mitigating Object Hallucination in MLLMs via Data-augmented Phrase-level Alignment

Pritam Sarkar, Sayna Ebrahimi, Ali Etemad et al.

ICLR 2025arXiv:2405.18654

citations

#2075

MLLMs Need 3D-Aware Representation Supervision for Scene Understanding

Xiaohu Huang, Jingjing Wu, Qunyi Xie et al.

NEURIPS 2025arXiv:2506.01946

citations

#2076

FonTS: Text Rendering With Typography and Style Controls

Wenda SHI, Yiren Song, Dengming Zhang et al.

ICCV 2025arXiv:2412.00136

citations

#2077

Mini-Monkey: Alleviating the Semantic Sawtooth Effect for Lightweight MLLMs via Complementary Image Pyramid

Mingxin Huang, Yuliang Liu, Dingkang Liang et al.

ICLR 2025arXiv:2408.02034

citations

#2078

EmbedLLM: Learning Compact Representations of Large Language Models

Richard Zhuang, Tianhao Wu, Zhaojin Wen et al.

ICLR 2025arXiv:2410.02223

citations

#2079

Do LLMs ``know'' internally when they follow instructions?

Juyeon Heo, Christina Heinze-Deml, Oussama Elachqar et al.

ICLR 2025arXiv:2410.14516

citations

#2080

SE-Agent: Self-Evolution Trajectory Optimization in Multi-Step Reasoning with LLM-Based Agents

Yifu Guo, Jiaye Lin, Huacan Wang et al.

NEURIPS 2025arXiv:2508.02085

citations

#2081

RobustKV: Defending Large Language Models against Jailbreak Attacks via KV Eviction

Tanqiu Jiang, Zian Wang, Jiacheng Liang et al.

ICLR 2025arXiv:2410.19937

citations

#2082

OneForecast: A Universal Framework for Global and Regional Weather Forecasting

Yuan Gao, Hao Wu, Ruiqi Shu et al.

ICML 2025arXiv:2502.00338

citations

#2083

ImageGen-CoT: Enhancing Text-to-Image In-context Learning with Chain-of-Thought Reasoning

Jiaqi Liao, Zhengyuan Yang, Linjie Li et al.

ICCV 2025arXiv:2503.19312

citations

#2084

TimeXL: Explainable Multi-modal Time Series Prediction with LLM-in-the-Loop

Yushan Jiang, Wenchao Yu, Geon Lee et al.

NEURIPS 2025arXiv:2503.01013

citations

#2085

Scaling Unlocks Broader Generation and Deeper Functional Understanding of Proteins

Aadyot Bhatnagar, Sarthak Jain, Joel Beazer et al.

NEURIPS 2025spotlight

citations

#2086

GFM-RAG: Graph Foundation Model for Retrieval Augmented Generation

LINHAO LUO, Zicheng Zhao, Reza Haffari et al.

NEURIPS 2025arXiv:2502.01113

citations

#2087

Controlling Language and Diffusion Models by Transporting Activations

Pau Rodriguez, Arno Blaas, Michal Klein et al.

ICLR 2025arXiv:2410.23054

citations

#2088

Distraction is All You Need for Multimodal Large Language Model Jailbreaking

Zuopeng Yang, Jiluan Fan, Anli Yan et al.

CVPR 2025highlightarXiv:2502.10794

citations

#2089

Audio Entailment: Assessing Deductive Reasoning for Audio Understanding

Soham Deshmukh, Shuo Han, Hazim Bukhari et al.

AAAI 2025paperarXiv:2407.18062

citations

#2090

Self-Improving Transformers Overcome Easy-to-Hard and Length Generalization Challenges

Nayoung Lee, Jack Cai, Avi Schwarzschild et al.

ICML 2025arXiv:2502.01612

citations

#2091

F-LMM: Grounding Frozen Large Multimodal Models

Size Wu, Sheng Jin, Wenwei Zhang et al.

CVPR 2025arXiv:2406.05821

citations

#2092

Audio Large Language Models Can Be Descriptive Speech Quality Evaluators

CHEN CHEN, Yuchen Hu, Siyin Wang et al.

ICLR 2025arXiv:2501.17202

citations

#2093

GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmentation

Lang Lin, Xueyang Yu, Ziqi Pang et al.

CVPR 2025arXiv:2504.07962

citations

#2094

Unhackable Temporal Reward for Scalable Video MLLMs

En Yu, Kangheng Lin, Liang Zhao et al.

ICLR 2025oralarXiv:2502.12081

citations

#2095

BooW-VTON: Boosting In-the-Wild Virtual Try-On via Mask-Free Pseudo Data Training

Xuanpu Zhang, Dan Song, pengxin zhan et al.

CVPR 2025arXiv:2408.06047

citations

#2096

Meta CLIP 2: A Worldwide Scaling Recipe

Yung-Sung Chuang, Yang Li, Dong Wang et al.

NEURIPS 2025spotlightarXiv:2507.22062

citations

#2097

Impromptu VLA: Open Weights and Open Data for Driving Vision-Language-Action Models

Haohan Chi, Huan-ang Gao, Ziming Liu et al.

NEURIPS 2025arXiv:2505.23757

citations

#2098

VisOnlyQA: Large Vision Language Models Still Struggle with Visual Perception of Geometric Information

Ryo Kamoi, Yusen Zhang, Sarkar Snigdha Sarathi Das et al.

COLM 2025paperarXiv:2412.00947

citations

#2099

DSPO: Direct Score Preference Optimization for Diffusion Model Alignment

Huaisheng Zhu, Teng Xiao, Vasant Honavar

ICLR 2025

citations

#2100

PathGen-1.6M: 1.6 Million Pathology Image-text Pairs Generation through Multi-agent Collaboration

Yuxuan Sun, Yunlong Zhang, Yixuan Si et al.

ICLR 2025arXiv:2407.00203

citations

#2101

Inducing Programmatic Skills for Agentic Tasks

Zora Zhiruo Wang, Apurva Gandhi, Graham Neubig et al.

COLM 2025paperarXiv:2504.06821

citations

#2102

Establishing Task Scaling Laws via Compute-Efficient Model Ladders

Akshita Bhagia, Jiacheng Liu, Alexander Wettig et al.

COLM 2025paperarXiv:2412.04403

citations

#2103

Towards Robust and Parameter-Efficient Knowledge Unlearning for LLMs

Sungmin Cha, Sungjun Cho, Dasol Hwang et al.

ICLR 2025arXiv:2408.06621

citations

#2104

Game4Loc: A UAV Geo-Localization Benchmark from Game Data

Yuxiang Ji, Boyong He, Zhuoyue Tan et al.

AAAI 2025paperarXiv:2409.16925

citations

#2105

MotionFollower: Editing Video Motion via Score-Guided Diffusion

Shuyuan Tu, Qi Dai, Zihao Zhang et al.

ICCV 2025

citations

#2106

Diversity-Aware Policy Optimization for Large Language Model Reasoning

Jian Yao, Ran Cheng, Xingyu Wu et al.

NEURIPS 2025spotlightarXiv:2505.23433

citations

#2107

Multimodal Large Language Models for Inverse Molecular Design with Retrosynthetic Planning

Gang Liu, Michael Sun, Wojciech Matusik et al.

ICLR 2025arXiv:2410.04223

citations

#2108

$\text{D}_{2}\text{O}$: Dynamic Discriminative Operations for Efficient Long-Context Inference of Large Language Models

Zhongwei Wan, Xinjian Wu, Yu Zhang et al.

ICLR 2025

citations

#2109

B2Opt: Learning to Optimize Black-box Optimization with Little Budget

Xiaobin Li, Kai Wu, Xiaoyu Zhang et al.

AAAI 2025paperarXiv:2304.11787

citations

#2110

Truthful Aggregation of LLMs with an Application to Online Advertising

Ermis Soumalias, Michael Curry, Sven Seuken

NEURIPS 2025arXiv:2405.05905

citations

#2111

Scaling Optimal LR Across Token Horizons

Johan Bjorck, Alon Benhaim, Vishrav Chaudhary et al.

ICLR 2025arXiv:2409.19913

citations

#2112

SiriuS: Self-improving Multi-agent Systems via Bootstrapped Reasoning

Wanjia Zhao, Mert Yuksekgonul, Shirley Wu et al.

NEURIPS 2025arXiv:2502.04780

citations

#2113

Flow: Modularized Agentic Workflow Automation

Boye Niu, Yiliao Song, Kai Lian et al.

ICLR 2025arXiv:2501.07834

citations

#2114

From Lazy to Rich: Exact Learning Dynamics in Deep Linear Networks

Clementine Domine, Nicolas Anguita, Alexandra M Proca et al.

ICLR 2025

citations

#2115

Multi-Agent Systems Execute Arbitrary Malicious Code

Harold Triedman, Rishi Dev Jha, Vitaly Shmatikov

COLM 2025paperarXiv:2503.12188

citations

#2116

Reinforced Lifelong Editing for Language Models

Zherui Li, Houcheng Jiang, Hao Chen et al.

ICML 2025arXiv:2502.05759

citations

#2117

GI-GS: Global Illumination Decomposition on Gaussian Splatting for Inverse Rendering

Hongze CHEN, Zehong Lin, Jun Zhang

ICLR 2025arXiv:2410.02619

citations

#2118

Material Anything: Generating Materials for Any 3D Object via Diffusion

Xin Huang, Tengfei Wang, Ziwei Liu et al.

CVPR 2025highlightarXiv:2411.15138

citations

#2119

STRAP: Robot Sub-Trajectory Retrieval for Augmented Policy Learning

Marius Memmel, Jacob Berg, Bingqing Chen et al.

ICLR 2025arXiv:2412.15182

citations

#2120

YouTube-SL-25: A Large-Scale, Open-Domain Multilingual Sign Language Parallel Corpus

Garrett Tanzer, Biao Zhang

ICLR 2025arXiv:2407.11144

citations

#2121

Position: AI Evaluation Should Learn from How We Test Humans

Yan Zhuang, Qi Liu, Zachary Pardos et al.

ICML 2025arXiv:2306.10512

citations

#2122

Mani-GS: Gaussian Splatting Manipulation with Triangular Mesh

Xiangjun Gao, Xiaoyu Li, Yiyu Zhuang et al.

CVPR 2025arXiv:2405.17811

citations

#2123

Avat3r: Large Animatable Gaussian Reconstruction Model for High-fidelity 3D Head Avatars

Tobias Kirschstein, Javier Romero, Artem Sevastopolsky et al.

ICCV 2025arXiv:2502.20220

citations

#2124

Radiant Foam: Real-Time Differentiable Ray Tracing

Shrisudhan Govindarajan, Daniel Rebain, Kwang Moo Yi et al.

ICCV 2025highlightarXiv:2502.01157

citations

#2125

Reflect-DiT: Inference-Time Scaling for Text-to-Image Diffusion Transformers via In-Context Reflection

Shufan Li, Konstantinos Kallidromitis, Akash Gokul et al.

ICCV 2025arXiv:2503.12271

citations

#2126

Towards a General Time Series Forecasting Model with Unified Representation and Adaptive Transfer

Yihang Wang, Yuying Qiu, Peng Chen et al.

ICML 2025arXiv:2405.17478

citations

#2127

Harnessing Webpage UIs for Text-Rich Visual Understanding

Junpeng Liu, Tianyue Ou, Yifan Song et al.

ICLR 2025arXiv:2410.13824

citations

#2128

Monitoring Latent World States in Language Models with Propositional Probes

Jiahai Feng, Stuart Russell, Jacob Steinhardt

ICLR 2025arXiv:2406.19501

citations

#2129

TimeKAN: KAN-based Frequency Decomposition Learning Architecture for Long-term Time Series Forecasting

Songtao Huang, Zhen Zhao, Can Li et al.

ICLR 2025oralarXiv:2502.06910

citations

#2130

UniGEM: A Unified Approach to Generation and Property Prediction for Molecules

Shikun Feng, Yuyan Ni, Lu yan et al.

ICLR 2025arXiv:2410.10516

citations

#2131

DenseMatcher: Learning 3D Semantic Correspondence for Category-Level Manipulation from a Single Demo

Junzhe Zhu, Yuanchen Ju, Junyi Zhang et al.

ICLR 2025arXiv:2412.05268

citations

#2132

Adv-CPG: A Customized Portrait Generation Framework with Facial Adversarial Attacks

Junying Wang, Hongyuan Zhang, Yuan Yuan

CVPR 2025arXiv:2503.08269

citations

#2133

IRASim: A Fine-Grained World Model for Robot Manipulation

Fangqi Zhu, Hongtao Wu, Song Guo et al.

ICCV 2025arXiv:2406.14540

citations

#2134

One Model for ALL: Low-Level Task Interaction Is a Key to Task-Agnostic Image Fusion

Chunyang Cheng, Tianyang Xu, Zhenhua Feng et al.

CVPR 2025arXiv:2502.19854

citations

#2135

DiST-4D: Disentangled Spatiotemporal Diffusion with Metric Depth for 4D Driving Scene Generation

Jiazhe Guo, Yikang Ding, Xiwu Chen et al.

ICCV 2025arXiv:2503.15208

citations

#2136

Privacy Auditing of Large Language Models

Ashwinee Panda, Xinyu Tang, Christopher Choquette-Choo et al.

ICLR 2025arXiv:2503.06808

citations

#2137

Matrix3D: Large Photogrammetry Model All-in-One

Yuanxun Lu, Jingyang Zhang, Tian Fang et al.

CVPR 2025highlightarXiv:2502.07685

citations

#2138

Scaling Law with Learning Rate Annealing

Howe Tissue, Venus Wang, Lu Wang

NEURIPS 2025arXiv:2408.11029

citations

#2139

TASTE-Rob: Advancing Video Generation of Task-Oriented Hand-Object Interaction for Generalizable Robotic Manipulation

Hongxiang Zhao, Xingchen Liu, Mutian Xu et al.

CVPR 2025arXiv:2503.11423

citations

#2140

Moto: Latent Motion Token as the Bridging Language for Learning Robot Manipulation from Videos

Yi Chen, Yuying Ge, Weiliang Tang et al.

ICCV 2025arXiv:2412.04445

citations

#2141

CogNav: Cognitive Process Modeling for Object Goal Navigation with LLMs

Yihan Cao, Jiazhao Zhang, Zhinan Yu et al.

ICCV 2025arXiv:2412.10439

citations

#2142

3D-Properties: Identifying Challenges in DPO and Charting a Path Forward

Yuzi Yan, Yibo Miao, Jialian Li et al.

ICLR 2025arXiv:2406.07327

citations

#2143

Inference-Time Text-to-Video Alignment with Diffusion Latent Beam Search

Yuta Oshima, Masahiro Suzuki, Yutaka Matsuo et al.

NEURIPS 2025arXiv:2501.19252

citations

#2144

FoldToken: Learning Protein Language via Vector Quantization and Beyond

Zhangyang Gao, Cheng Tan, Jue Wang et al.

AAAI 2025paperarXiv:2403.09673

citations

#2145

Mind the Time: Temporally-Controlled Multi-Event Video Generation

Ziyi Wu, Aliaksandr Siarohin, Willi Menapace et al.

CVPR 2025arXiv:2412.05263

citations

#2146

Community Forensics: Using Thousands of Generators to Train Fake Image Detectors

Jeongsoo Park, Andrew Owens

CVPR 2025arXiv:2411.04125

citations

#2147

LongMamba: Enhancing Mamba's Long-Context Capabilities via Training-Free Receptive Field Enlargement

Zhifan Ye, Kejing Xia, Yonggan Fu et al.

ICLR 2025arXiv:2504.16053

citations

#2148

Detecting Out-of-Distribution Through the Lens of Neural Collapse

Litian Liu, Yao Qin

CVPR 2025arXiv:2311.01479

citations

#2149

Compute or Load KV Cache? Why Not Both?

Shuowei Jin, Xueshen Liu, Qingzhao Zhang et al.

ICML 2025arXiv:2410.03065

citations

#2150

Demystifying Cost-Efficiency in LLM Serving over Heterogeneous GPUs

Youhe Jiang, Fangcheng Fu, Xiaozhe Yao et al.

ICML 2025arXiv:2502.00722

citations

#2151

Puppet-Master: Scaling Interactive Video Generation as a Motion Prior for Part-Level Dynamics

Ruining Li, Chuanxia Zheng, Christian Rupprecht et al.

ICCV 2025arXiv:2408.04631

citations

#2152

Any-Resolution AI-Generated Image Detection by Spectral Learning

Dimitrios Karageorgiou, Symeon Papadopoulos, Ioannis Kompatsiaris et al.

CVPR 2025arXiv:2411.19417

citations

#2153

LVAgent: Long Video Understanding by Multi-Round Dynamical Collaboration of MLLM Agents

Boyu Chen, Zhengrong Yue, Siran Chen et al.

ICCV 2025arXiv:2503.10200

citations

#2154

Generative Video Propagation

Shaoteng Liu, Tianyu Wang, Jui-Hsien Wang et al.

CVPR 2025arXiv:2412.19761

citations

#2155

MagicMirror: ID-Preserved Video Generation in Video Diffusion Transformers

Yuechen Zhang, YaoYang Liu, Bin Xia et al.

ICCV 2025arXiv:2501.03931

citations

#2156

Nonparametric Modern Hopfield Models

Jerry Yao-Chieh Hu, Bo-Yu Chen, Dennis Wu et al.

ICML 2025arXiv:2404.03900

citations

#2157

ToolDial: Multi-turn Dialogue Generation Method for Tool-Augmented Language Models

Jeonghoon Shim, Gyuhyeon Seo, Cheongsu Lim et al.

ICLR 2025arXiv:2503.00564

citations

#2158

Recite, Reconstruct, Recollect: Memorization in LMs as a Multifaceted Phenomenon

USVSN Sai Prashanth, Alvin Deng, Kyle O'Brien et al.

ICLR 2025arXiv:2406.17746

citations

#2159

MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models

Ziyu Liu, Yuhang Zang, Xiaoyi Dong et al.

ICLR 2025arXiv:2410.17637

citations

#2160

Beyond correlation: The impact of human uncertainty in measuring the effectiveness of automatic evaluation and LLM-as-a-judge

Aparna Elangovan, Lei Xu, Jongwoo Ko et al.

ICLR 2025arXiv:2410.03775

citations

#2161

Understanding Optimization in Deep Learning with Central Flows

Jeremy Cohen, Alex Damian, Ameet Talwalkar et al.

ICLR 2025arXiv:2410.24206

citations

#2162

Unveil Inversion and Invariance in Flow Transformer for Versatile Image Editing

Pengcheng Xu, Boyuan Jiang, Xiaobin Hu et al.

CVPR 2025arXiv:2411.15843

citations

#2163

Robin3D: Improving 3D Large Language Model via Robust Instruction Tuning

Weitai Kang, Haifeng Huang, Yuzhang Shang et al.

ICCV 2025arXiv:2410.00255

citations

#2164

Self-Challenging Language Model Agents

Yifei Zhou, Sergey Levine, Jason Weston et al.

NEURIPS 2025arXiv:2506.01716

citations

#2165

NNsight and NDIF: Democratizing Access to Open-Weight Foundation Model Internals

Jaden Fiotto-Kaufman, Alexander Loftus, Eric Todd et al.

ICLR 2025arXiv:2407.14561

citations

#2166

Open-YOLO 3D: Towards Fast and Accurate Open-Vocabulary 3D Instance Segmentation

Mohamed el amine Boudjoghra, Angela Dai, Jean Lahoud et al.

ICLR 2025arXiv:2406.02548

citations

#2167

Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficient

Zigeng Chen, Xinyin Ma, Gongfan Fang et al.

CVPR 2025arXiv:2411.17787

citations

#2168

DeepRTL: Bridging Verilog Understanding and Generation with a Unified Representation Model

Yi Liu, Changran Xu, Yunhao Zhou et al.

ICLR 2025arXiv:2502.15832

citations

#2169

TimeBridge: Non-Stationarity Matters for Long-term Time Series Forecasting

Peiyuan Liu, Beiliang Wu, Yifan Hu et al.

ICML 2025arXiv:2410.04442

citations

#2170

Standard Gaussian Process is All You Need for High-Dimensional Bayesian Optimization

Zhitong Xu, Haitao Wang, Jeff Phillips et al.

ICLR 2025arXiv:2402.02746

citations

#2171

Compress then Serve: Serving Thousands of LoRA Adapters with Little Overhead

Rickard Gabrielsson, Jiacheng Zhu, Onkar Bhardwaj et al.

ICML 2025arXiv:2407.00066

citations

#2172

SplatFlow: Multi-View Rectified Flow Model for 3D Gaussian Splatting Synthesis

Hyojun Go, byeongjun park, Jiho Jang et al.

CVPR 2025arXiv:2411.16443

citations

#2173

ReinFlow: Fine-tuning Flow Matching Policy with Online Reinforcement Learning

Tonghe Zhang, Chao Yu, Sichang Su et al.

NEURIPS 2025arXiv:2505.22094

citations

#2174

Does SGD really happen in tiny subspaces?

Minhak Song, Kwangjun Ahn, Chulhee Yun

ICLR 2025arXiv:2405.16002

citations

#2175

Synergistic Multi-Agent Framework with Trajectory Learning for Knowledge-Intensive Tasks

Shengbin Yue, Siyuan Wang, Wei Chen et al.

AAAI 2025paperarXiv:2407.09893

citations

#2176

Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation

Xinyu Yang, Yuwei An, Hongyi Liu et al.

NEURIPS 2025spotlightarXiv:2506.09991

citations

#2177

SegEarth-OV: Towards Training-Free Open-Vocabulary Segmentation for Remote Sensing Images

Kaiyu Li, Ruixun Liu, Xiangyong Cao et al.

CVPR 2025arXiv:2410.01768

citations

#2178

REFINE: Inversion-Free Backdoor Defense via Model Reprogramming

Yukun Chen, Shuo Shao, Enhao Huang et al.

ICLR 2025arXiv:2502.18508

citations

#2179

VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning

Zhong-Yu Li, Ruoyi Du, Juncheng Yan et al.

ICCV 2025arXiv:2504.07960

citations

#2180

Learning Distributions of Complex Fluid Simulations with Diffusion Graph Networks

Mario Lino, Tobias Pfaff, Nils Thuerey

ICLR 2025arXiv:2504.02843

citations

#2181

Looking Backward: Streaming Video-to-Video Translation with Feature Banks

Feng Liang, Akio Kodaira, Chenfeng Xu et al.

ICLR 2025oralarXiv:2405.15757

citations

#2182

SSR: Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning

Yang Liu, Ming Ma, Xiaomin Yu et al.

NEURIPS 2025arXiv:2505.12448

citations

#2183

Is Sarcasm Detection a Step-by-Step Reasoning Process in Large Language Models?

Ben Yao, Yazhou Zhang, Qiuchi Li et al.

AAAI 2025paperarXiv:2407.12725

citations

#2184

ResMaster: Mastering High-Resolution Image Generation via Structural and Fine-Grained Guidance

Shuwei Shi, Wenbo Li, Yuechen Zhang et al.

AAAI 2025paperarXiv:2406.16476

citations

#2185

Decision Theoretic Foundations for Conformal Prediction: Optimal Uncertainty Quantification for Risk-Averse Agents

Shayan Kiyani, George Pappas, Aaron Roth et al.

ICML 2025spotlightarXiv:2502.02561

citations

#2186

HexGen-2: Disaggregated Generative Inference of LLMs in Heterogeneous Environment

YOUHE JIANG, Ran Yan, Binhang Yuan

ICLR 2025arXiv:2502.07903

citations

#2187

Task-driven Image Fusion with Learnable Fusion Loss

Haowen Bai, Jiangshe Zhang, Zixiang Zhao et al.

CVPR 2025highlightarXiv:2412.03240

citations

#2188

Video Depth without Video Models

Bingxin Ke, Dominik Narnhofer, Shengyu Huang et al.

CVPR 2025arXiv:2411.19189

citations

#2189

Selective Attention Improves Transformer

Yaniv Leviathan, Matan Kalman, Yossi Matias

ICLR 2025arXiv:2410.02703

citations

#2190

Adaptive Rectangular Convolution for Remote Sensing Pansharpening

Xueyang Wang, Zhixin Zheng, Jiandong Shao et al.

CVPR 2025arXiv:2503.00467

citations

#2191

Self-Correcting Decoding with Generative Feedback for Mitigating Hallucinations in Large Vision-Language Models

Ce Zhang, Zifu Wan, Zhehan Kan et al.

ICLR 2025arXiv:2502.06130

citations

#2192

SafeWatch: An Efficient Safety-Policy Following Video Guardrail Model with Transparent Explanations

Zhaorun Chen, Francesco Pinto, Minzhou Pan et al.

ICLR 2025arXiv:2412.06878

citations

#2193

UltraVideo: High-Quality UHD Video Dataset with Comprehensive Captions

Xue zhucun, Jiangning Zhang, Teng Hu et al.

NEURIPS 2025arXiv:2506.13691

citations

#2194

Training on the Benchmark Is Not All You Need

Shiwen Ni, Xiangtao Kong, Chengming Li et al.

AAAI 2025paperarXiv:2409.01790

citations

#2195

Textured Gaussians for Enhanced 3D Scene Appearance Modeling

Brian Chao, Hung-Yu Tseng, Lorenzo Porzi et al.

CVPR 2025arXiv:2411.18625

citations

#2196

MELON: Provable Defense Against Indirect Prompt Injection Attacks in AI Agents

Kaijie Zhu, Xianjun Yang, Jindong Wang et al.

ICML 2025arXiv:2502.05174

citations

#2197

QMambaBSR: Burst Image Super-Resolution with Query State Space Model

Xin Di, Long Peng, Peizhe Xia et al.

CVPR 2025arXiv:2408.08665

citations

#2198

Argus: Vision-Centric Reasoning with Grounded Chain-of-Thought

Yunze Man, De-An Huang, Guilin Liu et al.

CVPR 2025arXiv:2505.23766

citations

#2199

SnapGen-V: Generating a Five-Second Video within Five Seconds on a Mobile Device

Yushu Wu, Zhixing Zhang, Yanyu Li et al.

CVPR 2025arXiv:2412.10494

citations

#2200

Structure Language Models for Protein Conformation Generation

Jiarui Lu, Xiaoyin Chen, Stephen Lu et al.

ICLR 2025arXiv:2410.18403

citations

← Previous

1...9 10 11 12 13...112