Most Cited 2025 "body pose control" Papers

22,274 papers found • Page 7 of 112

#1201

Add-it: Training-Free Object Insertion in Images With Pretrained Diffusion Models

Yoad Tewel, Rinon Gal, Dvir Samuel et al.

ICLR 2025arXiv:2411.07232
35
citations
#1202

Adaptive Multi-Scale Decomposition Framework for Time Series Forecasting

Yifan Hu, Peiyuan Liu, Peng Zhu et al.

AAAI 2025paperarXiv:2406.03751
35
citations
#1203

LLM-Powered User Simulator for Recommender System

Zijian Zhang, Shuchang Liu, Ziru Liu et al.

AAAI 2025paperarXiv:2412.16984
35
citations
#1204

InfiniCube: Unbounded and Controllable Dynamic 3D Driving Scene Generation with World-Guided Video Models

Yifan Lu, Xuanchi Ren, Jiawei Yang et al.

ICCV 2025arXiv:2412.03934
35
citations
#1205

A Closer Look at Machine Unlearning for Large Language Models

Xiaojian Yuan, Tianyu Pang, Chao Du et al.

ICLR 2025arXiv:2410.08109
35
citations
#1206

DeepMesh: Auto-Regressive Artist-mesh Creation with Reinforcement Learning

Ruowen Zhao, James Jun Liang Chen Ye, Zhengyi Wang et al.

ICCV 2025arXiv:2503.15265
35
citations
#1207

A New Perspective on Shampoo's Preconditioner

Depen Morwani, Itai Shapira, Nikhil Vyas et al.

ICLR 2025arXiv:2406.17748
35
citations
#1208

Sparc3D: Sparse Representation and Construction for High-Resolution 3D Shapes Modeling

Zhihao Li, Yufei Wang, Heliang Zheng et al.

NEURIPS 2025arXiv:2505.14521
35
citations
#1209

AnySat: One Earth Observation Model for Many Resolutions, Scales, and Modalities

Guillaume Astruc, Nicolas Gonthier, Clement Mallet et al.

CVPR 2025highlightarXiv:2412.14123
35
citations
#1210

Pow3R: Empowering Unconstrained 3D Reconstruction with Camera and Scene Priors

Wonbong Jang, Philippe Weinzaepfel, Vincent Leroy et al.

CVPR 2025arXiv:2503.17316
35
citations
#1211

SAM2Act: Integrating Visual Foundation Model with A Memory Architecture for Robotic Manipulation

Haoquan Fang, Markus Grotz, Wilbert Pumacay et al.

ICML 2025arXiv:2501.18564
35
citations
#1212

WISA: World simulator assistant for physics-aware text-to-video generation

Jing Wang, Ao Ma, Ke Cao et al.

NEURIPS 2025spotlightarXiv:2503.08153
35
citations
#1213

SCALM: Detecting Bad Practices in Smart Contracts Through LLMs

Zongwei Li, Xiaoqi Li, Wenkai Li et al.

AAAI 2025paperarXiv:2502.04347
35
citations
#1214

Harnessing the Universal Geometry of Embeddings

Rishi Jha, Collin Zhang, Vitaly Shmatikov et al.

NEURIPS 2025arXiv:2505.12540
35
citations
#1215

MNE-SLAM: Multi-Agent Neural SLAM for Mobile Robots

Tianchen Deng, Guole Shen, Chen Xun et al.

CVPR 2025
35
citations
#1216

Code-as-Monitor: Constraint-aware Visual Programming for Reactive and Proactive Robotic Failure Detection

Enshen Zhou, Qi Su, Cheng Chi et al.

CVPR 2025arXiv:2412.04455
35
citations
#1217

Scaling Laws of Synthetic Data for Language Model

Zeyu Qin, Qingxiu Dong, Xingxing Zhang et al.

COLM 2025paperarXiv:2503.19551
35
citations
#1218

Reconstructive Visual Instruction Tuning

Haochen Wang, Anlin Zheng, Yucheng Zhao et al.

ICLR 2025arXiv:2410.09575
35
citations
#1219

VisualQuality-R1: Reasoning-Induced Image Quality Assessment via Reinforcement Learning to Rank

Tianhe Wu, Jian Zou, Jie Liang et al.

NEURIPS 2025spotlightarXiv:2505.14460
35
citations
#1220

Similarity-Guided Layer-Adaptive Vision Transformer for UAV Tracking

chaocan xue, Bineng Zhong, Qihua Liang et al.

CVPR 2025arXiv:2503.06625
35
citations
#1221

Modular Duality in Deep Learning

Jeremy Bernstein, Laker Newhouse

ICML 2025arXiv:2410.21265
35
citations
#1222

Improving Retrieval Augmented Language Model with Self-Reasoning

Yuan Xia, Jingbo Zhou, Zhenhui Shi et al.

AAAI 2025paperarXiv:2407.19813
35
citations
#1223

SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding

Hao Li, Changyao TIAN, Jie Shao et al.

CVPR 2025arXiv:2412.09604
35
citations
#1224

Can LLMs Understand Time Series Anomalies?

Zihao Zhou, Rose Yu

ICLR 2025arXiv:2410.05440
35
citations
#1225

One-Prompt-One-Story: Free-Lunch Consistent Text-to-Image Generation Using a Single Prompt

Tao Liu, Kai Wang, Senmao Li et al.

ICLR 2025arXiv:2501.13554
35
citations
#1226

Adversarial Perturbations Cannot Reliably Protect Artists From Generative AI

Robert Hönig, Javier Rando, Nicholas Carlini et al.

ICLR 2025arXiv:2406.12027
35
citations
#1227

Preference Optimization for Reasoning with Pseudo Feedback

Fangkai Jiao, Geyang Guo, Xingxing Zhang et al.

ICLR 2025arXiv:2411.16345
35
citations
#1228

HiSplat: Hierarchical 3D Gaussian Splatting for Generalizable Sparse-View Reconstruction

Shengji Tang, Weicai Ye, Peng Ye et al.

ICLR 2025arXiv:2410.06245
35
citations
#1229

Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive?

Rylan Schaeffer, Hailey Schoelkopf, Brando Miranda et al.

ICML 2025arXiv:2406.04391
35
citations
#1230

DrivingForward: Feed-forward 3D Gaussian Splatting for Driving Scene Reconstruction from Flexible Surround-view Input

Qijian Tian, Xin Tan, Yuan Xie et al.

AAAI 2025paperarXiv:2409.12753
35
citations
#1231

EmbodiedSAM: Online Segment Any 3D Thing in Real Time

Xiuwei Xu, Huangxing Chen, Linqing Zhao et al.

ICLR 2025arXiv:2408.11811
35
citations
#1232

DeFoG: Discrete Flow Matching for Graph Generation

Yiming Qin, Manuel Madeira, Dorina Thanou et al.

ICML 2025oralarXiv:2410.04263
35
citations
#1233

PepTune: De Novo Generation of Therapeutic Peptides with Multi-Objective-Guided Discrete Diffusion

Sophia Tang, Yinuo Zhang, Pranam Chatterjee, PhD

ICML 2025arXiv:2412.17780
35
citations
#1234

nGPT: Normalized Transformer with Representation Learning on the Hypersphere

Ilya Loshchilov, Cheng-Ping Hsieh, Simeng Sun et al.

ICLR 2025arXiv:2410.01131
35
citations
#1235

Training-free and Adaptive Sparse Attention for Efficient Long Video Generation

yifei xia, Suhan Ling, Fangcheng Fu et al.

ICCV 2025arXiv:2502.21079
35
citations
#1236

Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning Models

Ruikang Liu, Yuxuan Sun, Manyi Zhang et al.

COLM 2025paperarXiv:2504.04823
35
citations
#1237

RepoAudit: An Autonomous LLM-Agent for Repository-Level Code Auditing

Jinyao Guo, Chengpeng Wang, Xiangzhe Xu et al.

ICML 2025arXiv:2501.18160
35
citations
#1238

LongGenBench: Benchmarking Long-Form Generation in Long Context LLMs

Yuhao Wu, Ming Shan Hee, Zhiqiang Hu et al.

ICLR 2025arXiv:2409.02076
35
citations
#1239

Test-Time Preference Optimization: On-the-Fly Alignment via Iterative Textual Feedback

Yafu Li, Xuyang Hu, Xiaoye Qu et al.

ICML 2025arXiv:2501.12895
35
citations
#1240

Multi-Agent Collaboration via Evolving Orchestration

Yufan Dang, Chen Qian, Xueheng Luo et al.

NEURIPS 2025arXiv:2505.19591
35
citations
#1241

TrajectoryCrafter: Redirecting Camera Trajectory for Monocular Videos via Diffusion Models

Mark YU, Wenbo Hu, Jinbo Xing et al.

ICCV 2025arXiv:2503.05638
35
citations
#1242

TabDiff: a Mixed-type Diffusion Model for Tabular Data Generation

Juntong Shi, Minkai Xu, Harper Hua et al.

ICLR 2025arXiv:2410.20626
35
citations
#1243

Detecting Strategic Deception with Linear Probes

Nicholas Goldowsky-Dill, Bilal Chughtai, Stefan Heimersheim et al.

ICML 2025arXiv:2502.03407
35
citations
#1244

3D-Mem: 3D Scene Memory for Embodied Exploration and Reasoning

Yuncong Yang, Han Yang, Jiachen Zhou et al.

CVPR 2025arXiv:2411.17735
35
citations
#1245

Merging LoRAs like Playing LEGO: Pushing the Modularity of LoRA to Extremes Through Rank-Wise Clustering

Ziyu Zhao, tao shen, Didi Zhu et al.

ICLR 2025arXiv:2409.16167
35
citations
#1246

ToolGen: Unified Tool Retrieval and Calling via Generation

Renxi Wang, Xudong Han, Lei Ji et al.

ICLR 2025arXiv:2410.03439
35
citations
#1247

Sufficient Context: A New Lens on Retrieval Augmented Generation Systems

Hailey Joren, Jianyi Zhang, Chun-Sung Ferng et al.

ICLR 2025arXiv:2411.06037
35
citations
#1248

No Task Left Behind: Isotropic Model Merging with Common and Task-Specific Subspaces

Daniel Marczak, Simone Magistri, Sebastian Cygert et al.

ICML 2025arXiv:2502.04959
34
citations
#1249

StarVector: Generating Scalable Vector Graphics Code from Images and Text

Juan Rodriguez, Abhay Puri, Shubham Agarwal et al.

CVPR 2025arXiv:2312.11556
34
citations
#1250

SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion

Ahmed Nassar, Matteo Omenetti, Maksym Lysak et al.

ICCV 2025arXiv:2503.11576
34
citations
#1251

DEEM: Diffusion models serve as the eyes of large language models for image perception

Run Luo, Yunshui Li, Longze Chen et al.

ICLR 2025arXiv:2405.15232
34
citations
#1252

Spot the Fake: Large Multimodal Model-Based Synthetic Image Detection with Artifact Explanation

Siwei Wen, junyan ye, Peilin Feng et al.

NEURIPS 2025arXiv:2503.14905
34
citations
#1253

Text4Seg: Reimagining Image Segmentation as Text Generation

Mengcheng Lan, Chaofeng Chen, Yue Zhou et al.

ICLR 2025arXiv:2410.09855
34
citations
#1254

VisualPredicator: Learning Abstract World Models with Neuro-Symbolic Predicates for Robot Planning

Yichao Liang, Nishanth Kumar, Hao Tang et al.

ICLR 2025arXiv:2410.23156
34
citations
#1255

Fira: Can We Achieve Full-rank Training of LLMs Under Low-rank Constraint?

Xi Chen, Kaituo Feng, Changsheng Li et al.

NEURIPS 2025arXiv:2410.01623
34
citations
#1256

Let Them Talk: Audio-Driven Multi-Person Conversational Video Generation

Zhe Kong, Feng Gao, Yong Zhang et al.

NEURIPS 2025arXiv:2505.22647
34
citations
#1257

GUI-G1: Understanding R1-Zero-Like Training for Visual Grounding in GUI Agents

Yuqi Zhou, Sunhao Dai, Shuai Wang et al.

NEURIPS 2025arXiv:2505.15810
34
citations
#1258

Massive Values in Self-Attention Modules are the Key to Contextual Knowledge Understanding

Mingyu Jin, Kai Mei, Wujiang Xu et al.

ICML 2025arXiv:2502.01563
34
citations
#1259

Efficiently Learning at Test-Time: Active Fine-Tuning of LLMs

Jonas Hübotter, Sascha Bongni, Ido Hakimi et al.

ICLR 2025arXiv:2410.08020
34
citations
#1260

WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model

Zongjian Li, Bin Lin, Yang Ye et al.

CVPR 2025arXiv:2411.17459
34
citations
#1261

UI-Vision: A Desktop-centric GUI Benchmark for Visual Perception and Interaction

Perampalli Shravan Nayak, Xiangru Jian, Kevin Qinghong Lin et al.

ICML 2025arXiv:2503.15661
34
citations
#1262

A₀ : An Affordance-Aware Hierarchical Model for General Robotic Manipulation

Rongtao Xu, Jian Zhang, Minghao Guo et al.

ICCV 2025arXiv:2504.12636
34
citations
#1263

ForecastBench: A Dynamic Benchmark of AI Forecasting Capabilities

Ezra Karger, Houtan Bastani, Chen Yueh-Han et al.

ICLR 2025arXiv:2409.19839
34
citations
#1264

Generative Gaussian Splatting for Unbounded 3D City Generation

Haozhe Xie, Zhaoxi Chen, Fangzhou Hong et al.

CVPR 2025arXiv:2406.06526
34
citations
#1265

One Diffusion to Generate Them All

Duong H. Le, Tuan Pham, Sangho Lee et al.

CVPR 2025arXiv:2411.16318
34
citations
#1266

EfficientVLA: Training-Free Acceleration and Compression for Vision-Language-Action Models

Yantai Yang, Yuhao Wang, Zichen Wen et al.

NEURIPS 2025oralarXiv:2506.10100
34
citations
#1267

VideoRFT: Incentivizing Video Reasoning Capability in MLLMs via Reinforced Fine-Tuning

Qi Wang, Yanrui Yu, Ye Yuan et al.

NEURIPS 2025oralarXiv:2505.12434
34
citations
#1268

PerturBench: Benchmarking Machine Learning Models for Cellular Perturbation Analysis

Yan Wu, Esther Wershof, Sebastian Schmon et al.

NEURIPS 2025arXiv:2408.10609
34
citations
#1269

LongVALE: Vision-Audio-Language-Event Benchmark Towards Time-Aware Omni-Modal Perception of Long Videos

Tiantian Geng, Jinrui Zhang, Qingni Wang et al.

CVPR 2025arXiv:2411.19772
34
citations
#1270

What Makes Large Language Models Reason in (Multi-Turn) Code Generation?

Kunhao Zheng, Juliette Decugis, Jonas Gehring et al.

ICLR 2025arXiv:2410.08105
34
citations
#1271

O(d/T) Convergence Theory for Diffusion Probabilistic Models under Minimal Assumptions

Gen Li, Yuling Yan

ICLR 2025arXiv:2409.18959
34
citations
#1272

An Analysis of Quantile Temporal-Difference Learning

Mark Rowland, Remi Munos, Mohammad Gheshlaghi Azar et al.

ICML 2025oralarXiv:2301.04462
34
citations
#1273

Stereo Anywhere: Robust Zero-Shot Deep Stereo Matching Even Where Either Stereo or Mono Fail

Luca Bartolomei, Fabio Tosi, Matteo Poggi et al.

CVPR 2025arXiv:2412.04472
34
citations
#1274

SD-LoRA: Scalable Decoupled Low-Rank Adaptation for Class Incremental Learning

Yichen Wu, Hongming Piao, Long-Kai Huang et al.

ICLR 2025arXiv:2501.13198
34
citations
#1275

3DGS-LM: Faster Gaussian-Splatting Optimization with Levenberg-Marquardt

Lukas Höllein, Aljaz Bozic, Michael Zollhöfer et al.

ICCV 2025arXiv:2409.12892
34
citations
#1276

Adversarial Score identity Distillation: Rapidly Surpassing the Teacher in One Step

Mingyuan Zhou, Huangjie Zheng, Yi Gu et al.

ICLR 2025arXiv:2410.14919
34
citations
#1277

Learning to Clarify: Multi-turn Conversations with Action-Based Contrastive Self-Training

Maximillian Chen, Ruoxi Sun, Tomas Pfister et al.

ICLR 2025arXiv:2406.00222
34
citations
#1278

Harnessing Large Language Models for Knowledge Graph Question Answering via Adaptive Multi-Aspect Retrieval-Augmentation

Derong Xu, Xinhang Li, Ziheng Zhang et al.

AAAI 2025paperarXiv:2412.18537
34
citations
#1279

Words or Vision: Do Vision-Language Models Have Blind Faith in Text?

Ailin Deng, Tri Cao, Zhirui Chen et al.

CVPR 2025arXiv:2503.02199
34
citations
#1280

SuperBPE: Space Travel for Language Models

Alisa Liu, Jonathan Hayase, Valentin Hofmann et al.

COLM 2025paperarXiv:2503.13423
34
citations
#1281

More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models

Zhongxing Xu, Chengzhi Liu, Qingyue Wei et al.

NEURIPS 2025arXiv:2505.21523
34
citations
#1282

Unveiling the Secret Recipe: A Guide For Supervised Fine-Tuning Small LLMs

Aldo Pareja, Nikhil Shivakumar Nayak, Hao Wang et al.

ICLR 2025arXiv:2412.13337
34
citations
#1283

Robust Autonomy Emerges from Self-Play

Marco Cusumano-Towner, David Hafner, Alexander Hertzberg et al.

ICML 2025arXiv:2502.03349
34
citations
#1284

Planning Anything with Rigor: General-Purpose Zero-Shot Planning with LLM-based Formalized Programming

Yilun Hao, Yang Zhang, Chuchu Fan

ICLR 2025arXiv:2410.12112
34
citations
#1285

X-ALMA: Plug & Play Modules and Adaptive Rejection for Quality Translation at Scale

Haoran Xu, Kenton Murray, Philipp Koehn et al.

ICLR 2025arXiv:2410.03115
34
citations
#1286

Projecting Assumptions: The Duality Between Sparse Autoencoders and Concept Geometry

Sai Sumedh R. Hindupur, Ekdeep S Lubana, Thomas Fel et al.

NEURIPS 2025arXiv:2503.01822
34
citations
#1287

Zero-Shot Whole-Body Humanoid Control via Behavioral Foundation Models

Andrea Tirinzoni, Ahmed Touati, Jesse Farebrother et al.

ICLR 2025oralarXiv:2504.11054
34
citations
#1288

Emergence of a High-Dimensional Abstraction Phase in Language Transformers

Emily Cheng, Diego Doimo, Corentin Kervadec et al.

ICLR 2025arXiv:2405.15471
34
citations
#1289

Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning

Yuheng Zhang, Dian Yu, Baolin Peng et al.

ICLR 2025arXiv:2407.00617
34
citations
#1290

Tensor Product Attention Is All You Need

Yifan Zhang, Yifeng Liu, Huizhuo Yuan et al.

NEURIPS 2025spotlightarXiv:2501.06425
34
citations
#1291

RAG-DDR: Optimizing Retrieval-Augmented Generation Using Differentiable Data Rewards

Xinze Li, Sen Mei, Zhenghao Liu et al.

ICLR 2025arXiv:2410.13509
34
citations
#1292

Computational Limits of Low-Rank Adaptation (LoRA) Fine-Tuning for Transformer Models

Jerry Yao-Chieh Hu, Maojiang Su, En-Jui Kuo et al.

ICLR 2025arXiv:2406.03136
34
citations
#1293

On Vanishing Gradients, Over-Smoothing, and Over-Squashing in GNNs: Bridging Recurrent and Graph Learning

Alvaro Arroyo, Alessio Gravina, Benjamin Gutteridge et al.

NEURIPS 2025arXiv:2502.10818
34
citations
#1294

What to align in multimodal contrastive learning?

Benoit Dufumier, Javiera Castillo Navarro, Devis Tuia et al.

ICLR 2025arXiv:2409.07402
34
citations
#1295

The Leaderboard Illusion

Shivalika Singh, Yiyang Nan, Alex Wang et al.

NEURIPS 2025arXiv:2504.20879
34
citations
#1296

On the Emergence of Position Bias in Transformers

Xinyi Wu, Yifei Wang, Stefanie Jegelka et al.

ICML 2025arXiv:2502.01951
34
citations
#1297

$R^2$-Guard: Robust Reasoning Enabled LLM Guardrail via Knowledge-Enhanced Logical Reasoning

Mintong Kang, Bo Li

ICLR 2025arXiv:2407.05557
34
citations
#1298

Judge Decoding: Faster Speculative Sampling Requires Going Beyond Model Alignment

Gregor Bachmann, Sotiris Anagnostidis, Albert Pumarola et al.

ICLR 2025arXiv:2501.19309
34
citations
#1299

AffectGPT: A New Dataset, Model, and Benchmark for Emotion Understanding with Multimodal Large Language Models

Zheng Lian, Haoyu Chen, Lan Chen et al.

ICML 2025oralarXiv:2501.16566
34
citations
#1300

Interpreting the Second-Order Effects of Neurons in CLIP

Yossi Gandelsman, Alexei Efros, Jacob Steinhardt

ICLR 2025arXiv:2406.04341
33
citations
#1301

TimeCAP: Learning to Contextualize, Augment, and Predict Time Series Events with Large Language Model Agents

Geon Lee, Wenchao Yu, Kijung Shin et al.

AAAI 2025paperarXiv:2502.11418
33
citations
#1302

Representation Entanglement for Generation: Training Diffusion Transformers Is Much Easier Than You Think

Ge Wu, Shen Zhang, Ruijing Shi et al.

NEURIPS 2025oralarXiv:2507.01467
33
citations
#1303

Speculative Knowledge Distillation: Bridging the Teacher-Student Gap Through Interleaved Sampling

Wenda Xu, Rujun Han, Zifeng Wang et al.

ICLR 2025arXiv:2410.11325
33
citations
#1304

Guided Real Image Dehazing Using YCbCr Color Space

Wenxuan Fang, Junkai Fan, Yu Zheng et al.

AAAI 2025paperarXiv:2412.17496
33
citations
#1305

Optimizing Large Language Model Training Using FP4 Quantization

Ruizhe Wang, Yeyun Gong, Xiao Liu et al.

ICML 2025arXiv:2501.17116
33
citations
#1306

On the Relation between Trainability and Dequantization of Variational Quantum Learning Models

Elies Gil-Fuster, Casper Gyurik, Adrian Perez-Salinas et al.

ICLR 2025arXiv:2406.07072
33
citations
#1307

MAT-Agent: Adaptive Multi-Agent Training Optimization

jusheng zhang, Kaitong Cai, Yijia Fan et al.

NEURIPS 2025arXiv:2510.17845
33
citations
#1308

AgentRefine: Enhancing Agent Generalization through Refinement Tuning

Dayuan Fu, Keqing He, Yejie Wang et al.

ICLR 2025arXiv:2501.01702
33
citations
#1309

World-consistent Video Diffusion with Explicit 3D Modeling

Qihang Zhang, Shuangfei Zhai, Miguel Ángel Bautista et al.

CVPR 2025highlightarXiv:2412.01821
33
citations
#1310

Track4Gen: Teaching Video Diffusion Models to Track Points Improves Video Generation

Hyeonho Jeong, Chun-Hao P. Huang, Jong Chul Ye et al.

CVPR 2025arXiv:2412.06016
33
citations
#1311

A Little Depth Goes a Long Way: The Expressive Power of Log-Depth Transformers

Will Merrill, Ashish Sabharwal

NEURIPS 2025arXiv:2503.03961
33
citations
#1312

Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning

Di Zhang, Jingdi Lei, Junxian Li et al.

CVPR 2025arXiv:2411.18203
33
citations
#1313

AI as Humanity’s Salieri: Quantifying Linguistic Creativity of Language Models via Systematic Attribution of Machine Text against Web Text

Ximing Lu, Melanie Sclar, Skyler Hallinan et al.

ICLR 2025arXiv:2410.04265
33
citations
#1314

VideoGrain: Modulating Space-Time Attention for Multi-Grained Video Editing

Xiangpeng Yang, Linchao Zhu, Hehe Fan et al.

ICLR 2025arXiv:2502.17258
33
citations
#1315

EgoLife: Towards Egocentric Life Assistant

Jingkang Yang, Shuai Liu, Hongming Guo et al.

CVPR 2025arXiv:2503.03803
33
citations
#1316

SWE-rebench: An Automated Pipeline for Task Collection and Decontaminated Evaluation of Software Engineering Agents

Ibragim Badertdinov, Alexander Golubev, Maksim Nekrashevich et al.

NEURIPS 2025arXiv:2505.20411
33
citations
#1317

MV-MATH: Evaluating Multimodal Math Reasoning in Multi-Visual Contexts

Peijie Wang, Zhong-Zhi Li, Fei Yin et al.

CVPR 2025arXiv:2502.20808
33
citations
#1318

Fair Text-to-Image Diffusion via Fair Mapping

Jia Li, Lijie Hu, Jingfeng Zhang et al.

AAAI 2025paperarXiv:2311.17695
33
citations
#1319

3D-HGS: 3D Half-Gaussian Splatting

Haolin Li, Jinyang Liu, Mario Sznaier et al.

CVPR 2025arXiv:2406.02720
33
citations
#1320

LION-FS: Fast & Slow Video-Language Thinker as Online Video Assistant

Wei Li, Bing Hu, Rui Shao et al.

CVPR 2025arXiv:2503.03663
33
citations
#1321

VideoWorld: Exploring Knowledge Learning from Unlabeled Videos

Zhongwei Ren, Yunchao Wei, Xun Guo et al.

CVPR 2025arXiv:2501.09781
33
citations
#1322

Unveiling the Impact of Coding Data Instruction Fine-Tuning on Large Language Models Reasoning

Xinlu Zhang, Zhiyu Zoey Chen, Xi Ye et al.

AAAI 2025paperarXiv:2405.20535
33
citations
#1323

gRNAde: Geometric Deep Learning for 3D RNA inverse design

Chaitanya Joshi, Arian Jamasb, Ramon Viñas et al.

ICLR 2025arXiv:2305.14749
33
citations
#1324

Revisiting text-to-image evaluation with Gecko: on metrics, prompts, and human rating

Olivia Wiles, Chuhan Zhang, Isabela Albuquerque et al.

ICLR 2025arXiv:2404.16820
33
citations
#1325

Exploring the Adversarial Vulnerabilities of Vision-Language-Action Models in Robotics

Taowen Wang, Cheng Han, James Liang et al.

ICCV 2025arXiv:2411.13587
33
citations
#1326

Gramian Multimodal Representation Learning and Alignment

Giordano Cicchetti, Eleonora Grassucci, Luigi Sigillo et al.

ICLR 2025arXiv:2412.11959
33
citations
#1327

Hyper-Connections

Defa Zhu, Hongzhi Huang, Zihao Huang et al.

ICLR 2025arXiv:2409.19606
33
citations
#1328

Reasoning Models Better Express Their Confidence

Dongkeun Yoon, Seungone Kim, Sohee Yang et al.

NEURIPS 2025arXiv:2505.14489
33
citations
#1329

Diverging Preferences: When do Annotators Disagree and do Models Know?

Michael Zhang, Zhilin Wang, Jena Hwang et al.

ICML 2025arXiv:2410.14632
33
citations
#1330

SPAR3D: Stable Point-Aware Reconstruction of 3D Objects from Single Images

Zixuan Huang, Mark Boss, Aaryaman Vasishta et al.

CVPR 2025arXiv:2501.04689
33
citations
#1331

DCEvo: Discriminative Cross-Dimensional Evolutionary Learning for Infrared and Visible Image Fusion

Jinyuan Liu, Bowei Zhang, Qingyun Mei et al.

CVPR 2025arXiv:2503.17673
33
citations
#1332

StyleMaster: Stylize Your Video with Artistic Generation and Translation

Zixuan Ye, Huijuan Huang, Xintao Wang et al.

CVPR 2025arXiv:2412.07744
33
citations
#1333

Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation

Sangmin Bae, Yujin Kim, Reza Bayat et al.

NEURIPS 2025arXiv:2507.10524
33
citations
#1334

Prompt Compression with Context-Aware Sentence Encoding for Fast and Improved LLM Inference

Barys Liskavets, Maxim Ushakov, Shuvendu Roy et al.

AAAI 2025paperarXiv:2409.01227
33
citations
#1335

Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awareness

Haochen Wang, Yucheng Zhao, Tiancai Wang et al.

ICCV 2025arXiv:2504.01901
33
citations
#1336

Scaling Wearable Foundation Models

Girish Narayanswamy, Xin Liu, Kumar Ayush et al.

ICLR 2025arXiv:2410.13638
33
citations
#1337

VidHalluc: Evaluating Temporal Hallucinations in Multimodal Large Language Models for Video Understanding

Chaoyu Li, Eun Woo Im, Pooyan Fazli

CVPR 2025arXiv:2412.03735
33
citations
#1338

Improved Noise Schedule for Diffusion Training

Tiankai Hang, Shuyang Gu, Jianmin Bao et al.

ICCV 2025arXiv:2407.03297
33
citations
#1339

Empowering LLMs to Understand and Generate Complex Vector Graphics

XiMing Xing, Juncheng Hu, Guotao Liang et al.

CVPR 2025arXiv:2412.11102
33
citations
#1340

Make It Count: Text-to-Image Generation with an Accurate Number of Objects

Lital Binyamin, Yoad Tewel, Hilit Segev et al.

CVPR 2025arXiv:2406.10210
33
citations
#1341

The dark side of the forces: assessing non-conservative force models for atomistic machine learning

Filippo Bigi, Marcel Langer, Michele Ceriotti

ICML 2025oralarXiv:2412.11569
33
citations
#1342

Steering Large Language Model Activations in Sparse Spaces

Reza Bayat, Ali Rahimi-Kalahroudi, Mohammad Pezeshki et al.

COLM 2025paperarXiv:2503.00177
33
citations
#1343

Towards Robust Alignment of Language Models: Distributionally Robustifying Direct Preference Optimization

Junkang Wu, Yuexiang Xie, Zhengyi Yang et al.

ICLR 2025arXiv:2407.07880
33
citations
#1344

PartGen: Part-level 3D Generation and Reconstruction with Multi-view Diffusion Models

Minghao Chen, Roman Shapovalov, Iro Laina et al.

CVPR 2025highlightarXiv:2412.18608
33
citations
#1345

Don't be lazy: CompleteP enables compute-efficient deep transformers

Nolan Dey, Bin Zhang, Lorenzo Noci et al.

NEURIPS 2025arXiv:2505.01618
33
citations
#1346

LLM4GEN: Leveraging Semantic Representation of LLMs for Text-to-Image Generation

Mushui Liu, Yuhang Ma, Zhen Yang et al.

AAAI 2025paperarXiv:2407.00737
33
citations
#1347

Dataset Distillation with Neural Characteristic Function: A Minmax Perspective

Shaobo Wang, Yicun Yang, Zhiyuan Liu et al.

CVPR 2025highlightarXiv:2502.20653
33
citations
#1348

DEFAME: Dynamic Evidence-based FAct-checking with Multimodal Experts

Tobias Braun, Mark Rothermel, Marcus Rohrbach et al.

ICML 2025oralarXiv:2412.10510
33
citations
#1349

High-Dimensional Prediction for Sequential Decision Making

Georgy Noarov, Ramya Ramalingam, Aaron Roth et al.

ICML 2025oralarXiv:2310.17651
33
citations
#1350

Resolving Multi-Condition Confusion for Finetuning-Free Personalized Image Generation

Qihan Huang, Siming Fu, Jinlong Liu et al.

AAAI 2025paperarXiv:2409.17920
33
citations
#1351

Stable-Hair: Real-World Hair Transfer via Diffusion Model

Yuxuan Zhang, Qing Zhang, Yiren Song et al.

AAAI 2025paperarXiv:2407.14078
33
citations
#1352

PointRWKV: Efficient RWKV-Like Model for Hierarchical Point Cloud Learning

Qingdong He, Jiangning Zhang, Jinlong Peng et al.

AAAI 2025paperarXiv:2405.15214
32
citations
#1353

Evolutionary Large Language Model for Automated Feature Transformation

Nanxu Gong, Chandan K Reddy, Wangyang Ying et al.

AAAI 2025paperarXiv:2405.16203
32
citations
#1354

Context-Alignment: Activating and Enhancing LLMs Capabilities in Time Series

Yuxiao Hu, Qian Li, Dongxiao Zhang et al.

ICLR 2025arXiv:2501.03747
32
citations
#1355

Harnessing Diversity for Important Data Selection in Pretraining Large Language Models

Chi Zhang, Huaping Zhong, Kuan Zhang et al.

ICLR 2025arXiv:2409.16986
32
citations
#1356

Archetypal SAE: Adaptive and Stable Dictionary Learning for Concept Extraction in Large Vision Models

Thomas Fel, Ekdeep Singh Lubana, Jacob Prince et al.

ICML 2025arXiv:2502.12892
32
citations
#1357

SRPO: Enhancing Multimodal LLM Reasoning via Reflection-Aware Reinforcement Learning

Zhongwei Wan, Zhihao Dou, Che Liu et al.

NEURIPS 2025arXiv:2506.01713
32
citations
#1358

Proposer-Agent-Evaluator (PAE): Autonomous Skill Discovery For Foundation Model Internet Agents

Yifei Zhou, Qianlan Yang, Kaixiang Lin et al.

ICML 2025arXiv:2412.13194
32
citations
#1359

Think Silently, Think Fast: Dynamic Latent Compression of LLM Reasoning Chains

Wenhui Tan, Jiaze Li, Jianzhong Ju et al.

NEURIPS 2025arXiv:2505.16552
32
citations
#1360

Closed-Loop Supervised Fine-Tuning of Tokenized Traffic Models

Zhejun Zhang, Peter Karkus, Maximilian Igl et al.

CVPR 2025arXiv:2412.05334
32
citations
#1361

Cross-modal Information Flow in Multimodal Large Language Models

Zhi Zhang, Srishti Yadav, Fengze Han et al.

CVPR 2025arXiv:2411.18620
32
citations
#1362

WildGS-SLAM: Monocular Gaussian Splatting SLAM in Dynamic Environments

Jianhao Zheng, Zihan Zhu, Valentin Bieri et al.

CVPR 2025arXiv:2504.03886
32
citations
#1363

Walk the Talk? Measuring the Faithfulness of Large Language Model Explanations

Katie Matton, Robert Ness, John Guttag et al.

ICLR 2025arXiv:2504.14150
32
citations
#1364

Diffusion Renderer: Neural Inverse and Forward Rendering with Video Diffusion Models

Ruofan Liang, Žan Gojčič, Huan Ling et al.

CVPR 2025
32
citations
#1365

OMEGA: Can LLMs Reason Outside the Box in Math? Evaluating Exploratory, Compositional, and Transformative Generalization

Yiyou Sun, Shawn Hu, Georgia Zhou et al.

NEURIPS 2025arXiv:2506.18880
32
citations
#1366

Fast Solvers for Discrete Diffusion Models: Theory and Applications of High-Order Algorithms

Yinuo Ren, Haoxuan Chen, Yuchen Zhu et al.

NEURIPS 2025arXiv:2502.00234
32
citations
#1367

CAKE: Cascading and Adaptive KV Cache Eviction with Layer Preferences

Ziran Qin, Yuchen Cao, Mingbao Lin et al.

ICLR 2025oralarXiv:2503.12491
32
citations
#1368

TIS-DPO: Token-level Importance Sampling for Direct Preference Optimization With Estimated Weights

Aiwei Liu, Haoping Bai, Zhiyun Lu et al.

ICLR 2025arXiv:2410.04350
32
citations
#1369

STBLLM: Breaking the 1-Bit Barrier with Structured Binary LLMs

Peijie Dong, Lujun Li, Yuedong Zhong et al.

ICLR 2025arXiv:2408.01803
32
citations
#1370

ACPBench: Reasoning About Action, Change, and Planning

Harsha Kokel, Michael Katz, Kavitha Srinivas et al.

AAAI 2025paperarXiv:2410.05669
32
citations
#1371

Eagle 2.5: Boosting Long-Context Post-Training for Frontier Vision-Language Models

Guo Chen, Zhiqi Li, Shihao Wang et al.

NEURIPS 2025arXiv:2504.15271
32
citations
#1372

DocKylin: A Large Multimodal Model for Visual Document Understanding with Efficient Visual Slimming

Jiaxin Zhang, Wentao Yang, Songxuan Lai et al.

AAAI 2025paperarXiv:2406.19101
32
citations
#1373

Self-Boosting Large Language Models with Synthetic Preference Data

Qingxiu Dong, Li Dong, Xingxing Zhang et al.

ICLR 2025arXiv:2410.06961
32
citations
#1374

ICLR: In-Context Learning of Representations

Core Francisco Park, Andrew Lee, Ekdeep Singh Lubana et al.

ICLR 2025arXiv:2501.00070
32
citations
#1375

Complexity Experts are Task-Discriminative Learners for Any Image Restoration

Eduard Zamfir, Zongwei Wu, Nancy Mehta et al.

CVPR 2025arXiv:2411.18466
32
citations
#1376

DarkIR: Robust Low-Light Image Restoration

Daniel Feijoo, Juan C. Benito, Alvaro Garcia et al.

CVPR 2025arXiv:2412.13443
32
citations
#1377

Any2AnyTryon: Leveraging Adaptive Position Embeddings for Versatile Virtual Clothing Tasks

Hailong Guo, Bohan Zeng, Yiren Song et al.

ICCV 2025arXiv:2501.15891
32
citations
#1378

Reasoning by Superposition: A Theoretical Perspective on Chain of Continuous Thought

Hanlin Zhu, Shibo Hao, Zhiting Hu et al.

NEURIPS 2025arXiv:2505.12514
32
citations
#1379

Number Cookbook: Number Understanding of Language Models and How to Improve It

Haotong Yang, Yi Hu, Shijia Kang et al.

ICLR 2025arXiv:2411.03766
32
citations
#1380

Checklists Are Better Than Reward Models For Aligning Language Models

Vijay Viswanathan, Yanchao Sun, Xiang Kong et al.

NEURIPS 2025spotlightarXiv:2507.18624
32
citations
#1381

The Geometry of Refusal in Large Language Models: Concept Cones and Representational Independence

Tom Wollschläger, Jannes Elstner, Simon Geisler et al.

ICML 2025arXiv:2502.17420
32
citations
#1382

Perceive Anything: Recognize, Explain, Caption, and Segment Anything in Images and Videos

Weifeng Lin, Xinyu Wei, Ruichuan An et al.

NEURIPS 2025arXiv:2506.05302
32
citations
#1383

InverseBench: Benchmarking Plug-and-Play Diffusion Priors for Inverse Problems in Physical Sciences

Hongkai Zheng, Wenda Chu, Bingliang Zhang et al.

ICLR 2025arXiv:2503.11043
32
citations
#1384

VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling

Zeyue Tian, Zhaoyang Liu, Ruibin Yuan et al.

CVPR 2025arXiv:2406.04321
32
citations
#1385

Self-Expansion of Pre-trained Models with Mixture of Adapters for Continual Learning

Huiyi Wang, Haodong Lu, Lina Yao et al.

CVPR 2025arXiv:2403.18886
32
citations
#1386

CityGaussianV2: Efficient and Geometrically Accurate Reconstruction for Large-Scale Scenes

Yang Liu, Chuanchen Luo, Zhongkai Mao et al.

ICLR 2025arXiv:2411.00771
32
citations
#1387

OmniPhysGS: 3D Constitutive Gaussians for General Physics-Based Dynamics Generation

Yuchen Lin, Chenguo Lin, Jianjin Xu et al.

ICLR 2025arXiv:2501.18982
32
citations
#1388

Retrieval-Augmented Generation with Conflicting Evidence

Han Wang, Archiki Prasad, Elias Stengel-Eskin et al.

COLM 2025paperarXiv:2504.13079
32
citations
#1389

From Reflection to Perfection: Scaling Inference-Time Optimization for Text-to-Image Diffusion Models via Reflection Tuning

Le Zhuo, Liangbing Zhao, Sayak Paul et al.

ICCV 2025arXiv:2504.16080
32
citations
#1390

Beyond Linear Approximations: A Novel Pruning Approach for Attention Matrix

Yingyu Liang, Jiangxuan Long, Zhenmei Shi et al.

ICLR 2025arXiv:2410.11261
32
citations
#1391

Ambient Diffusion Posterior Sampling: Solving Inverse Problems with Diffusion Models Trained on Corrupted Data

Asad Aali, Giannis Daras, Brett Levac et al.

ICLR 2025arXiv:2403.08728
32
citations
#1392

Policy Decorator: Model-Agnostic Online Refinement for Large Policy Model

Xiu Yuan, Tongzhou Mu, Stone Tao et al.

ICLR 2025arXiv:2412.13630
32
citations
#1393

MoDeGPT: Modular Decomposition for Large Language Model Compression

Chi-Heng Lin, Shangqian Gao, James Smith et al.

ICLR 2025arXiv:2408.09632
32
citations
#1394

Automated Red Teaming with GOAT: the Generative Offensive Agent Tester

Maya Pavlova, Erik Brinkman, Krithika Iyer et al.

ICML 2025arXiv:2410.01606
32
citations
#1395

MonSter: Marry Monodepth to Stereo Unleashes Power

JunDa Cheng, Longliang Liu, Gangwei Xu et al.

CVPR 2025highlight
32
citations
#1396

LT3SD: Latent Trees for 3D Scene Diffusion

Quan Meng, Lei Li, Matthias Nießner et al.

CVPR 2025arXiv:2409.08215
31
citations
#1397

From Isolated Conversations to Hierarchical Schemas: Dynamic Tree Memory Representation for LLMs

Alireza Rezazadeh, Zichao Li, Wei Wei et al.

ICLR 2025arXiv:2410.14052
31
citations
#1398

From Tokens to Words: On the Inner Lexicon of LLMs

Guy Kaplan, Matanel Oren, Yuval Reif et al.

ICLR 2025arXiv:2410.05864
31
citations
#1399

Unlocking Multimodal Mathematical Reasoning via Process Reward Model

Ruilin Luo, Zhuofan Zheng, Lei Wang et al.

NEURIPS 2025arXiv:2501.04686
31
citations
#1400

SageAttention3: Microscaling FP4 Attention for Inference and An Exploration of 8-Bit Training

Jintao Zhang, Jia wei, Haoxu Wang et al.

NEURIPS 2025spotlightarXiv:2505.11594
31
citations