Most Cited 2025 "body pose control" Papers

22,274 papers found • Page 7 of 112

Filters:Most Cited 2025 body pose control Clear all

Conference

AAAI 2025 (3,028)COLM 2025 (418)CVPR 2025 (2,873)ICCV 2025 (2,701)ICLR 2025 (3,827)ICML 2025 (3,340)ISMAR 2025 (229)NEURIPS 2025 (5,858)AAAI 2024 (2,289)CVPR 2024 (2,716)ECCV 2024 (2,387)ICLR 2024 (2,297)ICML 2024 (2,635)

Paper Type

poster (24,624)paper (8,558)oral (1,594)spotlight (1,421)highlight (975)

#1201

Add-it: Training-Free Object Insertion in Images With Pretrained Diffusion Models

Yoad Tewel, Rinon Gal, Dvir Samuel et al.

ICLR 2025arXiv:2411.07232

citations

#1202

Adaptive Multi-Scale Decomposition Framework for Time Series Forecasting

Yifan Hu, Peiyuan Liu, Peng Zhu et al.

AAAI 2025paperarXiv:2406.03751

citations

#1203

LLM-Powered User Simulator for Recommender System

Zijian Zhang, Shuchang Liu, Ziru Liu et al.

AAAI 2025paperarXiv:2412.16984

citations

#1204

InfiniCube: Unbounded and Controllable Dynamic 3D Driving Scene Generation with World-Guided Video Models

Yifan Lu, Xuanchi Ren, Jiawei Yang et al.

ICCV 2025arXiv:2412.03934

citations

#1205

A Closer Look at Machine Unlearning for Large Language Models

Xiaojian Yuan, Tianyu Pang, Chao Du et al.

ICLR 2025arXiv:2410.08109

citations

#1206

DeepMesh: Auto-Regressive Artist-mesh Creation with Reinforcement Learning

Ruowen Zhao, James Jun Liang Chen Ye, Zhengyi Wang et al.

ICCV 2025arXiv:2503.15265

citations

#1207

A New Perspective on Shampoo's Preconditioner

Depen Morwani, Itai Shapira, Nikhil Vyas et al.

ICLR 2025arXiv:2406.17748

citations

#1208

Sparc3D: Sparse Representation and Construction for High-Resolution 3D Shapes Modeling

Zhihao Li, Yufei Wang, Heliang Zheng et al.

NEURIPS 2025arXiv:2505.14521

citations

#1209

AnySat: One Earth Observation Model for Many Resolutions, Scales, and Modalities

Guillaume Astruc, Nicolas Gonthier, Clement Mallet et al.

CVPR 2025highlightarXiv:2412.14123

citations

#1210

Pow3R: Empowering Unconstrained 3D Reconstruction with Camera and Scene Priors

Wonbong Jang, Philippe Weinzaepfel, Vincent Leroy et al.

CVPR 2025arXiv:2503.17316

citations

#1211

SAM2Act: Integrating Visual Foundation Model with A Memory Architecture for Robotic Manipulation

Haoquan Fang, Markus Grotz, Wilbert Pumacay et al.

ICML 2025arXiv:2501.18564

citations

#1212

WISA: World simulator assistant for physics-aware text-to-video generation

Jing Wang, Ao Ma, Ke Cao et al.

NEURIPS 2025spotlightarXiv:2503.08153

citations

#1213

SCALM: Detecting Bad Practices in Smart Contracts Through LLMs

Zongwei Li, Xiaoqi Li, Wenkai Li et al.

AAAI 2025paperarXiv:2502.04347

citations

#1214

Harnessing the Universal Geometry of Embeddings

Rishi Jha, Collin Zhang, Vitaly Shmatikov et al.

NEURIPS 2025arXiv:2505.12540

citations

#1215

MNE-SLAM: Multi-Agent Neural SLAM for Mobile Robots

Tianchen Deng, Guole Shen, Chen Xun et al.

CVPR 2025

citations

#1216

Code-as-Monitor: Constraint-aware Visual Programming for Reactive and Proactive Robotic Failure Detection

Enshen Zhou, Qi Su, Cheng Chi et al.

CVPR 2025arXiv:2412.04455

citations

#1217

Scaling Laws of Synthetic Data for Language Model

Zeyu Qin, Qingxiu Dong, Xingxing Zhang et al.

COLM 2025paperarXiv:2503.19551

citations

#1218

Reconstructive Visual Instruction Tuning

Haochen Wang, Anlin Zheng, Yucheng Zhao et al.

ICLR 2025arXiv:2410.09575

citations

#1219

VisualQuality-R1: Reasoning-Induced Image Quality Assessment via Reinforcement Learning to Rank

Tianhe Wu, Jian Zou, Jie Liang et al.

NEURIPS 2025spotlightarXiv:2505.14460

citations

#1220

Similarity-Guided Layer-Adaptive Vision Transformer for UAV Tracking

chaocan xue, Bineng Zhong, Qihua Liang et al.

CVPR 2025arXiv:2503.06625

citations

#1221

Modular Duality in Deep Learning

Jeremy Bernstein, Laker Newhouse

ICML 2025arXiv:2410.21265

citations

#1222

Improving Retrieval Augmented Language Model with Self-Reasoning

Yuan Xia, Jingbo Zhou, Zhenhui Shi et al.

AAAI 2025paperarXiv:2407.19813

citations

#1223

SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding

Hao Li, Changyao TIAN, Jie Shao et al.

CVPR 2025arXiv:2412.09604

citations

#1224

Can LLMs Understand Time Series Anomalies?

Zihao Zhou, Rose Yu

ICLR 2025arXiv:2410.05440

citations

#1225

One-Prompt-One-Story: Free-Lunch Consistent Text-to-Image Generation Using a Single Prompt

Tao Liu, Kai Wang, Senmao Li et al.

ICLR 2025arXiv:2501.13554

citations

#1226

Adversarial Perturbations Cannot Reliably Protect Artists From Generative AI

Robert Hönig, Javier Rando, Nicholas Carlini et al.

ICLR 2025arXiv:2406.12027

citations

#1227

Preference Optimization for Reasoning with Pseudo Feedback

Fangkai Jiao, Geyang Guo, Xingxing Zhang et al.

ICLR 2025arXiv:2411.16345

citations

#1228

HiSplat: Hierarchical 3D Gaussian Splatting for Generalizable Sparse-View Reconstruction

Shengji Tang, Weicai Ye, Peng Ye et al.

ICLR 2025arXiv:2410.06245

citations

#1229

Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive?

Rylan Schaeffer, Hailey Schoelkopf, Brando Miranda et al.

ICML 2025arXiv:2406.04391

citations

#1230

DrivingForward: Feed-forward 3D Gaussian Splatting for Driving Scene Reconstruction from Flexible Surround-view Input

Qijian Tian, Xin Tan, Yuan Xie et al.

AAAI 2025paperarXiv:2409.12753

citations

#1231

EmbodiedSAM: Online Segment Any 3D Thing in Real Time

Xiuwei Xu, Huangxing Chen, Linqing Zhao et al.

ICLR 2025arXiv:2408.11811

citations

#1232

DeFoG: Discrete Flow Matching for Graph Generation

Yiming Qin, Manuel Madeira, Dorina Thanou et al.

ICML 2025oralarXiv:2410.04263

citations

#1233

PepTune: De Novo Generation of Therapeutic Peptides with Multi-Objective-Guided Discrete Diffusion

Sophia Tang, Yinuo Zhang, Pranam Chatterjee, PhD

ICML 2025arXiv:2412.17780

citations

#1234

nGPT: Normalized Transformer with Representation Learning on the Hypersphere

Ilya Loshchilov, Cheng-Ping Hsieh, Simeng Sun et al.

ICLR 2025arXiv:2410.01131

citations

#1235

Training-free and Adaptive Sparse Attention for Efficient Long Video Generation

yifei xia, Suhan Ling, Fangcheng Fu et al.

ICCV 2025arXiv:2502.21079

citations

#1236

Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning Models

Ruikang Liu, Yuxuan Sun, Manyi Zhang et al.

COLM 2025paperarXiv:2504.04823

citations

#1237

RepoAudit: An Autonomous LLM-Agent for Repository-Level Code Auditing

Jinyao Guo, Chengpeng Wang, Xiangzhe Xu et al.

ICML 2025arXiv:2501.18160

citations

#1238

LongGenBench: Benchmarking Long-Form Generation in Long Context LLMs

Yuhao Wu, Ming Shan Hee, Zhiqiang Hu et al.

ICLR 2025arXiv:2409.02076

citations

#1239

Test-Time Preference Optimization: On-the-Fly Alignment via Iterative Textual Feedback

Yafu Li, Xuyang Hu, Xiaoye Qu et al.

ICML 2025arXiv:2501.12895

citations

#1240

Multi-Agent Collaboration via Evolving Orchestration

Yufan Dang, Chen Qian, Xueheng Luo et al.

NEURIPS 2025arXiv:2505.19591

citations

#1241

TrajectoryCrafter: Redirecting Camera Trajectory for Monocular Videos via Diffusion Models

Mark YU, Wenbo Hu, Jinbo Xing et al.

ICCV 2025arXiv:2503.05638

citations

#1242

TabDiff: a Mixed-type Diffusion Model for Tabular Data Generation

Juntong Shi, Minkai Xu, Harper Hua et al.

ICLR 2025arXiv:2410.20626

citations

#1243

Detecting Strategic Deception with Linear Probes

Nicholas Goldowsky-Dill, Bilal Chughtai, Stefan Heimersheim et al.

ICML 2025arXiv:2502.03407

citations

#1244

3D-Mem: 3D Scene Memory for Embodied Exploration and Reasoning

Yuncong Yang, Han Yang, Jiachen Zhou et al.

CVPR 2025arXiv:2411.17735

citations

#1245

Merging LoRAs like Playing LEGO: Pushing the Modularity of LoRA to Extremes Through Rank-Wise Clustering

Ziyu Zhao, tao shen, Didi Zhu et al.

ICLR 2025arXiv:2409.16167

citations

#1246

ToolGen: Unified Tool Retrieval and Calling via Generation

Renxi Wang, Xudong Han, Lei Ji et al.

ICLR 2025arXiv:2410.03439

citations

#1247

Sufficient Context: A New Lens on Retrieval Augmented Generation Systems

Hailey Joren, Jianyi Zhang, Chun-Sung Ferng et al.

ICLR 2025arXiv:2411.06037

citations

#1248

No Task Left Behind: Isotropic Model Merging with Common and Task-Specific Subspaces

Daniel Marczak, Simone Magistri, Sebastian Cygert et al.

ICML 2025arXiv:2502.04959

citations

#1249

StarVector: Generating Scalable Vector Graphics Code from Images and Text

Juan Rodriguez, Abhay Puri, Shubham Agarwal et al.

CVPR 2025arXiv:2312.11556

citations

#1250

SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion

Ahmed Nassar, Matteo Omenetti, Maksym Lysak et al.

ICCV 2025arXiv:2503.11576

citations

#1251

DEEM: Diffusion models serve as the eyes of large language models for image perception

Run Luo, Yunshui Li, Longze Chen et al.

ICLR 2025arXiv:2405.15232

citations

#1252

Spot the Fake: Large Multimodal Model-Based Synthetic Image Detection with Artifact Explanation

Siwei Wen, junyan ye, Peilin Feng et al.

NEURIPS 2025arXiv:2503.14905

citations

#1253

Text4Seg: Reimagining Image Segmentation as Text Generation

Mengcheng Lan, Chaofeng Chen, Yue Zhou et al.

ICLR 2025arXiv:2410.09855

citations

#1254

VisualPredicator: Learning Abstract World Models with Neuro-Symbolic Predicates for Robot Planning

Yichao Liang, Nishanth Kumar, Hao Tang et al.

ICLR 2025arXiv:2410.23156

citations

#1255

Fira: Can We Achieve Full-rank Training of LLMs Under Low-rank Constraint?

Xi Chen, Kaituo Feng, Changsheng Li et al.

NEURIPS 2025arXiv:2410.01623

citations

#1256

Let Them Talk: Audio-Driven Multi-Person Conversational Video Generation

Zhe Kong, Feng Gao, Yong Zhang et al.

NEURIPS 2025arXiv:2505.22647

citations

#1257

GUI-G1: Understanding R1-Zero-Like Training for Visual Grounding in GUI Agents

Yuqi Zhou, Sunhao Dai, Shuai Wang et al.

NEURIPS 2025arXiv:2505.15810

citations

#1258

Massive Values in Self-Attention Modules are the Key to Contextual Knowledge Understanding

Mingyu Jin, Kai Mei, Wujiang Xu et al.

ICML 2025arXiv:2502.01563

citations

#1259

Efficiently Learning at Test-Time: Active Fine-Tuning of LLMs

Jonas Hübotter, Sascha Bongni, Ido Hakimi et al.

ICLR 2025arXiv:2410.08020

citations

#1260

WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model

Zongjian Li, Bin Lin, Yang Ye et al.

CVPR 2025arXiv:2411.17459

citations

#1261

UI-Vision: A Desktop-centric GUI Benchmark for Visual Perception and Interaction

Perampalli Shravan Nayak, Xiangru Jian, Kevin Qinghong Lin et al.

ICML 2025arXiv:2503.15661

citations

#1262

A₀ : An Affordance-Aware Hierarchical Model for General Robotic Manipulation

Rongtao Xu, Jian Zhang, Minghao Guo et al.

ICCV 2025arXiv:2504.12636

citations

#1263

ForecastBench: A Dynamic Benchmark of AI Forecasting Capabilities

Ezra Karger, Houtan Bastani, Chen Yueh-Han et al.

ICLR 2025arXiv:2409.19839

citations

#1264

Generative Gaussian Splatting for Unbounded 3D City Generation

Haozhe Xie, Zhaoxi Chen, Fangzhou Hong et al.

CVPR 2025arXiv:2406.06526

citations

#1265

One Diffusion to Generate Them All

Duong H. Le, Tuan Pham, Sangho Lee et al.

CVPR 2025arXiv:2411.16318

citations

#1266

EfficientVLA: Training-Free Acceleration and Compression for Vision-Language-Action Models

Yantai Yang, Yuhao Wang, Zichen Wen et al.

NEURIPS 2025oralarXiv:2506.10100

citations

#1267

VideoRFT: Incentivizing Video Reasoning Capability in MLLMs via Reinforced Fine-Tuning

Qi Wang, Yanrui Yu, Ye Yuan et al.

NEURIPS 2025oralarXiv:2505.12434

citations

#1268

PerturBench: Benchmarking Machine Learning Models for Cellular Perturbation Analysis

Yan Wu, Esther Wershof, Sebastian Schmon et al.

NEURIPS 2025arXiv:2408.10609

citations

#1269

LongVALE: Vision-Audio-Language-Event Benchmark Towards Time-Aware Omni-Modal Perception of Long Videos

Tiantian Geng, Jinrui Zhang, Qingni Wang et al.

CVPR 2025arXiv:2411.19772

citations

#1270

What Makes Large Language Models Reason in (Multi-Turn) Code Generation?

Kunhao Zheng, Juliette Decugis, Jonas Gehring et al.

ICLR 2025arXiv:2410.08105

citations

#1271

O(d/T) Convergence Theory for Diffusion Probabilistic Models under Minimal Assumptions

Gen Li, Yuling Yan

ICLR 2025arXiv:2409.18959

citations

#1272

An Analysis of Quantile Temporal-Difference Learning

Mark Rowland, Remi Munos, Mohammad Gheshlaghi Azar et al.

ICML 2025oralarXiv:2301.04462

citations

#1273

Stereo Anywhere: Robust Zero-Shot Deep Stereo Matching Even Where Either Stereo or Mono Fail

Luca Bartolomei, Fabio Tosi, Matteo Poggi et al.

CVPR 2025arXiv:2412.04472

citations

#1274

SD-LoRA: Scalable Decoupled Low-Rank Adaptation for Class Incremental Learning

Yichen Wu, Hongming Piao, Long-Kai Huang et al.

ICLR 2025arXiv:2501.13198

citations

#1275

3DGS-LM: Faster Gaussian-Splatting Optimization with Levenberg-Marquardt

Lukas Höllein, Aljaz Bozic, Michael Zollhöfer et al.

ICCV 2025arXiv:2409.12892

citations

#1276

Adversarial Score identity Distillation: Rapidly Surpassing the Teacher in One Step

Mingyuan Zhou, Huangjie Zheng, Yi Gu et al.

ICLR 2025arXiv:2410.14919

citations

#1277

Learning to Clarify: Multi-turn Conversations with Action-Based Contrastive Self-Training

Maximillian Chen, Ruoxi Sun, Tomas Pfister et al.

ICLR 2025arXiv:2406.00222

citations

#1278

Harnessing Large Language Models for Knowledge Graph Question Answering via Adaptive Multi-Aspect Retrieval-Augmentation

Derong Xu, Xinhang Li, Ziheng Zhang et al.

AAAI 2025paperarXiv:2412.18537

citations

#1279

Words or Vision: Do Vision-Language Models Have Blind Faith in Text?

Ailin Deng, Tri Cao, Zhirui Chen et al.

CVPR 2025arXiv:2503.02199

citations

#1280

SuperBPE: Space Travel for Language Models

Alisa Liu, Jonathan Hayase, Valentin Hofmann et al.

COLM 2025paperarXiv:2503.13423

citations

#1281

More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models

Zhongxing Xu, Chengzhi Liu, Qingyue Wei et al.

NEURIPS 2025arXiv:2505.21523

citations

#1282

Unveiling the Secret Recipe: A Guide For Supervised Fine-Tuning Small LLMs

Aldo Pareja, Nikhil Shivakumar Nayak, Hao Wang et al.

ICLR 2025arXiv:2412.13337

citations

#1283

Robust Autonomy Emerges from Self-Play

Marco Cusumano-Towner, David Hafner, Alexander Hertzberg et al.

ICML 2025arXiv:2502.03349

citations

#1284

Planning Anything with Rigor: General-Purpose Zero-Shot Planning with LLM-based Formalized Programming

Yilun Hao, Yang Zhang, Chuchu Fan

ICLR 2025arXiv:2410.12112

citations

#1285

X-ALMA: Plug & Play Modules and Adaptive Rejection for Quality Translation at Scale

Haoran Xu, Kenton Murray, Philipp Koehn et al.

ICLR 2025arXiv:2410.03115

citations

#1286

Projecting Assumptions: The Duality Between Sparse Autoencoders and Concept Geometry

Sai Sumedh R. Hindupur, Ekdeep S Lubana, Thomas Fel et al.

NEURIPS 2025arXiv:2503.01822

citations

#1287

Zero-Shot Whole-Body Humanoid Control via Behavioral Foundation Models

Andrea Tirinzoni, Ahmed Touati, Jesse Farebrother et al.

ICLR 2025oralarXiv:2504.11054

citations

#1288

Emergence of a High-Dimensional Abstraction Phase in Language Transformers

Emily Cheng, Diego Doimo, Corentin Kervadec et al.

ICLR 2025arXiv:2405.15471

citations

#1289

Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning

Yuheng Zhang, Dian Yu, Baolin Peng et al.

ICLR 2025arXiv:2407.00617

citations

#1290

Tensor Product Attention Is All You Need

Yifan Zhang, Yifeng Liu, Huizhuo Yuan et al.

NEURIPS 2025spotlightarXiv:2501.06425

citations

#1291

RAG-DDR: Optimizing Retrieval-Augmented Generation Using Differentiable Data Rewards

Xinze Li, Sen Mei, Zhenghao Liu et al.

ICLR 2025arXiv:2410.13509

citations

#1292

Computational Limits of Low-Rank Adaptation (LoRA) Fine-Tuning for Transformer Models

Jerry Yao-Chieh Hu, Maojiang Su, En-Jui Kuo et al.

ICLR 2025arXiv:2406.03136

citations

#1293

On Vanishing Gradients, Over-Smoothing, and Over-Squashing in GNNs: Bridging Recurrent and Graph Learning

Alvaro Arroyo, Alessio Gravina, Benjamin Gutteridge et al.

NEURIPS 2025arXiv:2502.10818

citations

#1294

What to align in multimodal contrastive learning?

Benoit Dufumier, Javiera Castillo Navarro, Devis Tuia et al.

ICLR 2025arXiv:2409.07402

citations

#1295

The Leaderboard Illusion

Shivalika Singh, Yiyang Nan, Alex Wang et al.

NEURIPS 2025arXiv:2504.20879

citations

#1296

On the Emergence of Position Bias in Transformers

Xinyi Wu, Yifei Wang, Stefanie Jegelka et al.

ICML 2025arXiv:2502.01951

citations

#1297

$R^2$-Guard: Robust Reasoning Enabled LLM Guardrail via Knowledge-Enhanced Logical Reasoning

Mintong Kang, Bo Li

ICLR 2025arXiv:2407.05557

citations

#1298

Judge Decoding: Faster Speculative Sampling Requires Going Beyond Model Alignment

Gregor Bachmann, Sotiris Anagnostidis, Albert Pumarola et al.

ICLR 2025arXiv:2501.19309

citations

#1299

AffectGPT: A New Dataset, Model, and Benchmark for Emotion Understanding with Multimodal Large Language Models

Zheng Lian, Haoyu Chen, Lan Chen et al.

ICML 2025oralarXiv:2501.16566

citations

#1300

Interpreting the Second-Order Effects of Neurons in CLIP

Yossi Gandelsman, Alexei Efros, Jacob Steinhardt

ICLR 2025arXiv:2406.04341

citations

#1301

TimeCAP: Learning to Contextualize, Augment, and Predict Time Series Events with Large Language Model Agents

Geon Lee, Wenchao Yu, Kijung Shin et al.

AAAI 2025paperarXiv:2502.11418

citations

#1302

Representation Entanglement for Generation: Training Diffusion Transformers Is Much Easier Than You Think

Ge Wu, Shen Zhang, Ruijing Shi et al.

NEURIPS 2025oralarXiv:2507.01467

citations

#1303

Speculative Knowledge Distillation: Bridging the Teacher-Student Gap Through Interleaved Sampling

Wenda Xu, Rujun Han, Zifeng Wang et al.

ICLR 2025arXiv:2410.11325

citations

#1304

Guided Real Image Dehazing Using YCbCr Color Space

Wenxuan Fang, Junkai Fan, Yu Zheng et al.

AAAI 2025paperarXiv:2412.17496

citations

#1305

Optimizing Large Language Model Training Using FP4 Quantization

Ruizhe Wang, Yeyun Gong, Xiao Liu et al.

ICML 2025arXiv:2501.17116

citations

#1306

On the Relation between Trainability and Dequantization of Variational Quantum Learning Models

Elies Gil-Fuster, Casper Gyurik, Adrian Perez-Salinas et al.

ICLR 2025arXiv:2406.07072

citations

#1307

MAT-Agent: Adaptive Multi-Agent Training Optimization

jusheng zhang, Kaitong Cai, Yijia Fan et al.

NEURIPS 2025arXiv:2510.17845

citations

#1308

AgentRefine: Enhancing Agent Generalization through Refinement Tuning

Dayuan Fu, Keqing He, Yejie Wang et al.

ICLR 2025arXiv:2501.01702

citations

#1309

World-consistent Video Diffusion with Explicit 3D Modeling

Qihang Zhang, Shuangfei Zhai, Miguel Ángel Bautista et al.

CVPR 2025highlightarXiv:2412.01821

citations

#1310

Track4Gen: Teaching Video Diffusion Models to Track Points Improves Video Generation

Hyeonho Jeong, Chun-Hao P. Huang, Jong Chul Ye et al.

CVPR 2025arXiv:2412.06016

citations

#1311

A Little Depth Goes a Long Way: The Expressive Power of Log-Depth Transformers

Will Merrill, Ashish Sabharwal

NEURIPS 2025arXiv:2503.03961

citations

#1312

Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning

Di Zhang, Jingdi Lei, Junxian Li et al.

CVPR 2025arXiv:2411.18203

citations

#1313

AI as Humanity’s Salieri: Quantifying Linguistic Creativity of Language Models via Systematic Attribution of Machine Text against Web Text

Ximing Lu, Melanie Sclar, Skyler Hallinan et al.

ICLR 2025arXiv:2410.04265

citations

#1314

VideoGrain: Modulating Space-Time Attention for Multi-Grained Video Editing

Xiangpeng Yang, Linchao Zhu, Hehe Fan et al.

ICLR 2025arXiv:2502.17258

citations

#1315

EgoLife: Towards Egocentric Life Assistant

Jingkang Yang, Shuai Liu, Hongming Guo et al.

CVPR 2025arXiv:2503.03803

citations

#1316

SWE-rebench: An Automated Pipeline for Task Collection and Decontaminated Evaluation of Software Engineering Agents

Ibragim Badertdinov, Alexander Golubev, Maksim Nekrashevich et al.

NEURIPS 2025arXiv:2505.20411

citations

#1317

MV-MATH: Evaluating Multimodal Math Reasoning in Multi-Visual Contexts

Peijie Wang, Zhong-Zhi Li, Fei Yin et al.

CVPR 2025arXiv:2502.20808

citations

#1318

Fair Text-to-Image Diffusion via Fair Mapping

Jia Li, Lijie Hu, Jingfeng Zhang et al.

AAAI 2025paperarXiv:2311.17695

citations

#1319

3D-HGS: 3D Half-Gaussian Splatting

Haolin Li, Jinyang Liu, Mario Sznaier et al.

CVPR 2025arXiv:2406.02720

citations

#1320

LION-FS: Fast & Slow Video-Language Thinker as Online Video Assistant

Wei Li, Bing Hu, Rui Shao et al.

CVPR 2025arXiv:2503.03663

citations

#1321

VideoWorld: Exploring Knowledge Learning from Unlabeled Videos

Zhongwei Ren, Yunchao Wei, Xun Guo et al.

CVPR 2025arXiv:2501.09781

citations

#1322

Unveiling the Impact of Coding Data Instruction Fine-Tuning on Large Language Models Reasoning

Xinlu Zhang, Zhiyu Zoey Chen, Xi Ye et al.

AAAI 2025paperarXiv:2405.20535

citations

#1323

gRNAde: Geometric Deep Learning for 3D RNA inverse design

Chaitanya Joshi, Arian Jamasb, Ramon Viñas et al.

ICLR 2025arXiv:2305.14749

citations

#1324

Revisiting text-to-image evaluation with Gecko: on metrics, prompts, and human rating

Olivia Wiles, Chuhan Zhang, Isabela Albuquerque et al.

ICLR 2025arXiv:2404.16820

citations

#1325

Exploring the Adversarial Vulnerabilities of Vision-Language-Action Models in Robotics

Taowen Wang, Cheng Han, James Liang et al.

ICCV 2025arXiv:2411.13587

citations

#1326

Gramian Multimodal Representation Learning and Alignment

Giordano Cicchetti, Eleonora Grassucci, Luigi Sigillo et al.

ICLR 2025arXiv:2412.11959

citations

#1327

Hyper-Connections

Defa Zhu, Hongzhi Huang, Zihao Huang et al.

ICLR 2025arXiv:2409.19606

citations

#1328

Reasoning Models Better Express Their Confidence

Dongkeun Yoon, Seungone Kim, Sohee Yang et al.

NEURIPS 2025arXiv:2505.14489

citations

#1329

Diverging Preferences: When do Annotators Disagree and do Models Know?

Michael Zhang, Zhilin Wang, Jena Hwang et al.

ICML 2025arXiv:2410.14632

citations

#1330

SPAR3D: Stable Point-Aware Reconstruction of 3D Objects from Single Images

Zixuan Huang, Mark Boss, Aaryaman Vasishta et al.

CVPR 2025arXiv:2501.04689

citations

#1331

DCEvo: Discriminative Cross-Dimensional Evolutionary Learning for Infrared and Visible Image Fusion

Jinyuan Liu, Bowei Zhang, Qingyun Mei et al.

CVPR 2025arXiv:2503.17673

citations

#1332

StyleMaster: Stylize Your Video with Artistic Generation and Translation

Zixuan Ye, Huijuan Huang, Xintao Wang et al.

CVPR 2025arXiv:2412.07744

citations

#1333

Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation

Sangmin Bae, Yujin Kim, Reza Bayat et al.

NEURIPS 2025arXiv:2507.10524

citations

#1334

Prompt Compression with Context-Aware Sentence Encoding for Fast and Improved LLM Inference

Barys Liskavets, Maxim Ushakov, Shuvendu Roy et al.

AAAI 2025paperarXiv:2409.01227

citations

#1335

Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awareness

Haochen Wang, Yucheng Zhao, Tiancai Wang et al.

ICCV 2025arXiv:2504.01901

citations

#1336

Scaling Wearable Foundation Models

Girish Narayanswamy, Xin Liu, Kumar Ayush et al.

ICLR 2025arXiv:2410.13638

citations

#1337

VidHalluc: Evaluating Temporal Hallucinations in Multimodal Large Language Models for Video Understanding

Chaoyu Li, Eun Woo Im, Pooyan Fazli

CVPR 2025arXiv:2412.03735

citations

#1338

Improved Noise Schedule for Diffusion Training

Tiankai Hang, Shuyang Gu, Jianmin Bao et al.

ICCV 2025arXiv:2407.03297

citations

#1339

Empowering LLMs to Understand and Generate Complex Vector Graphics

XiMing Xing, Juncheng Hu, Guotao Liang et al.

CVPR 2025arXiv:2412.11102

citations

#1340

Make It Count: Text-to-Image Generation with an Accurate Number of Objects

Lital Binyamin, Yoad Tewel, Hilit Segev et al.

CVPR 2025arXiv:2406.10210

citations

#1341

The dark side of the forces: assessing non-conservative force models for atomistic machine learning

Filippo Bigi, Marcel Langer, Michele Ceriotti

ICML 2025oralarXiv:2412.11569

citations

#1342

Steering Large Language Model Activations in Sparse Spaces

Reza Bayat, Ali Rahimi-Kalahroudi, Mohammad Pezeshki et al.

COLM 2025paperarXiv:2503.00177

citations

#1343

Towards Robust Alignment of Language Models: Distributionally Robustifying Direct Preference Optimization

Junkang Wu, Yuexiang Xie, Zhengyi Yang et al.

ICLR 2025arXiv:2407.07880

citations

#1344

PartGen: Part-level 3D Generation and Reconstruction with Multi-view Diffusion Models

Minghao Chen, Roman Shapovalov, Iro Laina et al.

CVPR 2025highlightarXiv:2412.18608

citations

#1345

Don't be lazy: CompleteP enables compute-efficient deep transformers

Nolan Dey, Bin Zhang, Lorenzo Noci et al.

NEURIPS 2025arXiv:2505.01618

citations

#1346

LLM4GEN: Leveraging Semantic Representation of LLMs for Text-to-Image Generation

Mushui Liu, Yuhang Ma, Zhen Yang et al.

AAAI 2025paperarXiv:2407.00737

citations

#1347

Dataset Distillation with Neural Characteristic Function: A Minmax Perspective

Shaobo Wang, Yicun Yang, Zhiyuan Liu et al.

CVPR 2025highlightarXiv:2502.20653

citations

#1348

DEFAME: Dynamic Evidence-based FAct-checking with Multimodal Experts

Tobias Braun, Mark Rothermel, Marcus Rohrbach et al.

ICML 2025oralarXiv:2412.10510

citations

#1349

High-Dimensional Prediction for Sequential Decision Making

Georgy Noarov, Ramya Ramalingam, Aaron Roth et al.

ICML 2025oralarXiv:2310.17651

citations

#1350

Resolving Multi-Condition Confusion for Finetuning-Free Personalized Image Generation

Qihan Huang, Siming Fu, Jinlong Liu et al.

AAAI 2025paperarXiv:2409.17920

citations

#1351

Stable-Hair: Real-World Hair Transfer via Diffusion Model

Yuxuan Zhang, Qing Zhang, Yiren Song et al.

AAAI 2025paperarXiv:2407.14078

citations

#1352

PointRWKV: Efficient RWKV-Like Model for Hierarchical Point Cloud Learning

Qingdong He, Jiangning Zhang, Jinlong Peng et al.

AAAI 2025paperarXiv:2405.15214

citations

#1353

Evolutionary Large Language Model for Automated Feature Transformation

Nanxu Gong, Chandan K Reddy, Wangyang Ying et al.

AAAI 2025paperarXiv:2405.16203

citations

#1354

Context-Alignment: Activating and Enhancing LLMs Capabilities in Time Series

Yuxiao Hu, Qian Li, Dongxiao Zhang et al.

ICLR 2025arXiv:2501.03747

citations

#1355

Harnessing Diversity for Important Data Selection in Pretraining Large Language Models

Chi Zhang, Huaping Zhong, Kuan Zhang et al.

ICLR 2025arXiv:2409.16986

citations

#1356

Archetypal SAE: Adaptive and Stable Dictionary Learning for Concept Extraction in Large Vision Models

Thomas Fel, Ekdeep Singh Lubana, Jacob Prince et al.

ICML 2025arXiv:2502.12892

citations

#1357

SRPO: Enhancing Multimodal LLM Reasoning via Reflection-Aware Reinforcement Learning

Zhongwei Wan, Zhihao Dou, Che Liu et al.

NEURIPS 2025arXiv:2506.01713

citations

#1358

Proposer-Agent-Evaluator (PAE): Autonomous Skill Discovery For Foundation Model Internet Agents

Yifei Zhou, Qianlan Yang, Kaixiang Lin et al.

ICML 2025arXiv:2412.13194

citations

#1359

Think Silently, Think Fast: Dynamic Latent Compression of LLM Reasoning Chains

Wenhui Tan, Jiaze Li, Jianzhong Ju et al.

NEURIPS 2025arXiv:2505.16552

citations

#1360

Closed-Loop Supervised Fine-Tuning of Tokenized Traffic Models

Zhejun Zhang, Peter Karkus, Maximilian Igl et al.

CVPR 2025arXiv:2412.05334

citations

#1361

Cross-modal Information Flow in Multimodal Large Language Models

Zhi Zhang, Srishti Yadav, Fengze Han et al.

CVPR 2025arXiv:2411.18620

citations

#1362

WildGS-SLAM: Monocular Gaussian Splatting SLAM in Dynamic Environments

Jianhao Zheng, Zihan Zhu, Valentin Bieri et al.

CVPR 2025arXiv:2504.03886

citations

#1363

Walk the Talk? Measuring the Faithfulness of Large Language Model Explanations

Katie Matton, Robert Ness, John Guttag et al.

ICLR 2025arXiv:2504.14150

citations

#1364

Diffusion Renderer: Neural Inverse and Forward Rendering with Video Diffusion Models

Ruofan Liang, Žan Gojčič, Huan Ling et al.

CVPR 2025

citations

#1365

OMEGA: Can LLMs Reason Outside the Box in Math? Evaluating Exploratory, Compositional, and Transformative Generalization

Yiyou Sun, Shawn Hu, Georgia Zhou et al.

NEURIPS 2025arXiv:2506.18880

citations

#1366

Fast Solvers for Discrete Diffusion Models: Theory and Applications of High-Order Algorithms

Yinuo Ren, Haoxuan Chen, Yuchen Zhu et al.

NEURIPS 2025arXiv:2502.00234

citations

#1367

CAKE: Cascading and Adaptive KV Cache Eviction with Layer Preferences

Ziran Qin, Yuchen Cao, Mingbao Lin et al.

ICLR 2025oralarXiv:2503.12491

citations

#1368

TIS-DPO: Token-level Importance Sampling for Direct Preference Optimization With Estimated Weights

Aiwei Liu, Haoping Bai, Zhiyun Lu et al.

ICLR 2025arXiv:2410.04350

citations

#1369

STBLLM: Breaking the 1-Bit Barrier with Structured Binary LLMs

Peijie Dong, Lujun Li, Yuedong Zhong et al.

ICLR 2025arXiv:2408.01803

citations

#1370

ACPBench: Reasoning About Action, Change, and Planning

Harsha Kokel, Michael Katz, Kavitha Srinivas et al.

AAAI 2025paperarXiv:2410.05669

citations

#1371

Eagle 2.5: Boosting Long-Context Post-Training for Frontier Vision-Language Models

Guo Chen, Zhiqi Li, Shihao Wang et al.

NEURIPS 2025arXiv:2504.15271

citations

#1372

DocKylin: A Large Multimodal Model for Visual Document Understanding with Efficient Visual Slimming

Jiaxin Zhang, Wentao Yang, Songxuan Lai et al.

AAAI 2025paperarXiv:2406.19101

citations

#1373

Self-Boosting Large Language Models with Synthetic Preference Data

Qingxiu Dong, Li Dong, Xingxing Zhang et al.

ICLR 2025arXiv:2410.06961

citations

#1374

ICLR: In-Context Learning of Representations

Core Francisco Park, Andrew Lee, Ekdeep Singh Lubana et al.

ICLR 2025arXiv:2501.00070

citations

#1375

Complexity Experts are Task-Discriminative Learners for Any Image Restoration

Eduard Zamfir, Zongwei Wu, Nancy Mehta et al.

CVPR 2025arXiv:2411.18466

citations

#1376

DarkIR: Robust Low-Light Image Restoration

Daniel Feijoo, Juan C. Benito, Alvaro Garcia et al.

CVPR 2025arXiv:2412.13443

citations

#1377

Any2AnyTryon: Leveraging Adaptive Position Embeddings for Versatile Virtual Clothing Tasks

Hailong Guo, Bohan Zeng, Yiren Song et al.

ICCV 2025arXiv:2501.15891

citations

#1378

Reasoning by Superposition: A Theoretical Perspective on Chain of Continuous Thought

Hanlin Zhu, Shibo Hao, Zhiting Hu et al.

NEURIPS 2025arXiv:2505.12514

citations

#1379

Number Cookbook: Number Understanding of Language Models and How to Improve It

Haotong Yang, Yi Hu, Shijia Kang et al.

ICLR 2025arXiv:2411.03766

citations

#1380

Checklists Are Better Than Reward Models For Aligning Language Models

Vijay Viswanathan, Yanchao Sun, Xiang Kong et al.

NEURIPS 2025spotlightarXiv:2507.18624

citations

#1381

The Geometry of Refusal in Large Language Models: Concept Cones and Representational Independence

Tom Wollschläger, Jannes Elstner, Simon Geisler et al.

ICML 2025arXiv:2502.17420

citations

#1382

Perceive Anything: Recognize, Explain, Caption, and Segment Anything in Images and Videos

Weifeng Lin, Xinyu Wei, Ruichuan An et al.

NEURIPS 2025arXiv:2506.05302

citations

#1383

InverseBench: Benchmarking Plug-and-Play Diffusion Priors for Inverse Problems in Physical Sciences

Hongkai Zheng, Wenda Chu, Bingliang Zhang et al.

ICLR 2025arXiv:2503.11043

citations

#1384

VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling

Zeyue Tian, Zhaoyang Liu, Ruibin Yuan et al.

CVPR 2025arXiv:2406.04321

citations

#1385

Self-Expansion of Pre-trained Models with Mixture of Adapters for Continual Learning

Huiyi Wang, Haodong Lu, Lina Yao et al.

CVPR 2025arXiv:2403.18886

citations

#1386

CityGaussianV2: Efficient and Geometrically Accurate Reconstruction for Large-Scale Scenes

Yang Liu, Chuanchen Luo, Zhongkai Mao et al.

ICLR 2025arXiv:2411.00771

citations

#1387

OmniPhysGS: 3D Constitutive Gaussians for General Physics-Based Dynamics Generation

Yuchen Lin, Chenguo Lin, Jianjin Xu et al.

ICLR 2025arXiv:2501.18982

citations

#1388

Retrieval-Augmented Generation with Conflicting Evidence

Han Wang, Archiki Prasad, Elias Stengel-Eskin et al.

COLM 2025paperarXiv:2504.13079

citations

#1389

From Reflection to Perfection: Scaling Inference-Time Optimization for Text-to-Image Diffusion Models via Reflection Tuning

Le Zhuo, Liangbing Zhao, Sayak Paul et al.

ICCV 2025arXiv:2504.16080

citations

#1390

Beyond Linear Approximations: A Novel Pruning Approach for Attention Matrix

Yingyu Liang, Jiangxuan Long, Zhenmei Shi et al.

ICLR 2025arXiv:2410.11261

citations

#1391

Ambient Diffusion Posterior Sampling: Solving Inverse Problems with Diffusion Models Trained on Corrupted Data

Asad Aali, Giannis Daras, Brett Levac et al.

ICLR 2025arXiv:2403.08728

citations

#1392

Policy Decorator: Model-Agnostic Online Refinement for Large Policy Model

Xiu Yuan, Tongzhou Mu, Stone Tao et al.

ICLR 2025arXiv:2412.13630

citations

#1393

MoDeGPT: Modular Decomposition for Large Language Model Compression

Chi-Heng Lin, Shangqian Gao, James Smith et al.

ICLR 2025arXiv:2408.09632

citations

#1394

Automated Red Teaming with GOAT: the Generative Offensive Agent Tester

Maya Pavlova, Erik Brinkman, Krithika Iyer et al.

ICML 2025arXiv:2410.01606

citations

#1395

MonSter: Marry Monodepth to Stereo Unleashes Power

JunDa Cheng, Longliang Liu, Gangwei Xu et al.

CVPR 2025highlight

citations

#1396

LT3SD: Latent Trees for 3D Scene Diffusion

Quan Meng, Lei Li, Matthias Nießner et al.

CVPR 2025arXiv:2409.08215

citations

#1397

From Isolated Conversations to Hierarchical Schemas: Dynamic Tree Memory Representation for LLMs

Alireza Rezazadeh, Zichao Li, Wei Wei et al.

ICLR 2025arXiv:2410.14052

citations

#1398

From Tokens to Words: On the Inner Lexicon of LLMs

Guy Kaplan, Matanel Oren, Yuval Reif et al.

ICLR 2025arXiv:2410.05864

citations

#1399

Unlocking Multimodal Mathematical Reasoning via Process Reward Model

Ruilin Luo, Zhuofan Zheng, Lei Wang et al.

NEURIPS 2025arXiv:2501.04686

citations

#1400

SageAttention3: Microscaling FP4 Attention for Inference and An Exploration of 8-Bit Training

Jintao Zhang, Jia wei, Haoxu Wang et al.

NEURIPS 2025spotlightarXiv:2505.11594

citations

← Previous

1...5 6 7 8 9...112