Most Cited 2025 "grasping motion generation" Papers

22,274 papers found • Page 21 of 112

#4001

VORTA: Efficient Video Diffusion via Routing Sparse Attention

Wenhao Sun, Rong-Cheng Tu, Yifu Ding et al.

NEURIPS 2025arXiv:2505.18809
12
citations
#4002

Strategy Coopetition Explains the Emergence and Transience of In-Context Learning

Aaditya Singh, Ted Moskovitz, Sara Dragutinović et al.

ICML 2025oralarXiv:2503.05631
12
citations
#4003

MaskControl: Spatio-Temporal Control for Masked Motion Synthesis

Ekkasit Pinyoanuntapong, Muhammad Usama Saleem, Korrawe Karunratanakul et al.

ICCV 2025arXiv:2410.10780
12
citations
#4004

GaussianSpa: An “Optimizing-Sparsifying” Simplification Framework for Compact and High-Quality 3D Gaussian Splatting

Yangming Zhang, Wenqi Jia, Wei Niu et al.

CVPR 2025arXiv:2411.06019
12
citations
#4005

POGEMA: A Benchmark Platform for Cooperative Multi-Agent Pathfinding

Alexey Skrynnik, Anton Andreychuk, Anatolii Borzilov et al.

ICLR 2025arXiv:2407.14931
12
citations
#4006

Learning-Order Autoregressive Models with Application to Molecular Graph Generation

Zhe Wang, Jiaxin Shi, Nicolas Heess et al.

ICML 2025arXiv:2503.05979
12
citations
#4007

From Commands to Prompts: LLM-based Semantic File System for AIOS

Zeru Shi, Kai Mei, Mingyu Jin et al.

ICLR 2025arXiv:2410.11843
12
citations
#4008

LoftUp: Learning a Coordinate-Based Feature Upsampler for Vision Foundation Models

Haiwen Huang, Anpei Chen, Volodymyr Havrylov et al.

ICCV 2025arXiv:2504.14032
12
citations
#4009

LiveXiv - A Multi-Modal live benchmark based on Arxiv papers content

Nimrod Shabtay, Felipe Maia Polo, Sivan Doveh et al.

ICLR 2025arXiv:2410.10783
12
citations
#4010

DexHandDiff: Interaction-aware Diffusion Planning for Adaptive Dexterous Manipulation

Zhixuan Liang, Yao Mu, Yixiao Wang et al.

CVPR 2025arXiv:2411.18562
12
citations
#4011

Diffusion Models for Attribution

Xiongren Chen, Jiuyong Li, Jixue Liu et al.

AAAI 2025paperarXiv:2403.14790
12
citations
#4012

Efficient 3D Recognition with Event-driven Spike Sparse Convolution

Xuerui Qiu, Man Yao, Jieyuan Zhang et al.

AAAI 2025paperarXiv:2412.07360
12
citations
#4013

Proxy Denoising for Source-Free Domain Adaptation

Song Tang, Wenxin Su, Yan Gan et al.

ICLR 2025arXiv:2406.01658
12
citations
#4014

Exploring and Mitigating Adversarial Manipulation of Voting-Based Leaderboards

Yangsibo Huang, Milad Nasr, Anastasios Angelopoulos et al.

ICML 2025oralarXiv:2501.07493
12
citations
#4015

Advancing Spiking Neural Networks Towards Multiscale Spatiotemporal Interaction Learning

Yimeng Shan, Malu Zhang, Rui-jie Zhu et al.

AAAI 2025paperarXiv:2405.13672
12
citations
#4016

Latent Space Super-Resolution for Higher-Resolution Image Generation with Diffusion Models

Jinho Jeong, Sangmin Han, Jinwoo Kim et al.

CVPR 2025arXiv:2503.18446
12
citations
#4017

ReSi: A Comprehensive Benchmark for Representational Similarity Measures

Max Klabunde, Tassilo Wald, Tobias Schumacher et al.

ICLR 2025arXiv:2408.00531
12
citations
#4018

Emphasizing Discriminative Features for Dataset Distillation in Complex Scenarios

Kai Wang, Zekai Li, Zhi-Qi Cheng et al.

CVPR 2025arXiv:2410.17193
12
citations
#4019

Conformal Thresholded Intervals for Efficient Regression

Rui Luo, Zhixin Zhou

AAAI 2025paperarXiv:2407.14495
12
citations
#4020

TGB-Seq Benchmark: Challenging Temporal GNNs with Complex Sequential Dynamics

Lu Yi, Jie Peng, Yanping Zheng et al.

ICLR 2025oralarXiv:2502.02975
12
citations
#4021

Revisiting Tampered Scene Text Detection in the Era of Generative AI

Chenfan Qu, Yiwu Zhong, Fengjun Guo et al.

AAAI 2025paperarXiv:2407.21422
12
citations
#4022

Mr. DETR: Instructive Multi-Route Training for Detection Transformers

Chang-Bin Zhang, Yujie Zhong, Kai Han

CVPR 2025
12
citations
#4023

LinPrim: Linear Primitives for Differentiable Volumetric Rendering

Nicolas von Lützow, Matthias Niessner

NEURIPS 2025arXiv:2501.16312
12
citations
#4024

QoQ-Med: Building Multimodal Clinical Foundation Models with Domain-Aware GRPO Training

David Dai, Peilin Chen, Chanakya Ekbote et al.

NEURIPS 2025oralarXiv:2506.00711
12
citations
#4025

Test-Time Backdoor Detection for Object Detection Models

Hangtao Zhang, Yichen Wang, Shihui Yan et al.

CVPR 2025arXiv:2503.15293
12
citations
#4026

Exploring the Effectiveness of Object-Centric Representations in Visual Question Answering: Comparative Insights with Foundation Models

Amir Mohammad Karimi Mamaghan, Samuele Papa, Karl H. Johansson et al.

ICLR 2025arXiv:2407.15589
12
citations
#4027

Accelerating Large Language Model Reasoning via Speculative Search

Zhihai Wang, Jie Wang, Jilai Pan et al.

ICML 2025arXiv:2505.02865
12
citations
#4028

Graph Mixture of Experts and Memory-augmented Routers for Multivariate Time Series Anomaly Detection

Xiaoyu Huang, Weidong Chen, Bo Hu et al.

AAAI 2025paperarXiv:2412.19108
12
citations
#4029

CLIMB: Data Foundations for Large Scale Multimodal Clinical Foundation Models

David Dai, Peilin Chen, Malinda Lu et al.

ICML 2025oralarXiv:2503.07667
12
citations
#4030

Potemkin Understanding in Large Language Models

Marina Mancoridis, Bec Weeks, Keyon Vafa et al.

ICML 2025arXiv:2506.21521
12
citations
#4031

ROCKET-1: Mastering Open-World Interaction with Visual-Temporal Context Prompting

Shaofei Cai, Zihao Wang, Kewei Lian et al.

CVPR 2025arXiv:2410.17856
12
citations
#4032

Law of the Weakest Link: Cross Capabilities of Large Language Models

Ming Zhong, Aston Zhang, Xuewei Wang et al.

ICLR 2025arXiv:2409.19951
12
citations
#4033

From Pixels to Tokens: Byte-Pair Encoding on Quantized Visual Modalities

Wanpeng Zhang, Zilong Xie, Yicheng Feng et al.

ICLR 2025arXiv:2410.02155
12
citations
#4034

Exploring Vacant Classes in Label-Skewed Federated Learning

Kuangpu Guo, Yuhe Ding, Jian Liang et al.

AAAI 2025paperarXiv:2401.02329
12
citations
#4035

VideoICL: Confidence-based Iterative In-context Learning for Out-of-Distribution Video Understanding

Kangsan Kim, Geon Park, Youngwan Lee et al.

CVPR 2025arXiv:2412.02186
12
citations
#4036

A Meta-Learning Approach to Bayesian Causal Discovery

Anish Dhir, Matthew Ashman, James Requeima et al.

ICLR 2025arXiv:2412.16577
12
citations
#4037

Skill Expansion and Composition in Parameter Space

Tenglong Liu, Jianxiong Li, Yinan Zheng et al.

ICLR 2025arXiv:2502.05932
12
citations
#4038

Latent-EnSF: A Latent Ensemble Score Filter for High-Dimensional Data Assimilation with Sparse Observation Data

Phillip Si, Peng Chen

ICLR 2025arXiv:2409.00127
12
citations
#4039

Decoupled Distillation to Erase: A General Unlearning Method for Any Class-centric Tasks

Yu Zhou, Dian Zheng, Qijie Mo et al.

CVPR 2025highlightarXiv:2503.23751
12
citations
#4040

Speak Easy: Eliciting Harmful Jailbreaks from LLMs with Simple Interactions

Yik Siu Chan, Narutatsu Ri, Yuxin Xiao et al.

ICML 2025arXiv:2502.04322
12
citations
#4041

MMEgo: Towards Building Egocentric Multimodal LLMs for Video QA

Hanrong Ye, Haotian Zhang, Erik Daxberger et al.

ICLR 2025
12
citations
#4042

RAT: Adversarial Attacks on Deep Reinforcement Agents for Targeted Behaviors

Fengshuo Bai, Runze Liu, Yali Du et al.

AAAI 2025paperarXiv:2412.10713
12
citations
#4043

OmniCount: Multi-label Object Counting with Semantic-Geometric Priors

Anindya Mondal, Sauradip Nag, Xiatian Zhu et al.

AAAI 2025paperarXiv:2403.05435
12
citations
#4044

Certified Unlearning for Neural Networks

Anastasiia Koloskova, Youssef Allouah, Animesh Jha et al.

ICML 2025arXiv:2506.06985
12
citations
#4045

No Metric to Rule Them All: Toward Principled Evaluations of Graph-Learning Datasets

Corinna Coupette, Jeremy Wayland, Emily Simons et al.

ICML 2025arXiv:2502.02379
12
citations
#4046

SpaceGNN: Multi-Space Graph Neural Network for Node Anomaly Detection with Extremely Limited Labels

Xiangyu Dong, Xingyi Zhang, Lei Chen et al.

ICLR 2025arXiv:2502.03201
12
citations
#4047

Efficient Rectification of Neuro-Symbolic Reasoning Inconsistencies by Abductive Reflection

Wen-Chao Hu, Wang-Zhou Dai, Yuan Jiang et al.

AAAI 2025paperarXiv:2412.08457
12
citations
#4048

Debiased Multimodal Understanding for Human Language Sequences

Zhi Xu, Dingkang Yang, Mingcheng Li et al.

AAAI 2025paperarXiv:2403.05025
12
citations
#4049

Two by Two: Learning Multi-Task Pairwise Objects Assembly for Generalizable Robot Manipulation

Yu Qi, Yuanchen Ju, Tianming Wei et al.

CVPR 2025arXiv:2504.06961
12
citations
#4050

VAGEN: Reinforcing World Model Reasoning for Multi-Turn VLM Agents

Kangrui Wang, Pingyue Zhang, Zihan Wang et al.

NEURIPS 2025arXiv:2510.16907
12
citations
#4051

VinTAGe: Joint Video and Text Conditioning for Holistic Audio Generation

Saksham Singh Kushwaha, Yapeng Tian

CVPR 2025arXiv:2412.10768
12
citations
#4052

LightningDrag: Lightning Fast and Accurate Drag-based Image Editing Emerging from Videos

Yujun Shi, Jun Hao Liew, Hanshu Yan et al.

ICML 2025arXiv:2405.13722
12
citations
#4053

Learning Personalized Decision Support Policies

Umang Bhatt, Valerie Chen, Katherine M. Collins et al.

AAAI 2025paperarXiv:2304.06701
12
citations
#4054

GeoLLaVA-8K: Scaling Remote-Sensing Multimodal Large Language Models to 8K Resolution

Fengxiang Wang, Mingshuo Chen, Yueying Li et al.

NEURIPS 2025spotlightarXiv:2505.21375
12
citations
#4055

Test-Time Scaling of Diffusion Models via Noise Trajectory Search

Vignav Ramesh, Morteza Mardani

NEURIPS 2025arXiv:2506.03164
12
citations
#4056

LoRACLR: Contrastive Adaptation for Customization of Diffusion Models

Enis Simsar, Thomas Hofmann, Federico Tombari et al.

CVPR 2025arXiv:2412.09622
12
citations
#4057

MCCD: Multi-Agent Collaboration-based Compositional Diffusion for Complex Text-to-Image Generation

Mingcheng Li, Xiaolu Hou, Ziyang Liu et al.

CVPR 2025arXiv:2505.02648
12
citations
#4058

Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment

Chenghao Fan, zhenyi lu, Sichen Liu et al.

ICML 2025arXiv:2502.16894
12
citations
#4059

Achieving Dimension-Free Communication in Federated Learning via Zeroth-Order Optimization

Zhe Li, Bicheng Ying, Zidong Liu et al.

ICLR 2025arXiv:2405.15861
12
citations
#4060

Improving the Sparse Structure Learning of Spiking Neural Networks from the View of Compression Efficiency

Jiangrong Shen, Qi Xu, Gang Pan et al.

ICLR 2025arXiv:2502.13572
12
citations
#4061

LeanAgent: Lifelong Learning for Formal Theorem Proving

Adarsh Kumarappan, Mohit Tiwari, Peiyang Song et al.

ICLR 2025arXiv:2410.06209
12
citations
#4062

MaestroMotif: Skill Design from Artificial Intelligence Feedback

Martin Klissarov, Mikael Henaff, Roberta Raileanu et al.

ICLR 2025arXiv:2412.08542
12
citations
#4063

Everything, Everywhere, All at Once: Is Mechanistic Interpretability Identifiable?

Maxime Méloux, Silviu Maniu, François Portet et al.

ICLR 2025arXiv:2502.20914
12
citations
#4064

SharpDepth: Sharpening Metric Depth Predictions Using Diffusion Distillation

Duc-Hai Pham, Tung Do, Phong Nguyen et al.

CVPR 2025arXiv:2411.18229
12
citations
#4065

Bongard in Wonderland: Visual Puzzles that Still Make AI Go Mad?

Antonia Wüst, Tim Woydt, Lukas Helff et al.

ICML 2025arXiv:2410.19546
12
citations
#4066

Manifold Learning by Mixture Models of VAEs for Inverse Problems

Giovanni S. Alberti, Johannes Hertrich, Matteo Santacesaria et al.

ICLR 2025arXiv:2303.15244
12
citations
#4067

MoEE: Mixture of Emotion Experts for Audio-Driven Portrait Animation

Huaize Liu, WenZhang Sun, Donglin Di et al.

CVPR 2025arXiv:2501.01808
12
citations
#4068

In-context Time Series Predictor

Jiecheng Lu, Yan Sun, Shihao Yang

ICLR 2025arXiv:2405.14982
12
citations
#4069

Generative Classifiers Avoid Shortcut Solutions

Alexander Li, Ananya Kumar, Deepak Pathak

ICLR 2025arXiv:2512.25034
12
citations
#4070

Flow-Based Policy for Online Reinforcement Learning

Lei Lv, Yunfei Li, Yu Luo et al.

NEURIPS 2025arXiv:2506.12811
12
citations
#4071

P-SPIKESSM: HARNESSING PROBABILISTIC SPIKING STATE SPACE MODELS FOR LONG-RANGE DEPENDENCY TASKS

Malyaban Bal, Abhronil Sengupta

ICLR 2025arXiv:2406.02923
12
citations
#4072

Flowing from Words to Pixels: A Noise-Free Framework for Cross-Modality Evolution

Qihao Liu, Xi Yin, Alan L. Yuille et al.

CVPR 2025highlightarXiv:2412.15213
12
citations
#4073

SpiritSight Agent: Advanced GUI Agent with One Look

Zhiyuan Huang, Ziming Cheng, Junting Pan et al.

CVPR 2025arXiv:2503.03196
12
citations
#4074

HiP-AD: Hierarchical and Multi-Granularity Planning with Deformable Attention for Autonomous Driving in a Single Decoder

Yingqi Tang, Zhuoran Xu, Zhaotie Meng et al.

ICCV 2025arXiv:2503.08612
12
citations
#4075

SwiftEdit: Lightning Fast Text-Guided Image Editing via One-Step Diffusion

Trong-Tung Nguyen, Quang Nguyen, Khoi Nguyen et al.

CVPR 2025arXiv:2412.04301
12
citations
#4076

Learning Adversarial MDPs with Stochastic Hard Constraints

Francesco Emanuele Stradi, Matteo Castiglioni, Alberto Marchesi et al.

ICML 2025arXiv:2403.03672
12
citations
#4077

Towards Trustworthy Knowledge Graph Reasoning: An Uncertainty Aware Perspective

Bo Ni, Yu Wang, Lu Cheng et al.

AAAI 2025paperarXiv:2410.08985
12
citations
#4078

Momentum-GS: Momentum Gaussian Self-Distillation for High-Quality Large Scene Reconstruction

JIXUAN FAN, Wanhua Li, Yifei Han et al.

ICCV 2025arXiv:2412.04887
12
citations
#4079

AKiRa: Augmentation Kit on Rays for Optical Video Generation

Xi Wang, Robin Courant, Marc Christie et al.

CVPR 2025arXiv:2412.14158
12
citations
#4080

Solving New Tasks by Adapting Internet Video Knowledge

Calvin Luo, Zilai Zeng, Yilun Du et al.

ICLR 2025arXiv:2504.15369
12
citations
#4081

Is Your World Simulator a Good Story Presenter? A Consecutive Events-Based Benchmark for Future Long Video Generation

Yiping Wang, Xuehai He, Kuan Wang et al.

CVPR 2025arXiv:2412.16211
12
citations
#4082

QP-SNN: Quantized and Pruned Spiking Neural Networks

Wenjie Wei, Malu Zhang, Zijian Zhou et al.

ICLR 2025oralarXiv:2502.05905
12
citations
#4083

AdvPrefix: An Objective for Nuanced LLM Jailbreaks

Sicheng Zhu, Brandon Amos, Yuandong Tian et al.

NEURIPS 2025arXiv:2412.10321
12
citations
#4084

STD-PLM: Understanding Both Spatial and Temporal Properties of Spatial-Temporal Data with PLM

Yiheng Huang, Xiaowei Mao, Shengnan Guo et al.

AAAI 2025paperarXiv:2407.09096
12
citations
#4085

Exploring the limits of strong membership inference attacks on large language models

Jamie Hayes, I Shumailov, Christopher A. Choquette-Choo et al.

NEURIPS 2025arXiv:2505.18773
12
citations
#4086

An OpenMind for 3D Medical Vision Self-supervised Learning

Tassilo Wald, Constantin Ulrich, Jonathan Suprijadi et al.

ICCV 2025arXiv:2412.17041
12
citations
#4087

Advancing Mathematical Reasoning in Language Models: The Impact of Problem-Solving Data, Data Synthesis Methods, and Training Stages

Zui Chen, Tianqiao Liu, Tongqing et al.

ICLR 2025arXiv:2501.14002
12
citations
#4088

Edge Prompt Tuning for Graph Neural Networks

Xingbo Fu, Yinhan He, Jundong Li

ICLR 2025arXiv:2503.00750
12
citations
#4089

DiffPuter: Empowering Diffusion Models for Missing Data Imputation

Hengrui Zhang, Liancheng Fang, Qitian Wu et al.

ICLR 2025arXiv:2405.20690
12
citations
#4090

Accelerating RL for LLM Reasoning with Optimal Advantage Regression

Kianté Brantley, Mingyu Chen, Zhaolin Gao et al.

NEURIPS 2025arXiv:2505.20686
12
citations
#4091

PSHuman: Photorealistic Single-image 3D Human Reconstruction using Cross-Scale Multiview Diffusion and Explicit Remeshing

Peng Li, Wangguandong Zheng, Yuan Liu et al.

CVPR 2025arXiv:2409.10141
12
citations
#4092

Bag of Tricks for Inference-time Computation of LLM Reasoning

Fan LIU, Wen-Shuo Chao, Naiqiang Tan et al.

NEURIPS 2025arXiv:2502.07191
12
citations
#4093

InsightEdit: Towards Better Instruction Following for Image Editing

Yingjing Xu, Jie Kong, Jiazhi Wang et al.

CVPR 2025arXiv:2411.17323
12
citations
#4094

Black-Box Adversarial Attacks on LLM-Based Code Completion

Slobodan Jenko, Niels Mündler, Jingxuan He et al.

ICML 2025arXiv:2408.02509
12
citations
#4095

Detecting Backdoor Samples in Contrastive Language Image Pretraining

Hanxun Huang, Sarah Erfani, Yige Li et al.

ICLR 2025arXiv:2502.01385
11
citations
#4096

CholecTrack20: A Multi-Perspective Tracking Dataset for Surgical Tools

Chinedu Innocent Nwoye, Kareem elgohary, Anvita A. Srinivas et al.

CVPR 2025arXiv:2312.07352
11
citations
#4097

Jailbreaking as a Reward Misspecification Problem

Zhihui Xie, Jiahui Gao, Lei Li et al.

ICLR 2025arXiv:2406.14393
11
citations
#4098

MM-Agent: LLM as Agents for Real-world Mathematical Modeling Problem

Fan LIU, Zherui Yang, Cancheng Liu et al.

NEURIPS 2025arXiv:2505.14148
11
citations
#4099

Dual Process Learning: Controlling Use of In-Context vs. In-Weights Strategies with Weight Forgetting

Suraj Anand, Michael Lepori, Jack Merullo et al.

ICLR 2025arXiv:2406.00053
11
citations
#4100

The Utility and Complexity of In- and Out-of-Distribution Machine Unlearning

Youssef Allouah, Joshua Kazdan, Rachid Guerraoui et al.

ICLR 2025arXiv:2412.09119
11
citations
#4101

Phoenix: A Motion-based Self-Reflection Framework for Fine-grained Robotic Action Correction

Wenke Xia, Ruoxuan Feng, Dong Wang et al.

CVPR 2025arXiv:2504.14588
11
citations
#4102

Understanding Virtual Nodes: Oversquashing and Node Heterogeneity

Joshua Southern, Francesco Di Giovanni, Michael Bronstein et al.

ICLR 2025arXiv:2405.13526
11
citations
#4103

Memory Mosaics

Jianyu Zhang, Niklas Nolte, Ranajoy Sadhukhan et al.

ICLR 2025arXiv:2405.06394
11
citations
#4104

LATINO-PRO: LAtent consisTency INverse sOlver with PRompt Optimization

Alessio Spagnoletti, Jean Prost, Andres Almansa et al.

ICCV 2025arXiv:2503.12615
11
citations
#4105

TEASER: Token Enhanced Spatial Modeling for Expressions Reconstruction

Yunfei Liu, Lei Zhu, Lijian Lin et al.

ICLR 2025arXiv:2502.10982
11
citations
#4106

Improving Your Model Ranking on Chatbot Arena by Vote Rigging

Rui Min, Tianyu Pang, Chao Du et al.

ICML 2025arXiv:2501.17858
11
citations
#4107

AV-Link: Temporally-Aligned Diffusion Features for Cross-Modal Audio-Video Generation

Moayed Haji-Ali, Willi Menapace, Aliaksandr Siarohin et al.

ICCV 2025arXiv:2412.15191
11
citations
#4108

CausalPFN: Amortized Causal Effect Estimation via In-Context Learning

Vahid Balazadeh, Hamidreza Kamkari, Valentin Thomas et al.

NEURIPS 2025spotlightarXiv:2506.07918
11
citations
#4109

EMOE: Modality-Specific Enhanced Dynamic Emotion Experts

Yiyang Fang, Wenke Huang, Guancheng Wan et al.

CVPR 2025
11
citations
#4110

Differentiable Optimization of Similarity Scores Between Models and Brains

Nathan Cloos, Moufan Li, Markus Siegel et al.

ICLR 2025arXiv:2407.07059
11
citations
#4111

Gazing Into Missteps: Leveraging Eye-Gaze for Unsupervised Mistake Detection in Egocentric Videos of Skilled Human Activities

Michele Mazzamuto, Antonino Furnari, Yoichi Sato et al.

CVPR 2025arXiv:2406.08379
11
citations
#4112

MIMO: A Medical Vision Language Model with Visual Referring Multimodal Input and Pixel Grounding Multimodal Output

Yanyuan Chen, Dexuan Xu, Yu Huang et al.

CVPR 2025arXiv:2510.10011
11
citations
#4113

Understanding Model Calibration - A gentle introduction and visual exploration of calibration and the expected calibration error (ECE)

Maja Pavlovic

ICLR 2025arXiv:2501.19047
11
citations
#4114

R-Sparse: Rank-Aware Activation Sparsity for Efficient LLM Inference

Zhenyu Zhang, Zechun Liu, Yuandong Tian et al.

ICLR 2025arXiv:2504.19449
11
citations
#4115

GraphGPT: Generative Pre-trained Graph Eulerian Transformer

Qifang Zhao, Weidong Ren, Tianyu Li et al.

ICML 2025arXiv:2401.00529
11
citations
#4116

Improving the Scaling Laws of Synthetic Data with Deliberate Practice

Reyhane Askari Hemmat, Mohammad Pezeshki, Elvis Dohmatob et al.

ICML 2025oralarXiv:2502.15588
11
citations
#4117

Revolve: Optimizing AI Systems by Tracking Response Evolution in Textual Optimization

Peiyan Zhang, Haibo Jin, Leyang Hu et al.

ICML 2025arXiv:2412.03092
11
citations
#4118

Strategist: Self-improvement of LLM Decision Making via Bi-Level Tree Search

Jonathan Light, Min Cai, Weiqin Chen et al.

ICLR 2025arXiv:2408.10635
11
citations
#4119

SMamba: Sparse Mamba for Event-based Object Detection

Nan Yang, Yang Wang, Zhanwen Liu et al.

AAAI 2025paperarXiv:2501.11971
11
citations
#4120

The Lottery LLM Hypothesis, Rethinking What Abilities Should LLM Compression Preserve?

Zhenheng Tang, Xiang Liu, Qian Wang et al.

ICLR 2025arXiv:2502.17535
11
citations
#4121

Hierarchical Equivariant Policy via Frame Transfer

Haibo Zhao, Dian Wang, Yizhe Zhu et al.

ICML 2025arXiv:2502.05728
11
citations
#4122

FALCON: Resolving Visual Redundancy and Fragmentation in High-resolution Multimodal Large Language Models via Visual Registers

Renshan Zhang, Rui Shao, Gongwei Chen et al.

ICCV 2025arXiv:2501.16297
11
citations
#4123

DiTFastAttnV2: Head-wise Attention Compression for Multi-Modality Diffusion Transformers

Hanling Zhang, Rundong Su, Zhihang Yuan et al.

ICCV 2025arXiv:2503.22796
11
citations
#4124

GenPC: Zero-shot Point Cloud Completion via 3D Generative Priors

An Li, Zhe Zhu, Mingqiang Wei

CVPR 2025arXiv:2502.19896
11
citations
#4125

Geometry Field Splatting with Gaussian Surfels

Kaiwen Jiang, Venkataram Sivaram, Cheng Peng et al.

CVPR 2025arXiv:2411.17067
11
citations
#4126

CellFlux: Simulating Cellular Morphology Changes via Flow Matching

Yuhui Zhang, Yuchang Su, Chenyu Wang et al.

ICML 2025arXiv:2502.09775
11
citations
#4127

Anyprefer: An Agentic Framework for Preference Data Synthesis

Yiyang Zhou, Zhaoyang Wang, Tianle Wang et al.

ICLR 2025arXiv:2504.19276
11
citations
#4128

SePer: Measure Retrieval Utility Through The Lens Of Semantic Perplexity Reduction

Lu Dai, Yijie Xu, Jinhui Ye et al.

ICLR 2025arXiv:2503.01478
11
citations
#4129

NeRFPrior: Learning Neural Radiance Field as a Prior for Indoor Scene Reconstruction

Wenyuan Zhang, Emily Yue-ting Jia, Junsheng Zhou et al.

CVPR 2025highlightarXiv:2503.18361
11
citations
#4130

MatAnyone: Stable Video Matting with Consistent Memory Propagation

Peiqing Yang, Shangchen Zhou, Jixin Zhao et al.

CVPR 2025arXiv:2501.14677
11
citations
#4131

Training Nonlinear Transformers for Chain-of-Thought Inference: A Theoretical Generalization Analysis

Hongkang Li, Songtao Lu, Pin-Yu Chen et al.

ICLR 2025arXiv:2410.02167
11
citations
#4132

I Think, Therefore I Diffuse: Enabling Multimodal In-Context Reasoning in Diffusion Models

Zhenxing Mi, Kuan-Chieh Wang, Guocheng Qian et al.

ICML 2025arXiv:2502.10458
11
citations
#4133

DOVE: Efficient One-Step Diffusion Model for Real-World Video Super-Resolution

Zheng Chen, Zichen Zou, Kewei Zhang et al.

NEURIPS 2025arXiv:2505.16239
11
citations
#4134

Why and How LLMs Hallucinate: Connecting the Dots with Subsequence Associations

Yiyou Sun, Yu Gai, Lijie Chen et al.

NEURIPS 2025arXiv:2504.12691
11
citations
#4135

Efficiently Parameterized Neural Metriplectic Systems

Anthony Gruber, Kookjin Lee, Haksoo Lim et al.

ICLR 2025arXiv:2405.16305
11
citations
#4136

On Conformal Isometry of Grid Cells: Learning Distance-Preserving Position Embedding

Dehong Xu, Ruiqi Gao, Wenhao Zhang et al.

ICLR 2025arXiv:2405.16865
11
citations
#4137

Attention-Driven GUI Grounding: Leveraging Pretrained Multimodal Large Language Models Without Fine-Tuning

Hai-Ming Xu, Qi Chen, Lei Wang et al.

AAAI 2025paperarXiv:2412.10840
11
citations
#4138

Motion-Grounded Video Reasoning: Understanding and Perceiving Motion at Pixel Level

Andong Deng, Tongjia Chen, Shoubin Yu et al.

CVPR 2025arXiv:2411.09921
11
citations
#4139

GaussianUDF: Inferring Unsigned Distance Functions through 3D Gaussian Splatting

Shujuan Li, Yu-Shen Liu, Zhizhong Han

CVPR 2025highlightarXiv:2503.19458
11
citations
#4140

Reviving DSP for Advanced Theorem Proving in the Era of Reasoning Models

Chenrui Cao, Liangcheng Song, Zenan Li et al.

NEURIPS 2025arXiv:2506.11487
11
citations
#4141

DSP: Dynamic Sequence Parallelism for Multi-Dimensional Transformers

Xuanlei Zhao, Shenggan Cheng, Chang Chen et al.

ICML 2025arXiv:2403.10266
11
citations
#4142

X-Fi: A Modality-Invariant Foundation Model for Multimodal Human Sensing

Xinyan Chen, Jianfei Yang

ICLR 2025arXiv:2410.10167
11
citations
#4143

From Poses to Identity: Training-Free Person Re-Identification via Feature Centralization

Chao Yuan, Guiwei Zhang, Changxiao Ma et al.

CVPR 2025arXiv:2503.00938
11
citations
#4144

TIGER: Time-frequency Interleaved Gain Extraction and Reconstruction for Efficient Speech Separation

Mohan Xu, Kai Li, Guo Chen et al.

ICLR 2025oralarXiv:2410.01469
11
citations
#4145

Probing the Latent Hierarchical Structure of Data via Diffusion Models

Antonio Sclocchi, Alessandro Favero, Noam Levi et al.

ICLR 2025arXiv:2410.13770
11
citations
#4146

Scaling Trends in Language Model Robustness

Nikolaus Howe, Ian McKenzie, Oskar Hollinsworth et al.

ICML 2025spotlightarXiv:2407.18213
11
citations
#4147

CoRe: Context-Regularized Text Embedding Learning for Text-to-Image Personalization

Feize Wu, Yun Pang, Junyi Zhang et al.

AAAI 2025paperarXiv:2408.15914
11
citations
#4148

Can Transformers Reason Logically? A Study in SAT Solving

Leyan Pan, Vijay Ganesh, Jacob Abernethy et al.

ICML 2025arXiv:2410.07432
11
citations
#4149

Rethinking Invariance in In-context Learning

Lizhe Fang, Yifei Wang, Khashayar Gatmiry et al.

ICLR 2025arXiv:2505.04994
11
citations
#4150

PhyMPGN: Physics-encoded Message Passing Graph Network for spatiotemporal PDE systems

Bocheng Zeng, Qi Wang, Mengtao Yan et al.

ICLR 2025oralarXiv:2410.01337
11
citations
#4151

Optimized Multi-Token Joint Decoding With Auxiliary Model for LLM Inference

Zongyue Qin, Ziniu Hu, Zifan He et al.

ICLR 2025arXiv:2407.09722
11
citations
#4152

Forensics-Bench: A Comprehensive Forgery Detection Benchmark Suite for Large Vision Language Models

Jin Wang, Chenghui Lv, Xian Li et al.

CVPR 2025arXiv:2503.15024
11
citations
#4153

When the Future Becomes the Past: Taming Temporal Correspondence for Self-supervised Video Representation Learning

Yang Liu, Qianqian Xu, Peisong Wen et al.

CVPR 2025arXiv:2503.15096
11
citations
#4154

DisEnvisioner: Disentangled and Enriched Visual Prompt for Customized Image Generation

Jing He, Haodong Li, huyongzhe et al.

ICLR 2025arXiv:2410.02067
11
citations
#4155

LIVS: A Pluralistic Alignment Dataset for Inclusive Public Spaces

Rashid Mushkani, Perampalli Shravan Nayak, Hugo Berard et al.

ICML 2025arXiv:2503.01894
11
citations
#4156

Reasoning-as-Logic-Units: Scaling Test-Time Reasoning in Large Language Models Through Logic Unit Alignment

Cheryl Li, Tianyuan Xu, Yiwen Guo

ICML 2025arXiv:2502.07803
11
citations
#4157

TopoTune: A Framework for Generalized Combinatorial Complex Neural Networks

Mathilde Papillon, Guillermo Bernardez, Claudio Battiloro et al.

ICML 2025arXiv:2410.06530
11
citations
#4158

Glad: A Streaming Scene Generator for Autonomous Driving

Bin Xie, Yingfei Liu, Tiancai Wang et al.

ICLR 2025oralarXiv:2503.00045
11
citations
#4159

Audio-visual Controlled Video Diffusion with Masked Selective State Spaces Modeling for Natural Talking Head Generation

Fating Hong, Zunnan Xu, Zixiang Zhou et al.

ICCV 2025arXiv:2504.02542
11
citations
#4160

VCR: A Task for Pixel-Level Complex Reasoning in Vision Language Models via Restoring Occluded Text

Tianyu Zhang, Suyuchen Wang, Lu Li et al.

ICLR 2025arXiv:2406.06462
11
citations
#4161

Retrieval-Augmented Perception: High-resolution Image Perception Meets Visual RAG

Wenbin Wang, Yongcheng Jing, Liang Ding et al.

ICML 2025oralarXiv:2503.01222
11
citations
#4162

3DLLM-Mem: Long-Term Spatial-Temporal Memory for Embodied 3D Large Language Model

Wenbo Hu, Yining Hong, Yanjun Wang et al.

NEURIPS 2025oralarXiv:2505.22657
11
citations
#4163

Reasoning as an Adaptive Defense for Safety

Taeyoun Kim, Fahim Tajwar, Aditi Raghunathan et al.

NEURIPS 2025arXiv:2507.00971
11
citations
#4164

PlanarSplatting: Accurate Planar Surface Reconstruction in 3 Minutes

Bin Tan, Rui Yu, Yujun Shen et al.

CVPR 2025highlightarXiv:2412.03451
11
citations
#4165

Stochastic Polyak Step-sizes and Momentum: Convergence Guarantees and Practical Performance

Dimitris Oikonomou, Nicolas Loizou

ICLR 2025arXiv:2406.04142
11
citations
#4166

Synthetic Video Enhances Physical Fidelity in Video Synthesis

Qi Zhao, Xingyu Ni, Ziyu Wang et al.

ICCV 2025arXiv:2503.20822
11
citations
#4167

CLIP is Strong Enough to Fight Back: Test-time Counterattacks towards Zero-shot Adversarial Robustness of CLIP

Songlong Xing, Zhengyu Zhao, Nicu Sebe

CVPR 2025arXiv:2503.03613
11
citations
#4168

EVolSplat: Efficient Volume-based Gaussian Splatting for Urban View Synthesis

Sheng Miao, Jiaxin Huang, Dongfeng Bai et al.

CVPR 2025arXiv:2503.20168
11
citations
#4169

Analysis of Linear Mode Connectivity via Permutation-Based Weight Matching: With Insights into Other Permutation Search Methods

Akira Ito, Masanori Yamada, Atsutoshi Kumagai

ICLR 2025arXiv:2402.04051
11
citations
#4170

How do Transformers Learn Implicit Reasoning?

Jiaran Ye, Zijun Yao, Zhidian Huang et al.

NEURIPS 2025oralarXiv:2505.23653
11
citations
#4171

Adaptive Self-improvement LLM Agentic System for ML Library Development

Genghan Zhang, Weixin Liang, Olivia Hsu et al.

ICML 2025arXiv:2502.02534
11
citations
#4172

Reangle-A-Video: 4D Video Generation as Video-to-Video Translation

Hyeonho Jeong, Suhyeon Lee, Jong Ye

ICCV 2025arXiv:2503.09151
11
citations
#4173

BASKET: A Large-Scale Video Dataset for Fine-Grained Skill Estimation

Yulu Pan, Ce Zhang, Gedas Bertasius

CVPR 2025arXiv:2503.20781
11
citations
#4174

Neural Sampling from Boltzmann Densities: Fisher-Rao Curves in the Wasserstein Geometry

Jannis Chemseddine, Christian Wald, Richard Duong et al.

ICLR 2025arXiv:2410.03282
11
citations
#4175

How Expressive are Knowledge Graph Foundation Models?

Xingyue Huang, Pablo Barcelo, Michael Bronstein et al.

ICML 2025arXiv:2502.13339
11
citations
#4176

XLand-100B: A Large-Scale Multi-Task Dataset for In-Context Reinforcement Learning

Alexander Nikulin, Ilya Zisman, Alexey Zemtsov et al.

ICLR 2025arXiv:2406.08973
11
citations
#4177

CodeSteer: Symbolic-Augmented Language Models via Code/Text Guidance

Yongchao Chen, Yilun Hao, Yueying Liu et al.

ICML 2025arXiv:2502.04350
11
citations
#4178

X-Dancer: Expressive Music to Human Dance Video Generation

Zeyuan Chen, Hongyi Xu, Guoxian Song et al.

ICCV 2025highlightarXiv:2502.17414
11
citations
#4179

Sharpness-Aware Minimization Efficiently Selects Flatter Minima Late In Training

Zhanpeng Zhou, Mingze Wang, Yuchen Mao et al.

ICLR 2025arXiv:2410.10373
11
citations
#4180

The Computational Complexity of Circuit Discovery for Inner Interpretability

Federico Adolfi, Martina G. Vilas, Todd Wareham

ICLR 2025arXiv:2410.08025
11
citations
#4181

6DGS: Enhanced Direction-Aware Gaussian Splatting for Volumetric Rendering

Zhongpai Gao, Benjamin Planche, Meng Zheng et al.

ICLR 2025arXiv:2410.04974
11
citations
#4182

xLSTM 7B: A Recurrent LLM for Fast and Efficient Inference

Maximilian Beck, Korbinian Pöppel, Phillip Lippe et al.

ICML 2025arXiv:2503.13427
11
citations
#4183

Flash-VStream: Efficient Real-Time Understanding for Long Video Streams

Haoji Zhang, Yiqin Wang, Yansong Tang et al.

ICCV 2025arXiv:2506.23825
11
citations
#4184

The Lighthouse of Language: Enhancing LLM Agents via Critique-Guided Improvement

Ruihan Yang, Fanghua Ye, Jian Li et al.

NEURIPS 2025arXiv:2503.16024
11
citations
#4185

MAPLE: Many-Shot Adaptive Pseudo-Labeling for In-Context Learning

Zihan Chen, Song Wang, Zhen Tan et al.

ICML 2025arXiv:2505.16225
11
citations
#4186

A Large Recurrent Action Model: xLSTM enables Fast Inference for Robotics Tasks

Thomas Schmied, Thomas Adler, Vihang Patil et al.

ICML 2025arXiv:2410.22391
11
citations
#4187

Repurposing Pre-trained Video Diffusion Models for Event-based Video Interpolation

Jingxi Chen, Brandon Y. Feng, Haoming Cai et al.

CVPR 2025arXiv:2412.07761
11
citations
#4188

ViSAGe: Video-to-Spatial Audio Generation

Jaeyeon Kim, Heeseung Yun, Gunhee Kim

ICLR 2025oralarXiv:2506.12199
11
citations
#4189

One Node One Model: Featuring the Missing-Half for Graph Clustering

Xuanting Xie, Bingheng Li, Erlin Pan et al.

AAAI 2025paperarXiv:2412.09902
11
citations
#4190

Rethinking Few-Shot Adaptation of Vision-Language Models in Two Stages

Matteo Farina, Massimiliano Mancini, Giovanni Iacca et al.

CVPR 2025arXiv:2503.11609
11
citations
#4191

STCOcc: Sparse Spatial-Temporal Cascade Renovation for 3D Occupancy and Scene Flow Prediction

Zhimin Liao, Ping Wei, Shuaijia Chen et al.

CVPR 2025arXiv:2504.19749
11
citations
#4192

Hierarchical Fine-grained Preference Optimization for Physically Plausible Video Generation

Harold Haodong Chen, Haojian Huang, Qifeng Chen et al.

NEURIPS 2025oralarXiv:2508.10858
11
citations
#4193

Aux-Think: Exploring Reasoning Strategies for Data-Efficient Vision-Language Navigation

Shuo Wang, Yongcai Wang, Wanting Li et al.

NEURIPS 2025arXiv:2505.11886
11
citations
#4194

Revisiting Energy Based Models as Policies: Ranking Noise Contrastive Estimation and Interpolating Energy Models

Sumeet Singh, Vikas Sindhwani, Stephen Tu

ICLR 2025arXiv:2309.05803
11
citations
#4195

MM-CamObj: A Comprehensive Multimodal Dataset for Camouflaged Object Scenarios

Jiacheng Ruan, Wenzhen Yuan, Zehao Lin et al.

AAAI 2025paperarXiv:2409.16084
11
citations
#4196

Learning Robust Stereo Matching in the Wild with Selective Mixture-of-Experts

Yun Wang, Longguang Wang, Chenghao Zhang et al.

ICCV 2025highlightarXiv:2507.04631
11
citations
#4197

PhysCtrl: Generative Physics for Controllable and Physics-Grounded Video Generation

Chen Wang, Chuhao Chen, Yiming Huang et al.

NEURIPS 2025oralarXiv:2509.20358
11
citations
#4198

RFWave: Multi-band Rectified Flow for Audio Waveform Reconstruction

Peng Liu, Dongyang Dai, Zhiyong Wu

ICLR 2025arXiv:2403.05010
11
citations
#4199

TweedieMix: Improving Multi-Concept Fusion for Diffusion-based Image/Video Generation

Gihyun Kwon, Jong Chul YE

ICLR 2025arXiv:2410.05591
11
citations
#4200

Generalized Recorrupted-to-Recorrupted: Self-Supervised Learning Beyond Gaussian Noise

Brayan Monroy, Jorge Bacca, Julián Tachella

CVPR 2025arXiv:2412.04648
11
citations