Most Cited 2025 "3d all-atom models" Papers

22,274 papers found • Page 16 of 112

#3001

OmniConsistency: Learning Style-Agnostic Consistency from Paired Stylization Data

Yiren Song, Cheng Liu, Mike Zheng Shou

NEURIPS 2025posterarXiv:2505.18445
10
citations
#3002

Towards Multiple Character Image Animation Through Enhancing Implicit Decoupling

Jingyun Xue, WANG HongFa, Qi Tian et al.

ICLR 2025posterarXiv:2406.03035
10
citations
#3003

ADAM: An Embodied Causal Agent in Open-World Environments

Shu Yu, Chaochao Lu

ICLR 2025posterarXiv:2410.22194
10
citations
#3004

Steering the CensorShip: Uncovering Representation Vectors for LLM "Thought'' Control

Hannah Cyberey, David Evans

COLM 2025paper
10
citations
#3005

General Scene Adaptation for Vision-and-Language Navigation

Haodong Hong, Yanyuan Qiao, Sen Wang et al.

ICLR 2025posterarXiv:2501.17403
10
citations
#3006

Holistic Reasoning with Long-Context LMs: A Benchmark for Database Operations on Massive Textual Data

Seiji Maekawa, Hayate Iso, Nikita Bhutani

ICLR 2025posterarXiv:2410.11996
10
citations
#3007

Generate, but Verify: Reducing Hallucination in Vision-Language Models with Retrospective Resampling

Tsung-Han (Patrick) Wu, Heekyung Lee, Jiaxin Ge et al.

NEURIPS 2025posterarXiv:2504.13169
10
citations
#3008

Training-Free Guidance Beyond Differentiability: Scalable Path Steering with Tree Search in Diffusion and Flow Models

Yingqing Guo, Yukang Yang, Hui Yuan et al.

NEURIPS 2025posterarXiv:2502.11420
10
citations
#3009

CameraCtrl II: Dynamic Scene Exploration via Camera-controlled Video Diffusion Models

Hao He, Ceyuan Yang, Shanchuan Lin et al.

ICCV 2025posterarXiv:2503.10592
10
citations
#3010

LoRe: Personalizing LLMs via Low-Rank Reward Modeling

Avinandan Bose, Zhihan Xiong, Yuejie Chi et al.

COLM 2025paperarXiv:2504.14439
10
citations
#3011

Graph Neural Preconditioners for Iterative Solutions of Sparse Linear Systems

Jie Chen

ICLR 2025posterarXiv:2406.00809
10
citations
#3012

Breaking the Data Barrier -- Building GUI Agents Through Task Generalization

Junlei Zhang, Zichen Ding, Chang Ma et al.

COLM 2025paperarXiv:2504.10127
10
citations
#3013

Sample Efficient Preference Alignment in LLMs via Active Exploration

Viraj Mehta, Syrine Belakaria, Vikramjeet Das et al.

COLM 2025paperarXiv:2312.00267
10
citations
#3014

DOTA: Distributional Test-time Adaptation of Vision-Language Models

Zongbo Han, Jialong Yang, Guangyu Wang et al.

NEURIPS 2025posterarXiv:2409.19375
10
citations
#3015

Dual Process Learning: Controlling Use of In-Context vs. In-Weights Strategies with Weight Forgetting

Suraj Anand, Michael Lepori, Jack Merullo et al.

ICLR 2025posterarXiv:2406.00053
10
citations
#3016

RayFlow: Instance-Aware Diffusion Acceleration via Adaptive Flow Trajectories

Huiyang Shao, Xin Xia, Yuhong Yang et al.

CVPR 2025posterarXiv:2503.07699
10
citations
#3017

Advancing Language Multi-Agent Learning with Credit Re-Assignment for Interactive Environment Generalization

Zhitao He, Zijun Liu, Peng Li et al.

COLM 2025paperarXiv:2502.14496
10
citations
#3018

EqNIO: Subequivariant Neural Inertial Odometry

Royina Karegoudra Jayanth, Yinshuang Xu, Ziyun Wang et al.

ICLR 2025posterarXiv:2408.06321
10
citations
#3019

Deep MMD Gradient Flow without adversarial training

Alexandre Galashov, Valentin De Bortoli, Arthur Gretton

ICLR 2025posterarXiv:2405.06780
10
citations
#3020

Noise Stability Optimization for Finding Flat Minima: A Hessian-based Regularization Approach

Haotian Ju, Hongyang Zhang, Dongyue Li

ICLR 2025posterarXiv:2306.08553
10
citations
#3021

Budgeted Online Continual Learning by Adaptive Layer Freezing and Frequency-based Sampling

Minhyuk Seo, Hyunseo Koh, Jonghyun Choi

ICLR 2025posterarXiv:2410.15143
10
citations
#3022

Breach By A Thousand Leaks: Unsafe Information Leakage in 'Safe' AI Responses

David Glukhov, Ziwen Han, I Shumailov et al.

ICLR 2025posterarXiv:2407.02551
10
citations
#3023

PLeaS - Merging Models with Permutations and Least Squares

Anshul Nasery, Jonathan Hayase, Pang Wei Koh et al.

CVPR 2025posterarXiv:2407.02447
10
citations
#3024

Nautilus: Locality-aware Autoencoder for Scalable Mesh Generation

Yuxuan Wang, Xuanyu Yi, Haohan Weng et al.

ICCV 2025posterarXiv:2501.14317
10
citations
#3025

X-Dancer: Expressive Music to Human Dance Video Generation

Zeyuan Chen, Hongyi Xu, Guoxian Song et al.

ICCV 2025highlightarXiv:2502.17414
10
citations
#3026

Label-Free Backdoor Attacks in Vertical Federated Learning

Wei Shen, Wenke Huang, Guancheng Wan et al.

AAAI 2025paper
10
citations
#3027

Generative Monoculture in Large Language Models

Fan Wu, Emily Black, Varun Chandrasekaran

ICLR 2025posterarXiv:2407.02209
10
citations
#3028

Multiple Heads are Better than One: Mixture of Modality Knowledge Experts for Entity Representation Learning

Yichi Zhang, Zhuo Chen, Lingbing Guo et al.

ICLR 2025posterarXiv:2405.16869
10
citations
#3029

Harmony in Divergence: Towards Fast, Accurate, and Memory-efficient Zeroth-order LLM Fine-tuning

Qitao Tan, Jun Liu, Zheng Zhan et al.

NEURIPS 2025posterarXiv:2502.03304
10
citations
#3030

Causal Inference over Visual-Semantic-Aligned Graph for Image Classification

Lei Meng, Xiangxian Li, Xiaoshuo Yan et al.

AAAI 2025paper
10
citations
#3031

DRIFT: Dynamic Rule-Based Defense with Injection Isolation for Securing LLM Agents

Hao Li, Xiaogeng Liu, CHIU Chun et al.

NEURIPS 2025posterarXiv:2506.12104
10
citations
#3032

Bayesian Concept Bottleneck Models with LLM Priors

Jean Feng, Avni Kothari, Lucas Zier et al.

NEURIPS 2025posterarXiv:2410.15555
10
citations
#3033

Fluid Language Model Benchmarking

Valentin Hofmann, David Heineman, Ian Magnusson et al.

COLM 2025paperarXiv:2509.11106
10
citations
#3034

FedMIA: An Effective Membership Inference Attack Exploiting "All for One" Principle in Federated Learning

Gongxi Zhu, Donghao Li, Hanlin Gu et al.

CVPR 2025poster
10
citations
#3035

Flowing from Words to Pixels: A Noise-Free Framework for Cross-Modality Evolution

Qihao Liu, Xi Yin, Alan L. Yuille et al.

CVPR 2025highlightarXiv:2412.15213
10
citations
#3036

DELT: A Simple Diversity-driven EarlyLate Training for Dataset Distillation

Zhiqiang Shen, Ammar Sherif, Zeyuan Yin et al.

CVPR 2025posterarXiv:2411.19946
10
citations
#3037

Early Concept Drift Detection via Prediction Uncertainty

Pengqian Lu, Jie Lu, Anjin Liu et al.

AAAI 2025paperarXiv:2412.11158
10
citations
#3038

Mol-LLaMA: Towards General Understanding of Molecules in Large Molecular Language Model

Dongki Kim, Wonbin Lee, Sung Ju Hwang

NEURIPS 2025posterarXiv:2502.13449
10
citations
#3039

X-Fi: A Modality-Invariant Foundation Model for Multimodal Human Sensing

Xinyan Chen, Jianfei Yang

ICLR 2025posterarXiv:2410.10167
10
citations
#3040

CADDreamer: CAD Object Generation from Single-view Images

Yuan Li, Cheng Lin, Yuan Liu et al.

CVPR 2025highlightarXiv:2502.20732
10
citations
#3041

Efficiently Parameterized Neural Metriplectic Systems

Anthony Gruber, Kookjin Lee, Haksoo Lim et al.

ICLR 2025posterarXiv:2405.16305
10
citations
#3042

VideoUFO: A Million-Scale User-Focused Dataset for Text-to-Video Generation

Wenhao Wang, Yi Yang

NEURIPS 2025posterarXiv:2503.01739
10
citations
#3043

ObjectMover: Generative Object Movement with Video Prior

Xin Yu, Tianyu Wang, Soo Ye Kim et al.

CVPR 2025posterarXiv:2503.08037
10
citations
#3044

TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters

Haiyang Wang, Yue Fan, Muhammad Ferjad Naeem et al.

ICLR 2025posterarXiv:2410.23168
10
citations
#3045

A Closer Look at TabPFN v2: Understanding Its Strengths and Extending Its Capabilities

Han-Jia Ye, Si-Yang Liu, Wei-Lun (Harry) Chao

NEURIPS 2025posterarXiv:2502.17361
10
citations
#3046

Enhancing Multilingual LLM Pretraining with Model-Based Data Selection

Bettina Messmer, Vinko Sabolčec, Martin Jaggi

NEURIPS 2025posterarXiv:2502.10361
10
citations
#3047

STOP: Integrated Spatial-Temporal Dynamic Prompting for Video Understanding

Zichen Liu, Kunlun Xu, Bing Su et al.

CVPR 2025posterarXiv:2503.15973
10
citations
#3048

Pareto Set Learning for Multi-Objective Reinforcement Learning

Erlong Liu, Yu-Chang Wu, Xiaobin Huang et al.

AAAI 2025paperarXiv:2501.06773
10
citations
#3049

Rewind-to-Delete: Certified Machine Unlearning for Nonconvex Functions

Siqiao Mu, Diego Klabjan

NEURIPS 2025posterarXiv:2409.09778
10
citations
#3050

UrbanLLaVA: A Multi-modal Large Language Model for Urban Intelligence

Jie Feng, Shengyuan Wang, Tianhui Liu et al.

ICCV 2025posterarXiv:2506.23219
10
citations
#3051

ROADWork: A Dataset and Benchmark for Learning to Recognize, Observe, Analyze and Drive Through Work Zones

Anurag Ghosh, Shen Zheng, Robert Tamburo et al.

ICCV 2025posterarXiv:2406.07661
10
citations
#3052

RoMo: Robust Motion Segmentation Improves Structure from Motion

Lily Goli, Sara Sabour, Mark Matthews et al.

ICCV 2025posterarXiv:2411.18650
10
citations
#3053

Visual Generation Without Guidance

Huayu Chen, Kai Jiang, Kaiwen Zheng et al.

ICML 2025posterarXiv:2501.15420
10
citations
#3054

Probing the Latent Hierarchical Structure of Data via Diffusion Models

Antonio Sclocchi, Alessandro Favero, Noam Levi et al.

ICLR 2025posterarXiv:2410.13770
10
citations
#3055

Hierarchical Mixture of Experts: Generalizable Learning for High-Level Synthesis

Weikai Li, Ding Wang, Zijian Ding et al.

AAAI 2025paperarXiv:2410.19225
10
citations
#3056

AVTrustBench: Assessing and Enhancing Reliability and Robustness in Audio-Visual LLMs

Sanjoy Chowdhury, Sayan Nag, Subhrajyoti Dasgupta et al.

ICCV 2025posterarXiv:2501.02135
10
citations
#3057

CLEVER: A Curated Benchmark for Formally Verified Code Generation

Amitayush Thakur, Jasper Lee, George Tsoukalas et al.

NEURIPS 2025posterarXiv:2505.13938
10
citations
#3058

Reconstructing People, Places, and Cameras

Lea Müller, Hongsuk Choi, Anthony Zhang et al.

CVPR 2025highlightarXiv:2412.17806
10
citations
#3059

BizGen: Advancing Article-level Visual Text Rendering for Infographics Generation

Yuyang Peng, Shishi Xiao, Keming Wu et al.

CVPR 2025posterarXiv:2503.20672
10
citations
#3060

Latent-EnSF: A Latent Ensemble Score Filter for High-Dimensional Data Assimilation with Sparse Observation Data

Phillip Si, Peng Chen

ICLR 2025posterarXiv:2409.00127
10
citations
#3061

DocLayLLM: An Efficient Multi-modal Extension of Large Language Models for Text-rich Document Understanding

Wenhui Liao, Jiapeng Wang, Hongliang Li et al.

CVPR 2025posterarXiv:2408.15045
10
citations
#3062

Splatter-360: Generalizable 360 Gaussian Splatting for Wide-baseline Panoramic Images

Zheng Chen, Chenming Wu, Zhelun Shen et al.

CVPR 2025poster
10
citations
#3063

InsightEdit: Towards Better Instruction Following for Image Editing

Yingjing Xu, Jie Kong, Jiazhi Wang et al.

CVPR 2025posterarXiv:2411.17323
10
citations
#3064

Imagine and Seek: Improving Composed Image Retrieval with an Imagined Proxy

You Li, Fan Ma, Yi Yang

CVPR 2025posterarXiv:2411.16752
10
citations
#3065

Pre-Training Graph Neural Networks on Molecules by Using Subgraph-Conditioned Graph Information Bottleneck

Van Thuy Hoang, O-Joun Lee

AAAI 2025paperarXiv:2412.15589
9
citations
#3066

CholecTrack20: A Multi-Perspective Tracking Dataset for Surgical Tools

Chinedu Innocent Nwoye, Kareem elgohary, Anvita A. Srinivas et al.

CVPR 2025posterarXiv:2312.07352
9
citations
#3067

HOPE for a Robust Parameterization of Long-memory State Space Models

Annan Yu, Michael W Mahoney, N. Benjamin Erichson

ICLR 2025posterarXiv:2405.13975
9
citations
#3068

LoRA Subtraction for Drift-Resistant Space in Exemplar-Free Continual Learning

Xuan Liu, Xiaobin Chang

CVPR 2025posterarXiv:2503.18985
9
citations
#3069

JarvisArt: Liberating Human Artistic Creativity via an Intelligent Photo Retouching Agent

Yunlong Lin, Zixu Lin, Kunjie Lin et al.

NEURIPS 2025posterarXiv:2506.17612
9
citations
#3070

GenDeg: Diffusion-based Degradation Synthesis for Generalizable All-In-One Image Restoration

Sudarshan Rajagopalan, Nithin Gopalakrishnan Nair, Jay Paranjape et al.

CVPR 2025posterarXiv:2411.17687
9
citations
#3071

WeatherGFM: Learning a Weather Generalist Foundation Model via In-context Learning

Xiangyu Zhao, Zhiwang Zhou, Wenlong Zhang et al.

ICLR 2025oralarXiv:2411.05420
9
citations
#3072

SceneTAP: Scene-Coherent Typographic Adversarial Planner against Vision-Language Models in Real-World Environments

Yue Cao, Yun Xing, Jie Zhang et al.

CVPR 2025posterarXiv:2412.00114
9
citations
#3073

FisherTune: Fisher-Guided Robust Tuning of Vision Foundation Models for Domain Generalized Segmentation

Dong Zhao, Jinlong Li, Shuang Wang et al.

CVPR 2025posterarXiv:2503.17940
9
citations
#3074

On the Optimization and Generalization of Two-layer Transformers with Sign Gradient Descent

Bingrui Li, Wei Huang, Andi Han et al.

ICLR 2025posterarXiv:2410.04870
9
citations
#3075

$\gamma-$MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models

Yaxin Luo, Gen Luo, Jiayi Ji et al.

ICLR 2025poster
9
citations
#3076

Monet: Mixture of Monosemantic Experts for Transformers

Jungwoo Park, Young Jin Ahn, Kee-Eung Kim et al.

ICLR 2025posterarXiv:2412.04139
9
citations
#3077

Synthesizing Privacy-Preserving Text Data via Finetuning *without* Finetuning Billion-Scale LLMs

Bowen Tan, Zheng Xu, Eric Xing et al.

ICML 2025posterarXiv:2503.12347
9
citations
#3078

APOLLO: Automated LLM and Lean Collaboration for Advanced Formal Reasoning

Azim Ospanov, Farzan Farnia, Roozbeh Yousefzadeh

NEURIPS 2025posterarXiv:2505.05758
9
citations
#3079

Scaling Laws for Differentially Private Language Models

Ryan McKenna, Yangsibo Huang, Amer Sinha et al.

ICML 2025posterarXiv:2501.18914
9
citations
#3080

I Can Hear You: Selective Robust Training for Deepfake Audio Detection

Zirui Zhang, Wei Hao, Aroon Sankoh et al.

ICLR 2025posterarXiv:2411.00121
9
citations
#3081

(Mis)Fitting Scaling Laws: A Survey of Scaling Law Fitting Techniques in Deep Learning

Margaret Li, Sneha Kudugunta, Luke Zettlemoyer

ICLR 2025poster
9
citations
#3082

Can Classic GNNs Be Strong Baselines for Graph-level Tasks? Simple Architectures Meet Excellence

Yuankai Luo, Lei Shi, Xiao-Ming Wu

ICML 2025posterarXiv:2502.09263
9
citations
#3083

When Do LLMs Help With Node Classification? A Comprehensive Analysis

Xixi Wu, Yifei Shen, Fangzhou Ge et al.

ICML 2025posterarXiv:2502.00829
9
citations
#3084

Video-ColBERT: Contextualized Late Interaction for Text-to-Video Retrieval

Arun Reddy, Alexander Martin, Eugene Yang et al.

CVPR 2025posterarXiv:2503.19009
9
citations
#3085

Promptable Anomaly Segmentation with SAM Through Self-Perception Tuning

Hui-Yue Yang, Hui Chen, Ao Wang et al.

AAAI 2025paperarXiv:2411.17217
9
citations
#3086

Provable Convergence and Limitations of Geometric Tempering for Langevin Dynamics

Omar Chehab, Anna Korba, Austin Stromme et al.

ICLR 2025posterarXiv:2410.09697
9
citations
#3087

Repo2Run: Automated Building Executable Environment for Code Repository at Scale

Ruida Hu, Chao Peng, XinchenWang et al.

NEURIPS 2025spotlightarXiv:2502.13681
9
citations
#3088

OS-ATLAS: Foundation Action Model for Generalist GUI Agents

Zhiyong Wu, Zhenyu Wu, Fangzhi Xu et al.

ICLR 2025poster
9
citations
#3089

Constrained Fair and Efficient Allocations

Benjamin Cookson, Soroush Ebadian, Nisarg Shah

AAAI 2025paperarXiv:2411.00133
9
citations
#3090

Generalizing from SIMPLE to HARD Visual Reasoning: Can We Mitigate Modality Imbalance in VLMs?

Simon Park, Abhishek Panigrahi, Yun Cheng et al.

ICML 2025posterarXiv:2501.02669
9
citations
#3091

Perplexity Trap: PLM-Based Retrievers Overrate Low Perplexity Documents

Haoyu Wang, Sunhao Dai, Haiyuan Zhao et al.

ICLR 2025posterarXiv:2503.08684
9
citations
#3092

EmergentTTS-Eval: Evaluating TTS Models on Complex Prosodic, Expressiveness, and Linguistic Challenges Using Model-as-a-Judge

Ruskin Raj Manku, Yuzhi Tang, Xingjian Shi et al.

NEURIPS 2025posterarXiv:2505.23009
9
citations
#3093

Post-pre-training for Modality Alignment in Vision-Language Foundation Models

Shin'ya Yamaguchi, Dewei Feng, Sekitoshi Kanai et al.

CVPR 2025posterarXiv:2504.12717
9
citations
#3094

Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks

Miran Heo, Min-Hung Chen, De-An Huang et al.

CVPR 2025posterarXiv:2501.08326
9
citations
#3095

Breaking Free from MMI: A New Frontier in Rationalization by Probing Input Utilization

Wei Liu, Zhiying Deng, Zhongyu Niu et al.

ICLR 2025posterarXiv:2503.06202
9
citations
#3096

Group Distributionally Robust Dataset Distillation with Risk Minimization

Saeed Vahidian, Mingyu Wang, Jianyang Gu et al.

ICLR 2025posterarXiv:2402.04676
9
citations
#3097

Flow-Based Policy for Online Reinforcement Learning

Lei Lv, Yunfei Li, Yu Luo et al.

NEURIPS 2025posterarXiv:2506.12811
9
citations
#3098

Random-Set Neural Networks

Shireen Kudukkil Manchingal, Muhammad Mubashar, Kaizheng Wang et al.

ICLR 2025posterarXiv:2307.05772
9
citations
#3099

Enhancing Foundation Models for Time Series Forecasting via Wavelet-based Tokenization

Luca Masserano, Abdul Fatir Ansari, Boran Han et al.

ICML 2025oralarXiv:2412.05244
9
citations
#3100

SuperPC: A Single Diffusion Model for Point Cloud Completion, Upsampling, Denoising, and Colorization

Yi Du, Zhipeng Zhao, Shaoshu Su et al.

CVPR 2025posterarXiv:2503.14558
9
citations
#3101

From Poses to Identity: Training-Free Person Re-Identification via Feature Centralization

Chao Yuan, Guiwei Zhang, Changxiao Ma et al.

CVPR 2025posterarXiv:2503.00938
9
citations
#3102

Lumina-T2X: Scalable Flow-based Large Diffusion Transformer for Flexible Resolution Generation

Gao Peng, Le Zhuo, Dongyang Liu et al.

ICLR 2025oral
9
citations
#3103

K-Sort Arena: Efficient and Reliable Benchmarking for Generative Models via K-wise Human Preferences

Zhikai Li, Xuewen Liu, Dongrong Joe Fu et al.

CVPR 2025posterarXiv:2408.14468
9
citations
#3104

PDE-Transformer: Efficient and Versatile Transformers for Physics Simulations

Benjamin Holzschuh, Qiang Liu, Georg Kohl et al.

ICML 2025oralarXiv:2505.24717
9
citations
#3105

ScribbleLight: Single Image Indoor Relighting with Scribbles

Jun Myeong Choi, Annie N. Wang, Pieter Peers et al.

CVPR 2025posterarXiv:2411.17696
9
citations
#3106

MUST: The First Dataset and Unified Framework for Multispectral UAV Single Object Tracking

Haolin Qin, Tingfa Xu, Tianhao Li et al.

CVPR 2025posterarXiv:2503.17699
9
citations
#3107

HybridGS: High-Efficiency Gaussian Splatting Data Compression using Dual-Channel Sparse Representation and Point Cloud Encoder

Qi Yang, Le Yang, Geert Van der Auwera et al.

ICML 2025posterarXiv:2505.01938
9
citations
#3108

BrainUICL: An Unsupervised Individual Continual Learning Framework for EEG Applications

Yangxuan Zhou, Sha Zhao, Jiquan Wang et al.

ICLR 2025poster
9
citations
#3109

PlanarSplatting: Accurate Planar Surface Reconstruction in 3 Minutes

Bin Tan, Rui Yu, Yujun Shen et al.

CVPR 2025highlightarXiv:2412.03451
9
citations
#3110

Objective drives the consistency of representational similarity across datasets

Laure Ciernik, Lorenz Linhardt, Marco Morik et al.

ICML 2025posterarXiv:2411.05561
9
citations
#3111

Realistic Evaluation of Deep Partial-Label Learning Algorithms

Wei Wang, Dong-Dong Wu, Jindong Wang et al.

ICLR 2025posterarXiv:2502.10184
9
citations
#3112

Perspective-Invariant 3D Object Detection

Alan Liang, Lingdong Kong, Dongyue Lu et al.

ICCV 2025posterarXiv:2507.17665
9
citations
#3113

PreciseCam: Precise Camera Control for Text-to-Image Generation

Edurne Bernal-Berdun, Ana Serrano, Belen Masia et al.

CVPR 2025posterarXiv:2501.12910
9
citations
#3114

Measuring what Matters: Construct Validity in Large Language Model Benchmarks

Andrew M. Bean, Ryan Othniel Kearns, Angelika Romanou et al.

NEURIPS 2025posterarXiv:2511.04703
9
citations
#3115

Online Video Understanding: OVBench and VideoChat-Online

Zhenpeng Huang, Xinhao Li, Jiaqi Li et al.

CVPR 2025posterarXiv:2501.00584
9
citations
#3116

Self-attention-based Diffusion Model for Time-series Imputation in Partial Blackout Scenarios

Mohammad Rafid Ul Islam, Prasad Tadepalli, Alan Fern

AAAI 2025paperarXiv:2503.01737
9
citations
#3117

Scaling Collapse Reveals Universal Dynamics in Compute-Optimally Trained Neural Networks

Shikai Qiu, Lechao Xiao, Andrew Wilson et al.

ICML 2025oralarXiv:2507.02119
9
citations
#3118

HyperGLM: HyperGraph for Video Scene Graph Generation and Anticipation

Trong-Thuan Nguyen, Pha Nguyen, Jackson Cothren et al.

CVPR 2025posterarXiv:2411.18042
9
citations
#3119

Cafe-Talk: Generating 3D Talking Face Animation with Multimodal Coarse- and Fine-grained Control

Hejia Chen, Haoxian Zhang, Shoulong Zhang et al.

ICLR 2025oralarXiv:2503.14517
9
citations
#3120

TraF-Align: Trajectory-aware Feature Alignment for Asynchronous Multi-agent Perception

Zhiying Song, Lei Yang, Fuxi Wen et al.

CVPR 2025posterarXiv:2503.19391
9
citations
#3121

Fast Training of Sinusoidal Neural Fields via Scaling Initialization

Taesun Yeom, Sangyoon Lee, Jaeho Lee

ICLR 2025posterarXiv:2410.04779
9
citations
#3122

HyperGS: Hyperspectral 3D Gaussian Splatting

Christopher Thirgood, Oscar Mendez, Erin Chao Ling et al.

CVPR 2025posterarXiv:2412.12849
9
citations
#3123

KeyFace: Expressive Audio-Driven Facial Animation for Long Sequences via KeyFrame Interpolation

Antoni Bigata Casademunt, Michał Stypułkowski, Rodrigo Mira et al.

CVPR 2025posterarXiv:2503.01715
9
citations
#3124

You Only Prune Once: Designing Calibration-Free Model Compression With Policy Learning

Ayan Sengupta, Siddhant Chaudhary, Tanmoy Chakraborty

ICLR 2025posterarXiv:2501.15296
9
citations
#3125

GIFT: Unlocking Full Potential of Labels in Distilled Dataset at Near-zero Cost

Xinyi Shang, Peng Sun, Tao Lin

ICLR 2025posterarXiv:2405.14736
9
citations
#3126

IDEATOR: Jailbreaking and Benchmarking Large Vision-Language Models Using Themselves

Ruofan Wang, Juncheng Li, Yixu Wang et al.

ICCV 2025posterarXiv:2411.00827
9
citations
#3127

LeFusion: Controllable Pathology Synthesis via Lesion-Focused Diffusion Models

Hantao Zhang, Yuhe Liu, Jiancheng Yang et al.

ICLR 2025posterarXiv:2403.14066
9
citations
#3128

Learning Robust Spectral Dynamics for Temporal Domain Generalization

En Yu, Jie Lu, Xiaoyu Yang et al.

NEURIPS 2025oralarXiv:2505.12585
9
citations
#3129

AlphaZero Neural Scaling and Zipf's Law: a Tale of Board Games and Power Laws

Oren Neumann, Claudius Gros

NEURIPS 2025spotlightarXiv:2412.11979
9
citations
#3130

HiMoR: Monocular Deformable Gaussian Reconstruction with Hierarchical Motion Representation

Yiming Liang, Tianhan Xu, Yuta Kikuchi

CVPR 2025posterarXiv:2504.06210
9
citations
#3131

ND-SDF: Learning Normal Deflection Fields for High-Fidelity Indoor Reconstruction

Ziyu Tang, Weicai Ye, Yifan Wang et al.

ICLR 2025posterarXiv:2408.12598
9
citations
#3132

Vision Foundation Models as Effective Visual Tokenizers for Autoregressive Generation

Zheng Anlin, Xin Wen, Xuanyang Zhang et al.

NEURIPS 2025poster
9
citations
#3133

MegActor-Sigma: Unlocking Flexible Mixed-Modal Control in Portrait Animation with Diffusion Transformer

Shurong Yang, Huadong Li, Juhao Wu et al.

AAAI 2025paper
9
citations
#3134

Zero-Shot Styled Text Image Generation, but Make It Autoregressive

Vittorio Pippi, Fabio Quattrini, Silvia Cascianelli et al.

CVPR 2025posterarXiv:2503.17074
9
citations
#3135

Multi-modal brain encoding models for multi-modal stimuli

SUBBA REDDY OOTA, Khushbu Pahwa, mounika marreddy et al.

ICLR 2025posterarXiv:2505.20027
9
citations
#3136

MMDT: Decoding the Trustworthiness and Safety of Multimodal Foundation Models

Chejian Xu, Jiawei Zhang, Zhaorun Chen et al.

ICLR 2025posterarXiv:2503.14827
9
citations
#3137

The Non-Linear Representation Dilemma: Is Causal Abstraction Enough for Mechanistic Interpretability?

Denis Sutter, Julian Minder, Thomas Hofmann et al.

NEURIPS 2025spotlightarXiv:2507.08802
9
citations
#3138

All-in-One: Transferring Vision Foundation Models into Stereo Matching

Jingyi Zhou, Haoyu Zhang, Jiakang Yuan et al.

AAAI 2025paperarXiv:2412.09912
9
citations
#3139

Multimodal Latent Diffusion Model for Complex Sewing Pattern Generation

Shengqi Liu, Yuhao Cheng, Zhuo Chen et al.

ICCV 2025posterarXiv:2412.14453
9
citations
#3140

Attention-Driven GUI Grounding: Leveraging Pretrained Multimodal Large Language Models Without Fine-Tuning

Hai-Ming Xu, Qi Chen, Lei Wang et al.

AAAI 2025paperarXiv:2412.10840
9
citations
#3141

ICM-Assistant: Instruction-tuning Multimodal Large Language Models for Rule-based Explainable Image Content Moderation

Mengyang Wu, Yuzhi Zhao, Jialun Cao et al.

AAAI 2025paperarXiv:2412.18216
9
citations
#3142

FixTalk: Taming Identity Leakage for High-Quality Talking Head Generation in Extreme Cases

Shuai Tan, Bill Gong, Bin Ji et al.

ICCV 2025posterarXiv:2507.01390
9
citations
#3143

LITA-GS: Illumination-Agnostic Novel View Synthesis via Reference-Free 3D Gaussian Splatting and Physical Priors

Han Zhou, Wei Dong, Jun Chen

CVPR 2025posterarXiv:2504.00219
9
citations
#3144

OLinear: A Linear Model for Time Series Forecasting in Orthogonally Transformed Domain

Wenzhen Yue, Yong Liu, Hao Wang et al.

NEURIPS 2025oralarXiv:2505.08550
9
citations
#3145

Pareto Low-Rank Adapters: Efficient Multi-Task Learning with Preferences

Nikos Dimitriadis, Pascal Frossard, François Fleuret

ICLR 2025posterarXiv:2407.08056
9
citations
#3146

DreamDistribution: Learning Prompt Distribution for Diverse In-distribution Generation

Brian Nlong Zhao, Yuhang Xiao, Jiashu Xu et al.

ICLR 2025posterarXiv:2312.14216
9
citations
#3147

Do Large Language Models Truly Understand Geometric Structures?

Xiaofeng Wang, Yiming Wang, Wenhong Zhu et al.

ICLR 2025posterarXiv:2501.13773
9
citations
#3148

Knowledge-Aligned Counterfactual-Enhancement Diffusion Perception for Unsupervised Cross-Domain Visual Emotion Recognition

Wen Yin, Yong Wang, Guiduo Duan et al.

CVPR 2025posterarXiv:2505.19694
9
citations
#3149

StreamMind: Unlocking Full Frame Rate Streaming Video Dialogue through Event-Gated Cognition

Xin Ding, Hao Wu, Yifan Yang et al.

ICCV 2025posterarXiv:2503.06220
9
citations
#3150

How Transformers Learn Structured Data: Insights From Hierarchical Filtering

Jerome Garnier-Brun, Marc Mezard, Emanuele Moscato et al.

ICML 2025posterarXiv:2408.15138
9
citations
#3151

Bootstrapping Language-Guided Navigation Learning with Self-Refining Data Flywheel

Zun Wang, Jialu Li, Yicong Hong et al.

ICLR 2025posterarXiv:2412.08467
9
citations
#3152

Quantization Error Propagation: Revisiting Layer-Wise Post-Training Quantization

Yamato Arai, Yuma Ichikawa

NEURIPS 2025posterarXiv:2504.09629
9
citations
#3153

Diffusion models for Gaussian distributions: Exact solutions and Wasserstein errors

Emile Pierret, Bruno Galerne

ICML 2025posterarXiv:2405.14250
9
citations
#3154

QuaDiM: A Conditional Diffusion Model For Quantum State Property Estimation

Yehui Tang, Mabiao Long, Junchi Yan

ICLR 2025poster
9
citations
#3155

LIBA: Language Instructed Multi-granularity Bridge Assistant for 3D Visual Grounding

Yuan Wang, Ya-Li Li, W U Eastman Z Y et al.

AAAI 2025paper
9
citations
#3156

ADIFF: Explaining audio difference using natural language

Soham Deshmukh, Shuo Han, Rita Singh et al.

ICLR 2025posterarXiv:2502.04476
9
citations
#3157

SILO: Solving Inverse Problems with Latent Operators

Ron Raphaeli, Sean Man, Michael Elad

ICCV 2025posterarXiv:2501.11746
9
citations
#3158

Gap-Dependent Bounds for Q-Learning using Reference-Advantage Decomposition

Zhong Zheng, Haochen Zhang, Lingzhou Xue

ICLR 2025posterarXiv:2410.07574
9
citations
#3159

Counterfactual Generative Modeling with Variational Causal Inference

Yulun Wu, Louis McConnell, Claudia Iriondo

ICLR 2025posterarXiv:2410.12730
9
citations
#3160

Wide Neural Networks Trained with Weight Decay Provably Exhibit Neural Collapse

Arthur Jacot, Peter Súkeník, Zihan Wang et al.

ICLR 2025posterarXiv:2410.04887
9
citations
#3161

RHanDS: Refining Malformed Hands for Generated Images with Decoupled Structure and Style Guidance

Chengrui Wang, Pengfei Liu, Min Zhou et al.

AAAI 2025paperarXiv:2404.13984
9
citations
#3162

Revealing and Reducing Gender Biases in Vision and Language Assistants (VLAs)

Leander Girrbach, Stephan Alaniz, Yiran Huang et al.

ICLR 2025posterarXiv:2410.19314
9
citations
#3163

SMARTIES: Spectrum-Aware Multi-Sensor Auto-Encoder for Remote Sensing Images

Gencer Sumbul, Chang Xu, Emanuele Dalsasso et al.

ICCV 2025posterarXiv:2506.19585
9
citations
#3164

SAMRefiner: Taming Segment Anything Model for Universal Mask Refinement

Yuqi Lin, Hengjia Li, Wenqi Shao et al.

ICLR 2025posterarXiv:2502.06756
9
citations
#3165

UnCommon Objects in 3D

Xingchen Liu, Piyush Tayal, Jianyuan Wang et al.

CVPR 2025posterarXiv:2501.07574
9
citations
#3166

Rethinking Evaluation of Sparse Autoencoders through the Representation of Polysemous Words

Gouki Gouki, Hiroki Furuta, Yusuke Iwasawa et al.

ICLR 2025posterarXiv:2501.06254
9
citations
#3167

Physical Plausibility-aware Trajectory Prediction via Locomotion Embodiment

Hiromu Taketsugu, Takeru Oba, Takahiro Maeda et al.

CVPR 2025posterarXiv:2503.17267
9
citations
#3168

UI-Genie: A Self-Improving Approach for Iteratively Boosting MLLM-based Mobile GUI Agents

Han Xiao, Guozhi Wang, Yuxiang Chai et al.

NEURIPS 2025posterarXiv:2505.21496
9
citations
#3169

Denoising as Adaptation: Noise-Space Domain Adaptation for Image Restoration

Kang Liao, Zongsheng Yue, Zhouxia Wang et al.

ICLR 2025posterarXiv:2406.18516
9
citations
#3170

BodyGen: Advancing Towards Efficient Embodiment Co-Design

Haofei Lu, Zhe Wu, Junliang Xing et al.

ICLR 2025oralarXiv:2503.00533
9
citations
#3171

Beyond Bradley-Terry Models: A General Preference Model for Language Model Alignment

Yifan Zhang, Ge Zhang, Yue Wu et al.

ICML 2025posterarXiv:2410.02197
9
citations
#3172

GUARDIAN: Safeguarding LLM Multi-Agent Collaborations with Temporal Graph Modeling

Jialong Zhou, Lichao Wang, Xiao Yang

NEURIPS 2025oralarXiv:2505.19234
9
citations
#3173

Diff3DS: Generating View-Consistent 3D Sketch via Differentiable Curve Rendering

Yibo Zhang, Lihong Wang, Changqing Zou et al.

ICLR 2025posterarXiv:2405.15305
9
citations
#3174

Anyattack: Towards Large-scale Self-supervised Adversarial Attacks on Vision-language Models

Jiaming Zhang, Junhong Ye, Xingjun Ma et al.

CVPR 2025posterarXiv:2410.05346
9
citations
#3175

Relieving Universal Label Noise for Unsupervised Visible-Infrared Person Re-Identification by Inferring from Neighbors

Xiao Teng, Long Lan, Dingyao Chen et al.

AAAI 2025paperarXiv:2412.12220
9
citations
#3176

Dual Prompting Image Restoration with Diffusion Transformers

Dehong Kong, Fan Li, Zhixin Wang et al.

CVPR 2025posterarXiv:2504.17825
9
citations
#3177

Transition Path Sampling with Improved Off-Policy Training of Diffusion Path Samplers

Kiyoung Seong, Seonghyun Park, Seonghwan Kim et al.

ICLR 2025posterarXiv:2405.19961
9
citations
#3178

MUDDFormer: Breaking Residual Bottlenecks in Transformers via Multiway Dynamic Dense Connections

Da Xiao, Qingye Meng, Shengping Li et al.

ICML 2025posterarXiv:2502.12170
9
citations
#3179

RSafe: Incentivizing proactive reasoning to build robust and adaptive LLM safeguards

jingnan zheng, Xiangtian Ji, Yijun Lu et al.

NEURIPS 2025posterarXiv:2506.07736
9
citations
#3180

AutoOcc: Automatic Open-Ended Semantic Occupancy Annotation via Vision-Language Guided Gaussian Splatting

Xiaoyu Zhou, Jingqi Wang, Yongtao Wang et al.

ICCV 2025highlightarXiv:2502.04981
9
citations
#3181

SAM2Object: Consolidating View Consistency via SAM2 for Zero-Shot 3D Instance Segmentation

Jihuai Zhao, Junbao Zhuo, Jiansheng Chen et al.

CVPR 2025poster
9
citations
#3182

SoundCTM: Unifying Score-based and Consistency Models for Full-band Text-to-Sound Generation

Koichi Saito, Dongjun Kim, Takashi Shibuya et al.

ICLR 2025posterarXiv:2405.18503
9
citations
#3183

Test-time Adaptation for Cross-modal Retrieval with Query Shift

Haobin Li, Peng Hu, Qianjun Zhang et al.

ICLR 2025posterarXiv:2410.15624
9
citations
#3184

Fast Summation of Radial Kernels via QMC Slicing

Johannes Hertrich, Tim Jahn, Michael Quellmalz

ICLR 2025posterarXiv:2410.01316
9
citations
#3185

VRBench: A Benchmark for Multi-Step Reasoning in Long Narrative Videos

Jiashuo Yu, Yue Wu, Meng Chu et al.

ICCV 2025posterarXiv:2506.10857
9
citations
#3186

HiMTok: Learning Hierarchical Mask Tokens for Image Segmentation with Large Multimodal Model

Tao Wang, Changxu Cheng, Lingfeng Wang et al.

ICCV 2025posterarXiv:2503.13026
9
citations
#3187

Multi-Focus Image Fusion via Explicit Defocus Blur Modelling

Yuhui Quan, Xi Wan, Zitao Tang et al.

AAAI 2025paper
9
citations
#3188

MLLM-For3D: Adapting Multimodal Large Language Model for 3D Reasoning Segmentation

Jiaxin Huang, Runnan Chen, Ziwen Li et al.

NEURIPS 2025posterarXiv:2503.18135
9
citations
#3189

ChartMuseum: Testing Visual Reasoning Capabilities of Large Vision-Language Models

Liyan Tang, Grace Kim, Xinyu Zhao et al.

NEURIPS 2025posterarXiv:2505.13444
9
citations
#3190

Vanish into Thin Air: Cross-prompt Universal Adversarial Attacks for SAM2

Ziqi Zhou, Yifan Hu, Yufei Song et al.

NEURIPS 2025spotlightarXiv:2510.24195
9
citations
#3191

UAVScenes: A Multi-Modal Dataset for UAVs

Sijie Wang, Siqi Li, Yawei Zhang et al.

ICCV 2025posterarXiv:2507.22412
9
citations
#3192

DepthCues: Evaluating Monocular Depth Perception in Large Vision Models

Duolikun Danier, Mehmet Aygun, Changjian Li et al.

CVPR 2025posterarXiv:2411.17385
9
citations
#3193

Make Me Happier: Evoking Emotions Through Image Diffusion Models

Qing Lin, Jingfeng Zhang, YEW-SOON ONG et al.

ICCV 2025posterarXiv:2403.08255
9
citations
#3194

Markov Persuasion Processes: Learning to Persuade From Scratch

Francesco Bacchiocchi, Francesco Emanuele Stradi, Matteo Castiglioni et al.

NEURIPS 2025posterarXiv:2402.03077
9
citations
#3195

Beyond the convexity assumption: Realistic tabular data generation under quantifier-free real linear constraints

Mihaela Stoian, Eleonora Giunchiglia

ICLR 2025posterarXiv:2502.18237
9
citations
#3196

ReQFlow: Rectified Quaternion Flow for Efficient and High-Quality Protein Backbone Generation

Angxiao Yue, Zichong Wang, Hongteng Xu

ICML 2025posterarXiv:2502.14637
9
citations
#3197

HELM: Hierarchical Encoding for mRNA Language Modeling

Mehdi Yazdani-Jahromi, Mangal Prakash, Tommaso Mansi et al.

ICLR 2025posterarXiv:2410.12459
9
citations
#3198

MIB: A Mechanistic Interpretability Benchmark

Aaron Mueller, Atticus Geiger, Sarah Wiegreffe et al.

ICML 2025posterarXiv:2504.13151
9
citations
#3199

Rapidly Adapting Policies to the Real-World via Simulation-Guided Fine-Tuning

Patrick Yin, Tyler Westenbroek, Ching-An Cheng et al.

ICLR 2025posterarXiv:2502.02705
9
citations
#3200

MVREC: A General Few-shot Defect Classification Model Using Multi-View Region-Context

Shuai Lyu, Rongchen Zhang, Zeqi Ma et al.

AAAI 2025paperarXiv:2412.16897
9
citations