Most Cited 2025 "scene structure preservation" Papers

22,274 papers found • Page 23 of 112

#4401

Stochastic Polyak Step-sizes and Momentum: Convergence Guarantees and Practical Performance

Dimitris Oikonomou, Nicolas Loizou

ICLR 2025arXiv:2406.04142
11
citations
#4402

SceneWeaver: All-in-One 3D Scene Synthesis with an Extensible and Self-Reflective Agent

Yandan Yang, Baoxiong Jia, Shujie Zhang et al.

NEURIPS 2025arXiv:2509.20414
11
citations
#4403

Sharpness-Aware Minimization Efficiently Selects Flatter Minima Late In Training

Zhanpeng Zhou, Mingze Wang, Yuchen Mao et al.

ICLR 2025arXiv:2410.10373
11
citations
#4404

MUDDFormer: Breaking Residual Bottlenecks in Transformers via Multiway Dynamic Dense Connections

Da Xiao, Qingye Meng, Shengping Li et al.

ICML 2025arXiv:2502.12170
11
citations
#4405

Adversarial Robustness in Two-Stage Learning-to-Defer: Algorithms and Guarantees

Yannis Montreuil, Axel Carlier, Lai Xing Ng et al.

ICML 2025arXiv:2502.01027
11
citations
#4406

6DGS: Enhanced Direction-Aware Gaussian Splatting for Volumetric Rendering

Zhongpai Gao, Benjamin Planche, Meng Zheng et al.

ICLR 2025arXiv:2410.04974
11
citations
#4407

Reasoning Path Compression: Compressing Generation Trajectories for Efficient LLM Reasoning

Jiwon Song, Dongwon Jo, Yulhwa Kim et al.

NEURIPS 2025arXiv:2505.13866
11
citations
#4408

GUI-Reflection: Empowering Multimodal GUI Models with Self-Reflection Behavior

Penghao Wu, Shengnan Ma, Bo Wang et al.

NEURIPS 2025arXiv:2506.08012
11
citations
#4409

Reward-Instruct: A Reward-Centric Approach to Fast Photo-Realistic Image Generation

Yihong Luo, Tianyang Hu, Weijian Luo et al.

NEURIPS 2025arXiv:2503.13070
11
citations
#4410

Bridging the User-side Knowledge Gap in Knowledge-aware Recommendations with Large Language Models

Zheng Hu, Zhe Li, Ziyun Jiao et al.

AAAI 2025paperarXiv:2412.13544
11
citations
#4411

Training Nonlinear Transformers for Chain-of-Thought Inference: A Theoretical Generalization Analysis

Hongkang Li, Songtao Lu, Pin-Yu Chen et al.

ICLR 2025arXiv:2410.02167
11
citations
#4412

VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by Video Spatiotemporal Augmentation

Weiming Ren, Huan Yang, Jie Min et al.

CVPR 2025arXiv:2412.00927
11
citations
#4413

The Computational Complexity of Circuit Discovery for Inner Interpretability

Federico Adolfi, Martina G. Vilas, Todd Wareham

ICLR 2025arXiv:2410.08025
11
citations
#4414

RFWave: Multi-band Rectified Flow for Audio Waveform Reconstruction

Peng Liu, Dongyang Dai, Zhiyong Wu

ICLR 2025arXiv:2403.05010
11
citations
#4415

Jailbreaking as a Reward Misspecification Problem

Zhihui Xie, Jiahui Gao, Lei Li et al.

ICLR 2025arXiv:2406.14393
11
citations
#4416

How Expressive are Knowledge Graph Foundation Models?

Xingyue Huang, Pablo Barcelo, Michael Bronstein et al.

ICML 2025arXiv:2502.13339
11
citations
#4417

3DGraphLLM: Combining Semantic Graphs and Large Language Models for 3D Scene Understanding

Tatiana Zemskova, Dmitry Yudin

ICCV 2025arXiv:2412.18450
11
citations
#4418

Revisiting Energy Based Models as Policies: Ranking Noise Contrastive Estimation and Interpolating Energy Models

Sumeet Singh, Vikas Sindhwani, Stephen Tu

ICLR 2025arXiv:2309.05803
11
citations
#4419

ViSAGe: Video-to-Spatial Audio Generation

Jaeyeon Kim, Heeseung Yun, Gunhee Kim

ICLR 2025oralarXiv:2506.12199
11
citations
#4420

TEASER: Token Enhanced Spatial Modeling for Expressions Reconstruction

Yunfei Liu, Lei Zhu, Lijian Lin et al.

ICLR 2025arXiv:2502.10982
11
citations
#4421

Reconstructing People, Places, and Cameras

Lea Müller, Hongsuk Choi, Anthony Zhang et al.

CVPR 2025highlightarXiv:2412.17806
11
citations
#4422

Don’t Think Longer, Think Wisely: Optimizing Thinking Dynamics for Large Reasoning Models

Sohyun An, Ruochen Wang, Tianyi Zhou et al.

NEURIPS 2025
11
citations
#4423

TweedieMix: Improving Multi-Concept Fusion for Diffusion-based Image/Video Generation

Gihyun Kwon, Jong Chul YE

ICLR 2025arXiv:2410.05591
11
citations
#4424

PhyMPGN: Physics-encoded Message Passing Graph Network for spatiotemporal PDE systems

Bocheng Zeng, Qi Wang, Mengtao Yan et al.

ICLR 2025oralarXiv:2410.01337
11
citations
#4425

Glad: A Streaming Scene Generator for Autonomous Driving

Bin Xie, Yingfei Liu, Tiancai Wang et al.

ICLR 2025oralarXiv:2503.00045
11
citations
#4426

Beyond Training: Dynamic Token Merging for Zero-Shot Video Understanding

Yiming Zhang, Zhuokai Zhao, Zhaorun Chen et al.

ICCV 2025arXiv:2411.14401
11
citations
#4427

How Much Can We Forget about Data Contamination?

Sebastian Bordt, Suraj Srinivas, Valentyn Boreiko et al.

ICML 2025arXiv:2410.03249
11
citations
#4428

Towards Enhanced Image Inpainting: Mitigating Unwanted Object Insertion and Preserving Color Consistency

Yikai Wang, Chenjie Cao, Junqiu Yu et al.

CVPR 2025highlightarXiv:2312.04831
11
citations
#4429

MAP: Low-compute Model Merging with Amortized Pareto Fronts via Quadratic Approximation

Lu Li, Tianyu Zhang, Zhiqi Bu et al.

ICLR 2025arXiv:2406.07529
10
citations
#4430

DynamicVL: Benchmarking Multimodal Large Language Models for Dynamic City Understanding

Weihao Xuan, Junjue Wang, Heli Qi et al.

NEURIPS 2025oralarXiv:2505.21076
10
citations
#4431

MoS: Unleashing Parameter Efficiency of Low-Rank Adaptation with Mixture of Shards

Sheng Wang, Liheng Chen, Pengan CHEN et al.

ICLR 2025arXiv:2410.00938
10
citations
#4432

Image Quality Assessment: Investigating Causal Perceptual Effects with Abductive Counterfactual Inference

Wenhao Shen, Mingliang Zhou, Yu Chen et al.

CVPR 2025arXiv:2412.16939
10
citations
#4433

Visual Test-time Scaling for GUI Agent Grounding

Tiange Luo, Lajanugen Logeswaran, Justin Johnson et al.

ICCV 2025highlightarXiv:2505.00684
10
citations
#4434

VideoElevator: Elevating Video Generation Quality with Versatile Text-to-Image Diffusion Models

Yabo Zhang, Yuxiang Wei, Xianhui Lin et al.

AAAI 2025paperarXiv:2403.05438
10
citations
#4435

PlaySlot: Learning Inverse Latent Dynamics for Controllable Object-Centric Video Prediction and Planning

Angel Villar-Corrales, Sven Behnke

ICML 2025arXiv:2502.07600
10
citations
#4436

HarmoniCa: Harmonizing Training and Inference for Better Feature Caching in Diffusion Transformer Acceleration

Yushi Huang, Zining Wang, Ruihao Gong et al.

ICML 2025arXiv:2410.01723
10
citations
#4437

ReQFlow: Rectified Quaternion Flow for Efficient and High-Quality Protein Backbone Generation

Angxiao Yue, Zichong Wang, Hongteng Xu

ICML 2025arXiv:2502.14637
10
citations
#4438

SeqAfford: Sequential 3D Affordance Reasoning via Multimodal Large Language Model

Chunlin Yu, Hanqing Wang, Ye Shi et al.

CVPR 2025arXiv:2412.01550
10
citations
#4439

You Only Prune Once: Designing Calibration-Free Model Compression With Policy Learning

Ayan Sengupta, Siddhant Chaudhary, Tanmoy Chakraborty

ICLR 2025arXiv:2501.15296
10
citations
#4440

Periodic Materials Generation using Text-Guided Joint Diffusion Model

KISHALAY DAS, Subhojyoti Khastagir, Pawan Goyal et al.

ICLR 2025arXiv:2503.00522
10
citations
#4441

SceneTAP: Scene-Coherent Typographic Adversarial Planner against Vision-Language Models in Real-World Environments

Yue Cao, Yun Xing, Jie Zhang et al.

CVPR 2025arXiv:2412.00114
10
citations
#4442

FisherTune: Fisher-Guided Robust Tuning of Vision Foundation Models for Domain Generalized Segmentation

Dong Zhao, Jinlong Li, Shuang Wang et al.

CVPR 2025arXiv:2503.17940
10
citations
#4443

LoCoDL: Communication-Efficient Distributed Learning with Local Training and Compression

Laurent Condat, Artavazd Maranjyan, Peter Richtarik

ICLR 2025arXiv:2403.04348
10
citations
#4444

Fundamental Limitations in Pointwise Defences of LLM Finetuning APIs

Xander Davies, Eric Winsor, Alexandra Souly et al.

NEURIPS 2025arXiv:2502.14828
10
citations
#4445

Purifying Shampoo: Investigating Shampoo's Heuristics by Decomposing its Preconditioner

Runa Eschenhagen, Aaron Defazio, Tsung-Hsien Lee et al.

NEURIPS 2025spotlightarXiv:2506.03595
10
citations
#4446

Prompt-A-Video: Prompt Your Video Diffusion Model via Preference-Aligned LLM

Yatai Ji, Jiacheng Zhang, Jie Wu et al.

ICCV 2025arXiv:2412.15156
10
citations
#4447

Debiasing Multimodal Large Language Models via Noise-Aware Preference Optimization

zefeng zhang, Hengzhu Tang, Jiawei Sheng et al.

CVPR 2025arXiv:2503.17928
10
citations
#4448

VisionPAD: A Vision-Centric Pre-training Paradigm for Autonomous Driving

Haiming Zhang, Wending Zhou, Shenzhen The Chinese University of Hongkong et al.

CVPR 2025arXiv:2411.14716
10
citations
#4449

Multi-modal brain encoding models for multi-modal stimuli

SUBBA REDDY OOTA, Khushbu Pahwa, mounika marreddy et al.

ICLR 2025arXiv:2505.20027
10
citations
#4450

MergeBench: A Benchmark for Merging Domain-Specialized LLMs

Yifei He, Siqi Zeng, Yuzheng Hu et al.

NEURIPS 2025arXiv:2505.10833
10
citations
#4451

CLIPure: Purification in Latent Space via CLIP for Adversarially Robust Zero-Shot Classification

Mingkun Zhang, Keping Bi, Wei Chen et al.

ICLR 2025arXiv:2502.18176
10
citations
#4452

Safety Reasoning with Guidelines

Haoyu Wang, Zeyu Qin, Li Shen et al.

ICML 2025arXiv:2502.04040
10
citations
#4453

Gaussian Mixture Flow Matching Models

Hansheng Chen, Kai Zhang, Hao Tan et al.

ICML 2025arXiv:2504.05304
10
citations
#4454

SplatFlow: Self-Supervised Dynamic Gaussian Splatting in Neural Motion Flow Field for Autonomous Driving

Su Sun, Cheng Zhao, Zhuoyang Sun et al.

CVPR 2025highlightarXiv:2411.15482
10
citations
#4455

Flexible Frame Selection for Efficient Video Reasoning

Shyamal Buch, Arsha Nagrani, Anurag Arnab et al.

CVPR 2025
10
citations
#4456

RomanTex: Decoupling 3D-aware Rotary Positional Embedded Multi-Attention Network for Texture Synthesis

yifei feng, Mx Yang, Shuhui Yang et al.

ICCV 2025arXiv:2503.19011
10
citations
#4457

Spectral Image Tokenizer

Carlos Esteves, Mohammed Suhail, Ameesh Makadia

ICCV 2025arXiv:2412.09607
10
citations
#4458

PENCIL: Long Thoughts with Short Memory

Chenxiao Yang, Nati Srebro, David McAllester et al.

ICML 2025arXiv:2503.14337
10
citations
#4459

HO-Cap: A Capture System and Dataset for 3D Reconstruction and Pose Tracking of Hand-Object Interaction

Jikai Wang, Qifan Zhang, Yu-Wei Chao et al.

NEURIPS 2025arXiv:2406.06843
10
citations
#4460

Improved Training Technique for Latent Consistency Models

Minh Quan Dao, Khanh Doan, Di Liu et al.

ICLR 2025arXiv:2502.01441
10
citations
#4461

EndoBench: A Comprehensive Evaluation of Multi-Modal Large Language Models for Endoscopy Analysis

Shengyuan Liu, Boyun Zheng, Wenting Chen et al.

NEURIPS 2025arXiv:2505.23601
10
citations
#4462

S4-Driver: Scalable Self-Supervised Driving Multimodal Large Language Model with Spatio-Temporal Visual Representation

Yichen Xie, Runsheng Xu, Tong He et al.

CVPR 2025
10
citations
#4463

Self-Improving Language Models for Evolutionary Program Synthesis: A Case Study on ARC-AGI

Julien Pourcel, Cédric Colas, Pierre-Yves Oudeyer

ICML 2025arXiv:2507.14172
10
citations
#4464

ICM-Assistant: Instruction-tuning Multimodal Large Language Models for Rule-based Explainable Image Content Moderation

Mengyang Wu, Yuzhi Zhao, Jialun Cao et al.

AAAI 2025paperarXiv:2412.18216
10
citations
#4465

Implicit In-context Learning

Zhuowei Li, Zihao Xu, Ligong Han et al.

ICLR 2025arXiv:2405.14660
10
citations
#4466

LeFusion: Controllable Pathology Synthesis via Lesion-Focused Diffusion Models

Hantao Zhang, Yuhe Liu, Jiancheng Yang et al.

ICLR 2025arXiv:2403.14066
10
citations
#4467

Measuring the Faithfulness of Thinking Drafts in Large Reasoning Models

Zidi Xiong, Shan Chen, Zhenting Qi et al.

NEURIPS 2025arXiv:2505.13774
10
citations
#4468

Constrain Alignment with Sparse Autoencoders

Qingyu Yin, Chak Tou Leong, Hongbo Zhang et al.

ICML 2025arXiv:2411.07618
10
citations
#4469

GraphAvatar: Compact Head Avatars with GNN-Generated 3D Gaussians

Xiaobao Wei, Peng Chen, Ming Lu et al.

AAAI 2025paperarXiv:2412.13983
10
citations
#4470

SEFE: Superficial and Essential Forgetting Eliminator for Multimodal Continual Instruction Tuning

Jinpeng Chen, Runmin Cong, Yuzhi Zhao et al.

ICML 2025arXiv:2505.02486
10
citations
#4471

Overcoming False Illusions in Real-World Face Restoration with Multi-Modal Guided Diffusion Model

Keda TAO, Jinjin Gu, Yulun Zhang et al.

ICLR 2025arXiv:2410.04161
10
citations
#4472

EnergyMoGen: Compositional Human Motion Generation with Energy-Based Diffusion Model in Latent Space

Jianrong Zhang, Hehe Fan, Yi Yang

CVPR 2025highlightarXiv:2412.14706
10
citations
#4473

Unisolver: PDE-Conditional Transformers Towards Universal Neural PDE Solvers

Hang Zhou, Yuezhou Ma, Haixu Wu et al.

ICML 2025arXiv:2405.17527
10
citations
#4474

Radiology Report Generation via Multi-objective Preference Optimization

Ting Xiao, Lei Shi, Peng Liu et al.

AAAI 2025paperarXiv:2412.08901
10
citations
#4475

Step-by-Step Reasoning for Math Problems via Twisted Sequential Monte Carlo

Shengyu Feng, Xiang Kong, shuang ma et al.

ICLR 2025arXiv:2410.01920
10
citations
#4476

K-Sort Arena: Efficient and Reliable Benchmarking for Generative Models via K-wise Human Preferences

Zhikai Li, Xuewen Liu, Dongrong Joe Fu et al.

CVPR 2025arXiv:2408.14468
10
citations
#4477

GuidedQuant: Large Language Model Quantization via Exploiting End Loss Guidance

Jinuk Kim, Marwa El Halabi, Wonpyo Park et al.

ICML 2025arXiv:2505.07004
10
citations
#4478

Gemstones: A Model Suite for Multi-Faceted Scaling Laws

Sean McLeish, John Kirchenbauer, David Miller et al.

NEURIPS 2025arXiv:2502.06857
10
citations
#4479

Prior Does Matter: Visual Navigation via Denoising Diffusion Bridge Models

Hao Ren, Yiming Zeng, Zetong Bi et al.

CVPR 2025arXiv:2504.10041
10
citations
#4480

Diffusion State-Guided Projected Gradient for Inverse Problems

Rayhan Zirvi, Bahareh Tolooshams, anima anandkumar

ICLR 2025arXiv:2410.03463
10
citations
#4481

Towards Federated RLHF with Aggregated Client Preference for LLMs

Feijie Wu, Xiaoze Liu, Haoyu Wang et al.

ICLR 2025arXiv:2407.03038
10
citations
#4482

FormalAlign: Automated Alignment Evaluation for Autoformalization

Jianqiao Lu, Yingjia Wan, Yinya Huang et al.

ICLR 2025arXiv:2410.10135
10
citations
#4483

ID-Patch: Robust ID Association for Group Photo Personalization

Yimeng Zhang, Tiancheng Zhi, Jing Liu et al.

CVPR 2025arXiv:2411.13632
10
citations
#4484

Bootstrapping Language-Guided Navigation Learning with Self-Refining Data Flywheel

Zun Wang, Jialu Li, Yicong Hong et al.

ICLR 2025arXiv:2412.08467
10
citations
#4485

M2OST: Many-to-one Regression for Predicting Spatial Transcriptomics from Digital Pathology Images

Hongyi Wang, Xiuju Du, Jing Liu et al.

AAAI 2025paperarXiv:2409.15092
10
citations
#4486

Scaling Stick-Breaking Attention: An Efficient Implementation and In-depth Study

Shawn Tan, Songlin Yang, Aaron Courville et al.

ICLR 2025arXiv:2410.17980
10
citations
#4487

Wide Neural Networks Trained with Weight Decay Provably Exhibit Neural Collapse

Arthur Jacot, Peter Súkeník, Zihan Wang et al.

ICLR 2025arXiv:2410.04887
10
citations
#4488

RHanDS: Refining Malformed Hands for Generated Images with Decoupled Structure and Style Guidance

Chengrui Wang, Pengfei Liu, Min Zhou et al.

AAAI 2025paperarXiv:2404.13984
10
citations
#4489

$\infty$-Video: A Training-Free Approach to Long Video Understanding via Continuous-Time Memory Consolidation

Saúl Santos, António Farinhas, Daniel McNamee et al.

ICML 2025arXiv:2501.19098
10
citations
#4490

GENMO: A GENeralist Model for Human MOtion

Jiefeng Li, Jinkun Cao, Haotian Zhang et al.

ICCV 2025highlightarXiv:2505.01425
10
citations
#4491

Selective Prompt Anchoring for Code Generation

Yuan Tian, Tianyi Zhang

ICML 2025arXiv:2408.09121
10
citations
#4492

Needle Threading: Can LLMs Follow Threads Through Near-Million-Scale Haystacks?

Jonathan Roberts, Kai Han, Samuel Albanie

ICLR 2025arXiv:2411.05000
10
citations
#4493

Taylor Series-Inspired Local Structure Fitting Network for Few-shot Point Cloud Semantic Segmentation

Changshuo Wang, Shuting He, Xiang Fang et al.

AAAI 2025paperarXiv:2504.02454
10
citations
#4494

ADBA: Approximation Decision Boundary Approach for Black-Box Adversarial Attacks

Feiyang Wang, Xingquan Zuo, Hai Huang et al.

AAAI 2025paper
10
citations
#4495

Lorentz Local Canonicalization: How to make any Network Lorentz-Equivariant

Jonas Spinner, Luigi Favaro, Peter Lippmann et al.

NEURIPS 2025arXiv:2505.20280
10
citations
#4496

Nautilus: Locality-aware Autoencoder for Scalable Mesh Generation

Yuxuan Wang, Xuanyu Yi, Haohan Weng et al.

ICCV 2025arXiv:2501.14317
10
citations
#4497

Interpreting the Repeated Token Phenomenon in Large Language Models

Itay Yona, Ilia Shumailov, Jamie Hayes et al.

ICML 2025arXiv:2503.08908
10
citations
#4498

ForgerySleuth: Empowering Multimodal Large Language Models for Image Manipulation Detection

Zhihao Sun, Haoran Jiang, Haoran Chen et al.

NEURIPS 2025arXiv:2411.19466
10
citations
#4499

A Closer Look at Multimodal Representation Collapse

Abhra Chaudhuri, Anjan Dutta, Tu Bui et al.

ICML 2025spotlightarXiv:2505.22483
10
citations
#4500

Convergent Privacy Loss of Noisy-SGD without Convexity and Smoothness

Eli Chien, Pan Li

ICLR 2025arXiv:2410.01068
10
citations
#4501

3D Gaussian Inpainting with Depth-Guided Cross-View Consistency

Sheng-Yu Huang, Zi-Ting Chou, Yu-Chiang Frank Wang

CVPR 2025arXiv:2502.11801
10
citations
#4502

Denoising as Adaptation: Noise-Space Domain Adaptation for Image Restoration

Kang Liao, Zongsheng Yue, Zhouxia Wang et al.

ICLR 2025arXiv:2406.18516
10
citations
#4503

Node-Time Conditional Prompt Learning in Dynamic Graphs

Xingtong Yu, Zhenghao Liu, Xinming Zhang et al.

ICLR 2025oralarXiv:2405.13937
10
citations
#4504

R.I.P.: Better Models by Survival of the Fittest Prompts

Ping Yu, Weizhe Yuan, Olga Golovneva et al.

ICML 2025arXiv:2501.18578
10
citations
#4505

ThermalGaussian: Thermal 3D Gaussian Splatting

Rongfeng Lu, Hangyu Chen, Zunjie Zhu et al.

ICLR 2025arXiv:2409.07200
10
citations
#4506

MONA: Myopic Optimization with Non-myopic Approval Can Mitigate Multi-step Reward Hacking

Sebastian Farquhar, Vikrant Varma, David Lindner et al.

ICML 2025arXiv:2501.13011
10
citations
#4507

RocketKV: Accelerating Long-Context LLM Inference via Two-Stage KV Cache Compression

Payman Behnam, Yaosheng Fu, Ritchie Zhao et al.

ICML 2025arXiv:2502.14051
10
citations
#4508

TraF-Align: Trajectory-aware Feature Alignment for Asynchronous Multi-agent Perception

Zhiying Song, Lei Yang, Fuxi Wen et al.

CVPR 2025arXiv:2503.19391
10
citations
#4509

What Does It Mean to Be a Transformer? Insights from a Theoretical Hessian Analysis

Weronika Ormaniec, Felix Dangel, Sidak Pal Singh

ICLR 2025arXiv:2410.10986
10
citations
#4510

OMNI-DC: Highly Robust Depth Completion with Multiresolution Depth Integration

Yiming Zuo, Willow Yang, Zeyu Ma et al.

ICCV 2025arXiv:2411.19278
10
citations
#4511

HyperGS: Hyperspectral 3D Gaussian Splatting

Christopher Thirgood, Oscar Mendez, Erin Chao Ling et al.

CVPR 2025arXiv:2412.12849
10
citations
#4512

Synthesizing Software Engineering Data in a Test-Driven Manner

Lei Zhang, Jiaxi Yang, Min Yang et al.

ICML 2025arXiv:2506.09003
10
citations
#4513

A Little Goes a Long Way: Efficient Long Context Training and Inference with Partial Contexts

Suyu Ge, Xihui Lin, Yunan Zhang et al.

ICLR 2025arXiv:2410.01485
10
citations
#4514

Efficient Randomized Experiments Using Foundation Models

Piersilvio De Bartolomeis, Javier Abad, Guanbo Wang et al.

NEURIPS 2025arXiv:2502.04262
10
citations
#4515

Hydra-SGG: Hybrid Relation Assignment for One-stage Scene Graph Generation

Minghan Chen, Guikun Chen, Wenguan Wang et al.

ICLR 2025arXiv:2409.10262
10
citations
#4516

Continuous Diffusion for Mixed-Type Tabular Data

Markus Mueller, Kathrin Gruber, Dennis Fok

ICLR 2025arXiv:2312.10431
10
citations
#4517

Self-Improving Embodied Foundation Models

Seyed Kamyar Seyed Ghasemipour, Ayzaan Wahid, Jonathan Tompson et al.

NEURIPS 2025oralarXiv:2509.15155
10
citations
#4518

SmartPretrain: Model-Agnostic and Dataset-Agnostic Representation Learning for Motion Prediction

Yang Zhou, Hao Shao, Letian Wang et al.

ICLR 2025oralarXiv:2410.08669
10
citations
#4519

When Do LLMs Help With Node Classification? A Comprehensive Analysis

Xixi Wu, Yifei Shen, Fangzhou Ge et al.

ICML 2025arXiv:2502.00829
10
citations
#4520

Fundamental limits of learning in sequence multi-index models and deep attention networks: high-dimensional asymptotics and sharp thresholds

Emanuele Troiani, Hugo Cui, Yatin Dandi et al.

ICML 2025arXiv:2502.00901
10
citations
#4521

HiMoR: Monocular Deformable Gaussian Reconstruction with Hierarchical Motion Representation

Yiming Liang, Tianhan Xu, Yuta Kikuchi

CVPR 2025arXiv:2504.06210
10
citations
#4522

Efficient Attention-Sharing Information Distillation Transformer for Lightweight Single Image Super-Resolution

Karam Park, Jae Woong Soh, Nam Ik Cho

AAAI 2025paperarXiv:2501.15774
10
citations
#4523

Robot-R1: Reinforcement Learning for Enhanced Embodied Reasoning in Robotics

Dongyoung Kim, Huiwon Jang, Sumin Park et al.

NEURIPS 2025arXiv:2506.00070
10
citations
#4524

How Much Can Transfer? BRIDGE: Bounded Multi-Domain Graph Foundation Model with Generalization Guarantees

Haonan Yuan, Qingyun Sun, Junhua Shi et al.

ICML 2025
10
citations
#4525

Deep Linear Probe Generators for Weight Space Learning

Jonathan Kahana, Eliahu Horwitz, Imri Shuval et al.

ICLR 2025arXiv:2410.10811
10
citations
#4526

Efficient Model Editing with Task-Localized Sparse Fine-tuning

Leonardo Iurada, Marco Ciccone, Tatiana Tommasi

ICLR 2025arXiv:2504.02620
10
citations
#4527

TAR3D: Creating High-Quality 3D Assets via Next-Part Prediction

Xuying Zhang, Yutong Liu, Yangguang Li et al.

ICCV 2025arXiv:2412.16919
10
citations
#4528

Data-Driven Performance Guarantees for Classical and Learned Optimizers

Rajiv Sambharya, Bartolomeo Stellato

NEURIPS 2025arXiv:2404.13831
10
citations
#4529

DM-Adapter: Domain-Aware Mixture-of-Adapters for Text-Based Person Retrieval

Yating Liu, Zimo Liu, Xiangyuan Lan et al.

AAAI 2025paperarXiv:2503.04144
10
citations
#4530

Disco4D: Disentangled 4D Human Generation and Animation from a Single Image

Hui En Pang, Shuai Liu, Zhongang Cai et al.

CVPR 2025arXiv:2409.17280
10
citations
#4531

MTU-Bench: A Multi-granularity Tool-Use Benchmark for Large Language Models

Pei Wang, Yanan Wu, Zekun Wang et al.

ICLR 2025arXiv:2410.11710
10
citations
#4532

MLLM-For3D: Adapting Multimodal Large Language Model for 3D Reasoning Segmentation

Jiaxin Huang, Runnan Chen, Ziwen Li et al.

NEURIPS 2025arXiv:2503.18135
10
citations
#4533

Improving Transformer World Models for Data-Efficient RL

Antoine Dedieu, Joseph Ortiz, Xinghua Lou et al.

ICML 2025arXiv:2502.01591
10
citations
#4534

GTR: Improving Large 3D Reconstruction Models through Geometry and Texture Refinement

Peiye Zhuang, Songfang Han, Chaoyang Wang et al.

ICLR 2025arXiv:2406.05649
10
citations
#4535

Large Convolutional Model Tuning via Filter Subspace

Wei Chen, Zichen Miao, Qiang Qiu

ICLR 2025arXiv:2403.00269
10
citations
#4536

Language Models Trained to do Arithmetic Predict Human Risky and Intertemporal Choice

Jian-Qiao Zhu, Haijiang Yan, Thomas L. Griffiths

ICLR 2025oralarXiv:2405.19313
10
citations
#4537

On the Training Convergence of Transformers for In-Context Classification of Gaussian Mixtures

Wei Shen, Ruida Zhou, Jing Yang et al.

ICML 2025arXiv:2410.11778
10
citations
#4538

Gaussian Eigen Models for Human Heads

Wojciech Zielonka, Timo Bolkart, Thabo Beeler et al.

CVPR 2025arXiv:2407.04545
10
citations
#4539

The Non-Linear Representation Dilemma: Is Causal Abstraction Enough for Mechanistic Interpretability?

Denis Sutter, Julian Minder, Thomas Hofmann et al.

NEURIPS 2025spotlightarXiv:2507.08802
10
citations
#4540

RoMo: Robust Motion Segmentation Improves Structure from Motion

Lily Goli, Sara Sabour, Mark Matthews et al.

ICCV 2025arXiv:2411.18650
10
citations
#4541

Stochastic Control for Fine-tuning Diffusion Models: Optimality, Regularity, and Convergence

Yinbin Han, Meisam Razaviyayn, Renyuan Xu

ICML 2025arXiv:2412.18164
10
citations
#4542

IterGen: Iterative Semantic-aware Structured LLM Generation with Backtracking

Shubham Dipak Ugare, Rohan Gumaste, Tarun Suresh et al.

ICLR 2025arXiv:2410.07295
10
citations
#4543

EDEN: Enhanced Diffusion for High-quality Large-motion Video Frame Interpolation

Zihao Zhang, Haoran Chen, Haoyu Zhao et al.

CVPR 2025arXiv:2503.15831
10
citations
#4544

FilterTS: Comprehensive Frequency Filtering for Multivariate Time Series Forecasting

Yulong Wang, Yushuo Liu, Xiaoyi Duan et al.

AAAI 2025paperarXiv:2505.04158
10
citations
#4545

Beyond the convexity assumption: Realistic tabular data generation under quantifier-free real linear constraints

Mihaela Stoian, Eleonora Giunchiglia

ICLR 2025arXiv:2502.18237
10
citations
#4546

Atlas Gaussians Diffusion for 3D Generation

Haitao Yang, Yuan Dong, Hanwen Jiang et al.

ICLR 2025arXiv:2408.13055
10
citations
#4547

R-SCoRe: Revisiting Scene Coordinate Regression for Robust Large-Scale Visual Localization

Xudong Jiang, Fangjinhua Wang, Silvano Galliani et al.

CVPR 2025arXiv:2501.01421
10
citations
#4548

Zeroth-Order Policy Gradient for Reinforcement Learning from Human Feedback without Reward Inference

Qining Zhang, Lei Ying

ICLR 2025arXiv:2409.17401
10
citations
#4549

Re-HOLD: Video Hand Object Interaction Reenactment via adaptive Layout-instructed Diffusion Model

Yingying Fan, Quanwei Yang, Kaisiyuan Wang et al.

CVPR 2025arXiv:2503.16942
10
citations
#4550

ViG: Linear-complexity Visual Sequence Learning with Gated Linear Attention

Bencheng Liao, Xinggang Wang, Lianghui Zhu et al.

AAAI 2025paperarXiv:2405.18425
10
citations
#4551

Intrinsic User-Centric Interpretability through Global Mixture of Experts

Vinitra Swamy, Syrielle Montariol, Julian Blackwell et al.

ICLR 2025arXiv:2402.02933
10
citations
#4552

Multi-Session Budget Optimization for Forward Auction-based Federated Learning

Xiaoli Tang, Han Yu, Zengxiang Li et al.

ICML 2025arXiv:2311.12548
10
citations
#4553

Momentum-SAM: Sharpness Aware Minimization without Computational Overhead

Marlon Becker, Frederick Altrock, Benjamin Risse

NEURIPS 2025arXiv:2401.12033
10
citations
#4554

AVTrustBench: Assessing and Enhancing Reliability and Robustness in Audio-Visual LLMs

Sanjoy Chowdhury, Sayan Nag, Subhrajyoti Dasgupta et al.

ICCV 2025arXiv:2501.02135
10
citations
#4555

Towards Understanding Fine-Tuning Mechanisms of LLMs via Circuit Analysis

Xu Wang, Yan Hu, Wenyu Du et al.

ICML 2025arXiv:2502.11812
10
citations
#4556

Unleashing Hour-Scale Video Training for Long Video-Language Understanding

Jingyang Lin, Jialian Wu, Ximeng Sun et al.

NEURIPS 2025oralarXiv:2506.05332
10
citations
#4557

Identifying Macro Conditional Independencies and Macro Total Effects in Summary Causal Graphs with Latent Confounding

Simon Ferreira, Charles K. Assaad

AAAI 2025paperarXiv:2407.07934
10
citations
#4558

LITA-GS: Illumination-Agnostic Novel View Synthesis via Reference-Free 3D Gaussian Splatting and Physical Priors

Han Zhou, Wei Dong, Jun Chen

CVPR 2025arXiv:2504.00219
10
citations
#4559

FPSAttention: Training-Aware FP8 and Sparsity Co-Design for Fast Video Diffusion

Akide Liu, Zeyu Zhang, Zhexin Li et al.

NEURIPS 2025spotlightarXiv:2506.04648
10
citations
#4560

AnimateAnyMesh: A Feed-Forward 4D Foundation Model for Text-Driven Universal Mesh Animation

zijie wu, Chaohui Yu, Fan Wang et al.

ICCV 2025arXiv:2506.09982
10
citations
#4561

ChartMuseum: Testing Visual Reasoning Capabilities of Large Vision-Language Models

Liyan Tang, Grace Kim, Xinyu Zhao et al.

NEURIPS 2025arXiv:2505.13444
10
citations
#4562

BodyGen: Advancing Towards Efficient Embodiment Co-Design

Haofei Lu, Zhe Wu, Junliang Xing et al.

ICLR 2025oralarXiv:2503.00533
10
citations
#4563

Improving Parallel Program Performance with LLM Optimizers via Agent-System Interfaces

Anjiang Wei, Allen Nie, Thiago Teixeira et al.

ICML 2025arXiv:2410.15625
10
citations
#4564

Tool Unlearning for Tool-Augmented LLMs

Jiali Cheng, Hadi Amiri

ICML 2025arXiv:2502.01083
10
citations
#4565

Think Then React: Towards Unconstrained Action-to-Reaction Motion Generation

Wenhui Tan, Boyuan Li, Chuhao Jin et al.

ICLR 2025
10
citations
#4566

Knowledge-Aligned Counterfactual-Enhancement Diffusion Perception for Unsupervised Cross-Domain Visual Emotion Recognition

Wen Yin, Yong Wang, Guiduo Duan et al.

CVPR 2025arXiv:2505.19694
10
citations
#4567

Adversarial Training Can Provably Improve Robustness: Theoretical Analysis of Feature Learning Process Under Structured Data

Binghui Li, Yuanzhi Li

ICLR 2025arXiv:2410.08503
10
citations
#4568

Temporal Separation with Entropy Regularization for Knowledge Distillation in Spiking Neural Networks

Kairong Yu, Chengting Yu, Tianqing Zhang et al.

CVPR 2025arXiv:2503.03144
10
citations
#4569

Not All Prompts Are Made Equal: Prompt-based Pruning of Text-to-Image Diffusion Models

Alireza Ganjdanesh, Reza Shirkavand, Shangqian Gao et al.

ICLR 2025arXiv:2406.12042
10
citations
#4570

Consistency Checks for Language Model Forecasters

Daniel Paleka, Abhimanyu Pallavi Sudhir, Alejandro Alvarez et al.

ICLR 2025arXiv:2412.18544
10
citations
#4571

Objective drives the consistency of representational similarity across datasets

Laure Ciernik, Lorenz Linhardt, Marco Morik et al.

ICML 2025arXiv:2411.05561
10
citations
#4572

V2X-R: Cooperative LiDAR-4D Radar Fusion with Denoising Diffusion for 3D Object Detection

Xun Huang, Jinlong Wang, Qiming Xia et al.

CVPR 2025arXiv:2411.08402
10
citations
#4573

VSTAR: Generative Temporal Nursing for Longer Dynamic Video Synthesis

Yumeng Li, William H Beluch, Margret Keuper et al.

ICLR 2025oralarXiv:2403.13501
10
citations
#4574

TACO: Taming Diffusion for in-the-wild Video Amodal Completion

Ruijie Lu, Yixin Chen, Yu Liu et al.

ICCV 2025arXiv:2503.12049
10
citations
#4575

Towards Hierarchical Rectified Flow

Yichi Zhang, Yici Yan, Alex Schwing et al.

ICLR 2025arXiv:2502.17436
10
citations
#4576

SoundCTM: Unifying Score-based and Consistency Models for Full-band Text-to-Sound Generation

Koichi Saito, Dongjun Kim, Takashi Shibuya et al.

ICLR 2025arXiv:2405.18503
10
citations
#4577

Improved Off-policy Reinforcement Learning in Biological Sequence Design

Hyeonah Kim, Minsu Kim, Taeyoung Yun et al.

ICML 2025arXiv:2410.04461
10
citations
#4578

Neural Exploratory Landscape Analysis for Meta-Black-Box-Optimization

Zeyuan Ma, Jiacheng Chen, Hongshu Guo et al.

ICLR 2025arXiv:2408.10672
10
citations
#4579

SSLAM: Enhancing Self-Supervised Models with Audio Mixtures for Polyphonic Soundscapes

Tony Alex, Sara Atito, Armin Mustafa et al.

ICLR 2025arXiv:2506.12222
10
citations
#4580

Antidistillation Sampling

Yash Savani, Asher Trockman, Zhili Feng et al.

NEURIPS 2025arXiv:2504.13146
10
citations
#4581

MMKE-Bench: A Multimodal Editing Benchmark for Diverse Visual Knowledge

yuntao du, Kailin Jiang, Zhi Gao et al.

ICLR 2025arXiv:2502.19870
10
citations
#4582

PCDreamer: Point Cloud Completion Through Multi-view Diffusion Priors

Guangshun Wei, Yuan Feng, Long Ma et al.

CVPR 2025arXiv:2411.19036
10
citations
#4583

Boosting Latent Diffusion with Perceptual Objectives

Tariq Berrada, Pietro Astolfi, Melissa Hall et al.

ICLR 2025arXiv:2411.04873
10
citations
#4584

RANKCLIP: Ranking-Consistent Language-Image Pretraining

Yiming Zhang, Zhuokai Zhao, Zhaorun Chen et al.

ICCV 2025arXiv:2404.09387
10
citations
#4585

Quartet: Native FP4 Training Can Be Optimal for Large Language Models

Roberto Castro, Andrei Panferov, Rush Tabesh et al.

NEURIPS 2025arXiv:2505.14669
10
citations
#4586

Large Multi-modal Models Can Interpret Features in Large Multi-modal Models

Kaichen Zhang, Yifei Shen, Bo Li et al.

ICCV 2025arXiv:2411.14982
10
citations
#4587

Disentangled Motion Modeling for Video Frame Interpolation

Jaihyun Lew, Jooyoung Choi, Chaehun Shin et al.

AAAI 2025paperarXiv:2406.17256
10
citations
#4588

On Measuring Long-Range Interactions in Graph Neural Networks

Jacob Bamberger, Benjamin Gutteridge, Scott le Roux et al.

ICML 2025arXiv:2506.05971
10
citations
#4589

ProtoOcc: Accurate, Efficient 3D Occupancy Prediction Using Dual Branch Encoder-Prototype Query Decoder

Jungho Kim, Changwon Kang, Dongyoung Lee et al.

AAAI 2025paperarXiv:2412.08774
10
citations
#4590

A Recipe for Generating 3D Worlds from a Single Image

Katja Schwarz, Denis Rozumny, Samuel Rota Bulò et al.

ICCV 2025arXiv:2503.16611
10
citations
#4591

SurFhead: Affine Rig Blending for Geometrically Accurate 2D Gaussian Surfel Head Avatars

Jaeseong Lee, Taewoong Kang, Marcel Buehler et al.

ICLR 2025arXiv:2410.11682
10
citations
#4592

Beyond Walking: A Large-Scale Image-Text Benchmark for Text-based Person Anomaly Search

Shuyu Yang, Yaxiong Wang, Li Zhu et al.

ICCV 2025highlightarXiv:2411.17776
10
citations
#4593

MonoTAKD: Teaching Assistant Knowledge Distillation for Monocular 3D Object Detection

Hou-I Liu, Christine Wu, Jen-Hao Cheng et al.

CVPR 2025arXiv:2404.04910
10
citations
#4594

ConfTuner: Training Large Language Models to Express Their Confidence Verbally

Yibo Li, Miao Xiong, Jiaying Wu et al.

NEURIPS 2025arXiv:2508.18847
10
citations
#4595

Model-based RL as a Minimalist Approach to Horizon-Free and Second-Order Bounds

Zhiyong Wang, Dongruo Zhou, John C.S. Lui et al.

ICLR 2025arXiv:2408.08994
10
citations
#4596

MIP against Agent: Malicious Image Patches Hijacking Multimodal OS Agents

Lukas Aichberger, Alasdair Paren, Guohao Li et al.

NEURIPS 2025arXiv:2503.10809
10
citations
#4597

MA-LoT: Model-Collaboration Lean-based Long Chain-of-Thought Reasoning enhances Formal Theorem Proving

Ruida Wang, Rui Pan, Yuxin Li et al.

ICML 2025arXiv:2503.03205
10
citations
#4598

VTON 360: High-Fidelity Virtual Try-On from Any Viewing Direction

Zijian He, Yuwei Ning, Yipeng Qin et al.

CVPR 2025arXiv:2503.12165
10
citations
#4599

Sensor-Invariant Tactile Representation

Harsh Gupta, Yuchen Mo, Shengmiao Jin et al.

ICLR 2025arXiv:2502.19638
10
citations
#4600

HMAR: Efficient Hierarchical Masked Auto-Regressive Image Generation

Hermann Kumbong, Xian Liu, Tsung-Yi Lin et al.

CVPR 2025arXiv:2506.04421
10
citations