Most Cited 2025 "time-dependent attention" Papers

22,274 papers found • Page 20 of 112

#3801

Rewind-to-Delete: Certified Machine Unlearning for Nonconvex Functions

Siqiao Mu, Diego Klabjan

NEURIPS 2025arXiv:2409.09778
12
citations
#3802

Learning-Order Autoregressive Models with Application to Molecular Graph Generation

Zhe Wang, Jiaxin Shi, Nicolas Heess et al.

ICML 2025arXiv:2503.05979
12
citations
#3803

AdaRankGrad: Adaptive Gradient Rank and Moments for Memory-Efficient LLMs Training and Fine-Tuning

Yehonathan Refael, Jonathan Svirsky, Boris Shustin et al.

ICLR 2025arXiv:2410.17881
12
citations
#3804

AdaSkip: Adaptive Sublayer Skipping for Accelerating Long-Context LLM Inference

Zhuomin He, Yizhen Yao, Pengfei Zuo et al.

AAAI 2025paperarXiv:2501.02336
12
citations
#3805

Vec2Face: Scaling Face Dataset Generation with Loosely Constrained Vectors

Haiyu Wu, Jaskirat Singh, Sicong Tian et al.

ICLR 2025arXiv:2409.02979
12
citations
#3806

Strategy Coopetition Explains the Emergence and Transience of In-Context Learning

Aaditya Singh, Ted Moskovitz, Sara Dragutinović et al.

ICML 2025oralarXiv:2503.05631
12
citations
#3807

Layer- and Timestep-Adaptive Differentiable Token Compression Ratios for Efficient Diffusion Transformers

Haoran You, Connelly Barnes, Yuqian Zhou et al.

CVPR 2025arXiv:2412.16822
12
citations
#3808

TopoNets: High performing vision and language models with brain-like topography

Mayukh Deb, Mainak Deb, Apurva Murty

ICLR 2025arXiv:2501.16396
12
citations
#3809

Epistemic Alignment: A Mediating Framework for User-LLM Knowledge Delivery

Nicholas Clark, Hua Shen, Bill Howe et al.

COLM 2025paperarXiv:2504.01205
12
citations
#3810

FramePainter: Endowing Interactive Image Editing with Video Diffusion Priors

Yabo Zhang, xinpeng zhou, Yihan Zeng et al.

ICCV 2025arXiv:2501.08225
12
citations
#3811

Mol-LLaMA: Towards General Understanding of Molecules in Large Molecular Language Model

Dongki Kim, Wonbin Lee, Sung Ju Hwang

NEURIPS 2025arXiv:2502.13449
12
citations
#3812

Localized Concept Erasure for Text-to-Image Diffusion Models Using Training-Free Gated Low-Rank Adaptation

Byung Hyun Lee, Sungjin Lim, Se Young Chun

CVPR 2025arXiv:2503.12356
12
citations
#3813

Revisiting Zeroth-Order Optimization: Minimum-Variance Two-Point Estimators and Directionally Aligned Perturbations

Shaocong Ma, Heng Huang

ICLR 2025arXiv:2510.19975
12
citations
#3814

Breaking Latent Prior Bias in Detectors for Generalizable AIGC Image Detection

Yue Zhou, Xinan He, Kaiqing Lin et al.

NEURIPS 2025arXiv:2506.00874
12
citations
#3815

Towards Training-free Anomaly Detection with Vision and Language Foundation Models

Jinjin Zhang, Guodong Wang, yizhou jin et al.

CVPR 2025arXiv:2503.18325
12
citations
#3816

LIMOPro: Reasoning Refinement for Efficient and Effective Test-time Scaling

Yang Xiao, Jiashuo WANG, Ruifeng Yuan et al.

NEURIPS 2025arXiv:2505.19187
12
citations
#3817

CLIPer: Hierarchically Improving Spatial Representation of CLIP for Open-Vocabulary Semantic Segmentation

Lin Sun, Jiale Cao, Jin Xie et al.

ICCV 2025arXiv:2411.13836
12
citations
#3818

Learning 2D Invariant Affordance Knowledge for 3D Affordance Grounding

Xianqiang Gao, Pingrui Zhang, Delin Qu et al.

AAAI 2025paperarXiv:2408.13024
12
citations
#3819

Enhancing Multilingual LLM Pretraining with Model-Based Data Selection

Bettina Messmer, Vinko Sabolčec, Martin Jaggi

NEURIPS 2025arXiv:2502.10361
12
citations
#3820

BFANet: Revisiting 3D Semantic Segmentation with Boundary Feature Analysis

Weiguang Zhao, Rui Zhang, Qiufeng Wang et al.

CVPR 2025arXiv:2503.12539
12
citations
#3821

Brain Mapping with Dense Features: Grounding Cortical Semantic Selectivity in Natural Images With Vision Transformers

Andrew Luo, Jacob Yeung, Rushikesh Zawar et al.

ICLR 2025arXiv:2410.05266
12
citations
#3822

ICE-Bench: A Unified and Comprehensive Benchmark for Image Creating and Editing

Yulin Pan, Xiangteng He, Chaojie Mao et al.

ICCV 2025arXiv:2503.14482
12
citations
#3823

Exploring and Mitigating Adversarial Manipulation of Voting-Based Leaderboards

Yangsibo Huang, Milad Nasr, Anastasios Angelopoulos et al.

ICML 2025oralarXiv:2501.07493
12
citations
#3824

Adapter Merging with Centroid Prototype Mapping for Scalable Class-Incremental Learning

Takuma Fukuda, Hiroshi Kera, Kazuhiko Kawamoto

CVPR 2025arXiv:2412.18219
12
citations
#3825

CharaConsist: Fine-Grained Consistent Character Generation

Mengyu Wang, Henghui Ding, Jianing Peng et al.

ICCV 2025arXiv:2507.11533
12
citations
#3826

Post-pre-training for Modality Alignment in Vision-Language Foundation Models

Shin'ya Yamaguchi, Dewei Feng, Sekitoshi Kanai et al.

CVPR 2025arXiv:2504.12717
12
citations
#3827

Does Your Vision-Language Model Get Lost in the Long Video Sampling Dilemma?

Tianyuan Qu, Longxiang Tang, Bohao PENG et al.

ICCV 2025arXiv:2503.12496
12
citations
#3828

AC-DiT: Adaptive Coordination Diffusion Transformer for Mobile Manipulation

Sixiang Chen, Jiaming Liu, Siyuan Qian et al.

NEURIPS 2025arXiv:2507.01961
12
citations
#3829

Long-Sequence Recommendation Models Need Decoupled Embeddings

Ningya Feng, Junwei Pan, Jialong Wu et al.

ICLR 2025arXiv:2410.02604
12
citations
#3830

KTAE: A Model-Free Algorithm to Key-Tokens Advantage Estimation in Mathematical Reasoning

Wei Sun, Wen Yang, Pu Jian et al.

NEURIPS 2025arXiv:2505.16826
12
citations
#3831

Diffusion Models for Attribution

Xiongren Chen, Jiuyong Li, Jixue Liu et al.

AAAI 2025paperarXiv:2403.14790
12
citations
#3832

DreamDance: Animating Human Images by Enriching 3D Geometry Cues from 2D Poses

Yatian Pang, Bin Zhu, Bin Lin et al.

ICCV 2025arXiv:2412.00397
12
citations
#3833

CLIMB: Data Foundations for Large Scale Multimodal Clinical Foundation Models

David Dai, Peilin Chen, Malinda Lu et al.

ICML 2025oralarXiv:2503.07667
12
citations
#3834

Neuro-3D: Towards 3D Visual Decoding from EEG Signals

Zhanqiang Guo, Jiamin Wu, Yonghao Song et al.

CVPR 2025arXiv:2411.12248
12
citations
#3835

MatryoshkaKV: Adaptive KV Compression via Trainable Orthogonal Projection

Bokai Lin, Zihao Zeng, Zipeng Xiao et al.

ICLR 2025arXiv:2410.14731
12
citations
#3836

Potemkin Understanding in Large Language Models

Marina Mancoridis, Bec Weeks, Keyon Vafa et al.

ICML 2025arXiv:2506.21521
12
citations
#3837

What's the Move? Hybrid Imitation Learning via Salient Points

Priya Sundaresan, Hengyuan Hu, Quan Vuong et al.

ICLR 2025arXiv:2412.05426
12
citations
#3838

Accelerating Large Language Model Reasoning via Speculative Search

Zhihai Wang, Jie Wang, Jilai Pan et al.

ICML 2025arXiv:2505.02865
12
citations
#3839

SuperEdit: Rectifying and Facilitating Supervision for Instruction-Based Image Editing

Ming Li, Xin Gu, Fan Chen et al.

ICCV 2025arXiv:2505.02370
12
citations
#3840

Plastic Learning with Deep Fourier Features

Alex Lewandowski, Dale Schuurmans, Marlos C. Machado

ICLR 2025arXiv:2410.20634
12
citations
#3841

TopoLM: brain-like spatio-functional organization in a topographic language model

Neil Rathi, Johannes Mehrer, Badr AlKhamissi et al.

ICLR 2025arXiv:2410.11516
12
citations
#3842

ScribbleLight: Single Image Indoor Relighting with Scribbles

Jun Myeong Choi, Annie N. Wang, Pieter Peers et al.

CVPR 2025arXiv:2411.17696
12
citations
#3843

Efficient 3D Recognition with Event-driven Spike Sparse Convolution

Xuerui Qiu, Man Yao, Jieyuan Zhang et al.

AAAI 2025paperarXiv:2412.07360
12
citations
#3844

DriveGEN: Generalized and Robust 3D Detection in Driving via Controllable Text-to-Image Diffusion Generation

Hongbin Lin, Zilu Guo, Yifan Zhang et al.

CVPR 2025arXiv:2503.11122
12
citations
#3845

Energy Matching: Unifying Flow Matching and Energy-Based Models for Generative Modeling

Michal Balcerak, Tamaz Amiranashvili, Antonio Terpin et al.

NEURIPS 2025arXiv:2504.10612
12
citations
#3846

EgoPressure: A Dataset for Hand Pressure and Pose Estimation in Egocentric Vision

Yiming Zhao, Taein Kwon, Paul Streli et al.

CVPR 2025highlightarXiv:2409.02224
12
citations
#3847

On the Effect of Negative Gradient in Group Relative Deep Reinforcement Optimization

wenlong deng, Yi Ren, Muchen Li et al.

NEURIPS 2025arXiv:2505.18830
12
citations
#3848

OmniCount: Multi-label Object Counting with Semantic-Geometric Priors

Anindya Mondal, Sauradip Nag, Xiatian Zhu et al.

AAAI 2025paperarXiv:2403.05435
12
citations
#3849

No Metric to Rule Them All: Toward Principled Evaluations of Graph-Learning Datasets

Corinna Coupette, Jeremy Wayland, Emily Simons et al.

ICML 2025arXiv:2502.02379
12
citations
#3850

3D Mesh Editing using Masked LRMs

William Gao, Dilin Wang, Yuchen Fan et al.

ICCV 2025arXiv:2412.08641
12
citations
#3851

Transformer-Squared: Self-adaptive LLMs

Qi Sun, Edoardo Cetin, Yujin Tang

ICLR 2025arXiv:2501.06252
12
citations
#3852

Advancing Spiking Neural Networks Towards Multiscale Spatiotemporal Interaction Learning

Yimeng Shan, Malu Zhang, Rui-jie Zhu et al.

AAAI 2025paperarXiv:2405.13672
12
citations
#3853

Training LLMs over Neurally Compressed Text

Brian Lester, Jaehoon Lee, Jeffrey Pennington et al.

ICLR 2025arXiv:2404.03626
12
citations
#3854

Online Video Understanding: OVBench and VideoChat-Online

Zhenpeng Huang, Xinhao Li, Jiaqi Li et al.

CVPR 2025arXiv:2501.00584
12
citations
#3855

Sparse Autoencoders Reveal Temporal Difference Learning in Large Language Models

Can Demircan, Tankred Saanum, Akshay Jagadish et al.

ICLR 2025oralarXiv:2410.01280
12
citations
#3856

Conformal Thresholded Intervals for Efficient Regression

Rui Luo, Zhixin Zhou

AAAI 2025paperarXiv:2407.14495
12
citations
#3857

CoMM: A Coherent Interleaved Image-Text Dataset for Multimodal Understanding and Generation

Wei Chen, Lin Li, Yongqi Yang et al.

CVPR 2025highlightarXiv:2406.10462
12
citations
#3858

Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment

Chenghao Fan, zhenyi lu, Sichen Liu et al.

ICML 2025arXiv:2502.16894
12
citations
#3859

Certified Unlearning for Neural Networks

Anastasiia Koloskova, Youssef Allouah, Animesh Jha et al.

ICML 2025arXiv:2506.06985
12
citations
#3860

BlockDance: Reuse Structurally Similar Spatio-Temporal Features to Accelerate Diffusion Transformers

Hui Zhang, Tingwei Gao, Jie Shao et al.

CVPR 2025arXiv:2503.15927
12
citations
#3861

Coreset Selection via Reducible Loss in Continual Learning

Ruilin Tong, Yuhang Liu, Javen Qinfeng Shi et al.

ICLR 2025
12
citations
#3862

RoomTour3D: Geometry-Aware Video-Instruction Tuning for Embodied Navigation

Mingfei Han, Liang Ma, Kamila Zhumakhanova et al.

CVPR 2025arXiv:2412.08591
12
citations
#3863

Modeling Fine-Grained Hand-Object Dynamics for Egocentric Video Representation Learning

Baoqi Pei, Yifei Huang, Jilan Xu et al.

ICLR 2025arXiv:2503.00986
12
citations
#3864

Bongard in Wonderland: Visual Puzzles that Still Make AI Go Mad?

Antonia Wüst, Tim Woydt, Lukas Helff et al.

ICML 2025arXiv:2410.19546
12
citations
#3865

TGB-Seq Benchmark: Challenging Temporal GNNs with Complex Sequential Dynamics

Lu Yi, Jie Peng, Yanping Zheng et al.

ICLR 2025oralarXiv:2502.02975
12
citations
#3866

Revisiting Tampered Scene Text Detection in the Era of Generative AI

Chenfan Qu, Yiwu Zhong, Fengjun Guo et al.

AAAI 2025paperarXiv:2407.21422
12
citations
#3867

DyG-Mamba: Continuous State Space Modeling on Dynamic Graphs

Dongyuan Li, Shiyin Tan, Ying Zhang et al.

NEURIPS 2025arXiv:2408.06966
12
citations
#3868

NoT: Federated Unlearning via Weight Negation

Yasser Khalil, Leo Maxime Brunswic, Soufiane Lamghari et al.

CVPR 2025arXiv:2503.05657
12
citations
#3869

Scaling Laws for Gradient Descent and Sign Descent for Linear Bigram Models under Zipf’s Law

Frederik Kunstner, Francis Bach

NEURIPS 2025arXiv:2505.19227
12
citations
#3870

CityGS-X: A Scalable Architecture for Efficient and Geometrically Accurate Large-Scale Scene Reconstruction

Yuanyuan Gao, Hao Li, Jiaqi Chen et al.

ICCV 2025arXiv:2503.23044
12
citations
#3871

SV-RAG: LoRA-Contextualizing Adaptation of MLLMs for Long Document Understanding

Jian Chen, Ruiyi Zhang, Yufan Zhou et al.

ICLR 2025arXiv:2411.01106
12
citations
#3872

POGEMA: A Benchmark Platform for Cooperative Multi-Agent Pathfinding

Alexey Skrynnik, Anton Andreychuk, Anatolii Borzilov et al.

ICLR 2025arXiv:2407.14931
12
citations
#3873

Geometry of Lightning Self-Attention: Identifiability and Dimension

Nathan Henry, Giovanni Luca Marchetti, Kathlén Kohn

ICLR 2025arXiv:2408.17221
12
citations
#3874

UniGen: Enhanced Training & Test-Time Strategies for Unified Multimodal Understanding and Generation

Rui Tian, Mingfei Gao, Mingze Xu et al.

NEURIPS 2025arXiv:2505.14682
12
citations
#3875

Unlocking Efficient, Scalable, and Continual Knowledge Editing with Basis-Level Representation Fine-Tuning

Tianci Liu, Ruirui Li, Yunzhe Qi et al.

ICLR 2025arXiv:2503.00306
12
citations
#3876

Efficient Rectification of Neuro-Symbolic Reasoning Inconsistencies by Abductive Reflection

Wen-Chao Hu, Wang-Zhou Dai, Yuan Jiang et al.

AAAI 2025paperarXiv:2412.08457
12
citations
#3877

Graph Mixture of Experts and Memory-augmented Routers for Multivariate Time Series Anomaly Detection

Xiaoyu Huang, Weidong Chen, Bo Hu et al.

AAAI 2025paperarXiv:2412.19108
12
citations
#3878

Horizon-GS: Unified 3D Gaussian Splatting for Large-Scale Aerial-to-Ground Scenes

Lihan Jiang, Kerui Ren, Mulin Yu et al.

CVPR 2025arXiv:2412.01745
12
citations
#3879

Formation of Representations in Neural Networks

Liu Ziyin, Isaac Chuang, Tomer Galanti et al.

ICLR 2025arXiv:2410.03006
12
citations
#3880

MolParser: End-to-end Visual Recognition of Molecule Structures in the Wild

Xi Fang, Jiankun Wang, Xiaochen Cai et al.

ICCV 2025arXiv:2411.11098
12
citations
#3881

Speak Easy: Eliciting Harmful Jailbreaks from LLMs with Simple Interactions

Yik Siu Chan, Narutatsu Ri, Yuxin Xiao et al.

ICML 2025arXiv:2502.04322
12
citations
#3882

BIMBA: Selective-Scan Compression for Long-Range Video Question Answering

Md Mohaiminul Islam, Tushar Nagarajan, Huiyu Wang et al.

CVPR 2025arXiv:2503.09590
12
citations
#3883

Black-Box Adversarial Attacks on LLM-Based Code Completion

Slobodan Jenko, Niels Mündler, Jingxuan He et al.

ICML 2025arXiv:2408.02509
12
citations
#3884

RoboTron-Drive: All-in-One Large Multimodal Model for Autonomous Driving

Zhijian Huang, Chengjian Feng, Baihui Xiao et al.

ICCV 2025arXiv:2412.07689
12
citations
#3885

Exploring Vacant Classes in Label-Skewed Federated Learning

Kuangpu Guo, Yuhe Ding, Jian Liang et al.

AAAI 2025paperarXiv:2401.02329
12
citations
#3886

KBLaM: Knowledge Base augmented Language Model

Xi Wang, Taketomo Isazawa, Liana Mikaelyan et al.

ICLR 2025arXiv:2410.10450
12
citations
#3887

Learning Adversarial MDPs with Stochastic Hard Constraints

Francesco Emanuele Stradi, Matteo Castiglioni, Alberto Marchesi et al.

ICML 2025arXiv:2403.03672
12
citations
#3888

RealMath: A Continuous Benchmark for Evaluating Language Models on Research-Level Mathematics

Jie Zhang, Cezara Petrui, Kristina Nikolić et al.

NEURIPS 2025arXiv:2505.12575
12
citations
#3889

From Commands to Prompts: LLM-based Semantic File System for AIOS

Zeru Shi, Kai Mei, Mingyu Jin et al.

ICLR 2025arXiv:2410.11843
12
citations
#3890

Embodied VideoAgent: Persistent Memory from Egocentric Videos and Embodied Sensors Enables Dynamic Scene Understanding

Yue Fan, Xiaojian Ma, Rongpeng Su et al.

ICCV 2025highlightarXiv:2501.00358
12
citations
#3891

LiveXiv - A Multi-Modal live benchmark based on Arxiv papers content

Nimrod Shabtay, Felipe Maia Polo, Sivan Doveh et al.

ICLR 2025arXiv:2410.10783
12
citations
#3892

RAT: Adversarial Attacks on Deep Reinforcement Agents for Targeted Behaviors

Fengshuo Bai, Runze Liu, Yali Du et al.

AAAI 2025paperarXiv:2412.10713
12
citations
#3893

Scaling Laws for Differentially Private Language Models

Ryan McKenna, Yangsibo Huang, Amer Sinha et al.

ICML 2025arXiv:2501.18914
12
citations
#3894

Proxy Denoising for Source-Free Domain Adaptation

Song Tang, Wenxin Su, Yan Gan et al.

ICLR 2025arXiv:2406.01658
12
citations
#3895

MG-MotionLLM: A Unified Framework for Motion Comprehension and Generation across Multiple Granularities

Bizhu Wu, Jinheng Xie, Keming Shen et al.

CVPR 2025arXiv:2504.02478
12
citations
#3896

ReSi: A Comprehensive Benchmark for Representational Similarity Measures

Max Klabunde, Tassilo Wald, Tobias Schumacher et al.

ICLR 2025arXiv:2408.00531
12
citations
#3897

Debiased Multimodal Understanding for Human Language Sequences

Zhi Xu, Dingkang Yang, Mingcheng Li et al.

AAAI 2025paperarXiv:2403.05025
12
citations
#3898

Skill Expansion and Composition in Parameter Space

Tenglong Liu, Jianxiong Li, Yinan Zheng et al.

ICLR 2025arXiv:2502.05932
12
citations
#3899

UI-Genie: A Self-Improving Approach for Iteratively Boosting MLLM-based Mobile GUI Agents

Han Xiao, Guozhi Wang, Yuxiang Chai et al.

NEURIPS 2025arXiv:2505.21496
12
citations
#3900

EmotiCrafter: Text-to-Emotional-Image Generation based on Valence-Arousal Model

Shengqi Dang, Yi He, Long Ling et al.

ICCV 2025arXiv:2501.05710
12
citations
#3901

How Far are AI-generated Videos from Simulating the 3D Visual World: A Learned 3D Evaluation Approach

Chirui CHANG, Jiahui Liu, Zhengzhe Liu et al.

ICCV 2025arXiv:2406.19568
12
citations
#3902

Law of the Weakest Link: Cross Capabilities of Large Language Models

Ming Zhong, Aston Zhang, Xuewei Wang et al.

ICLR 2025arXiv:2409.19951
12
citations
#3903

VORTA: Efficient Video Diffusion via Routing Sparse Attention

Wenhao Sun, Rong-Cheng Tu, Yifu Ding et al.

NEURIPS 2025arXiv:2505.18809
12
citations
#3904

Latent-EnSF: A Latent Ensemble Score Filter for High-Dimensional Data Assimilation with Sparse Observation Data

Phillip Si, Peng Chen

ICLR 2025arXiv:2409.00127
12
citations
#3905

A Meta-Learning Approach to Bayesian Causal Discovery

Anish Dhir, Matthew Ashman, James Requeima et al.

ICLR 2025arXiv:2412.16577
12
citations
#3906

Learning Personalized Decision Support Policies

Umang Bhatt, Valerie Chen, Katherine M. Collins et al.

AAAI 2025paperarXiv:2304.06701
12
citations
#3907

Towards Graph Foundation Models: Learning Generalities Across Graphs via Task-Trees

Zehong Wang, Zheyuan Zhang, Tianyi MA et al.

ICML 2025arXiv:2412.16441
12
citations
#3908

Exploring the Effectiveness of Object-Centric Representations in Visual Question Answering: Comparative Insights with Foundation Models

Amir Mohammad Karimi Mamaghan, Samuele Papa, Karl H. Johansson et al.

ICLR 2025arXiv:2407.15589
12
citations
#3909

Improving the Sparse Structure Learning of Spiking Neural Networks from the View of Compression Efficiency

Jiangrong Shen, Qi Xu, Gang Pan et al.

ICLR 2025arXiv:2502.13572
12
citations
#3910

LoRASculpt: Sculpting LoRA for Harmonizing General and Specialized Knowledge in Multimodal Large Language Models

Jian Liang, Wenke Huang, Guancheng Wan et al.

CVPR 2025arXiv:2503.16843
12
citations
#3911

Advancing Mathematical Reasoning in Language Models: The Impact of Problem-Solving Data, Data Synthesis Methods, and Training Stages

Zui Chen, Tianqiao Liu, Tongqing et al.

ICLR 2025arXiv:2501.14002
12
citations
#3912

DocVLM: Make Your VLM an Efficient Reader

Mor Shpigel Nacson, Aviad Aberdam, Roy Ganz et al.

CVPR 2025arXiv:2412.08746
12
citations
#3913

Crab: A Unified Audio-Visual Scene Understanding Model with Explicit Cooperation

Henghui Du, Guangyao Li, Chang Zhou et al.

CVPR 2025arXiv:2503.13068
12
citations
#3914

P-SPIKESSM: HARNESSING PROBABILISTIC SPIKING STATE SPACE MODELS FOR LONG-RANGE DEPENDENCY TASKS

Malyaban Bal, Abhronil Sengupta

ICLR 2025arXiv:2406.02923
12
citations
#3915

SplatTalk: 3D VQA with Gaussian Splatting

Anh Thai, Kyle Genova, Songyou Peng et al.

ICCV 2025arXiv:2503.06271
12
citations
#3916

Text-to-LoRA: Instant Transformer Adaption

Rujikorn Charakorn, Edoardo Cetin, Yujin Tang et al.

ICML 2025arXiv:2506.06105
12
citations
#3917

Solving New Tasks by Adapting Internet Video Knowledge

Calvin Luo, Zilai Zeng, Yilun Du et al.

ICLR 2025arXiv:2504.15369
12
citations
#3918

From Pixels to Tokens: Byte-Pair Encoding on Quantized Visual Modalities

Wanpeng Zhang, Zilong Xie, Yicheng Feng et al.

ICLR 2025arXiv:2410.02155
12
citations
#3919

QoQ-Med: Building Multimodal Clinical Foundation Models with Domain-Aware GRPO Training

David Dai, Peilin Chen, Chanakya Ekbote et al.

NEURIPS 2025oralarXiv:2506.00711
12
citations
#3920

LinPrim: Linear Primitives for Differentiable Volumetric Rendering

Nicolas von Lützow, Matthias Niessner

NEURIPS 2025arXiv:2501.16312
12
citations
#3921

MaskControl: Spatio-Temporal Control for Masked Motion Synthesis

Ekkasit Pinyoanuntapong, Muhammad Usama Saleem, Korrawe Karunratanakul et al.

ICCV 2025arXiv:2410.10780
12
citations
#3922

EgoLM: Multi-Modal Language Model of Egocentric Motions

Fangzhou Hong, Vladimir Guzov, Hyo Jin Kim et al.

CVPR 2025arXiv:2409.18127
12
citations
#3923

LoftUp: Learning a Coordinate-Based Feature Upsampler for Vision Foundation Models

Haiwen Huang, Anpei Chen, Volodymyr Havrylov et al.

ICCV 2025arXiv:2504.14032
12
citations
#3924

Towards Trustworthy Knowledge Graph Reasoning: An Uncertainty Aware Perspective

Bo Ni, Yu Wang, Lu Cheng et al.

AAAI 2025paperarXiv:2410.08985
12
citations
#3925

LightningDrag: Lightning Fast and Accurate Drag-based Image Editing Emerging from Videos

Yujun Shi, Jun Hao Liew, Hanshu Yan et al.

ICML 2025arXiv:2405.13722
12
citations
#3926

One-for-More: Continual Diffusion Model for Anomaly Detection

Xiaofan Li, Xin Tan, Zhuo Chen et al.

CVPR 2025arXiv:2502.19848
12
citations
#3927

STD-PLM: Understanding Both Spatial and Temporal Properties of Spatial-Temporal Data with PLM

Yiheng Huang, Xiaowei Mao, Shengnan Guo et al.

AAAI 2025paperarXiv:2407.09096
12
citations
#3928

LeanAgent: Lifelong Learning for Formal Theorem Proving

Adarsh Kumarappan, Mohit Tiwari, Peiyang Song et al.

ICLR 2025arXiv:2410.06209
12
citations
#3929

GeoLLaVA-8K: Scaling Remote-Sensing Multimodal Large Language Models to 8K Resolution

Fengxiang Wang, Mingshuo Chen, Yueying Li et al.

NEURIPS 2025spotlightarXiv:2505.21375
12
citations
#3930

MaestroMotif: Skill Design from Artificial Intelligence Feedback

Martin Klissarov, Mikael Henaff, Roberta Raileanu et al.

ICLR 2025arXiv:2412.08542
12
citations
#3931

MMEgo: Towards Building Egocentric Multimodal LLMs for Video QA

Hanrong Ye, Haotian Zhang, Erik Daxberger et al.

ICLR 2025
12
citations
#3932

TLDR: Token-Level Detective Reward Model for Large Vision Language Models

Deqing Fu, Tong Xiao, Rui Wang et al.

ICLR 2025arXiv:2410.04734
12
citations
#3933

RaCFormer: Towards High-Quality 3D Object Detection via Query-based Radar-Camera Fusion

Xiaomeng Chu, Jiajun Deng, Guoliang You et al.

CVPR 2025arXiv:2412.12725
12
citations
#3934

TimeDART: A Diffusion Autoregressive Transformer for Self-Supervised Time Series Representation

Daoyu Wang, Mingyue Cheng, Zhiding Liu et al.

ICML 2025arXiv:2410.05711
12
citations
#3935

In-context Time Series Predictor

Jiecheng Lu, Yan Sun, Shihao Yang

ICLR 2025arXiv:2405.14982
12
citations
#3936

Toward Real-world BEV Perception: Depth Uncertainty Estimation via Gaussian Splatting

Shu-Wei Lu, Yi-Hsuan Tsai, Yi-Ting Chen

CVPR 2025arXiv:2504.01957
12
citations
#3937

Light-T2M: A Lightweight and Fast Model for Text-to-motion Generation

Ling-An Zeng, Guohong Huang, Gaojie Wu et al.

AAAI 2025paperarXiv:2412.11193
12
citations
#3938

ObjectMate: A Recurrence Prior for Object Insertion and Subject-Driven Generation

Daniel Winter, Asaf Shul, Matan Cohen et al.

ICCV 2025highlightarXiv:2412.08645
12
citations
#3939

Flow-Based Policy for Online Reinforcement Learning

Lei Lv, Yunfei Li, Yu Luo et al.

NEURIPS 2025arXiv:2506.12811
12
citations
#3940

VAGEN: Reinforcing World Model Reasoning for Multi-Turn VLM Agents

Kangrui Wang, Pingyue Zhang, Zihan Wang et al.

NEURIPS 2025arXiv:2510.16907
12
citations
#3941

Effective Training Data Synthesis for Improving MLLM Chart Understanding

Yuwei Yang, Zeyu Zhang, Yunzhong Hou et al.

ICCV 2025arXiv:2508.06492
12
citations
#3942

Achieving Dimension-Free Communication in Federated Learning via Zeroth-Order Optimization

Zhe Li, Bicheng Ying, Zidong Liu et al.

ICLR 2025arXiv:2405.15861
12
citations
#3943

Backdoor Attacks Against No-Reference Image Quality Assessment Models via a Scalable Trigger

Yi Yu, Song Xia, Xun Lin et al.

AAAI 2025paperarXiv:2412.07277
12
citations
#3944

Yuan: Yielding Unblemished Aesthetics Through a Unified Network for Visual Imperfections Removal in Generated Images

Zhenyu Yu, Chee Seng Chan

AAAI 2025paperarXiv:2501.08505
12
citations
#3945

NTPP: Generative Speech Language Modeling for Dual-Channel Spoken Dialogue via Next-Token-Pair Prediction

Qichao Wang, Ziqiao Meng, Wenqian Cui et al.

ICML 2025arXiv:2506.00975
12
citations
#3946

Stacking Brick by Brick: Aligned Feature Isolation for Incremental Face Forgery Detection

Jikang Cheng, Zhiyuan Yan, Ying Zhang et al.

CVPR 2025arXiv:2411.11396
12
citations
#3947

Predictive Data Selection: The Data That Predicts Is the Data That Teaches

KaShun SHUM, Yuzhen Huang, Hongjian Zou et al.

ICML 2025arXiv:2503.00808
12
citations
#3948

Test-Time Scaling of Diffusion Models via Noise Trajectory Search

Vignav Ramesh, Morteza Mardani

NEURIPS 2025arXiv:2506.03164
12
citations
#3949

Everything, Everywhere, All at Once: Is Mechanistic Interpretability Identifiable?

Maxime Méloux, Silviu Maniu, François Portet et al.

ICLR 2025arXiv:2502.20914
12
citations
#3950

Manifold Learning by Mixture Models of VAEs for Inverse Problems

Giovanni S. Alberti, Johannes Hertrich, Matteo Santacesaria et al.

ICLR 2025arXiv:2303.15244
12
citations
#3951

Generative Classifiers Avoid Shortcut Solutions

Alexander Li, Ananya Kumar, Deepak Pathak

ICLR 2025arXiv:2512.25034
12
citations
#3952

Precise Localization of Memories: A Fine-grained Neuron-level Knowledge Editing Technique for LLMs

Haowen Pan, Xiaozhi Wang, Yixin Cao et al.

ICLR 2025arXiv:2503.01090
12
citations
#3953

FluidNexus: 3D Fluid Reconstruction and Prediction from a Single Video

Yue Gao, Hong-Xing Yu, Bo Zhu et al.

CVPR 2025arXiv:2503.04720
12
citations
#3954

Noise Stability Optimization for Finding Flat Minima: A Hessian-based Regularization Approach

Haotian Ju, Hongyang Zhang, Dongyue Li

ICLR 2025arXiv:2306.08553
12
citations
#3955

The Illusion of Unlearning: The Unstable Nature of Machine Unlearning in Text-to-Image Diffusion Models

Naveen George, Karthik Nandan Dasaraju, Rutheesh Reddy Chittepu et al.

CVPR 2025
12
citations
#3956

DiffPuter: Empowering Diffusion Models for Missing Data Imputation

Hengrui Zhang, Liancheng Fang, Qitian Wu et al.

ICLR 2025arXiv:2405.20690
12
citations
#3957

Unbounded: A Generative Infinite Game of Character Life Simulation

Jialu Li, Yuanzhen Li, Neal Wadhwa et al.

ICLR 2025arXiv:2410.18975
12
citations
#3958

SymmCompletion: High-Fidelity and High-Consistency Point Cloud Completion with Symmetry Guidance

Hongyu Yan, Zijun Li, Kunming Luo et al.

AAAI 2025paperarXiv:2503.18007
12
citations
#3959

Accelerating RL for LLM Reasoning with Optimal Advantage Regression

Kianté Brantley, Mingyu Chen, Zhaolin Gao et al.

NEURIPS 2025arXiv:2505.20686
12
citations
#3960

AdaManip: Adaptive Articulated Object Manipulation Environments and Policy Learning

Yuanfei Wang, Xiaojie Zhang, Ruihai Wu et al.

ICLR 2025arXiv:2502.11124
12
citations
#3961

Edge Prompt Tuning for Graph Neural Networks

Xingbo Fu, Yinhan He, Jundong Li

ICLR 2025arXiv:2503.00750
12
citations
#3962

SafeAuto: Knowledge-Enhanced Safe Autonomous Driving with Multimodal Foundation Models

Jiawei Zhang, Xuan Yang, Taiqi Wang et al.

ICML 2025arXiv:2503.00211
12
citations
#3963

MVPortrait: Text-Guided Motion and Emotion Control for Multi-view Vivid Portrait Animation

Yukang Lin, Hokit Fung, Jianjin Xu et al.

CVPR 2025arXiv:2503.19383
12
citations
#3964

FLAME: Learning to Navigate with Multimodal LLM in Urban Environments

Yunzhe Xu, Yiyuan Pan, Zhe Liu et al.

AAAI 2025paperarXiv:2408.11051
12
citations
#3965

HiP-AD: Hierarchical and Multi-Granularity Planning with Deformable Attention for Autonomous Driving in a Single Decoder

Yingqi Tang, Zhuoran Xu, Zhaotie Meng et al.

ICCV 2025arXiv:2503.08612
12
citations
#3966

Once-for-All: Controllable Generative Image Compression with Dynamic Granularity Adaptation

Anqi Li, Feng Li, Yuxi Liu et al.

ICLR 2025arXiv:2406.00758
12
citations
#3967

MVGenMaster: Scaling Multi-View Generation from Any Image via 3D Priors Enhanced Diffusion Model

Chenjie Cao, Chaohui Yu, Shang Liu et al.

CVPR 2025arXiv:2411.16157
12
citations
#3968

Bridging the Vision-Brain Gap with an Uncertainty-Aware Blur Prior

Haitao Wu, Qing Li, Changqing Zhang et al.

CVPR 2025arXiv:2503.04207
12
citations
#3969

Generative Monoculture in Large Language Models

Fan Wu, Emily Black, Varun Chandrasekaran

ICLR 2025arXiv:2407.02209
12
citations
#3970

OmniSR: Shadow Removal Under Direct and Indirect Lighting

Jiamin Xu, Zelong Li, Yuxin Zheng et al.

AAAI 2025paperarXiv:2410.01719
12
citations
#3971

DASK: Distribution Rehearsing via Adaptive Style Kernel Learning for Exemplar-Free Lifelong Person Re-Identification

Kunlun Xu, Chenghao Jiang, Peixi Xiong et al.

AAAI 2025paperarXiv:2412.09224
12
citations
#3972

Zero-shot Video Moment Retrieval via Off-the-shelf Multimodal Large Language Models

Yifang Xu, Yunzhuo Sun, Benxiang Zhai et al.

AAAI 2025paperarXiv:2501.07972
12
citations
#3973

HashAttention: Semantic Sparsity for Faster Inference

Aditya Desai, Shuo Yang, Alejandro Cuadron et al.

ICML 2025arXiv:2412.14468
12
citations
#3974

OmniStyle: Filtering High Quality Style Transfer Data at Scale

Ye Wang, Ruiqi Liu, Jiang Lin et al.

CVPR 2025arXiv:2505.14028
12
citations
#3975

GRPose: Learning Graph Relations for Human Image Generation with Pose Priors

Xiangchen Yin, Donglin Di, Lei Fan et al.

AAAI 2025paperarXiv:2408.16540
12
citations
#3976

Debiased All-in-one Image Restoration with Task Uncertainty Regularization

Gang Wu, Junjun Jiang, Yijun Wang et al.

AAAI 2025paper
12
citations
#3977

NexusGS: Sparse View Synthesis with Epipolar Depth Priors in 3D Gaussian Splatting

Yulong Zheng, Zicheng Jiang, Shengfeng He et al.

CVPR 2025highlightarXiv:2503.18794
12
citations
#3978

Tra-MoE: Learning Trajectory Prediction Model from Multiple Domains for Adaptive Policy Conditioning

Jiange Yang, Haoyi Zhu, Yating Wang et al.

CVPR 2025arXiv:2411.14519
12
citations
#3979

Bag of Tricks for Inference-time Computation of LLM Reasoning

Fan LIU, Wen-Shuo Chao, Naiqiang Tan et al.

NEURIPS 2025arXiv:2502.07191
12
citations
#3980

PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models

Chenyu Yang, Xuan Dong, Xizhou Zhu et al.

CVPR 2025arXiv:2412.09613
12
citations
#3981

Detecting Backdoor Attacks in Federated Learning via Direction Alignment Inspection

Jiahao Xu, Zikai Zhang, Rui Hu

CVPR 2025highlightarXiv:2503.07978
12
citations
#3982

The Energy Loss Phenomenon in RLHF: A New Perspective on Mitigating Reward Hacking

Yuchun Miao, Sen Zhang, Liang Ding et al.

ICML 2025arXiv:2501.19358
12
citations
#3983

HOIGPT: Learning Long-Sequence Hand-Object Interaction with Language Models

Mingzhen Huang, Fu-Jen Chu, Bugra Tekin et al.

CVPR 2025arXiv:2503.19157
12
citations
#3984

UniVG: A Generalist Diffusion Model for Unified Image Generation and Editing

Tsu-Jui Fu, Yusu Qian, Chen Chen et al.

ICCV 2025arXiv:2503.12652
12
citations
#3985

GUI-Xplore: Empowering Generalizable GUI Agents with One Exploration

Yuchen Sun, Shanhui Zhao, Tao Yu et al.

CVPR 2025arXiv:2503.17709
12
citations
#3986

Scaling Inference-Efficient Language Models

Song Bian, Minghao Yan, Shivaram Venkataraman

ICML 2025arXiv:2501.18107
12
citations
#3987

Taming Knowledge Conflicts in Language Models

Gaotang Li, Yuzhong Chen, Hanghang Tong

ICML 2025spotlightarXiv:2503.10996
12
citations
#3988

Exploring the limits of strong membership inference attacks on large language models

Jamie Hayes, I Shumailov, Christopher A. Choquette-Choo et al.

NEURIPS 2025arXiv:2505.18773
12
citations
#3989

GaussianSpa: An “Optimizing-Sparsifying” Simplification Framework for Compact and High-Quality 3D Gaussian Splatting

Yangming Zhang, Wenqi Jia, Wei Niu et al.

CVPR 2025arXiv:2411.06019
12
citations
#3990

AdvPrefix: An Objective for Nuanced LLM Jailbreaks

Sicheng Zhu, Brandon Amos, Yuandong Tian et al.

NEURIPS 2025arXiv:2412.10321
12
citations
#3991

SmartRAG: Jointly Learn RAG-Related Tasks From the Environment Feedback

Jingsheng Gao, Linxu Li, Ke Ji et al.

ICLR 2025arXiv:2410.18141
12
citations
#3992

From 2D CAD Drawings to 3D Parametric Models: A Vision-Language Approach

Xilin Wang, Jia Zheng, Yuanchao Hu et al.

AAAI 2025paperarXiv:2412.11892
12
citations
#3993

Gaussian Variation Field Diffusion for High-fidelity Video-to-4D Synthesis

Bowen Zhang, Sicheng Xu, Chuxin Wang et al.

ICCV 2025arXiv:2507.23785
12
citations
#3994

DexHandDiff: Interaction-aware Diffusion Planning for Adaptive Dexterous Manipulation

Zhixuan Liang, Yao Mu, Yixiao Wang et al.

CVPR 2025arXiv:2411.18562
12
citations
#3995

Trusted Multi-View Classification via Evolutionary Multi-View Fusion

Xinyan Liang, Pinhan Fu, Yuhua Qian et al.

ICLR 2025
12
citations
#3996

MADGEN: Mass-Spec attends to De Novo Molecular generation

Yinkai Wang, Xiaohui Chen, Liping Liu et al.

ICLR 2025arXiv:2501.01950
12
citations
#3997

Unlocking the Capabilities of Large Vision-Language Models for Generalizable and Explainable Deepfake Detection

Peipeng Yu, Jianwei Fei, Hui Gao et al.

ICML 2025arXiv:2503.14853
12
citations
#3998

MoEQuant: Enhancing Quantization for Mixture-of-Experts Large Language Models via Expert-Balanced Sampling and Affinity Guidance

Zhixuan Chen, Xing Hu, Dawei Yang et al.

ICML 2025arXiv:2505.03804
12
citations
#3999

Conditioning Diffusions Using Malliavin Calculus

Jakiw Pidstrigach, Elizabeth Baker, Carles Domingo i Enrich et al.

ICML 2025arXiv:2504.03461
12
citations
#4000

WikiBigEdit: Understanding the Limits of Lifelong Knowledge Editing in LLMs

Lukas Thede, Karsten Roth, Matthias Bethge et al.

ICML 2025arXiv:2503.05683
12
citations