Most Cited 2025 "temporal perturbations" Papers

22,274 papers found • Page 9 of 112

#1601

Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment

ziang yan, Zhilin Li, Yinan He et al.

CVPR 2025posterarXiv:2412.19326
19
citations
#1602

TinyViM: Frequency Decoupling for Tiny Hybrid Vision Mamba

Xiaowen Ma, Zhen-Liang Ni, Xinghao Chen

ICCV 2025posterarXiv:2411.17473
19
citations
#1603

Cut Your Losses in Large-Vocabulary Language Models

Erik Wijmans, Brody Huval, Alexander Hertzberg et al.

ICLR 2025posterarXiv:2411.09009
19
citations
#1604

CrAM: Credibility-Aware Attention Modification in LLMs for Combating Misinformation in RAG

Boyi Deng, Wenjie Wang, Fengbin Zhu et al.

AAAI 2025paperarXiv:2406.11497
19
citations
#1605

MagicMotion: Controllable Video Generation with Dense-to-Sparse Trajectory Guidance

Quanhao Li, Zhen Xing, Rui Wang et al.

ICCV 2025posterarXiv:2503.16421
19
citations
#1606

NavMorph: A Self-Evolving World Model for Vision-and-Language Navigation in Continuous Environments

Xuan Yao, Junyu Gao, Changsheng Xu

ICCV 2025posterarXiv:2506.23468
19
citations
#1607

MVPaint: Synchronized Multi-View Diffusion for Painting Anything 3D

Wei Cheng, Juncheng Mu, Xianfang Zeng et al.

CVPR 2025posterarXiv:2411.02336
19
citations
#1608

KGARevion: An AI Agent for Knowledge-Intensive Biomedical QA

Xiaorui Su, Yibo Wang, Shanghua Gao et al.

ICLR 2025posterarXiv:2410.04660
19
citations
#1609

Design Principle Transfer in Neural Architecture Search via Large Language Models

Xun Zhou, Xingyu Wu, Liang Feng et al.

AAAI 2025paperarXiv:2408.11330
19
citations
#1610

On the Closed-Form of Flow Matching: Generalization Does Not Arise from Target Stochasticity

Quentin Bertrand, Anne Gagneux, Mathurin Massias et al.

NEURIPS 2025oralarXiv:2506.03719
19
citations
#1611

Zero-shot forecasting of chaotic systems

Yuanzhao Zhang, William Gilpin

ICLR 2025posterarXiv:2409.15771
19
citations
#1612

Towards a Unified Copernicus Foundation Model for Earth Vision

Yi Wang, Zhitong Xiong, Chenying Liu et al.

ICCV 2025posterarXiv:2503.11849
19
citations
#1613

EmoEdit: Evoking Emotions through Image Manipulation

Jingyuan Yang, Jiawei Feng, Weibin Luo et al.

CVPR 2025posterarXiv:2405.12661
19
citations
#1614

Design Principles and Challenges for Gaze + Pinch Interaction in XR

Ken Pfeuffer, Hans Gellersen, Mar Gonzalez-Franco

ISMAR 2025paper
19
citations
#1615

Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts

Junmo Kang, Leonid Karlinsky, Hongyin Luo et al.

ICLR 2025posterarXiv:2406.12034
19
citations
#1616

Schedule On the Fly: Diffusion Time Prediction for Faster and Better Image Generation

Zilyu Ye, Zhiyang Chen, Tiancheng Li et al.

CVPR 2025posterarXiv:2412.01243
19
citations
#1617

REAL: Benchmarking Autonomous Agents on Deterministic Simulations of Real Websites

Div Garg, Diego Caples, Andis Draguns et al.

NEURIPS 2025posterarXiv:2504.11543
19
citations
#1618

TwinMarket: A Scalable Behavioral and Social Simulation for Financial Markets

Yuzhe YANG, Yifei Zhang, Minghao Wu et al.

NEURIPS 2025oralarXiv:2502.01506
19
citations
#1619

TASTE-Rob: Advancing Video Generation of Task-Oriented Hand-Object Interaction for Generalizable Robotic Manipulation

Hongxiang Zhao, Xingchen Liu, Mutian Xu et al.

CVPR 2025posterarXiv:2503.11423
19
citations
#1620

GameArena: Evaluating LLM Reasoning through Live Computer Games

Lanxiang Hu, Qiyu Li, Anze Xie et al.

ICLR 2025posterarXiv:2412.06394
19
citations
#1621

Reducing Tool Hallucination via Reliability Alignment

Hongshen Xu, Zichen Zhu, Lei Pan et al.

ICML 2025posterarXiv:2412.04141
19
citations
#1622

Segmenting Maxillofacial Structures in CBCT Volumes

Federico Bolelli, Kevin Marchesini, Niels van Nistelrooij et al.

CVPR 2025poster
19
citations
#1623

DriveGPT4-V2: Harnessing Large Language Model Capabilities for Enhanced Closed-Loop Autonomous Driving

Zhenhua Xu, Yan Bai, Yujia Zhang et al.

CVPR 2025highlight
19
citations
#1624

CXPMRG-Bench: Pre-training and Benchmarking for X-ray Medical Report Generation on CheXpert Plus Dataset

Xiao Wang, Fuling Wang, Yuehang Li et al.

CVPR 2025posterarXiv:2410.00379
19
citations
#1625

DesignDiffusion: High-Quality Text-to-Design Image Generation with Diffusion Models

Zhendong Wang, Jianmin Bao, Shuyang Gu et al.

CVPR 2025posterarXiv:2503.01645
19
citations
#1626

UniK3D: Universal Camera Monocular 3D Estimation

Luigi Piccinelli, Christos Sakaridis, Mattia Segu et al.

CVPR 2025posterarXiv:2503.16591
19
citations
#1627

CAPTURE: Evaluating Spatial Reasoning in Vision Language Models via Occluded Object Counting

Atin Pothiraj, Jaemin Cho, Elias Stengel-Eskin et al.

ICCV 2025posterarXiv:2504.15485
19
citations
#1628

Perturbation-Restrained Sequential Model Editing

Jun-Yu Ma, Hong Wang, Hao-Xiang Xu et al.

ICLR 2025posterarXiv:2405.16821
19
citations
#1629

SeRL: Self-play Reinforcement Learning for Large Language Models with Limited Data

Wenkai Fang, Shunyu Liu, Yang Zhou et al.

NEURIPS 2025posterarXiv:2505.20347
19
citations
#1630

Forking Paths in Neural Text Generation

Eric Bigelow, Ari Holtzman, Hidenori Tanaka et al.

ICLR 2025posterarXiv:2412.07961
18
citations
#1631

Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion

zhenwei Wang, Tengfei Wang, Zexin He et al.

ICLR 2025posterarXiv:2409.11406
18
citations
#1632

Do as We Do, Not as You Think: the Conformity of Large Language Models

Zhiyuan Weng, Guikun Chen, Wenguan Wang

ICLR 2025posterarXiv:2501.13381
18
citations
#1633

Ca2-VDM: Efficient Autoregressive Video Diffusion Model with Causal Generation and Cache Sharing

Kaifeng Gao, Jiaxin Shi, Hanwang Zhang et al.

ICML 2025posterarXiv:2411.16375
18
citations
#1634

EvalTree: Profiling Language Model Weaknesses via Hierarchical Capability Trees

Zhiyuan Zeng, Yizhong Wang, Hannaneh Hajishirzi et al.

COLM 2025paperarXiv:2503.08893
18
citations
#1635

Standard Gaussian Process is All You Need for High-Dimensional Bayesian Optimization

Zhitong Xu, Haitao Wang, Jeff Phillips et al.

ICLR 2025posterarXiv:2402.02746
18
citations
#1636

TexGaussian: Generating High-quality PBR Material via Octree-based 3D Gaussian Splatting

Bojun Xiong, Jialun Liu, JiaKui Hu et al.

CVPR 2025posterarXiv:2411.19654
18
citations
#1637

Emerging Safety Attack and Defense in Federated Instruction Tuning of Large Language Models

Rui Ye, Jingyi Chai, Xiangrui Liu et al.

ICLR 2025posterarXiv:2406.10630
18
citations
#1638

Unveiling Visual Perception in Language Models: An Attention Head Analysis Approach

Jing Bi, Lianggong Bruce Wen, Zhang Liu et al.

CVPR 2025posterarXiv:2412.18108
18
citations
#1639

EffoVPR: Effective Foundation Model Utilization for Visual Place Recognition

Issar Tzachor, Boaz Lerner, Matan Levy et al.

ICLR 2025posterarXiv:2405.18065
18
citations
#1640

MSP-MVS: Multi-Granularity Segmentation Prior Guided Multi-View Stereo

Zhenlong Yuan, Cong Liu, Fei Shen et al.

AAAI 2025paperarXiv:2407.19323
18
citations
#1641

Decompositional Neural Scene Reconstruction with Generative Diffusion Prior

Junfeng Ni, Yu Liu, Ruijie Lu et al.

CVPR 2025posterarXiv:2503.14830
18
citations
#1642

SynFlowNet: Design of Diverse and Novel Molecules with Synthesis Constraints

Miruna Cretu, Charles Harris, Ilia Igashov et al.

ICLR 2025posterarXiv:2405.01155
18
citations
#1643

Cubify Anything: Scaling Indoor 3D Object Detection

Justin Lazarow, David Griffiths, Gefen Kohavi et al.

CVPR 2025highlightarXiv:2412.04458
18
citations
#1644

Raw2Drive: Reinforcement Learning with Aligned World Models for End-to-End Autonomous Driving (in CARLA v2)

Zhenjie Yang, Xiaosong Jia, Qifeng Li et al.

NEURIPS 2025posterarXiv:2505.16394
18
citations
#1645

AvatarGO: Zero-shot 4D Human-Object Interaction Generation and Animation

Yukang Cao, Liang Pan, Kai Han et al.

ICLR 2025posterarXiv:2410.07164
18
citations
#1646

Instruct-SkillMix: A Powerful Pipeline for LLM Instruction Tuning

Simran Kaur, Simon Park, Anirudh Goyal et al.

ICLR 2025posterarXiv:2408.14774
18
citations
#1647

Adaptive Rectangular Convolution for Remote Sensing Pansharpening

Xueyang Wang, Zhixin Zheng, Jiandong Shao et al.

CVPR 2025posterarXiv:2503.00467
18
citations
#1648

Video Motion Transfer with Diffusion Transformers

Alexander Pondaven, Aliaksandr Siarohin, Sergey Tulyakov et al.

CVPR 2025posterarXiv:2412.07776
18
citations
#1649

SLMRec: Distilling Large Language Models into Small for Sequential Recommendation

Wujiang Xu, Qitian Wu, Zujie Liang et al.

ICLR 2025oralarXiv:2405.17890
18
citations
#1650

VSP: Diagnosing the Dual Challenges of Perception and Reasoning in Spatial Planning Tasks for MLLMs

Qiucheng Wu, Handong Zhao, Michael Saxon et al.

ICCV 2025poster
18
citations
#1651

Generalization through variance: how noise shapes inductive biases in diffusion models

John Vastola

ICLR 2025posterarXiv:2504.12532
18
citations
#1652

HexGen-2: Disaggregated Generative Inference of LLMs in Heterogeneous Environment

YOUHE JIANG, Ran Yan, Binhang Yuan

ICLR 2025posterarXiv:2502.07903
18
citations
#1653

A Controlled Study on Long Context Extension and Generalization in LLMs

Yi Lu, Jing Nathan Yan, Songlin Yang et al.

COLM 2025paperarXiv:2409.12181
18
citations
#1654

Hydra-NeXt: Robust Closed-Loop Driving with Open-Loop Training

Zhenxin Li, Shihao Wang, Shiyi Lan et al.

ICCV 2025posterarXiv:2503.12030
18
citations
#1655

You Only Sample Once: Taming One-Step Text-to-Image Synthesis by Self-Cooperative Diffusion GANs

Yihong Luo, Xiaolong Chen, Xinghua Qu et al.

ICLR 2025posterarXiv:2403.12931
18
citations
#1656

Commit0: Library Generation from Scratch

Wenting Zhao, Nan Jiang, Celine Lee et al.

ICLR 2025posterarXiv:2412.01769
18
citations
#1657

Analyzing and Boosting the Power of Fine-Grained Visual Recognition for Multi-modal Large Language Models

Hulingxiao He, Geng Li, Zijun Geng et al.

ICLR 2025posterarXiv:2501.15140
18
citations
#1658

Diversity-Aware Policy Optimization for Large Language Model Reasoning

Jian Yao, Ran Cheng, Xingyu Wu et al.

NEURIPS 2025spotlightarXiv:2505.23433
18
citations
#1659

DistiLLM-2: A Contrastive Approach Boosts the Distillation of LLMs

Jongwoo Ko, Tianyi Chen, Sungnyun Kim et al.

ICML 2025oralarXiv:2503.07067
18
citations
#1660

Towards Adversarially Robust Dataset Distillation by Curvature Regularization

Eric Xue, Yijiang Li, Haoyang Liu et al.

AAAI 2025paperarXiv:2403.10045
18
citations
#1661

Limits of Deep Learning: Sequence Modeling through the Lens of Complexity Theory

Nikola Zubic, Federico Soldà, Aurelio Sulser et al.

ICLR 2025posterarXiv:2405.16674
18
citations
#1662

Ref-GS: Directional Factorization for 2D Gaussian Splatting

Youjia Zhang, Anpei Chen, Yumin Wan et al.

CVPR 2025posterarXiv:2412.00905
18
citations
#1663

MoonCast: High-Quality Zero-Shot Podcast Generation

Zeqian Ju, Dongchao Yang, Shen Kai et al.

NEURIPS 2025oralarXiv:2503.14345
18
citations
#1664

Unveiling the Mist over 3D Vision-Language Understanding: Object-centric Evaluation with Chain-of-Analysis

Jiangyong Huang, Baoxiong Jia, Yan Wang et al.

CVPR 2025posterarXiv:2503.22420
18
citations
#1665

UniPose: A Unified Multimodal Framework for Human Pose Comprehension, Generation and Editing

Yiheng Li, RuiBing Hou, Hong Chang et al.

CVPR 2025highlightarXiv:2411.16781
18
citations
#1666

Vid2Avatar-Pro: Authentic Avatar from Videos in the Wild via Universal Prior

Chen Guo, Junxuan Li, Yash Kant et al.

CVPR 2025posterarXiv:2503.01610
18
citations
#1667

Perm: A Parametric Representation for Multi-Style 3D Hair Modeling

Chengan He, Xin Sun, Zhixin Shu et al.

ICLR 2025posterarXiv:2407.19451
18
citations
#1668

Delta Decompression for MoE-based LLMs Compression

Hao Gu, Wei Li, Lujun Li et al.

ICML 2025posterarXiv:2502.17298
18
citations
#1669

Block Verification Accelerates Speculative Decoding

Ziteng Sun, Uri Mendlovic, Yaniv Leviathan et al.

ICLR 2025posterarXiv:2403.10444
18
citations
#1670

SketchAgent: Language-Driven Sequential Sketch Generation

Yael Vinker, Tamar Rott Shaham, Kristine Zheng et al.

CVPR 2025posterarXiv:2411.17673
18
citations
#1671

Can LLMs Really Learn to Translate a Low-Resource Language from One Grammar Book?

Seth Aycock, David Stap, Di Wu et al.

ICLR 2025posterarXiv:2409.19151
18
citations
#1672

Encryption-Friendly LLM Architecture

Donghwan Rho, Taeseong Kim, Minje Park et al.

ICLR 2025posterarXiv:2410.02486
18
citations
#1673

Benchmarking Large Vision-Language Models via Directed Scene Graph for Comprehensive Image Captioning

Fan Lu, Wei Wu, Kecheng Zheng et al.

CVPR 2025posterarXiv:2412.08614
18
citations
#1674

Merging on the Fly Without Retraining: A Sequential Approach to Scalable Continual Model Merging

Anke Tang, Enneng Yang, Li Shen et al.

NEURIPS 2025poster
18
citations
#1675

FatesGS: Fast and Accurate Sparse-View Surface Reconstruction Using Gaussian Splatting with Depth-Feature Consistency

Han Huang, Yulun Wu, Chao Deng et al.

AAAI 2025paperarXiv:2501.04628
18
citations
#1676

Beyond Chemical QA: Evaluating LLM's Chemical Reasoning with Modular Chemical Operations

Li Hao, He CAO, Bin Feng et al.

NEURIPS 2025posterarXiv:2505.21318
18
citations
#1677

Generative Multimodal Pretraining with Discrete Diffusion Timestep Tokens

Kaihang Pan, Wang Lin, Zhongqi Yue et al.

CVPR 2025posterarXiv:2504.14666
18
citations
#1678

Palu: KV-Cache Compression with Low-Rank Projection

Chi-Chih Chang, Wei-Cheng Lin, Chien-Yu Lin et al.

ICLR 2025poster
18
citations
#1679

BirdSet: A Large-Scale Dataset for Audio Classification in Avian Bioacoustics

Lukas Rauch, Raphael Schwinger, Moritz Wirth et al.

ICLR 2025posterarXiv:2403.10380
18
citations
#1680

Bridging Distributional and Risk-sensitive Reinforcement Learning with Provable Regret Bounds

Hao Liang, Zhiquan Luo

NEURIPS 2025posterarXiv:2210.14051
18
citations
#1681

Scaling Optimal LR Across Token Horizons

Johan Bjorck, Alon Benhaim, Vishrav Chaudhary et al.

ICLR 2025posterarXiv:2409.19913
18
citations
#1682

Controlling Language and Diffusion Models by Transporting Activations

Pau Rodriguez, Arno Blaas, Michal Klein et al.

ICLR 2025posterarXiv:2410.23054
18
citations
#1683

CrossMPT: Cross-attention Message-passing Transformer for Error Correcting Codes

Seong-Joon Park, Hee-Youl Kwak, Sang-Hyo Kim et al.

ICLR 2025posterarXiv:2405.01033
18
citations
#1684

Cross-Embodiment Dexterous Grasping with Reinforcement Learning

Haoqi Yuan, Bohan Zhou, Yuhui Fu et al.

ICLR 2025posterarXiv:2410.02479
18
citations
#1685

VLM-R³: Region Recognition, Reasoning, and Refinement for Enhanced Multimodal Chain-of-Thought

Chaoya Jiang, Yongrui Heng, Wei Ye et al.

NEURIPS 2025poster
18
citations
#1686

HumanDreamer: Generating Controllable Human-Motion Videos via Decoupled Generation

Boyuan Wang, Xiaofeng Wang, Chaojun Ni et al.

CVPR 2025posterarXiv:2503.24026
18
citations
#1687

ReinFlow: Fine-tuning Flow Matching Policy with Online Reinforcement Learning

Tonghe Zhang, Chao Yu, Sichang Su et al.

NEURIPS 2025posterarXiv:2505.22094
18
citations
#1688

Benchmarking Predictive Coding Networks -- Made Simple

Luca Pinchetti, Chang Qi, Oleh Lokshyn et al.

ICLR 2025posterarXiv:2407.01163
18
citations
#1689

TreeMeshGPT: Artistic Mesh Generation with Autoregressive Tree Sequencing

Stefan Lionar, Jiabin Liang, Gim Hee Lee

CVPR 2025posterarXiv:2503.11629
18
citations
#1690

CLIBD: Bridging Vision and Genomics for Biodiversity Monitoring at Scale

ZeMing Gong, Austin Wang, Xiaoliang Huo et al.

ICLR 2025posterarXiv:2405.17537
18
citations
#1691

4Real-Video: Learning Generalizable Photo-Realistic 4D Video Diffusion

Chaoyang Wang, Peiye Zhuang, Tuan Duc Ngo et al.

CVPR 2025highlightarXiv:2412.04462
18
citations
#1692

GenFusion: Closing the Loop between Reconstruction and Generation via Videos

Sibo Wu, Congrong Xu, Binbin Huang et al.

CVPR 2025posterarXiv:2503.21219
18
citations
#1693

HoliTom: Holistic Token Merging for Fast Video Large Language Models

Kele Shao, Keda TAO, Can Qin et al.

NEURIPS 2025oralarXiv:2505.21334
18
citations
#1694

Moto: Latent Motion Token as the Bridging Language for Learning Robot Manipulation from Videos

Yi Chen, Yuying Ge, Weiliang Tang et al.

ICCV 2025posterarXiv:2412.04445
18
citations
#1695

Discretization-invariance? On the Discretization Mismatch Errors in Neural Operators

Wenhan Gao, Ruichen Xu, Yuefan Deng et al.

ICLR 2025poster
18
citations
#1696

SpikingSSMs: Learning Long Sequences with Sparse and Parallel Spiking State Space Models

Shuaijie Shen, Chao Wang, Renzhuo Huang et al.

AAAI 2025paperarXiv:2408.14909
18
citations
#1697

Wider or Deeper? Scaling LLM Inference-Time Compute with Adaptive Branching Tree Search

Yuichi Inoue, Kou Misaki, Yuki Imajuku et al.

NEURIPS 2025spotlightarXiv:2503.04412
18
citations
#1698

Structure-Adaptive Multi-View Graph Clustering for Remote Sensing Data

Renxiang Guan, Wenxuan Tu, Siwei Wang et al.

AAAI 2025paper
18
citations
#1699

Diffusion$^2$: Dynamic 3D Content Generation via Score Composition of Video and Multi-view Diffusion Models

Zeyu Yang, Zijie Pan, Chun Gu et al.

ICLR 2025oralarXiv:2404.02148
18
citations
#1700

Object Detection using Event Camera: A MoE Heat Conduction based Detector and A New Benchmark Dataset

Xiao Wang, Yu Jin, Wentao Wu et al.

CVPR 2025posterarXiv:2412.06647
18
citations
#1701

TIME-FS: Joint Learning of Tensorial Incomplete Multi-View Unsupervised Feature Selection and Missing-View Imputation

Yanyong Huang, Minghui Lu, Wei Huang et al.

AAAI 2025paper
18
citations
#1702

Boosting Neural Combinatorial Optimization for Large-Scale Vehicle Routing Problems

Fu Luo, Xi Lin, Yaoxin Wu et al.

ICLR 2025poster
18
citations
#1703

X-NeMo: Expressive Neural Motion Reenactment via Disentangled Latent Attention

XiaoChen Zhao, Hongyi Xu, Guoxian Song et al.

ICLR 2025posterarXiv:2507.23143
18
citations
#1704

Gradient-Free Generation for Hard-Constrained Systems

Chaoran Cheng, Boran Han, Danielle Maddix et al.

ICLR 2025posterarXiv:2412.01786
18
citations
#1705

MarS: a Financial Market Simulation Engine Powered by Generative Foundation Model

Junjie Li, Yang Liu, Weiqing Liu et al.

ICLR 2025posterarXiv:2409.07486
18
citations
#1706

The Illusion of Empathy: How AI Chatbots Shape Conversation Perception

Tingting Liu, Salvatore Giorgi, Ankit Aich et al.

AAAI 2025paperarXiv:2411.12877
18
citations
#1707

A Label-free Heterophily-guided Approach for Unsupervised Graph Fraud Detection

Junjun Pan, Yixin Liu, Xin Zheng et al.

AAAI 2025paperarXiv:2502.13308
18
citations
#1708

Hash3D: Training-free Acceleration for 3D Generation

Xingyi Yang, Songhua Liu, Xinchao Wang

CVPR 2025posterarXiv:2404.06091
18
citations
#1709

EuroBERT: Scaling Multilingual Encoders for European Languages

Nicolas Boizard, Hippolyte Gisserot-Boukhlef, Duarte Miguel Alves et al.

COLM 2025paperarXiv:2503.05500
18
citations
#1710

Logic-in-Frames: Dynamic Keyframe Search via Visual Semantic-Logical Verification for Long Video Understanding

Weiyu Guo, Ziyang Chen, Shaoguang WANG et al.

NEURIPS 2025oralarXiv:2503.13139
18
citations
#1711

Learning to Discretize Denoising Diffusion ODEs

Vinh Tong, Trung-Dung Hoang, Anji Liu et al.

ICLR 2025posterarXiv:2405.15506
18
citations
#1712

MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization

Bhavya, Stelian Coros, Andreas Krause et al.

ICLR 2025posterarXiv:2412.12098
18
citations
#1713

Air Quality Prediction with Physics-Guided Dual Neural ODEs in Open Systems

jindong tian, Yuxuan Liang, Ronghui Xu et al.

ICLR 2025oralarXiv:2410.19892
18
citations
#1714

Dense Audio-Visual Event Localization Under Cross-Modal Consistency and Multi-Temporal Granularity Collaboration

Ziheng Zhou, Jinxing Zhou, Wei Qian et al.

AAAI 2025paperarXiv:2412.12628
18
citations
#1715

OmniGuard: Hybrid Manipulation Localization via Augmented Versatile Deep Image Watermarking

Xuanyu Zhang, Zecheng Tang, Zhipei Xu et al.

CVPR 2025posterarXiv:2412.01615
18
citations
#1716

Playing the Fool: Jailbreaking LLMs and Multimodal LLMs with Out-of-Distribution Strategy

Joonhyun Jeong, Seyun Bae, Yeonsung Jung et al.

CVPR 2025posterarXiv:2503.20823
18
citations
#1717

CAD-Recode: Reverse Engineering CAD Code from Point Clouds

Danila Rukhovich, Elona Dupont, Dimitrios Mallis et al.

ICCV 2025posterarXiv:2412.14042
18
citations
#1718

Fundamental Limits of Prompt Tuning Transformers: Universality, Capacity and Efficiency

Jerry Yao-Chieh Hu, Wei-Po Wang, Ammar Gilani et al.

ICLR 2025posterarXiv:2411.16525
18
citations
#1719

FreeSplatter: Pose-free Gaussian Splatting for Sparse-view 3D Reconstruction

Jiale Xu, Shenghua Gao, Ying Shan

ICCV 2025posterarXiv:2412.09573
18
citations
#1720

Non-myopic Generation of Language Models for Reasoning and Planning

Chang Ma, Haiteng Zhao, Junlei Zhang et al.

ICLR 2025posterarXiv:2410.17195
18
citations
#1721

A Quantum Circuit-Based Compression Perspective for Parameter-Efficient Learning

Chen-Yu Liu, Chao-Han Huck Yang, Hsi-Sheng Goan et al.

ICLR 2025posterarXiv:2410.09846
17
citations
#1722

Graph Sparsification via Mixture of Graphs

Guibin Zhang, Xiangguo SUN, Yanwei Yue et al.

ICLR 2025posterarXiv:2405.14260
17
citations
#1723

MetaMetrics: Calibrating Metrics for Generation Tasks Using Human Preferences

Genta Winata, David Anugraha, Lucky Susanto et al.

ICLR 2025posterarXiv:2410.02381
17
citations
#1724

Learning 3D Persistent Embodied World Models

Siyuan Zhou, Yilun Du, Yuncong Yang et al.

NEURIPS 2025posterarXiv:2505.05495
17
citations
#1725

Adversarial Attacks against Closed-Source MLLMs via Feature Optimal Alignment

Xiaojun Jia, Sensen Gao, Simeng Qin et al.

NEURIPS 2025posterarXiv:2505.21494
17
citations
#1726

ViLLa: Video Reasoning Segmentation with Large Language Model

rongkun Zheng, Lu Qi, Xi Chen et al.

ICCV 2025posterarXiv:2407.14500
17
citations
#1727

DashGaussian: Optimizing 3D Gaussian Splatting in 200 Seconds

Youyu Chen, Junjun Jiang, Kui Jiang et al.

CVPR 2025highlightarXiv:2503.18402
17
citations
#1728

Optimal Transport for Time Series Imputation

Hao Wang, zhengnan li, Haoxuan Li et al.

ICLR 2025oral
17
citations
#1729

SAEs Can Improve Unlearning: Dynamic Sparse Autoencoder Guardrails for Precision Unlearning in LLMs

Aashiq Muhamed, Jacopo Bonato, Mona T. Diab et al.

COLM 2025paper
17
citations
#1730

No Preference Left Behind: Group Distributional Preference Optimization

Binwei Yao, Zefan Cai, Yun-Shiuan Chuang et al.

ICLR 2025posterarXiv:2412.20299
17
citations
#1731

DarkBench: Benchmarking Dark Patterns in Large Language Models

Esben Kran, Hieu Minh Nguyen, Akash Kundu et al.

ICLR 2025posterarXiv:2503.10728
17
citations
#1732

Mamba as a Bridge: Where Vision Foundation Models Meet Vision Language Models for Domain-Generalized Semantic Segmentation

Xin Zhang, Robby T. Tan

CVPR 2025highlightarXiv:2504.03193
17
citations
#1733

VITA-Audio: Fast Interleaved Audio-Text Token Generation for Efficient Large Speech-Language Model

Zuwei Long, Yunhang Shen, Chaoyou Fu et al.

NEURIPS 2025poster
17
citations
#1734

DoomArena: A framework for Testing AI Agents Against Evolving Security Threats

Léo Boisvert, Abhay Puri, Gabriel Huang et al.

COLM 2025paperarXiv:2504.14064
17
citations
#1735

Adaptive Methods through the Lens of SDEs: Theoretical Insights on the Role of Noise

Enea Monzio Compagnoni, Tianlin Liu, Rustem Islamov et al.

ICLR 2025posterarXiv:2411.15958
17
citations
#1736

Depth Any Camera: Zero-Shot Metric Depth Estimation from Any Camera

Yuliang Guo, Sparsh Garg, S. Mahdi H. Miangoleh et al.

CVPR 2025posterarXiv:2501.02464
17
citations
#1737

Mitigating the Human-Robot Domain Discrepancy in Visual Pre-training for Robotic Manipulation

Jiaming Zhou, Teli Ma, Kun-Yu Lin et al.

CVPR 2025posterarXiv:2406.14235
17
citations
#1738

V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding

Junqi Ge, Ziyi Chen, Jintao Lin et al.

ICCV 2025posterarXiv:2412.09616
17
citations
#1739

USP: Unified Self-Supervised Pretraining for Image Generation and Understanding

Xiangxiang Chu, Renda Li, Yong Wang

ICCV 2025posterarXiv:2503.06132
17
citations
#1740

Spatiotemporal Skip Guidance for Enhanced Video Diffusion Sampling

Junha Hyung, Kinam Kim, Susung Hong et al.

CVPR 2025posterarXiv:2411.18664
17
citations
#1741

RainbowPO: A Unified Framework for Combining Improvements in Preference Optimization

Hanyang Zhao, Genta Winata, Anirban Das et al.

ICLR 2025posterarXiv:2410.04203
17
citations
#1742

QiMeng-CodeV-R1: Reasoning-Enhanced Verilog Generation

Yaoyu Zhu, Di Huang, Hanqi Lyu et al.

NEURIPS 2025posterarXiv:2505.24183
17
citations
#1743

Dynamic Multimodal Evaluation with Flexible Complexity by Vision-Language Bootstrapping

Yue Yang, Shuibo Zhang, Kaipeng Zhang et al.

ICLR 2025posterarXiv:2410.08695
17
citations
#1744

From Risk to Uncertainty: Generating Predictive Uncertainty Measures via Bayesian Estimation

Nikita Kotelevskii, Vladimir Kondratyev, Martin Takáč et al.

ICLR 2025posterarXiv:2402.10727
17
citations
#1745

Neighboring Autoregressive Modeling for Efficient Visual Generation

Yefei He, Yuanyu He, Shaoxuan He et al.

ICCV 2025posterarXiv:2503.10696
17
citations
#1746

INFP: Audio-Driven Interactive Head Generation in Dyadic Conversations

Yongming Zhu, Longhao Zhang, Zhengkun Rong et al.

CVPR 2025posterarXiv:2412.04037
17
citations
#1747

Towards Universality: Studying Mechanistic Similarity Across Language Model Architectures

Junxuan Wang, Xuyang Ge, Wentao Shu et al.

ICLR 2025posterarXiv:2410.06672
17
citations
#1748

Learning Interpretable Hierarchical Dynamical Systems Models from Time Series Data

Manuel Brenner, Elias Weber, Georgia Koppe et al.

ICLR 2025posterarXiv:2410.04814
17
citations
#1749

Unifying Unsupervised Graph-Level Anomaly Detection and Out-of-Distribution Detection: A Benchmark

Yili Wang, Yixin Liu, Xu Shen et al.

ICLR 2025posterarXiv:2406.15523
17
citations
#1750

Spiking Transformer with Spatial-Temporal Attention

Donghyun Lee, Yuhang Li, Youngeun Kim et al.

CVPR 2025posterarXiv:2409.19764
17
citations
#1751

Reason-before-Retrieve: One-Stage Reflective Chain-of-Thoughts for Training-Free Zero-Shot Composed Image Retrieval

Yuanmin Tang, Jue Zhang, Xiaoting Qin et al.

CVPR 2025highlightarXiv:2412.11077
17
citations
#1752

Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation

Yuqing Wang, Zhijie Lin, Yao Teng et al.

ICCV 2025posterarXiv:2503.16430
17
citations
#1753

Learning Clustering-based Prototypes for Compositional Zero-Shot Learning

Hongyu Qu, Jianan Wei, Xiangbo Shu et al.

ICLR 2025posterarXiv:2502.06501
17
citations
#1754

Can LVLMs Obtain a Driver’s License? A Benchmark Towards Reliable AGI for Autonomous Driving

Yuhang Lu, Yichen Yao, Jiadong Tu et al.

AAAI 2025paperarXiv:2409.02914
17
citations
#1755

PO3AD: Predicting Point Offsets toward Better 3D Point Cloud Anomaly Detection

Jianan Ye, Weiguang Zhao, Xi Yang et al.

CVPR 2025posterarXiv:2412.12617
17
citations
#1756

GuardReasoner-VL: Safeguarding VLMs via Reinforced Reasoning

Yue Liu, Shengfang Zhai, Mingzhe Du et al.

NEURIPS 2025posterarXiv:2505.11049
17
citations
#1757

Generative Flows on Synthetic Pathway for Drug Design

Seonghwan Seo, Minsu Kim, Tony Shen et al.

ICLR 2025posterarXiv:2410.04542
17
citations
#1758

CyberHost: A One-stage Diffusion Framework for Audio-driven Talking Body Generation

Gaojie Lin, Jianwen Jiang, Chao Liang et al.

ICLR 2025poster
17
citations
#1759

Roboflow100-VL: A Multi-Domain Object Detection Benchmark for Vision-Language Models

Matvei Popov, Peter Robicheaux, Anish Madan et al.

NEURIPS 2025posterarXiv:2505.20612
17
citations
#1760

SlowFast-LLaVA-1.5: A Family of Token-Efficient Video Large Language Models for Long-Form Video Understanding

Mingze Xu, Mingfei Gao, Shiyu Li et al.

COLM 2025paperarXiv:2503.18943
17
citations
#1761

OmniKV: Dynamic Context Selection for Efficient Long-Context LLMs

Jitai Hao, Yuke Zhu, Tian Wang et al.

ICLR 2025poster
17
citations
#1762

Reasoning of Large Language Models over Knowledge Graphs with Super-Relations

Song Wang, Junhong Lin, Xiaojie Guo et al.

ICLR 2025posterarXiv:2503.22166
17
citations
#1763

LoRA-FAIR: Federated LoRA Fine-Tuning with Aggregation and Initialization Refinement

Jieming Bian, Lei Wang, Letian Zhang et al.

ICCV 2025posterarXiv:2411.14961
17
citations
#1764

A Probabilistic Perspective on Unlearning and Alignment for Large Language Models

Yan Scholten, Stephan Günnemann, Leo Schwinn

ICLR 2025posterarXiv:2410.03523
17
citations
#1765

TULIP: Token-length Upgraded CLIP

Ivona Najdenkoska, Mohammad Mahdi Derakhshani, Yuki Asano et al.

ICLR 2025posterarXiv:2410.10034
17
citations
#1766

VL-ICL Bench: The Devil in the Details of Multimodal In-Context Learning

Yongshuo Zong, Ondrej Bohdal, Timothy Hospedales

ICLR 2025posterarXiv:2403.13164
17
citations
#1767

Mellow: a small audio language model for reasoning

Soham Deshmukh, Satvik Dixit, Rita Singh et al.

NEURIPS 2025posterarXiv:2503.08540
17
citations
#1768

GEVRM: Goal-Expressive Video Generation Model For Robust Visual Manipulation

Hongyin Zhang, Pengxiang Ding, Shangke Lyu et al.

ICLR 2025posterarXiv:2502.09268
17
citations
#1769

Exploring the Limits of Vision-Language-Action Manipulation in Cross-task Generalization

Jiaming Zhou, Ke Ye, Jiayi Liu et al.

NEURIPS 2025posterarXiv:2505.15660
17
citations
#1770

SlowFast-VGen: Slow-Fast Learning for Action-Driven Long Video Generation

Yining Hong, Beide Liu, Maxine Wu et al.

ICLR 2025oralarXiv:2410.23277
17
citations
#1771

Accelerating neural network training: An analysis of the AlgoPerf competition

Priya Kasimbeg, Frank Schneider, Runa Eschenhagen et al.

ICLR 2025posterarXiv:2502.15015
17
citations
#1772

BlueSuffix: Reinforced Blue Teaming for Vision-Language Models Against Jailbreak Attacks

Yunhan Zhao, Xiang Zheng, Lin Luo et al.

ICLR 2025posterarXiv:2410.20971
17
citations
#1773

Fine-Tuning Discrete Diffusion Models with Policy Gradient Methods

Oussama Zekri, Nicolas Boulle

NEURIPS 2025posterarXiv:2502.01384
17
citations
#1774

Spatial457: A Diagnostic Benchmark for 6D Spatial Reasoning of Large Mutimodal Models

Xingrui Wang, Wufei Ma, Tiezheng Zhang et al.

CVPR 2025highlight
17
citations
#1775

M²IV: Towards Efficient and Fine-grained Multimodal In-Context Learning via Representation Engineering

Yanshu Li, Yi Cao, Hongyang He et al.

COLM 2025paper
17
citations
#1776

Collab-RAG: Boosting Retrieval-Augmented Generation for Complex Question Answering via White-Box and Black-Box LLM Collaboration

Ran Xu, Wenqi Shi, Yuchen Zhuang et al.

COLM 2025paperarXiv:2504.04915
17
citations
#1777

Video-T1: Test-time Scaling for Video Generation

Fangfu Liu, Hanyang Wang, Yimo Cai et al.

ICCV 2025posterarXiv:2503.18942
17
citations
#1778

Sloth: scaling laws for LLM skills to predict multi-benchmark performance across families

Felipe Maia Polo, Seamus Somerstep, Leshem Choshen et al.

NEURIPS 2025posterarXiv:2412.06540
17
citations
#1779

DeLLMa: Decision Making Under Uncertainty with Large Language Models

Ollie Liu, Deqing Fu, Dani Yogatama et al.

ICLR 2025posterarXiv:2402.02392
17
citations
#1780

UniRestore: Unified Perceptual and Task-Oriented Image Restoration Model Using Diffusion Prior

I-Hsiang Chen, Wei-Ting Chen, Yu-Wei Liu et al.

CVPR 2025highlightarXiv:2501.13134
17
citations
#1781

Detecting Out-of-Distribution Through the Lens of Neural Collapse

Litian Liu, Yao Qin

CVPR 2025posterarXiv:2311.01479
17
citations
#1782

Meta-Unlearning on Diffusion Models: Preventing Relearning Unlearned Concepts

Hongcheng Gao, Tianyu Pang, Chao Du et al.

ICCV 2025posterarXiv:2410.12777
17
citations
#1783

AVHBench: A Cross-Modal Hallucination Benchmark for Audio-Visual Large Language Models

Kim Sung-Bin, Oh Hyun-Bin, Lee Jung-Mok et al.

ICLR 2025posterarXiv:2410.18325
17
citations
#1784

Deep Video Discovery: Agentic Search with Tool Use for Long-form Video Understanding

Xiaoyi Zhang, Zhaoyang Jia, Zongyu Guo et al.

NEURIPS 2025oralarXiv:2505.18079
17
citations
#1785

Improving Reasoning Performance in Large Language Models via Representation Engineering

Bertram Højer, Oliver Jarvis, Stefan Heinrich

ICLR 2025posterarXiv:2504.19483
17
citations
#1786

VLIPP: Towards Physically Plausible Video Generation with Vision and Language Informed Physical Prior

Xindi Yang, Baolu Li, Yiming Zhang et al.

ICCV 2025posterarXiv:2503.23368
17
citations
#1787

3DRealCar: An In-the-wild RGB-D Car Dataset with 360-degree Views

Xiaobiao Du, Yida Wang, Haiyang Sun et al.

ICCV 2025posterarXiv:2406.04875
17
citations
#1788

u-$\mu$P: The Unit-Scaled Maximal Update Parametrization

Charles Blake, Constantin Eichenberg, Josef Dean et al.

ICLR 2025poster
17
citations
#1789

Swift4D: Adaptive divide-and-conquer Gaussian Splatting for compact and efficient reconstruction of dynamic scene

Jiahao Wu, Rui Peng, Zhiyan Wang et al.

ICLR 2025poster
17
citations
#1790

Provable Benefit of Annealed Langevin Monte Carlo for Non-log-concave Sampling

Wei Guo, Molei Tao, Yongxin Chen

ICLR 2025posterarXiv:2407.16936
17
citations
#1791

On the Guidance of Flow Matching

Ruiqi Feng, Chenglei Yu, Wenhao Deng et al.

ICML 2025spotlightarXiv:2502.02150
17
citations
#1792

Interpretable Bilingual Multimodal Large Language Model for Diverse Biomedical Tasks

Lehan Wang, Haonan Wang, Honglong Yang et al.

ICLR 2025posterarXiv:2410.18387
17
citations
#1793

Learning Precise Affordances from Egocentric Videos for Robotic Manipulation

Li, Nikolaos Tsagkas, Jifei Song et al.

ICCV 2025posterarXiv:2408.10123
17
citations
#1794

Controllable Context Sensitivity and the Knob Behind It

Julian Minder, Kevin Du, Niklas Stoehr et al.

ICLR 2025posterarXiv:2411.07404
17
citations
#1795

PointOBB-v2: Towards Simpler, Faster, and Stronger Single Point Supervised Oriented Object Detection

Botao Ren, Xue Yang, Yi Yu et al.

ICLR 2025posterarXiv:2410.08210
17
citations
#1796

Personalized Federated Collaborative Filtering: A Variational AutoEncoder Approach

Zhiwei Li, Guodong Long, Tianyi Zhou et al.

AAAI 2025paperarXiv:2408.08931
17
citations
#1797

Visual Description Grounding Reduces Hallucinations and Boosts Reasoning in LVLMs

Sreyan Ghosh, Chandra Kiran Evuru, Sonal Kumar et al.

ICLR 2025posterarXiv:2405.15683
17
citations
#1798

Grokking at the Edge of Numerical Stability

Lucas Prieto, Melih Barsbey, Pedro Mediano et al.

ICLR 2025posterarXiv:2501.04697
17
citations
#1799

GEM: Empowering MLLM for Grounded ECG Understanding with Time Series and Images

Xiang Lan, Feng Wu, Kai He et al.

NEURIPS 2025posterarXiv:2503.06073
17
citations
#1800

Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision

Orr Zohar, Xiaohan Wang, Yonatan Bitton et al.

ICLR 2025posterarXiv:2407.06189
17
citations