ICLR 2025 Papers

3,827 papers found • Page 75 of 77

ViBiDSampler: Enhancing Video Interpolation Using Bidirectional Diffusion Sampler

Serin Yang, Taesung Kwon, Jong Chul YE

ICLR 2025oral
9
citations

VICtoR: Learning Hierarchical Vision-Instruction Correlation Rewards for Long-horizon Manipulation

Kuo-Han Hung, Pang-Chi Lo, Jia-Fong Yeh et al.

ICLR 2025poster
2
citations

Video Action Differencing

James Burgess, Xiaohan Wang, Yuhui Zhang et al.

ICLR 2025poster

VideoGLUE: Video General Understanding Evaluation of Foundation Models

Boqing Gong, Yin Cui, Long Zhao et al.

ICLR 2025oral

VideoGrain: Modulating Space-Time Attention for Multi-Grained Video Editing

Xiangpeng Yang, Linchao Zhu, Hehe Fan et al.

ICLR 2025posterarXiv:2502.17258
31
citations

Video In-context Learning: Autoregressive Transformers are Zero-Shot Video Imitators

Wentao Zhang, Junliang Guo, Tianyu He et al.

ICLR 2025poster

VideoPhy: Evaluating Physical Commonsense for Video Generation

Hritik Bansal, Zongyu Lin, Tianyi Xie et al.

ICLR 2025poster
99
citations

VideoShield: Regulating Diffusion-based Video Generation Models via Watermarking

Runyi Hu, Jie Zhang, Yiming Li et al.

ICLR 2025oral

Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision

Orr Zohar, Xiaohan Wang, Yonatan Bitton et al.

ICLR 2025poster
17
citations

VideoWebArena: Evaluating Long Context Multimodal Agents with Video Understanding Web Tasks

Lawrence Jang, Yinheng Li, Dan Zhao et al.

ICLR 2025poster
25
citations

ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation

Tianchen Zhao, Tongcheng Fang, Haofeng Huang et al.

ICLR 2025poster

VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation

Yecheng Wu, Zhuoyang Zhang, Junyu Chen et al.

ICLR 2025poster

ViSAGe: Video-to-Spatial Audio Generation

Jaeyeon Kim, Heeseung Yun, Gunhee Kim

ICLR 2025oral

Vision and Language Synergy for Rehearsal Free Continual Learning

Muhammad Anwar Masum, Mahardhika Pratama, Savitha Ramasamy et al.

ICLR 2025poster

Vision CNNs trained to estimate spatial latents learned similar ventral-stream-aligned representations

Yudi Xie, Weichen Huang, Esther Alter et al.

ICLR 2025poster

Vision Language Models are In-Context Value Learners

Yecheng Jason Ma, Joey Hejna, Chuyuan Fu et al.

ICLR 2025oral

Vision-LSTM: xLSTM as Generic Vision Backbone

Benedikt Alkin, Maximilian Beck, Korbinian Pöppel et al.

ICLR 2025poster
85
citations

Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures

Yuchen Duan, Weiyun Wang, Zhe Chen et al.

ICLR 2025poster

VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents

Shi Yu, Chaoyue Tang, Bokai Xu et al.

ICLR 2025posterarXiv:2410.10594
121
citations

VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents

Xiao Liu, Tianjie Zhang, Yu Gu et al.

ICLR 2025poster
67
citations

Visual Agents as Fast and Slow Thinkers

Guangyan Sun, Mingyu Jin, Zhenting Wang et al.

ICLR 2025poster
44
citations

Visual Description Grounding Reduces Hallucinations and Boosts Reasoning in LVLMs

Sreyan Ghosh, Chandra Kiran Evuru, Sonal Kumar et al.

ICLR 2025poster

Visual Haystacks: A Vision-Centric Needle-In-A-Haystack Benchmark

Tsung-Han Wu, Giscard Biamby, Jerome Quenum et al.

ICLR 2025poster
30
citations

Visually Consistent Hierarchical Image Classification

Seulki Park, Youren Zhang, Stella Yu et al.

ICLR 2025poster

Visually Guided Decoding: Gradient-Free Hard Prompt Inversion with Language Models

Donghoon Kim, Minji Bae, Kyuhong Shim et al.

ICLR 2025poster

Visual-O1: Understanding Ambiguous Instructions via Multi-modal Multi-turn Chain-of-thoughts Reasoning

Minheng Ni, YuTao Fan, Lei Zhang et al.

ICLR 2025poster

VisualPredicator: Learning Abstract World Models with Neuro-Symbolic Predicates for Robot Planning

Yichao Liang, Nishanth Kumar, Hao Tang et al.

ICLR 2025poster

VLAS: Vision-Language-Action Model with Speech Instructions for Customized Robot Manipulation

Wei Zhao, Pengxiang Ding, Zhang Min et al.

ICLR 2025poster
37
citations

VL-Cache: Sparsity and Modality-Aware KV Cache Compression for Vision-Language Model Inference Acceleration

Dezhan Tu, Danylo Vashchilenko, Yuzhe Lu et al.

ICLR 2025poster

VL-ICL Bench: The Devil in the Details of Multimodal In-Context Learning

Yongshuo Zong, Ondrej Bohdal, Timothy Hospedales

ICLR 2025poster

VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks

Ziyan Jiang, Rui Meng, Xinyi Yang et al.

ICLR 2025poster

VLMaterial: Procedural Material Generation with Large Vision-Language Models

Beichen Li, Rundi Wu, Armando Solar-Lezama et al.

ICLR 2025poster

VOILA: Evaluation of MLLMs For Perceptual Understanding and Analogical Reasoning

Nilay Yilmaz, Maitreya Patel, Lawrence Luo et al.

ICLR 2025poster

VoxDialogue: Can Spoken Dialogue Systems Understand Information Beyond Words?

Xize Cheng, Ruofan Hu, Xiaoda Yang et al.

ICLR 2025poster

VSTAR: Generative Temporal Nursing for Longer Dynamic Video Synthesis

Yumeng Li, William H Beluch, Margret Keuper et al.

ICLR 2025oral
10
citations

VTDexManip: A Dataset and Benchmark for Visual-tactile Pretraining and Dexterous Manipulation with Reinforcement Learning

Qingtao Liu, Yu Cui, Zhengnan Sun et al.

ICLR 2025poster

VVC-Gym: A Fixed-Wing UAV Reinforcement Learning Environment for Multi-Goal Long-Horizon Problems

Xudong Gong, Feng Dawei, Kele Xu et al.

ICLR 2025oral

Walk the Talk? Measuring the Faithfulness of Large Language Model Explanations

Katie Matton, Robert Ness, John Guttag et al.

ICLR 2025poster

Ward: Provable RAG Dataset Inference via LLM Watermarks

Nikola Jovanović, Robin Staab, Maximilian Baader et al.

ICLR 2025poster
13
citations

WardropNet: Traffic Flow Predictions via Equilibrium-Augmented Learning

Kai Jungel, Dario Paccagnan, Axel Parmentier et al.

ICLR 2025poster

Warm Diffusion: Recipe for Blur-Noise Mixture Diffusion Models

Hao-Chien Hsueh, Wen-Hsiao Peng, Ching-Chun Huang

ICLR 2025poster
2
citations

Wasserstein Distances, Neuronal Entanglement, and Sparsity

Shashata Sawmya, Linghao Kong, Ilia Markov et al.

ICLR 2025poster

Wasserstein-Regularized Conformal Prediction under General Distribution Shift

Rui Xu, Chao Chen, Yue Sun et al.

ICLR 2025poster
5
citations

Watch Less, Do More: Implicit Skill Discovery for Video-Conditioned Policy

Wang, Zongqing Lu

ICLR 2025poster

Watermark Anything With Localized Messages

Tom Sander, Pierre Fernandez, Alain Oliviero Durmus et al.

ICLR 2025posterarXiv:2411.07231
38
citations

Wavelet-based Positional Representation for Long Context

Yui Oka, Taku Hasegawa, Kyosuke Nishida et al.

ICLR 2025posterarXiv:2502.02004
2
citations

Wavelet Diffusion Neural Operator

Peiyan Hu, Rui Wang, Xiang Zheng et al.

ICLR 2025poster

WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling

Shengpeng Ji, Ziyue Jiang, Wen Wang et al.

ICLR 2025oralarXiv:2408.16532
125
citations

Wayward Concepts In Multimodal Models

Brandon Trabucco, Max Gurinas, Kyle Doherty et al.

ICLR 2025poster

Weakly-Supervised Affordance Grounding Guided by Part-Level Semantic Priors

Peiran Xu, Yadong MU

ICLR 2025poster