All Papers

34,180 papers found • Page 677 of 684

Views Can Be Deceiving: Improved SSL Through Feature Space Augmentation

Kimia Hamidieh, Haoran Zhang, Swami Sankaranarayanan et al.

ICLR 2024spotlight

View Selection for 3D Captioning via Diffusion Ranking

Tiange Luo, Justin Johnson, Honglak Lee

ECCV 2024poster
29
citations

ViG-Bias: Visually Grounded Bias Discovery and Mitigation

Badr-Eddine Marani, Mohamed HANINI, Nihitha Malayarukil et al.

ECCV 2024poster
2
citations

VIGC: Visual Instruction Generation and Correction

Théo Delemazure, Jérôme Lang, Grzegorz Pierczyński

AAAI 2024paperarXiv:2308.12714
84
citations

ViGoR: Improving Visual Grounding of Large Vision Language Models with Fine-Grained Reward Modeling

Siming Yan, Min Bai, Weifeng Chen et al.

ECCV 2024poster

ViLA: Efficient Video-Language Alignment for Video Question Answering

Xijun Wang, Junbang Liang, Chun-Kai Wang et al.

ECCV 2024poster
22
citations

ViLa-MIL: Dual-scale Vision-Language Multiple Instance Learning for Whole Slide Image Classification

Jiangbo Shi, Chen Li, Tieliang Gong et al.

CVPR 2024poster

VILA: On Pre-training for Visual Language Models

Ji Lin, Danny Yin, Wei Ping et al.

CVPR 2024poster
685
citations

ViLMA: A Zero-Shot Benchmark for Linguistic and Temporal Grounding in Video-Language Models

İlker Kesen, Andrea Pedrotti, Mustafa Dogan et al.

ICLR 2024oral

ViLT-CLIP: Video and Language Tuning CLIP with Multimodal Prompt Learning and Scenario-guided Optimization

Hao Wang, Fang Liu, Licheng Jiao et al.

AAAI 2024paper

VINECS: Video-based Neural Character Skinning

Zhouyingcheng Liao, Vladislav Golyanik, Marc Habermann et al.

CVPR 2024poster

VinT-6D: A Large-Scale Object-in-hand Dataset from Vision, Touch and Proprioception

Zhaoliang Wan, Yonggen Ling, Senlin Yi et al.

ICML 2024poster

ViP: A Differentially Private Foundation Model for Computer Vision

Yaodong Yu, Maziar Sanjabi, Yi Ma et al.

ICML 2024poster

ViPer: Visual Personalization of Generative Models via Individual Preference Learning

Sogand Salehi, Mahdi Shafiei, Roman Bachmann et al.

ECCV 2024posterarXiv:2407.17365
10
citations

ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts

Mu Cai, Haotian Liu, Siva Mustikovela et al.

CVPR 2024poster
153
citations

V-IRL: Grounding Virtual Intelligence in Real Life

Jihan YANG, Runyu Ding, Ellis L Brown et al.

ECCV 2024posterarXiv:2402.03310
35
citations

Virtual Immunohistochemistry Staining for Histological Images Assisted by Weakly-supervised Learning

Jiahan Li, Jiuyang Dong, Shenjin Huang et al.

CVPR 2024poster

VISAGE: Video Instance Segmentation with Appearance-Guided Enhancement

Hanjung Kim, Jaehyun Kang, Miran Heo et al.

ECCV 2024posterarXiv:2312.04885
7
citations

VISA: Reasoning Video Object Segmentation via Large Language Model

Cilin Yan, haochen wang, Shilin Yan et al.

ECCV 2024poster
95
citations

VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document Understanding

Ofir Abramovich, Niv Nayman, Sharon Fogel et al.

ECCV 2024poster
6
citations

Visible and Clear: Finding Tiny Objects in Difference Map

Bing Cao, Haiyu Yao, Pengfei Zhu et al.

ECCV 2024posterarXiv:2405.11276
19
citations

Vision-and-Language Navigation via Causal Learning

Liuyi Wang, Zongtao He, Ronghao Dang et al.

CVPR 2024poster
41
citations

Vision-by-Language for Training-Free Compositional Image Retrieval

Shyamgopal Karthik, Karsten Roth, Massimiliano Mancini et al.

ICLR 2024poster

VisionGraph: Leveraging Large Multimodal Models for Graph Theory Problems in Visual Context

yunxin li, Baotian Hu, Haoyuan Shi et al.

ICML 2024poster

Vision-Language Action Knowledge Learning for Semantic-Aware Action Quality Assessment

Huangbiao Xu, Xiao Ke, Yuezhou Li et al.

ECCV 2024poster

Vision-Language Dual-Pattern Matching for Out-of-Distribution Detection

Zihan Zhang, Zhuo Xu, Xiang Xiang

ECCV 2024poster

Vision-Language Foundation Models as Effective Robot Imitators

Xinghang Li, Minghuan Liu, Hanbo Zhang et al.

ICLR 2024spotlight
310
citations

Vision-Language Models are Zero-Shot Reward Models for Reinforcement Learning

Juan Rocamonde, Victoriano Montesinos, Elvis Nava et al.

ICLR 2024poster
133
citations

Vision-Language Pre-training with Object Contrastive Learning for 3D Scene Understanding

Taolin Zhang, Sunan He, Tao Dai et al.

AAAI 2024paper

VisionLLaMA: A Unified LLaMA Backbone for Vision Tasks

Xiangxiang Chu, Jianlin Su, Bo Zhang et al.

ECCV 2024poster

Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model

Lianghui Zhu, Bencheng Liao, Qian Zhang et al.

ICML 2024poster

Vision Transformer Off-the-Shelf: A Surprising Baseline for Few-Shot Class-Agnostic Counting

Zhicheng Wang, Liwen Xiao, Zhiguo Cao et al.

AAAI 2024paperarXiv:2305.04440
28
citations

Vision Transformers as Probabilistic Expansion from Learngene

Qiufeng Wang, Xu Yang, Haokun Chen et al.

ICML 2024poster

Vision Transformers Need Registers

Timothée Darcet, Maxime Oquab, Julien Mairal et al.

ICLR 2024poster

VisionTrap: Vision-Augmented Trajectory Prediction Guided by Textual Descriptions

Seokha Moon, Hyun Woo, Hongbeen Park et al.

ECCV 2024poster

Vista3D: unravel the 3d darkside of a single image

Qiuhong Shen, Xingyi Yang, Michael Bi Mi et al.

ECCV 2024poster
3
citations

VISTA-LLAMA: Reducing Hallucination in Video Language Models via Equal Distance to Visual Tokens

Fan Ma, Xiaojie Jin, Heng Wang et al.

CVPR 2024poster

ViSTec: Video Modeling for Sports Technique Recognition and Tactical Analysis

Yuchen He, Zeqing Yuan, Yihong Wu et al.

AAAI 2024paper

Visual Alignment Pre-training for Sign Language Translation

Peiqi Jiao, Yuecong Min, Xilin CHEN

ECCV 2024poster

Visual Anagrams: Generating Multi-View Optical Illusions with Diffusion Models

Daniel Geng, Inbum Park, Andrew Owens

CVPR 2024poster

Visual-Augmented Dynamic Semantic Prototype for Generative Zero-Shot Learning

Wenjin Hou, Shiming Chen, Shuhuang Chen et al.

CVPR 2024poster

Visual Chain-of-Thought Prompting for Knowledge-Based Visual Reasoning

AAAI 2024paper

Visual Concept Connectome (VCC): Open World Concept Discovery and their Interlayer Connections in Deep Models

Matthew Kowal, Richard P. Wildes, Kosta Derpanis

CVPR 2024highlight
16
citations

Visual Data-Type Understanding does not emerge from scaling Vision-Language Models

Vishaal Udandarao, Max F. Burg, Samuel Albanie et al.

ICLR 2024poster

Visual Delta Generator with Large Multi-modal Models for Semi-supervised Composed Image Retrieval

Young Kyun Jang, Donghyun Kim, Zihang Meng et al.

CVPR 2024poster
21
citations

Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation

Yunhao Ge, Xiaohui Zeng, Jacob Huffman et al.

CVPR 2024poster
33
citations

Visual Grounding for Object-Level Generalization in Reinforcement Learning

Haobin Jiang, Zongqing Lu

ECCV 2024posterarXiv:2408.01942
3
citations

Visual Hallucination Elevates Speech Recognition

Fang Zhang, Yongxin Zhu, Xiangxiang Wang et al.

AAAI 2024paper

Visual In-Context Prompting

Feng Li, Qing Jiang, Hao Zhang et al.

CVPR 2024poster
52
citations

Visual Instruction Tuning with Polite Flamingo

Delong Chen, Jianfeng Liu, Wenliang Dai et al.

AAAI 2024paper