All Papers
34,180 papers found • Page 677 of 684
Views Can Be Deceiving: Improved SSL Through Feature Space Augmentation
Kimia Hamidieh, Haoran Zhang, Swami Sankaranarayanan et al.
View Selection for 3D Captioning via Diffusion Ranking
Tiange Luo, Justin Johnson, Honglak Lee
ViG-Bias: Visually Grounded Bias Discovery and Mitigation
Badr-Eddine Marani, Mohamed HANINI, Nihitha Malayarukil et al.
VIGC: Visual Instruction Generation and Correction
Théo Delemazure, Jérôme Lang, Grzegorz Pierczyński
ViGoR: Improving Visual Grounding of Large Vision Language Models with Fine-Grained Reward Modeling
Siming Yan, Min Bai, Weifeng Chen et al.
ViLA: Efficient Video-Language Alignment for Video Question Answering
Xijun Wang, Junbang Liang, Chun-Kai Wang et al.
ViLa-MIL: Dual-scale Vision-Language Multiple Instance Learning for Whole Slide Image Classification
Jiangbo Shi, Chen Li, Tieliang Gong et al.
VILA: On Pre-training for Visual Language Models
Ji Lin, Danny Yin, Wei Ping et al.
ViLMA: A Zero-Shot Benchmark for Linguistic and Temporal Grounding in Video-Language Models
İlker Kesen, Andrea Pedrotti, Mustafa Dogan et al.
ViLT-CLIP: Video and Language Tuning CLIP with Multimodal Prompt Learning and Scenario-guided Optimization
Hao Wang, Fang Liu, Licheng Jiao et al.
VINECS: Video-based Neural Character Skinning
Zhouyingcheng Liao, Vladislav Golyanik, Marc Habermann et al.
VinT-6D: A Large-Scale Object-in-hand Dataset from Vision, Touch and Proprioception
Zhaoliang Wan, Yonggen Ling, Senlin Yi et al.
ViP: A Differentially Private Foundation Model for Computer Vision
Yaodong Yu, Maziar Sanjabi, Yi Ma et al.
ViPer: Visual Personalization of Generative Models via Individual Preference Learning
Sogand Salehi, Mahdi Shafiei, Roman Bachmann et al.
ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts
Mu Cai, Haotian Liu, Siva Mustikovela et al.
V-IRL: Grounding Virtual Intelligence in Real Life
Jihan YANG, Runyu Ding, Ellis L Brown et al.
Virtual Immunohistochemistry Staining for Histological Images Assisted by Weakly-supervised Learning
Jiahan Li, Jiuyang Dong, Shenjin Huang et al.
VISAGE: Video Instance Segmentation with Appearance-Guided Enhancement
Hanjung Kim, Jaehyun Kang, Miran Heo et al.
VISA: Reasoning Video Object Segmentation via Large Language Model
Cilin Yan, haochen wang, Shilin Yan et al.
VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document Understanding
Ofir Abramovich, Niv Nayman, Sharon Fogel et al.
Visible and Clear: Finding Tiny Objects in Difference Map
Bing Cao, Haiyu Yao, Pengfei Zhu et al.
Vision-and-Language Navigation via Causal Learning
Liuyi Wang, Zongtao He, Ronghao Dang et al.
Vision-by-Language for Training-Free Compositional Image Retrieval
Shyamgopal Karthik, Karsten Roth, Massimiliano Mancini et al.
VisionGraph: Leveraging Large Multimodal Models for Graph Theory Problems in Visual Context
yunxin li, Baotian Hu, Haoyuan Shi et al.
Vision-Language Action Knowledge Learning for Semantic-Aware Action Quality Assessment
Huangbiao Xu, Xiao Ke, Yuezhou Li et al.
Vision-Language Dual-Pattern Matching for Out-of-Distribution Detection
Zihan Zhang, Zhuo Xu, Xiang Xiang
Vision-Language Foundation Models as Effective Robot Imitators
Xinghang Li, Minghuan Liu, Hanbo Zhang et al.
Vision-Language Models are Zero-Shot Reward Models for Reinforcement Learning
Juan Rocamonde, Victoriano Montesinos, Elvis Nava et al.
Vision-Language Pre-training with Object Contrastive Learning for 3D Scene Understanding
Taolin Zhang, Sunan He, Tao Dai et al.
VisionLLaMA: A Unified LLaMA Backbone for Vision Tasks
Xiangxiang Chu, Jianlin Su, Bo Zhang et al.
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
Lianghui Zhu, Bencheng Liao, Qian Zhang et al.
Vision Transformer Off-the-Shelf: A Surprising Baseline for Few-Shot Class-Agnostic Counting
Zhicheng Wang, Liwen Xiao, Zhiguo Cao et al.
Vision Transformers as Probabilistic Expansion from Learngene
Qiufeng Wang, Xu Yang, Haokun Chen et al.
Vision Transformers Need Registers
Timothée Darcet, Maxime Oquab, Julien Mairal et al.
VisionTrap: Vision-Augmented Trajectory Prediction Guided by Textual Descriptions
Seokha Moon, Hyun Woo, Hongbeen Park et al.
Vista3D: unravel the 3d darkside of a single image
Qiuhong Shen, Xingyi Yang, Michael Bi Mi et al.
VISTA-LLAMA: Reducing Hallucination in Video Language Models via Equal Distance to Visual Tokens
Fan Ma, Xiaojie Jin, Heng Wang et al.
ViSTec: Video Modeling for Sports Technique Recognition and Tactical Analysis
Yuchen He, Zeqing Yuan, Yihong Wu et al.
Visual Alignment Pre-training for Sign Language Translation
Peiqi Jiao, Yuecong Min, Xilin CHEN
Visual Anagrams: Generating Multi-View Optical Illusions with Diffusion Models
Daniel Geng, Inbum Park, Andrew Owens
Visual-Augmented Dynamic Semantic Prototype for Generative Zero-Shot Learning
Wenjin Hou, Shiming Chen, Shuhuang Chen et al.
Visual Chain-of-Thought Prompting for Knowledge-Based Visual Reasoning
Visual Concept Connectome (VCC): Open World Concept Discovery and their Interlayer Connections in Deep Models
Matthew Kowal, Richard P. Wildes, Kosta Derpanis
Visual Data-Type Understanding does not emerge from scaling Vision-Language Models
Vishaal Udandarao, Max F. Burg, Samuel Albanie et al.
Visual Delta Generator with Large Multi-modal Models for Semi-supervised Composed Image Retrieval
Young Kyun Jang, Donghyun Kim, Zihang Meng et al.
Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation
Yunhao Ge, Xiaohui Zeng, Jacob Huffman et al.
Visual Grounding for Object-Level Generalization in Reinforcement Learning
Haobin Jiang, Zongqing Lu
Visual Hallucination Elevates Speech Recognition
Fang Zhang, Yongxin Zhu, Xiangxiang Wang et al.
Visual In-Context Prompting
Feng Li, Qing Jiang, Hao Zhang et al.
Visual Instruction Tuning with Polite Flamingo
Delong Chen, Jianfeng Liu, Wenliang Dai et al.