All Papers

34,598 papers found • Page 686 of 692

Vision Transformers as Probabilistic Expansion from Learngene

Qiufeng Wang, Xu Yang, Haokun Chen et al.

ICML 2024poster

Vision Transformers Need Registers

Timothée Darcet, Maxime Oquab, Julien Mairal et al.

ICLR 2024posterarXiv:2309.16588

VisionTrap: Vision-Augmented Trajectory Prediction Guided by Textual Descriptions

Seokha Moon, Hyun Woo, Hongbeen Park et al.

ECCV 2024posterarXiv:2407.12345
21
citations

Vista3D: unravel the 3d darkside of a single image

Qiuhong Shen, Xingyi Yang, Michael Bi Mi et al.

ECCV 2024posterarXiv:2409.12193
3
citations

VISTA-LLAMA: Reducing Hallucination in Video Language Models via Equal Distance to Visual Tokens

Fan Ma, Xiaojie Jin, Heng Wang et al.

CVPR 2024posterarXiv:2312.08870

ViSTec: Video Modeling for Sports Technique Recognition and Tactical Analysis

Yuchen He, Zeqing Yuan, Yihong Wu et al.

AAAI 2024paperarXiv:2402.15952

Visual Alignment Pre-training for Sign Language Translation

Peiqi Jiao, Yuecong Min, Xilin CHEN

ECCV 2024poster
17
citations

Visual Anagrams: Generating Multi-View Optical Illusions with Diffusion Models

Daniel Geng, Inbum Park, Andrew Owens

CVPR 2024posterarXiv:2311.17919

Visual-Augmented Dynamic Semantic Prototype for Generative Zero-Shot Learning

Wenjin Hou, Shiming Chen, Shuhuang Chen et al.

CVPR 2024posterarXiv:2404.14808

Visual Chain-of-Thought Prompting for Knowledge-Based Visual Reasoning

AAAI 2024paper

Visual Concept Connectome (VCC): Open World Concept Discovery and their Interlayer Connections in Deep Models

Matthew Kowal, Richard P. Wildes, Kosta Derpanis

CVPR 2024highlightarXiv:2404.02233
16
citations

Visual Data-Type Understanding does not emerge from scaling Vision-Language Models

Vishaal Udandarao, Max F. Burg, Samuel Albanie et al.

ICLR 2024posterarXiv:2310.08577

Visual Delta Generator with Large Multi-modal Models for Semi-supervised Composed Image Retrieval

Young Kyun Jang, Donghyun Kim, Zihang Meng et al.

CVPR 2024posterarXiv:2404.15516
21
citations

Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation

Yunhao Ge, Xiaohui Zeng, Jacob Huffman et al.

CVPR 2024posterarXiv:2404.19752
33
citations

Visual Grounding for Object-Level Generalization in Reinforcement Learning

Haobin Jiang, Zongqing Lu

ECCV 2024posterarXiv:2408.01942
3
citations

Visual Hallucination Elevates Speech Recognition

Fang Zhang, Yongxin Zhu, Xiangxiang Wang et al.

AAAI 2024paper

Visual In-Context Prompting

Feng Li, Qing Jiang, Hao Zhang et al.

CVPR 2024posterarXiv:2311.13601
52
citations

Visual Instruction Tuning with Polite Flamingo

Delong Chen, Jianfeng Liu, Wenliang Dai et al.

AAAI 2024paperarXiv:2307.01003

Visual Layout Composer: Image-Vector Dual Diffusion Model for Design Layout Generation

Mohammad Amin Shabani, Zhaowen Wang, Difan Liu et al.

CVPR 2024poster

Visual Objectification in Films: Towards a New AI Task for Video Interpretation

Julie Tores, Lucile Sassatelli, Hui-Yin Wu et al.

CVPR 2024highlightarXiv:2401.13296
5
citations

Visual Point Cloud Forecasting enables Scalable Autonomous Driving

Zetong Yang, Li Chen, Yanan Sun et al.

CVPR 2024highlightarXiv:2312.17655

Visual Program Distillation: Distilling Tools and Programmatic Reasoning into Vision-Language Models

Yushi Hu, Otilia Stretcu, Chun-Ta Lu et al.

CVPR 2024posterarXiv:2312.03052

Visual Programming for Zero-shot Open-Vocabulary 3D Visual Grounding

Zhihao Yuan, Jinke Ren, Chun-Mei Feng et al.

CVPR 2024posterarXiv:2311.15383

Visual Prompting for Generalized Few-shot Segmentation: A Multi-scale Approach

Mir Rayat Imtiaz Hossain, Mennatullah Siam, Leonid Sigal et al.

CVPR 2024posterarXiv:2404.11732
21
citations

Visual Prompting via Partial Optimal Transport

MENGYU ZHENG, Zhiwei Hao, Yehui Tang et al.

ECCV 2024poster

Visual Redundancy Removal for Composite Images: A Benchmark Dataset and a Multi-Visual-Effects Driven Incremental Method

Miaohui Wang, Rong Zhang, Lirong Huang et al.

AAAI 2024paper

Visual Relationship Transformation

Xiaoyu Xu, Jiayan Qiu, Baosheng Yu et al.

ECCV 2024poster

Visual Representation Learning with Stochastic Frame Prediction

Huiwon Jang, Dongyoung Kim, Junsu Kim et al.

ICML 2024oralarXiv:2406.07398

Visual-Text Cross Alignment: Refining the Similarity Score in Vision-Language Models

Jinhao Li, Haopeng Li, Sarah Erfani et al.

ICML 2024posterarXiv:2406.02915

Visual Text Generation in the Wild

Yuanzhi Zhu, Jiawei Liu, Feiyu Gao et al.

ECCV 2024posterarXiv:2407.14138

Visual Transformer with Differentiable Channel Selection: An Information Bottleneck Inspired Approach

Yancheng Wang, Ping Li, Yingzhen Yang

ICML 2024poster

VITA: ‘Carefully Chosen and Weighted Less’ Is Better in Medication Recommendation

AAAI 2024paperarXiv:2312.12100

ViTamin: Designing Scalable Vision Models in the Vision-Language Era

Jieneng Chen, Qihang Yu, Xiaohui Shen et al.

CVPR 2024posterarXiv:2404.02132

VITATECS: A Diagnostic Dataset for Temporal Concept Understanding of Video-Language Models

Shicheng Li, Lei Li, Yi Liu et al.

ECCV 2024posterarXiv:2311.17404
49
citations

ViT-Calibrator: Decision Stream Calibration for Vision Transformer

Lin Chen, Zhijie Jia, Lechao Cheng et al.

AAAI 2024paperarXiv:2304.04354
3
citations

ViT-CoMer: Vision Transformer with Convolutional Multi-scale Feature Interaction for Dense Predictions

Chunlong Xia, Xinliang Wang, Feng Lv et al.

CVPR 2024highlightarXiv:2403.07392
131
citations

ViTEraser: Harnessing the Power of Vision Transformers for Scene Text Removal with SegMIM Pretraining

Dezhi Peng, Chongyu Liu, Yuliang Liu et al.

AAAI 2024paperarXiv:2306.12106
18
citations

ViT-Lens: Towards Omni-modal Representations

Stan Weixian Lei, Yixiao Ge, Kun Yi et al.

CVPR 2024posterarXiv:2311.16081

ViTree: Single-Path Neural Tree for Step-Wise Interpretable Fine-Grained Visual Categorization

Danning Lao, Qi Liu, Jiazi Bu et al.

AAAI 2024paperarXiv:2401.17050

ViVid-1-to-3: Novel View Synthesis with Video Diffusion Models

Jeong-gi Kwak, Erqun Dong, Yuhe Jin et al.

CVPR 2024highlightarXiv:2312.01305

VividDreamer: Invariant Score Distillation for Hyper-Realistic Text-to-3D Generation

Wenjie Zhuo, Fan Ma, Hehe Fan et al.

ECCV 2024posterarXiv:2407.09822
16
citations

VIXEN: Visual Text Comparison Network for Image Difference Captioning

Alexander Black, Jing Shi, Yifei Fan et al.

AAAI 2024paperarXiv:2402.19119
9
citations

VkD: Improving Knowledge Distillation using Orthogonal Projections

Roy Miles, Ismail Elezi, Jiankang Deng

CVPR 2024poster
24
citations

VLAD-BuFF: Burst-aware Fast Feature Aggregation for Visual Place Recognition

Ahmad Khaliq, Ming Xu, Stephen Hausler et al.

ECCV 2024posterarXiv:2409.19293
6
citations

VLCounter: Text-Aware Visual Representation for Zero-Shot Object Counting

Seunggu Kang, WonJun Moon, Euiyeon Kim et al.

AAAI 2024paperarXiv:2312.16580
54
citations

VLM2Scene: Self-Supervised Image-Text-LiDAR Learning with Foundation Models for Autonomous Driving Scene Understanding

Guibiao Liao, Jiankun Li, Xiaoqing Ye

AAAI 2024paper
36
citations

VLN-Video: Utilizing Driving Videos for Outdoor Vision-and-Language Navigation

Jialu Li, Aishwarya Padmakumar, Gaurav Sukhatme et al.

AAAI 2024paperarXiv:2402.03561
10
citations

Vlogger: Make Your Dream A Vlog

Shaobin Zhuang, Kunchang Li, Xinyuan Chen et al.

CVPR 2024posterarXiv:2401.09414

VLP: Vision Language Planning for Autonomous Driving

Chenbin Pan, Burhan Yaman, Tommaso Nesti et al.

CVPR 2024posterarXiv:2401.05577
127
citations

VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models

Hyeonho Jeong, Geon Yeong Park, Jong Chul Ye

CVPR 2024posterarXiv:2312.00845