All Papers

34,180 papers found • Page 678 of 684

Visual Layout Composer: Image-Vector Dual Diffusion Model for Design Layout Generation

Mohammad Amin Shabani, Zhaowen Wang, Difan Liu et al.

CVPR 2024poster

Visual Objectification in Films: Towards a New AI Task for Video Interpretation

Julie Tores, Lucile Sassatelli, Hui-Yin Wu et al.

CVPR 2024highlight
5
citations

Visual Point Cloud Forecasting enables Scalable Autonomous Driving

Zetong Yang, Li Chen, Yanan Sun et al.

CVPR 2024highlight

Visual Program Distillation: Distilling Tools and Programmatic Reasoning into Vision-Language Models

Yushi Hu, Otilia Stretcu, Chun-Ta Lu et al.

CVPR 2024poster

Visual Programming for Zero-shot Open-Vocabulary 3D Visual Grounding

Zhihao Yuan, Jinke Ren, Chun-Mei Feng et al.

CVPR 2024poster

Visual Prompting for Generalized Few-shot Segmentation: A Multi-scale Approach

Mir Rayat Imtiaz Hossain, Mennatullah Siam, Leonid Sigal et al.

CVPR 2024posterarXiv:2404.11732
21
citations

Visual Prompting via Partial Optimal Transport

MENGYU ZHENG, Zhiwei Hao, Yehui Tang et al.

ECCV 2024poster

Visual Redundancy Removal for Composite Images: A Benchmark Dataset and a Multi-Visual-Effects Driven Incremental Method

Miaohui Wang, Rong Zhang, Lirong Huang et al.

AAAI 2024paper

Visual Relationship Transformation

Xiaoyu Xu, Jiayan Qiu, Baosheng Yu et al.

ECCV 2024poster

Visual Representation Learning with Stochastic Frame Prediction

Huiwon Jang, Dongyoung Kim, Junsu Kim et al.

ICML 2024oral

Visual-Text Cross Alignment: Refining the Similarity Score in Vision-Language Models

Jinhao Li, Haopeng Li, Sarah Erfani et al.

ICML 2024poster

Visual Text Generation in the Wild

Yuanzhi Zhu, Jiawei Liu, Feiyu Gao et al.

ECCV 2024poster

Visual Transformer with Differentiable Channel Selection: An Information Bottleneck Inspired Approach

Yancheng Wang, Ping Li, Yingzhen Yang

ICML 2024poster

VITA: ‘Carefully Chosen and Weighted Less’ Is Better in Medication Recommendation

AAAI 2024paperarXiv:2312.12100

ViTamin: Designing Scalable Vision Models in the Vision-Language Era

Jieneng Chen, Qihang Yu, Xiaohui Shen et al.

CVPR 2024poster

VITATECS: A Diagnostic Dataset for Temporal Concept Understanding of Video-Language Models

Shicheng Li, Lei Li, Yi Liu et al.

ECCV 2024posterarXiv:2311.17404
49
citations

ViT-Calibrator: Decision Stream Calibration for Vision Transformer

Lin Chen, Zhijie Jia, Lechao Cheng et al.

AAAI 2024paperarXiv:2304.04354
3
citations

ViT-CoMer: Vision Transformer with Convolutional Multi-scale Feature Interaction for Dense Predictions

Chunlong Xia, Xinliang Wang, Feng Lv et al.

CVPR 2024highlightarXiv:2403.07392
131
citations

ViTEraser: Harnessing the Power of Vision Transformers for Scene Text Removal with SegMIM Pretraining

Dezhi Peng, Chongyu Liu, Yuliang Liu et al.

AAAI 2024paperarXiv:2306.12106
18
citations

ViT-Lens: Towards Omni-modal Representations

Stan Weixian Lei, Yixiao Ge, Kun Yi et al.

CVPR 2024poster

ViTree: Single-Path Neural Tree for Step-Wise Interpretable Fine-Grained Visual Categorization

Danning Lao, Qi Liu, Jiazi Bu et al.

AAAI 2024paper

ViVid-1-to-3: Novel View Synthesis with Video Diffusion Models

Jeong-gi Kwak, Erqun Dong, Yuhe Jin et al.

CVPR 2024highlight

VividDreamer: Invariant Score Distillation for Hyper-Realistic Text-to-3D Generation

Wenjie Zhuo, Fan Ma, Hehe Fan et al.

ECCV 2024poster
16
citations

VIXEN: Visual Text Comparison Network for Image Difference Captioning

Alexander Black, Jing Shi, Yifei Fan et al.

AAAI 2024paperarXiv:2402.19119
9
citations

VkD: Improving Knowledge Distillation using Orthogonal Projections

Roy Miles, Ismail Elezi, Jiankang Deng

CVPR 2024poster
24
citations

VLAD-BuFF: Burst-aware Fast Feature Aggregation for Visual Place Recognition

Ahmad Khaliq, Ming Xu, Stephen Hausler et al.

ECCV 2024posterarXiv:2409.19293
6
citations

VLCounter: Text-Aware Visual Representation for Zero-Shot Object Counting

Seunggu Kang, WonJun Moon, Euiyeon Kim et al.

AAAI 2024paperarXiv:2312.16580
54
citations

VLM2Scene: Self-Supervised Image-Text-LiDAR Learning with Foundation Models for Autonomous Driving Scene Understanding

Guibiao Liao, Jiankun Li, Xiaoqing Ye

AAAI 2024paper

VLN-Video: Utilizing Driving Videos for Outdoor Vision-and-Language Navigation

Jialu Li, Aishwarya Padmakumar, Gaurav Sukhatme et al.

AAAI 2024paperarXiv:2402.03561
10
citations

Vlogger: Make Your Dream A Vlog

Shaobin Zhuang, Kunchang Li, Xinyuan Chen et al.

CVPR 2024poster

VLP: Vision Language Planning for Autonomous Driving

Chenbin Pan, Burhan Yaman, Tommaso Nesti et al.

CVPR 2024poster
127
citations

VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models

Hyeonho Jeong, Geon Yeong Park, Jong Chul Ye

CVPR 2024poster

VMINer: Versatile Multi-view Inverse Rendering with Near- and Far-field Light Sources

Fan Fei, Jiajun Tang, Ping Tan et al.

CVPR 2024highlight

VMT-Adapter: Parameter-Efficient Transfer Learning for Multi-Task Dense Scene Understanding

Yi Xin, Junlong Du, Qiang Wang et al.

AAAI 2024paperarXiv:2312.08733
82
citations

VNN: Verification-Friendly Neural Networks with Hard Robustness Guarantees

Anahita Baninajjar, Ahmed Rezine, Amir Aminifar

ICML 2024poster

Vocabulary for Universal Approximation: A Linguistic Perspective of Mapping Compositions

Yongqiang Cai

ICML 2024spotlight

VoCo: A Simple-yet-Effective Volume Contrastive Learning Framework for 3D Medical Image Analysis

Linshan Wu, Jia-Xin Zhuang, Hao Chen

CVPR 2024posterarXiv:2402.17300
70
citations

Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis

Hubert Siuzdak

ICLR 2024poster

Volumetric Environment Representation for Vision-Language Navigation

Liu, Wenguan Wang, Yi Yang

CVPR 2024highlight

Volumetric Rendering with Baked Quadrature Fields

Gopal Sharma, Daniel Rebain, Kwang Moo Yi et al.

ECCV 2024posterarXiv:2312.02202
10
citations

VONet: Unsupervised Video Object Learning With Parallel U-Net Attention and Object-wise Sequential VAE

Haonan Yu, Wei Xu

ICLR 2024oral

VOODOO 3D: Volumetric Portrait Disentanglement For One-Shot 3D Head Reenactment

Phong Tran, Egor Zakharov, Long Nhat Ho et al.

CVPR 2024poster
29
citations

VoroNav: Voronoi-based Zero-shot Object Navigation with Large Language Model

Pengying Wu, Yao Mu, Bingxian Wu et al.

ICML 2024poster

Voxel or Pillar: Exploring Efficient Point Cloud Representation for 3D Object Detection

Yuhao Huang, Sanping Zhou, Junjie Zhang et al.

AAAI 2024paper

VP3D: Unleashing 2D Visual Prompt for Text-to-3D Generation

Yang Chen, Yingwei Pan, haibo yang et al.

CVPR 2024poster
30
citations

VPDETR: End-to-End Vanishing Point DEtection TRansformers

Taiyan Chen, Xianghua Ying, Jinfa Yang et al.

AAAI 2024paper
4
citations

VP-SAM: Taming Segment Anything Model for Video Polyp Segmentation via Disentanglement and Spatio-temporal Side Network

Zhixue Fang, Yuzhi Liu, Huisi Wu et al.

ECCV 2024poster
2
citations

VQA-Diff: Exploiting VQA and Diffusion for Zero-Shot Image-to-3D Vehicle Asset Generation in Autonomous Driving

Yibo Liu, Zheyuan Yang, Guile Wu et al.

ECCV 2024posterarXiv:2407.06516

VQAttack: Transferable Adversarial Attacks on Visual Question Answering via Pre-trained Models

Ziyi Yin, Muchao Ye, Tianrong Zhang et al.

AAAI 2024paper

VQCNIR: Clearer Night Image Restoration with Vector-Quantized Codebook

Wenbin Zou, Hongxia Gao, Tian Ye et al.

AAAI 2024paper