Paper "vision-language models" Papers

38 papers found

A Wander Through the Multimodal Landscape: Efficient Transfer Learning via Low-rank Sequence Multimodal Adapter

Zirun Guo, Xize Cheng, Yangyang Wu et al.

AAAI 2025paperarXiv:2412.08979
3
citations

CoMT: A Novel Benchmark for Chain of Multi-modal Thought on Large Vision-Language Models

Zihui Cheng, Qiguang Chen, Jin Zhang et al.

AAAI 2025paperarXiv:2412.12932
30
citations

Cross-Lingual Text-Rich Visual Comprehension: An Information Theory Perspective

Xinmiao Yu, Xiaocheng Feng, Yun Li et al.

AAAI 2025paperarXiv:2412.17787

CVLUE: A New Benchmark Dataset for Chinese Vision-Language Understanding Evaluation

Yuxuan Wang, Yijun Liu, Fei Yu et al.

AAAI 2025paperarXiv:2407.01081
7
citations

Detecting Visual Information Manipulation Attacks in Augmented Reality: A Multimodal Semantic Reasoning Approach

Yanming Xiu, Maria Gorlatova

ISMAR 2025paperarXiv:2507.20356
7
citations

Extract Free Dense Misalignment from CLIP

JeongYeon Nam, Jinbae Im, Wonjae Kim et al.

AAAI 2025paperarXiv:2412.18404
2
citations

FigStep: Jailbreaking Large Vision-Language Models via Typographic Visual Prompts

Yichen Gong, Delong Ran, Jinyuan Liu et al.

AAAI 2025paperarXiv:2311.05608
302
citations

Hierarchical Cross-Modal Alignment for Open-Vocabulary 3D Object Detection

Youjun Zhao, Jiaying Lin, Rynson W. H. Lau

AAAI 2025paperarXiv:2503.07593
1
citations

Interpreting the linear structure of vision-language model embedding spaces

Isabel Papadimitriou, Huangyuan Su, Thomas Fel et al.

COLM 2025paperarXiv:2504.11695
14
citations

Is Your Image a Good Storyteller?

Xiujie Song, Xiaoyi Pang, Haifeng Tang et al.

AAAI 2025paperarXiv:2501.01982
2
citations

Language Prompt for Autonomous Driving

Dongming Wu, Wencheng Han, Yingfei Liu et al.

AAAI 2025paperarXiv:2309.04379
138
citations

Learning to Prompt with Text Only Supervision for Vision-Language Models

Muhammad Uzair Khattak, Muhammad Ferjad Naeem, Muzammal Naseer et al.

AAAI 2025paperarXiv:2401.02418
42
citations

LOMA: Language-assisted Semantic Occupancy Network via Triplane Mamba

Yubo Cui, Zhiheng Li, Jiaqiang Wang et al.

AAAI 2025paperarXiv:2412.08388
9
citations

LongPerceptualThoughts: Distilling System-2 Reasoning for System-1 Perception

Yuan-Hong Liao, Sven Elflein, Liu He et al.

COLM 2025paperarXiv:2504.15362
9
citations

Low-Light Image Enhancement via Generative Perceptual Priors

Han Zhou, Wei Dong, Xiaohong Liu et al.

AAAI 2025paperarXiv:2412.20916
15
citations

Mind the Uncertainty in Human Disagreement: Evaluating Discrepancies Between Model Predictions and Human Responses in VQA

Jian Lan, Diego Frassinelli, Barbara Plank

AAAI 2025paperarXiv:2410.02773
3
citations

MVREC: A General Few-shot Defect Classification Model Using Multi-View Region-Context

Shuai Lyu, Rongchen Zhang, Zeqi Ma et al.

AAAI 2025paperarXiv:2412.16897
9
citations

Position-Aware Guided Point Cloud Completion with CLIP Model

Feng Zhou, Qi Zhang, Ju Dai et al.

AAAI 2025paperarXiv:2412.08271
2
citations

Progressive Multi-granular Alignments for Grounded Reasoning in Large Vision-Language Models

Quang-Hung Le, Long Hoang Dang, Ngan Hoang Le et al.

AAAI 2025paperarXiv:2412.08125
3
citations

Queryable Prototype Multiple Instance Learning with Vision-Language Models for Incremental Whole Slide Image Classification

Jiaxiang Gou, Luping Ji, Pei Liu et al.

AAAI 2025paperarXiv:2410.10573
9
citations

SKI Models: Skeleton Induced Vision-Language Embeddings for Understanding Activities of Daily Living

Arkaprava Sinha, Dominick Reilly, Francois Bremond et al.

AAAI 2025paperarXiv:2502.03459
2
citations

Standing on the Shoulders of Giants: Reprogramming Visual-Language Model for General Deepfake Detection

Kaiqing Lin, Yuzhen Lin, Weixiang Li et al.

AAAI 2025paperarXiv:2409.02664
19
citations

V2Xum-LLM: Cross-Modal Video Summarization with Temporal Prompt Instruction Tuning

Hang Hua, Yunlong Tang, Chenliang Xu et al.

AAAI 2025paperarXiv:2404.12353
50
citations

Vision-aware Multimodal Prompt Tuning for Uploadable Multi-source Few-shot Domain Adaptation

Kuanghong Liu, Jin Wang, Kangjian He et al.

AAAI 2025paperarXiv:2503.06106
2
citations

VisOnlyQA: Large Vision Language Models Still Struggle with Visual Perception of Geometric Information

Ryo Kamoi, Yusen Zhang, Sarkar Snigdha Sarathi Das et al.

COLM 2025paperarXiv:2412.00947
22
citations

AnomalyGPT: Detecting Industrial Anomalies Using Large Vision-Language Models

Zhaopeng Gu, Bingke Zhu, Guibo Zhu et al.

AAAI 2024paperarXiv:2308.15366
252
citations

CLIM: Contrastive Language-Image Mosaic for Region Representation

Size Wu, Wenwei Zhang, Lumin XU et al.

AAAI 2024paperarXiv:2312.11376
25
citations

COMMA: Co-articulated Multi-Modal Learning

Authors: Lianyu Hu, Liqing Gao, Zekang Liu et al.

AAAI 2024paperarXiv:2401.00268
7
citations

Compound Text-Guided Prompt Tuning via Image-Adaptive Cues

Hao Tan, Jun Li, Yizhuang Zhou et al.

AAAI 2024paperarXiv:2312.06401
14
citations

Delving into Multimodal Prompting for Fine-Grained Visual Classification

Xin Jiang, Hao Tang, Junyao Gao et al.

AAAI 2024paperarXiv:2309.08912
59
citations

Domain-Controlled Prompt Learning

Qinglong Cao, Zhengqin Xu, Yuntian Chen et al.

AAAI 2024paperarXiv:2310.07730
32
citations

Learning Hierarchical Prompt with Structured Linguistic Knowledge for Vision-Language Models

Yubin Wang, Xinyang Jiang, De Cheng et al.

AAAI 2024paperarXiv:2312.06323
42
citations

Multi-Prompts Learning with Cross-Modal Alignment for Attribute-Based Person Re-identification

Yajing Zhai, Yawen Zeng, Zhiyong Huang et al.

AAAI 2024paperarXiv:2312.16797
33
citations

p-Laplacian Adaptation for Generative Pre-trained Vision-Language Models

Haoyuan Wu, Xinyun Zhang, Peng Xu et al.

AAAI 2024paperarXiv:2312.10613

Semantic-Aware Data Augmentation for Text-to-Image Synthesis

Zhaorui Tan, Xi Yang, Kaizhu Huang

AAAI 2024paperarXiv:2312.07951
4
citations

Simple Image-Level Classification Improves Open-Vocabulary Object Detection

Ruohuan Fang, Guansong Pang, Xiao Bai

AAAI 2024paperarXiv:2312.10439
23
citations

VadCLIP: Adapting Vision-Language Models for Weakly Supervised Video Anomaly Detection

Peng Wu, Xuerong Zhou, Guansong Pang et al.

AAAI 2024paperarXiv:2308.11681
163
citations

Weak Distribution Detectors Lead to Stronger Generalizability of Vision-Language Prompt Tuning

Kun Ding, Haojian Zhang, Qiang Yu et al.

AAAI 2024paperarXiv:2404.00603
7
citations