ICCV "large language models" Papers
16 papers found
AutoPrompt: Automated Red-Teaming of Text-to-Image Models via LLM-Driven Adversarial Prompts
Yufan Liu, Wanqian Zhang, Huashan Chen et al.
ICCV 2025posterarXiv:2510.24034
1
citations
CAD-Recode: Reverse Engineering CAD Code from Point Clouds
Danila Rukhovich, Elona Dupont, Dimitrios Mallis et al.
ICCV 2025posterarXiv:2412.14042
18
citations
DWIM: Towards Tool-aware Visual Reasoning via Discrepancy-aware Workflow Generation & Instruct-Masking Tuning
Fucai Ke, Vijay Kumar b g, Xingjian Leng et al.
ICCV 2025posterarXiv:2503.19263
6
citations
Evidential Knowledge Distillation
Liangyu Xiang, Junyu Gao, Changsheng Xu
ICCV 2025posterarXiv:2507.18366
LLM-enhanced Action-aware Multi-modal Prompt Tuning for Image-Text Matching
Meng Tian, Shuo Yang, Xinxiao Wu
ICCV 2025posterarXiv:2506.23502
1
citations
MeshLLM: Empowering Large Language Models to Progressively Understand and Generate 3D Mesh
Shuangkang Fang, I-Chao Shen, Yufeng Wang et al.
ICCV 2025highlightarXiv:2508.01242
4
citations
Multimodal Prompt Alignment for Facial Expression Recognition
Fuyan Ma, Yiran He, Bin Sun et al.
ICCV 2025posterarXiv:2506.21017
2
citations
Passing the Driving Knowledge Test
Maolin Wei, Wanzhou Liu, Eshed Ohn-Bar
ICCV 2025posterarXiv:2508.21824
1
citations
RoboTron-Nav: A Unified Framework for Embodied Navigation Integrating Perception, Planning, and Prediction
Yufeng Zhong, Chengjian Feng, Feng yan et al.
ICCV 2025posterarXiv:2503.18525
3
citations
ScanEdit: Hierarchically-Guided Functional 3D Scan Editing
Mohamed El Amine Boudjoghra, Ivan Laptev, Angela Dai
ICCV 2025posterarXiv:2504.15049
SIMS: Simulating Stylized Human-Scene Interactions with Retrieval-Augmented Script Generation
Wenjia Wang, Liang Pan, Zhiyang Dou et al.
ICCV 2025posterarXiv:2411.19921
4
citations
Towards Higher Effective Rank in Parameter-Efficient Fine-tuning using Khatri-Rao Product
Paul Albert, Frederic Zhang, Hemanth Saratchandran et al.
ICCV 2025posterarXiv:2508.00230
4
citations
VALLR: Visual ASR Language Model for Lip Reading
Marshall Thomas, Edward Fish, Richard Bowden
ICCV 2025posterarXiv:2503.21408
6
citations
ViLLa: Video Reasoning Segmentation with Large Language Model
rongkun Zheng, Lu Qi, Xi Chen et al.
ICCV 2025posterarXiv:2407.14500
16
citations
Where, What, Why: Towards Explainable Driver Attention Prediction
Yuchen Zhou, Jiayu Tang, Xiaoyan Xiao et al.
ICCV 2025highlightarXiv:2506.23088
6
citations
Zero-AVSR: Zero-Shot Audio-Visual Speech Recognition with LLMs by Learning Language-Agnostic Speech Representations
Jeong Hun Yeo, Minsu Kim, Chae Won Kim et al.
ICCV 2025posterarXiv:2503.06273
5
citations