ICCV 2025 "large language models" Papers

16 papers found

AutoPrompt: Automated Red-Teaming of Text-to-Image Models via LLM-Driven Adversarial Prompts

Yufan Liu, Wanqian Zhang, Huashan Chen et al.

ICCV 2025posterarXiv:2510.24034
1
citations

CAD-Recode: Reverse Engineering CAD Code from Point Clouds

Danila Rukhovich, Elona Dupont, Dimitrios Mallis et al.

ICCV 2025posterarXiv:2412.14042
18
citations

DWIM: Towards Tool-aware Visual Reasoning via Discrepancy-aware Workflow Generation & Instruct-Masking Tuning

Fucai Ke, Vijay Kumar b g, Xingjian Leng et al.

ICCV 2025posterarXiv:2503.19263
6
citations

Evidential Knowledge Distillation

Liangyu Xiang, Junyu Gao, Changsheng Xu

ICCV 2025posterarXiv:2507.18366

LLM-enhanced Action-aware Multi-modal Prompt Tuning for Image-Text Matching

Meng Tian, Shuo Yang, Xinxiao Wu

ICCV 2025posterarXiv:2506.23502
1
citations

MeshLLM: Empowering Large Language Models to Progressively Understand and Generate 3D Mesh

Shuangkang Fang, I-Chao Shen, Yufeng Wang et al.

ICCV 2025highlightarXiv:2508.01242
4
citations

Multimodal Prompt Alignment for Facial Expression Recognition

Fuyan Ma, Yiran He, Bin Sun et al.

ICCV 2025posterarXiv:2506.21017
2
citations

Passing the Driving Knowledge Test

Maolin Wei, Wanzhou Liu, Eshed Ohn-Bar

ICCV 2025posterarXiv:2508.21824
1
citations

RoboTron-Nav: A Unified Framework for Embodied Navigation Integrating Perception, Planning, and Prediction

Yufeng Zhong, Chengjian Feng, Feng yan et al.

ICCV 2025posterarXiv:2503.18525
3
citations

ScanEdit: Hierarchically-Guided Functional 3D Scan Editing

Mohamed El Amine Boudjoghra, Ivan Laptev, Angela Dai

ICCV 2025posterarXiv:2504.15049

SIMS: Simulating Stylized Human-Scene Interactions with Retrieval-Augmented Script Generation

Wenjia Wang, Liang Pan, Zhiyang Dou et al.

ICCV 2025posterarXiv:2411.19921
4
citations

Towards Higher Effective Rank in Parameter-Efficient Fine-tuning using Khatri-Rao Product

Paul Albert, Frederic Zhang, Hemanth Saratchandran et al.

ICCV 2025posterarXiv:2508.00230
4
citations

VALLR: Visual ASR Language Model for Lip Reading

Marshall Thomas, Edward Fish, Richard Bowden

ICCV 2025posterarXiv:2503.21408
6
citations

ViLLa: Video Reasoning Segmentation with Large Language Model

rongkun Zheng, Lu Qi, Xi Chen et al.

ICCV 2025posterarXiv:2407.14500
16
citations

Where, What, Why: Towards Explainable Driver Attention Prediction

Yuchen Zhou, Jiayu Tang, Xiaoyan Xiao et al.

ICCV 2025highlightarXiv:2506.23088
6
citations

Zero-AVSR: Zero-Shot Audio-Visual Speech Recognition with LLMs by Learning Language-Agnostic Speech Representations

Jeong Hun Yeo, Minsu Kim, Chae Won Kim et al.

ICCV 2025posterarXiv:2503.06273
5
citations