Jiaqi Wang

27
Papers
599
Total Citations

Papers (27)

OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation

CVPR 2024
365
citations

GPT4Point: A Unified Framework for Point-Language Understanding and Generation

CVPR 2024
62
citations

OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?

CVPR 2025
37
citations

Adversarial Prompt Tuning for Vision-Language Models

ECCV 2024
33
citations

Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction

CVPR 2025arXiv
31
citations

SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation

ICML 2025
21
citations

MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models

ICLR 2025arXiv
19
citations

IDArb: Intrinsic Decomposition for Arbitrary Number of Input Views and Illuminations

ICLR 2025
15
citations

HiFlow: Training-free High-Resolution Image Generation with Flow-Aligned Guidance

NeurIPS 2025
6
citations

Utilize the Flow Before Stepping into the Same River Twice: Certainty Represented Knowledge Flow for Refusal-Aware Instruction Tuning

AAAI 2025
6
citations

PIGDreamer: Privileged Information Guided World Models for Safe Partially Observable Reinforcement Learning

ICML 2025
2
citations

Bootstrap3D: Improving Multi-view Diffusion Model with Synthetic Data

ICCV 2025
2
citations

SS-GEN: A Social Story Generation Framework with Large Language Models

AAAI 2025
0
citations

Enhancing Evolving Domain Generalization through Dynamic Latent Representations

AAAI 2024
0
citations

VQAttack: Transferable Adversarial Attacks on Visual Question Answering via Pre-trained Models

AAAI 2024
0
citations

Deciphering Cross-Modal Alignment in Large Vision-Language Models via Modality Integration Rate

ICCV 2025
0
citations

OneLLM: One Framework to Align All Modalities with Language

CVPR 2024
0
citations

Alpha-CLIP: A CLIP Model Focusing on Wherever You Want

CVPR 2024
0
citations

ByTheWay: Boost Your Text-to-Video Generation Model to Higher Quality in a Training-free Way

CVPR 2025
0
citations

Conical Visual Concentration for Efficient Large Vision-Language Models

CVPR 2025
0
citations

CrossGET: Cross-Guided Ensemble of Tokens for Accelerating Vision-Language Transformers

ICML 2024
0
citations

Light-A-Video: Training-free Video Relighting via Progressive Light Fusion

ICCV 2025
0
citations

Bridging Model Heterogeneity in Federated Learning via Uncertainty-based Asymmetrical Reciprocity Learning

ICML 2024
0
citations

SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree

ICCV 2025
0
citations

X-Prompt: Generalizable Auto-Regressive Visual Learning with In-Context Prompting

ICCV 2025
0
citations

Visual-RFT: Visual Reinforcement Fine-Tuning

ICCV 2025
0
citations

MM-IFEngine: Towards Multimodal Instruction Following

ICCV 2025
0
citations