Yueting Zhuang

37
Papers
135
Total Citations

Papers (37)

HealthGPT: A Medical Large Vision-Language Model for Unifying Comprehension and Generation via Heterogeneous Knowledge Adaptation

ICML 2025
63
citations

VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM

CVPR 2025
40
citations

Let LRMs Break Free from Overthinking via Self-Braking Tuning

NeurIPS 2025
13
citations

Benchmarking Multimodal CoT Reward Model Stepwise by Visual Program

ICCV 2025
10
citations

Janus-Pro-R1: Advancing Collaborative Visual Comprehension and Generation via Reinforcement Learning

NeurIPS 2025
6
citations

Image Regeneration: Evaluating Text-to-Image Model via Generating Identical Image with Multimodal Large Language Models

AAAI 2025
3
citations

HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data

CVPR 2024
0
citations

Auto-Encoding Morph-Tokens for Multimodal LLM

ICML 2024
0
citations

Momentor: Advancing Video Large Language Model with Fine-Grained Temporal Reasoning

ICML 2024
0
citations

Hierarchical Recurrent Neural Encoder for Video Representation With Application to Captioning

CVPR 2016
0
citations

Zero-Shot Recognition Using Dual Visual-Semantic Mapping Paths

CVPR 2017arXiv
0
citations

Self-Supervised Spatiotemporal Learning via Video Clip Order Prediction

CVPR 2019
0
citations

Counterfactual Samples Synthesizing for Robust Visual Question Answering

CVPR 2020arXiv
0
citations

Unsupervised Reinforcement Learning of Transferable Meta-Skills for Embodied Navigation

CVPR 2020arXiv
0
citations

Label Matching Semi-Supervised Object Detection

CVPR 2022
0
citations

Slimmable Domain Adaptation

CVPR 2022
0
citations

Learning To Learn by Jointly Optimizing Neural Architecture and Weights

CVPR 2022
0
citations

Compositional Temporal Grounding With Structured Variational Cross-Graph Correspondence Learning

CVPR 2022arXiv
0
citations

Deeply-Learned Part-Aligned Representations for Person Re-Identification

ICCV 2017arXiv
0
citations

Semi-Supervised Active Learning for Semi-Supervised Models: Exploit Adversarial Examples With Graph-Based Virtual Labels

ICCV 2021
0
citations

Adaptive Hierarchical Graph Reasoning With Semantic Coherence for Video-and-Language Inference

ICCV 2021arXiv
0
citations

Learning in Imperfect Environment: Multi-Label Classification with Long-Tailed Distribution and Partial Labels

ICCV 2023arXiv
0
citations

Gradient-Regulated Meta-Prompt Learning for Generalizable Vision-Language Models

ICCV 2023arXiv
0
citations

Visually-Prompted Language Model for Fine-Grained Scene Graph Generation in an Open World

ICCV 2023arXiv
0
citations

Unsupervised Prompt Tuning for Text-Driven Object Detection

ICCV 2023
0
citations

AnyEdit: Mastering Unified High-Quality Image Editing for Any Idea

CVPR 2025
0
citations

STEP: Enhancing Video-LLMs’ Compositional Reasoning by Spatio-Temporal Graph-guided Self-Training

CVPR 2025
0
citations

Iris: Breaking GUI Complexity with Adaptive Focus and Self-Refining

ICCV 2025
0
citations

2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining

ICCV 2025
0
citations

Mastering Collaborative Multi-modal Data Selection: A Focus on Informativeness, Uniqueness, and Representativeness

ICCV 2025
0
citations

Data Shunt: Collaboration of Small and Large Models for Lower Costs and Better Performance

AAAI 2024
0
citations

MacNet: Transferring Knowledge from Machine Comprehension to Sequence-to-Sequence Models

NeurIPS 2018
0
citations

Pixel-Level Cycle Association: A New Perspective for Domain Adaptive Semantic Segmentation

NeurIPS 2020
0
citations

Learning to Generate Visual Questions with Noisy Supervision

NeurIPS 2021
0
citations

Fine-Grained Semantically Aligned Vision-Language Pre-Training

NeurIPS 2022
0
citations

HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face

NeurIPS 2023
0
citations

Zero-shot Visual Relation Detection via Composite Visual Cues from Large Language Models

NeurIPS 2023
0
citations