Yueting Zhuang
37
Papers
135
Total Citations
Papers (37)
HealthGPT: A Medical Large Vision-Language Model for Unifying Comprehension and Generation via Heterogeneous Knowledge Adaptation
ICML 2025
63
citations
VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM
CVPR 2025
40
citations
Let LRMs Break Free from Overthinking via Self-Braking Tuning
NeurIPS 2025
13
citations
Benchmarking Multimodal CoT Reward Model Stepwise by Visual Program
ICCV 2025
10
citations
Janus-Pro-R1: Advancing Collaborative Visual Comprehension and Generation via Reinforcement Learning
NeurIPS 2025
6
citations
Image Regeneration: Evaluating Text-to-Image Model via Generating Identical Image with Multimodal Large Language Models
AAAI 2025
3
citations
HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data
CVPR 2024
0
citations
Auto-Encoding Morph-Tokens for Multimodal LLM
ICML 2024
0
citations
Momentor: Advancing Video Large Language Model with Fine-Grained Temporal Reasoning
ICML 2024
0
citations
Hierarchical Recurrent Neural Encoder for Video Representation With Application to Captioning
CVPR 2016
0
citations
Zero-Shot Recognition Using Dual Visual-Semantic Mapping Paths
CVPR 2017arXiv
0
citations
Self-Supervised Spatiotemporal Learning via Video Clip Order Prediction
CVPR 2019
0
citations
Counterfactual Samples Synthesizing for Robust Visual Question Answering
CVPR 2020arXiv
0
citations
Unsupervised Reinforcement Learning of Transferable Meta-Skills for Embodied Navigation
CVPR 2020arXiv
0
citations
Label Matching Semi-Supervised Object Detection
CVPR 2022
0
citations
Slimmable Domain Adaptation
CVPR 2022
0
citations
Learning To Learn by Jointly Optimizing Neural Architecture and Weights
CVPR 2022
0
citations
Compositional Temporal Grounding With Structured Variational Cross-Graph Correspondence Learning
CVPR 2022arXiv
0
citations
Deeply-Learned Part-Aligned Representations for Person Re-Identification
ICCV 2017arXiv
0
citations
Semi-Supervised Active Learning for Semi-Supervised Models: Exploit Adversarial Examples With Graph-Based Virtual Labels
ICCV 2021
0
citations
Adaptive Hierarchical Graph Reasoning With Semantic Coherence for Video-and-Language Inference
ICCV 2021arXiv
0
citations
Learning in Imperfect Environment: Multi-Label Classification with Long-Tailed Distribution and Partial Labels
ICCV 2023arXiv
0
citations
Gradient-Regulated Meta-Prompt Learning for Generalizable Vision-Language Models
ICCV 2023arXiv
0
citations
Visually-Prompted Language Model for Fine-Grained Scene Graph Generation in an Open World
ICCV 2023arXiv
0
citations
Unsupervised Prompt Tuning for Text-Driven Object Detection
ICCV 2023
0
citations
AnyEdit: Mastering Unified High-Quality Image Editing for Any Idea
CVPR 2025
0
citations
STEP: Enhancing Video-LLMs’ Compositional Reasoning by Spatio-Temporal Graph-guided Self-Training
CVPR 2025
0
citations
Iris: Breaking GUI Complexity with Adaptive Focus and Self-Refining
ICCV 2025
0
citations
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
ICCV 2025
0
citations
Mastering Collaborative Multi-modal Data Selection: A Focus on Informativeness, Uniqueness, and Representativeness
ICCV 2025
0
citations
Data Shunt: Collaboration of Small and Large Models for Lower Costs and Better Performance
AAAI 2024
0
citations
MacNet: Transferring Knowledge from Machine Comprehension to Sequence-to-Sequence Models
NeurIPS 2018
0
citations
Pixel-Level Cycle Association: A New Perspective for Domain Adaptive Semantic Segmentation
NeurIPS 2020
0
citations
Learning to Generate Visual Questions with Noisy Supervision
NeurIPS 2021
0
citations
Fine-Grained Semantically Aligned Vision-Language Pre-Training
NeurIPS 2022
0
citations
HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face
NeurIPS 2023
0
citations
Zero-shot Visual Relation Detection via Composite Visual Cues from Large Language Models
NeurIPS 2023
0
citations