Yong Jae Lee
45
Papers
180
Total Citations
Papers (45)
ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts
CVPR 2024
153
citations
X-Fusion: Introducing New Modality to Frozen Large Language Models
ICCV 2025
8
citations
Building a Mind Palace: Structuring Environment-Grounded Semantic Graphs for Effective Long Video Analysis with LLMs
CVPR 2025arXiv
7
citations
Removing Distributional Discrepancies in Captions Improves Image-Text Alignment
ECCV 2024
7
citations
Edit One for All: Interactive Batch Image Editing
CVPR 2024
5
citations
Yo’Chameleon: Personalized Vision and Language Generation
CVPR 2025
0
citations
Track and Segment: An Iterative Unsupervised Approach for Video Object Proposals
CVPR 2016
0
citations
Track and Transfer: Watching Videos to Simulate Strong Human Supervision for Weakly-Supervised Object Detection
CVPR 2016
0
citations
Identifying First-Person Camera Wearers in Third-Person Videos
CVPR 2017arXiv
0
citations
Weakly-Supervised Visual Grounding of Phrases With Linguistic Structures
CVPR 2017arXiv
0
citations
Interspecies Knowledge Transfer for Facial Keypoint Detection
CVPR 2017arXiv
0
citations
Cross-Domain Self-Supervised Multi-Task Feature Learning Using Synthetic Imagery
CVPR 2018arXiv
0
citations
HPLFlowNet: Hierarchical Permutohedral Lattice FlowNet for Scene Flow Estimation on Large-Scale Point Clouds
CVPR 2019
0
citations
FineGAN: Unsupervised Hierarchical Disentanglement for Fine-Grained Object Generation and Discovery
CVPR 2019
0
citations
You Reap What You Sow: Using Videos to Generate High Precision Object Proposals for Weakly-Supervised Object Detection
CVPR 2019
0
citations
MixNMatch: Multifactor Disentanglement and Encoding for Conditional Image Generation
CVPR 2020arXiv
0
citations
Don't Judge an Object by Its Context: Learning to Overcome Contextual Bias
CVPR 2020
0
citations
Instance-Aware, Context-Focused, and Memory-Efficient Weakly Supervised Object Detection
CVPR 2020arXiv
0
citations
Progressive Temporal Feature Alignment Network for Video Inpainting
CVPR 2021arXiv
0
citations
Few-Shot Image Generation via Cross-Domain Correspondence
CVPR 2021arXiv
0
citations
The Two Dimensions of Worst-Case Training and Their Integrated Effect for Out-of-Domain Generalization
CVPR 2022
0
citations
GIRAFFE HD: A High-Resolution 3D-Aware Generative Model
CVPR 2022arXiv
0
citations
Learning Customized Visual Models With Retrieval-Augmented Knowledge
CVPR 2023arXiv
0
citations
GLIGEN: Open-Set Grounded Text-to-Image Generation
CVPR 2023arXiv
0
citations
Generalized Decoding for Pixel, Image, and Language
CVPR 2023arXiv
0
citations
Towards Universal Fake Image Detectors That Generalize Across Generative Models
CVPR 2023arXiv
0
citations
Discovering the Spatial Extent of Relative Attributes
ICCV 2015
0
citations
Hide-And-Seek: Forcing a Network to Be Meticulous for Weakly-Supervised Object and Action Localization
ICCV 2017
0
citations
Identity From Here, Pose From There: Self-Supervised Disentanglement and Generation of Objects Using Unlabeled Videos
ICCV 2019
0
citations
YOLACT: Real-Time Instance Segmentation
ICCV 2019
0
citations
Collaging Class-Specific GANs for Semantic Image Synthesis
ICCV 2021arXiv
0
citations
A Sentence Speaks a Thousand Images: Domain Generalization through Distilling CLIP with Language Guidance
ICCV 2023
0
citations
Masked Discrimination for Self-Supervised Learning on Point Clouds
ECCV 2022
0
citations
Contrastive Learning for Diverse Disentangled Foreground Generation
ECCV 2022
0
citations
FlowWeb: Joint Image Set Alignment by Weaving Consistent, Pixel-Wise Correspondences
CVPR 2015
0
citations
CuRe: Cultural Gaps in the Long Tail of Text-to-Image Systems
ICCV 2025
0
citations
LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models
ICCV 2025
0
citations
Customizing Domain Adapters for Domain Generalization
ICCV 2025
0
citations
Improved Baselines with Visual Instruction Tuning
CVPR 2024
0
citations
Elastic-InfoGAN: Unsupervised Disentangled Representation Learning in Class-Imbalanced Data
NeurIPS 2020
0
citations
ELEVATER: A Benchmark and Toolkit for Evaluating Language-Augmented Visual Models
NeurIPS 2022
0
citations
Visual Instruction Inversion: Image Editing via Image Prompting
NeurIPS 2023
0
citations
What Knowledge Gets Distilled in Knowledge Distillation?
NeurIPS 2023
0
citations
Segment Everything Everywhere All at Once
NeurIPS 2023
0
citations
Visual Instruction Tuning
NeurIPS 2023
0
citations