Yong Jae Lee

45
Papers
180
Total Citations

Papers (45)

ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts

CVPR 2024
153
citations

X-Fusion: Introducing New Modality to Frozen Large Language Models

ICCV 2025
8
citations

Building a Mind Palace: Structuring Environment-Grounded Semantic Graphs for Effective Long Video Analysis with LLMs

CVPR 2025arXiv
7
citations

Removing Distributional Discrepancies in Captions Improves Image-Text Alignment

ECCV 2024
7
citations

Edit One for All: Interactive Batch Image Editing

CVPR 2024
5
citations

Yo’Chameleon: Personalized Vision and Language Generation

CVPR 2025
0
citations

Track and Segment: An Iterative Unsupervised Approach for Video Object Proposals

CVPR 2016
0
citations

Track and Transfer: Watching Videos to Simulate Strong Human Supervision for Weakly-Supervised Object Detection

CVPR 2016
0
citations

Identifying First-Person Camera Wearers in Third-Person Videos

CVPR 2017arXiv
0
citations

Weakly-Supervised Visual Grounding of Phrases With Linguistic Structures

CVPR 2017arXiv
0
citations

Interspecies Knowledge Transfer for Facial Keypoint Detection

CVPR 2017arXiv
0
citations

Cross-Domain Self-Supervised Multi-Task Feature Learning Using Synthetic Imagery

CVPR 2018arXiv
0
citations

HPLFlowNet: Hierarchical Permutohedral Lattice FlowNet for Scene Flow Estimation on Large-Scale Point Clouds

CVPR 2019
0
citations

FineGAN: Unsupervised Hierarchical Disentanglement for Fine-Grained Object Generation and Discovery

CVPR 2019
0
citations

You Reap What You Sow: Using Videos to Generate High Precision Object Proposals for Weakly-Supervised Object Detection

CVPR 2019
0
citations

MixNMatch: Multifactor Disentanglement and Encoding for Conditional Image Generation

CVPR 2020arXiv
0
citations

Don't Judge an Object by Its Context: Learning to Overcome Contextual Bias

CVPR 2020
0
citations

Instance-Aware, Context-Focused, and Memory-Efficient Weakly Supervised Object Detection

CVPR 2020arXiv
0
citations

Progressive Temporal Feature Alignment Network for Video Inpainting

CVPR 2021arXiv
0
citations

Few-Shot Image Generation via Cross-Domain Correspondence

CVPR 2021arXiv
0
citations

The Two Dimensions of Worst-Case Training and Their Integrated Effect for Out-of-Domain Generalization

CVPR 2022
0
citations

GIRAFFE HD: A High-Resolution 3D-Aware Generative Model

CVPR 2022arXiv
0
citations

Learning Customized Visual Models With Retrieval-Augmented Knowledge

CVPR 2023arXiv
0
citations

GLIGEN: Open-Set Grounded Text-to-Image Generation

CVPR 2023arXiv
0
citations

Generalized Decoding for Pixel, Image, and Language

CVPR 2023arXiv
0
citations

Towards Universal Fake Image Detectors That Generalize Across Generative Models

CVPR 2023arXiv
0
citations

Discovering the Spatial Extent of Relative Attributes

ICCV 2015
0
citations

Hide-And-Seek: Forcing a Network to Be Meticulous for Weakly-Supervised Object and Action Localization

ICCV 2017
0
citations

Identity From Here, Pose From There: Self-Supervised Disentanglement and Generation of Objects Using Unlabeled Videos

ICCV 2019
0
citations

YOLACT: Real-Time Instance Segmentation

ICCV 2019
0
citations

Collaging Class-Specific GANs for Semantic Image Synthesis

ICCV 2021arXiv
0
citations

A Sentence Speaks a Thousand Images: Domain Generalization through Distilling CLIP with Language Guidance

ICCV 2023
0
citations

Masked Discrimination for Self-Supervised Learning on Point Clouds

ECCV 2022
0
citations

Contrastive Learning for Diverse Disentangled Foreground Generation

ECCV 2022
0
citations

FlowWeb: Joint Image Set Alignment by Weaving Consistent, Pixel-Wise Correspondences

CVPR 2015
0
citations

CuRe: Cultural Gaps in the Long Tail of Text-to-Image Systems

ICCV 2025
0
citations

LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models

ICCV 2025
0
citations

Customizing Domain Adapters for Domain Generalization

ICCV 2025
0
citations

Improved Baselines with Visual Instruction Tuning

CVPR 2024
0
citations

Elastic-InfoGAN: Unsupervised Disentangled Representation Learning in Class-Imbalanced Data

NeurIPS 2020
0
citations

ELEVATER: A Benchmark and Toolkit for Evaluating Language-Augmented Visual Models

NeurIPS 2022
0
citations

Visual Instruction Inversion: Image Editing via Image Prompting

NeurIPS 2023
0
citations

What Knowledge Gets Distilled in Knowledge Distillation?

NeurIPS 2023
0
citations

Segment Everything Everywhere All at Once

NeurIPS 2023
0
citations

Visual Instruction Tuning

NeurIPS 2023
0
citations