Yong Jae Lee

10

Papers

180

Total Citations

Papers (10)

ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts

X-Fusion: Introducing New Modality to Frozen Large Language Models

Building a Mind Palace: Structuring Environment-Grounded Semantic Graphs for Effective Long Video Analysis with LLMs

Removing Distributional Discrepancies in Captions Improves Image-Text Alignment

Edit One for All: Interactive Batch Image Editing

Improved Baselines with Visual Instruction Tuning

CuRe: Cultural Gaps in the Long Tail of Text-to-Image Systems

LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models

Customizing Domain Adapters for Domain Generalization

Yo’Chameleon: Personalized Vision and Language Generation