Licheng Yu

26
Papers
77
Total Citations

Papers (26)

VideoSwap: Customized Video Subject Swapping with Interactive Semantic Point Correspondence

CVPR 2024
63
citations

Building a Mind Palace: Structuring Environment-Grounded Semantic Graphs for Effective Long Video Analysis with LLMs

CVPR 2025arXiv
7
citations

ROICtrl: Boosting Instance Control for Visual Generation

CVPR 2025
7
citations

FlowVid: Taming Imperfect Optical Flows for Consistent Video-to-Video Synthesis

CVPR 2024
0
citations

Layout-Agnostic Scene Text Image Synthesis with Diffusion Models

CVPR 2024
0
citations

Fairy: Fast Parallelized Instruction-Guided Video-to-Video Synthesis

CVPR 2024
0
citations

A Joint Speaker-Listener-Reinforcer Model for Referring Expressions

CVPR 2017arXiv
0
citations

MAttNet: Modular Attention Network for Referring Expression Comprehension

CVPR 2018arXiv
0
citations

Multi-Target Embodied Question Answering

CVPR 2019
0
citations

BachGAN: High-Resolution Image Synthesis From Salient Object Layout

CVPR 2020arXiv
0
citations

Violin: A Large-Scale Dataset for Video-and-Language Inference

CVPR 2020arXiv
0
citations

Connecting What To Say With Where To Look by Modeling Human Attention Traces

CVPR 2021arXiv
0
citations

Unsupervised Vision-and-Language Pre-Training via Retrieval-Based Multi-Granular Alignment

CVPR 2022arXiv
0
citations

Learning Procedure-Aware Video Representation From Instructional Videos and Their Narrations

CVPR 2023arXiv
0
citations

Tell Me What Happened: Unifying Text-Guided Video Completion via Multimodal Masked Video Generation

CVPR 2023arXiv
0
citations

FAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion Tasks

CVPR 2023
0
citations

Visual Madlibs: Fill in the Blank Description Generation and Question Answering

ICCV 2015
0
citations

CiT: Curation in Training for Effective Vision-Language Data

ICCV 2023arXiv
0
citations

Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models

ECCV 2020
0
citations

TVR: A Large-Scale Dataset for Video-Subtitle Moment Retrieval

ECCV 2020
0
citations

UNITER: UNiversal Image-TExt Representation Learning

ECCV 2020
0
citations

FashionViL: Fashion-Focused Vision-and-Language Representation Learning

ECCV 2022
0
citations

Accelerating Multimodal Large Language Models by Searching Optimal Vision Token Reduction

CVPR 2025
0
citations

"GEB+: A Benchmark for Generic Event Boundary Captioning, Grounding and Retrieval"

ECCV 2022
0
citations

Apollo: An Exploration of Video Understanding in Large Multimodal Models

CVPR 2025
0
citations

AVID: Any-Length Video Inpainting with Diffusion Model

CVPR 2024
0
citations