Kevin Lin

17
Papers
57
Total Citations

Papers (17)

MM-Narrator: Narrating Long-form Videos with Multimodal In-Context Learning

CVPR 2024arXiv
49
citations

BizGen: Advancing Article-level Visual Text Rendering for Infographics Generation

CVPR 2025arXiv
8
citations

LiVOS: Light Video Object Segmentation with Gated Linear Matching

CVPR 2025
0
citations

Scaling Inference-Time Search with Vision Value Model for Improved Visual Comprehension

ICCV 2025
0
citations

ImageGen-CoT: Enhancing Text-to-Image In-context Learning with Chain-of-Thought Reasoning

ICCV 2025arXiv
0
citations

DisCo: Disentangled Control for Realistic Human Dance Generation

CVPR 2024
0
citations

MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities

ICML 2024arXiv
0
citations

End-to-End Human Pose and Mesh Reconstruction with Transformers

CVPR 2021arXiv
0
citations

Cross-Modal Representation Learning for Zero-Shot Action Recognition

CVPR 2022arXiv
0
citations

SwinBERT: End-to-End Transformers With Sparse Attention for Video Captioning

CVPR 2022arXiv
0
citations

Adaptive Human Matting for Dynamic Videos

CVPR 2023arXiv
0
citations

An Empirical Study of End-to-End Video-Language Transformers With Masked Visual Modeling

CVPR 2023arXiv
0
citations

ReCo: Region-Controlled Text-to-Image Generation

CVPR 2023arXiv
0
citations

LAVENDER: Unifying Video-Language Understanding As Masked Language Modeling

CVPR 2023arXiv
0
citations

Neural Voting Field for Camera-Space 3D Hand Pose Estimation

CVPR 2023arXiv
0
citations

Mesh Graphormer

ICCV 2021arXiv
0
citations

Equivariant Similarity for Vision-Language Foundation Models

ICCV 2023arXiv
0
citations