Kevin Lin
17
Papers
57
Total Citations
Papers (17)
MM-Narrator: Narrating Long-form Videos with Multimodal In-Context Learning
CVPR 2024arXiv
49
citations
BizGen: Advancing Article-level Visual Text Rendering for Infographics Generation
CVPR 2025arXiv
8
citations
LiVOS: Light Video Object Segmentation with Gated Linear Matching
CVPR 2025
0
citations
Scaling Inference-Time Search with Vision Value Model for Improved Visual Comprehension
ICCV 2025
0
citations
ImageGen-CoT: Enhancing Text-to-Image In-context Learning with Chain-of-Thought Reasoning
ICCV 2025arXiv
0
citations
DisCo: Disentangled Control for Realistic Human Dance Generation
CVPR 2024
0
citations
MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities
ICML 2024arXiv
0
citations
End-to-End Human Pose and Mesh Reconstruction with Transformers
CVPR 2021arXiv
0
citations
Cross-Modal Representation Learning for Zero-Shot Action Recognition
CVPR 2022arXiv
0
citations
SwinBERT: End-to-End Transformers With Sparse Attention for Video Captioning
CVPR 2022arXiv
0
citations
Adaptive Human Matting for Dynamic Videos
CVPR 2023arXiv
0
citations
An Empirical Study of End-to-End Video-Language Transformers With Masked Visual Modeling
CVPR 2023arXiv
0
citations
ReCo: Region-Controlled Text-to-Image Generation
CVPR 2023arXiv
0
citations
LAVENDER: Unifying Video-Language Understanding As Masked Language Modeling
CVPR 2023arXiv
0
citations
Neural Voting Field for Camera-Space 3D Hand Pose Estimation
CVPR 2023arXiv
0
citations
Mesh Graphormer
ICCV 2021arXiv
0
citations
Equivariant Similarity for Vision-Language Foundation Models
ICCV 2023arXiv
0
citations