Sifei Liu
15
Papers
74
Total Citations
Papers (15)
Describe Anything: Detailed Localized Image and Video Captioning
ICCV 2025
49
citations
BlobGEN-Vid: Compositional Text-to-Video Generation with Blob Video Representations
CVPR 2025
11
citations
Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks
CVPR 2025arXiv
9
citations
Parallel Sequence Modeling via Generalized Spatial Propagation Network
CVPR 2025arXiv
3
citations
3D-SPATIAL MULTIMODAL MEMORY
ICLR 2025
2
citations
NVILA: Efficient Frontier Visual Language Models
CVPR 2025
0
citations
Token-Efficient VLM: High-Resolution Image Understanding via Dynamic Region Proposal
ICCV 2025
0
citations
Scaling Vision Pre-Training to 4K Resolution
CVPR 2025
0
citations
COLMAP-Free 3D Gaussian Splatting
CVPR 2024
0
citations
RegionGPT: Towards Region Understanding Vision Language Model
CVPR 2024
0
citations
A Unified Approach for Text- and Image-guided 4D Scene Generation
CVPR 2024
0
citations
HOIDiffusion: Generating Realistic 3D Hand-Object Interaction Data
CVPR 2024
0
citations
Communication-Efficient Collaborative Perception via Information Filling with Codebook
CVPR 2024
0
citations
RGBD Objects in the Wild: Scaling Real-World 3D Object Learning from RGB-D Videos
CVPR 2024
0
citations
Compositional Text-to-Image Generation with Dense Blob Representations
ICML 2024
0
citations