Yi Zhu
29
Papers
66
Total Citations
Papers (29)
EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
CVPR 2025
44
citations
rStar-Coder: Scaling Competitive Code Reasoning with a Large-Scale Verified Dataset
NeurIPS 2025arXiv
22
citations
Weakly Supervised Instance Segmentation Using Class Peak Response
CVPR 2018arXiv
0
citations
Towards Universal Representation for Unseen Action Recognition
CVPR 2018arXiv
0
citations
Learning Instance Activation Maps for Weakly Supervised Instance Segmentation
CVPR 2019
0
citations
Improving Semantic Segmentation via Video Propagation and Label Relaxation
CVPR 2019
0
citations
Vision-Language Navigation With Self-Supervised Auxiliary Reasoning Tasks
CVPR 2020arXiv
0
citations
Vision-Dialog Navigation by Exploring Cross-Modal Memory
CVPR 2020arXiv
0
citations
Domain Consensus Clustering for Universal Domain Adaptation
CVPR 2021
0
citations
SOON: Scenario Oriented Object Navigation With Graph-Based Exploration
CVPR 2021arXiv
0
citations
Learning Canonical F-Correlation Projection for Compact Multiview Representation
CVPR 2022
0
citations
ADAPT: Vision-Language Navigation With Modality-Aligned Action Prompts
CVPR 2022arXiv
0
citations
Soft Proposal Networks for Weakly Supervised Object Localization
ICCV 2017arXiv
0
citations
CrossCLR: Cross-Modal Contrastive Learning for Multi-Modal Video Representations
ICCV 2021arXiv
0
citations
VidTr: Video Transformer Without Convolutions
ICCV 2021arXiv
0
citations
Self-Motivated Communication Agent for Real-World Vision-Dialog Navigation
ICCV 2021
0
citations
CrossNorm and SelfNorm for Generalization Under Distribution Shifts
ICCV 2021arXiv
0
citations
MixReorg: Cross-Modal Mixed Patch Reorganization is a Good Mask Learner for Open-World Semantic Segmentation
ICCV 2023arXiv
0
citations
Motion-Guided Masking for Spatiotemporal Representation Learning
ICCV 2023arXiv
0
citations
Towards Geospatial Foundation Models via Continual Pretraining
ICCV 2023arXiv
0
citations
Motion-Excited Sampler: Video Adversarial Attack with Sparked Prior
ECCV 2020
0
citations
Selective Sparse Sampling for Fine-Grained Image Recognition
ICCV 2019
0
citations
CAP-Net: A Unified Network for 6D Pose and Size Estimation of Categorical Articulated Parts from a Single RGB-D Image
CVPR 2025
0
citations
Blending Anti-Aliasing into Vision Transformer
NeurIPS 2021
0
citations
Progressive Coordinate Transforms for Monocular 3D Object Detection
NeurIPS 2021
0
citations
CoupAlign: Coupling Word-Pixel with Sentence-Mask Alignments for Referring Image Segmentation
NeurIPS 2022
0
citations
Earthformer: Exploring Space-Time Transformers for Earth System Forecasting
NeurIPS 2022
0
citations
Prompt Pre-Training with Twenty-Thousand Classes for Open-Vocabulary Visual Recognition
NeurIPS 2023
0
citations
PreDiff: Precipitation Nowcasting with Latent Diffusion Models
NeurIPS 2023
0
citations