Yixiao Ge
43
Papers
317
Total Citations
Papers (43)
SmartEdit: Exploring Complex Instruction-based Image Editing with Multimodal Large Language Models
CVPR 2024
139
citations
ST-LLM: Large Language Models Are Effective Temporal Learners
ECCV 2024
124
citations
Scalable Image Tokenization with Index Backpropagation Quantization
ICCV 2025
16
citations
Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities
CVPR 2024
11
citations
GenHancer: Imperfect Generative Models are Secretly Strong Vision-Centric Enhancers
ICCV 2025
9
citations
Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation
CVPR 2025
7
citations
HaploVL: A Single-Transformer Baseline for Multi-Modal Understanding
ICML 2025
6
citations
Cached Transformers: Improving Transformers with Differentiable Memory Cached
AAAI 2024arXiv
5
citations
Low-Rank Approximation for Sparse Attention in Multi-Modal LLMs
CVPR 2024
0
citations
UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio Video Point Cloud Time-Series and Image Recognition
CVPR 2024
0
citations
ViT-Lens: Towards Omni-modal Representations
CVPR 2024
0
citations
Mutual CRF-GNN for Few-Shot Learning
CVPR 2021
0
citations
Refining Pseudo Labels With Clustering Consensus Over Generations for Unsupervised Object Re-Identification
CVPR 2021arXiv
0
citations
DivCo: Diverse Conditional Image Synthesis via Contrastive Generative Adversarial Network
CVPR 2021arXiv
0
citations
Bridging Video-Text Retrieval With Multiple Choice Questions
CVPR 2022arXiv
0
citations
Object-Aware Video-Language Pre-Training for Retrieval
CVPR 2022arXiv
0
citations
Accelerating Vision-Language Pretraining With Free Language Modeling
CVPR 2023arXiv
0
citations
All in One: Exploring Unified Video-Language Pre-Training
CVPR 2023arXiv
0
citations
Learning Transferable Spatiotemporal Representations From Natural Script Knowledge
CVPR 2023arXiv
0
citations
RILS: Masked Visual Reconstruction in Language Semantic Space
CVPR 2023arXiv
0
citations
Progressive Correspondence Pruning by Consensus Learning
ICCV 2021arXiv
0
citations
Online Pseudo Label Generation by Hierarchical Cluster Dynamics for Adaptive Person Re-Identification
ICCV 2021
0
citations
Unleashing Vanilla Vision Transformer with Masked Image Modeling for Object Detection
ICCV 2023arXiv
0
citations
Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation
ICCV 2023
0
citations
VoCo-LLaMA: Towards Vision Compression with Large Language Models
CVPR 2025
0
citations
Exploring Model Transferability through the Lens of Potential Energy
ICCV 2023arXiv
0
citations
Self-supervising Fine-grained Region Similarities for Large-scale Image Localization
ECCV 2020
0
citations
Mc-BEiT: Multi-Choice Discretization for Image BERT Pre-training
ECCV 2022
0
citations
Not All Models Are Equal: Predicting Model Transferability in a Self-Challenging Fisher Space
ECCV 2022
0
citations
MILES: Visual BERT Pre-training with Injected Language Semantics for Video-Text Retrieval
ECCV 2022
0
citations
BoxSnake: Polygonal Instance Segmentation with Box Supervision
ICCV 2023arXiv
0
citations
ATP-LLaVA: Adaptive Token Pruning for Large Vision Language Models
CVPR 2025
0
citations
Moto: Latent Motion Token as the Bridging Language for Learning Robot Manipulation from Videos
ICCV 2025
0
citations
AnimeGamer: Infinite Anime Life Simulation with Next Game State Prediction
ICCV 2025
0
citations
BT-Adapter: Video Conversation is Feasible Without Video Instruction Tuning
CVPR 2024
0
citations
YOLO-World: Real-Time Open-Vocabulary Object Detection
CVPR 2024
0
citations
Rethinking the Objectives of Vector-Quantized Tokenizers for Image Synthesis
CVPR 2024
0
citations
SEED-Bench: Benchmarking Multimodal Large Language Models
CVPR 2024
0
citations
FD-GAN: Pose-guided Feature Distilling GAN for Robust Person Re-identification
NeurIPS 2018
0
citations
Self-paced Contrastive Learning with Hybrid Memory for Domain Adaptive Object Re-ID
NeurIPS 2020
0
citations
Mix-of-Show: Decentralized Low-Rank Adaptation for Multi-Concept Customization of Diffusion Models
NeurIPS 2023
0
citations
Meta-Adapter: An Online Few-shot Learner for Vision-Language Model
NeurIPS 2023
0
citations
GPT4Tools: Teaching Large Language Model to Use Tools via Self-instruction
NeurIPS 2023
0
citations