Yixiao Ge
19
Papers
317
Total Citations
Papers (19)
SmartEdit: Exploring Complex Instruction-based Image Editing with Multimodal Large Language Models
CVPR 2024
139
citations
ST-LLM: Large Language Models Are Effective Temporal Learners
ECCV 2024
124
citations
Scalable Image Tokenization with Index Backpropagation Quantization
ICCV 2025
16
citations
Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities
CVPR 2024
11
citations
GenHancer: Imperfect Generative Models are Secretly Strong Vision-Centric Enhancers
ICCV 2025
9
citations
Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation
CVPR 2025
7
citations
HaploVL: A Single-Transformer Baseline for Multi-Modal Understanding
ICML 2025
6
citations
Cached Transformers: Improving Transformers with Differentiable Memory Cached
AAAI 2024arXiv
5
citations
Moto: Latent Motion Token as the Bridging Language for Learning Robot Manipulation from Videos
ICCV 2025
0
citations
BT-Adapter: Video Conversation is Feasible Without Video Instruction Tuning
CVPR 2024
0
citations
YOLO-World: Real-Time Open-Vocabulary Object Detection
CVPR 2024
0
citations
ATP-LLaVA: Adaptive Token Pruning for Large Vision Language Models
CVPR 2025
0
citations
Rethinking the Objectives of Vector-Quantized Tokenizers for Image Synthesis
CVPR 2024
0
citations
SEED-Bench: Benchmarking Multimodal Large Language Models
CVPR 2024
0
citations
Low-Rank Approximation for Sparse Attention in Multi-Modal LLMs
CVPR 2024
0
citations
UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio Video Point Cloud Time-Series and Image Recognition
CVPR 2024
0
citations
ViT-Lens: Towards Omni-modal Representations
CVPR 2024
0
citations
AnimeGamer: Infinite Anime Life Simulation with Next Game State Prediction
ICCV 2025
0
citations
VoCo-LLaMA: Towards Vision Compression with Large Language Models
CVPR 2025
0
citations