Yixiao Ge

19
Papers
317
Total Citations

Papers (19)

SmartEdit: Exploring Complex Instruction-based Image Editing with Multimodal Large Language Models

CVPR 2024
139
citations

ST-LLM: Large Language Models Are Effective Temporal Learners

ECCV 2024
124
citations

Scalable Image Tokenization with Index Backpropagation Quantization

ICCV 2025
16
citations

Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities

CVPR 2024
11
citations

GenHancer: Imperfect Generative Models are Secretly Strong Vision-Centric Enhancers

ICCV 2025
9
citations

Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation

CVPR 2025
7
citations

HaploVL: A Single-Transformer Baseline for Multi-Modal Understanding

ICML 2025
6
citations

Cached Transformers: Improving Transformers with Differentiable Memory Cached

AAAI 2024arXiv
5
citations

Moto: Latent Motion Token as the Bridging Language for Learning Robot Manipulation from Videos

ICCV 2025
0
citations

BT-Adapter: Video Conversation is Feasible Without Video Instruction Tuning

CVPR 2024
0
citations

YOLO-World: Real-Time Open-Vocabulary Object Detection

CVPR 2024
0
citations

ATP-LLaVA: Adaptive Token Pruning for Large Vision Language Models

CVPR 2025
0
citations

Rethinking the Objectives of Vector-Quantized Tokenizers for Image Synthesis

CVPR 2024
0
citations

SEED-Bench: Benchmarking Multimodal Large Language Models

CVPR 2024
0
citations

Low-Rank Approximation for Sparse Attention in Multi-Modal LLMs

CVPR 2024
0
citations

UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio Video Point Cloud Time-Series and Image Recognition

CVPR 2024
0
citations

ViT-Lens: Towards Omni-modal Representations

CVPR 2024
0
citations

AnimeGamer: Infinite Anime Life Simulation with Next Game State Prediction

ICCV 2025
0
citations

VoCo-LLaMA: Towards Vision Compression with Large Language Models

CVPR 2025
0
citations