Xia Hu

8

Papers

21

Total Citations

Papers (8)

TopV: Compatible Token Pruning with Inference Time Optimization for Fast and Low-Memory Multimodal Vision Language Model

Flexible Group Count Enables Hassle-Free Structured Pruning

KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache

TVE: Learning Meta-attribution for Transferable Vision Explainer

Soft Prompt Recovers Compressed LLMs, Transferably

GNNs Also Deserve Editing, and They Need It More Than Once

LLM Maybe LongLM: SelfExtend LLM Context Window Without Tuning

Omni-SMoLA: Boosting Generalist Multimodal Models with Soft Mixture of Low-rank Experts