Xuan Shen

10

Papers

30

Total Citations

Papers (10)

Numerical Pruning for Efficient Autoregressive Models

Sparse Learning for State Space Models on Mobile

LazyDiT: Lazy Learning for the Acceleration of Diffusion Transformers

Agile-Quant: Activation-Guided Quantization for Faster Inference of LLMs on the Edge

NPAS: A Compiler-Aware Framework of Unified Network Pruning and Architecture Search for Beyond Real-Time Mobile Acceleration

DeepMAD: Mathematical Architecture Design for Deep Convolutional Neural Network

QuartDepth: Post-Training Quantization for Real-Time Depth Estimation on the Edge

Toward Adaptive Large Language Models Structured Pruning via Hybrid-grained Weight Importance Assessment

SPViT: Enabling Faster Vision Transformers via Latency-Aware Soft Token Pruning

Sanity Checks for Lottery Tickets: Does Your Winning Ticket Really Win the Jackpot?