Shang Yang
4
Papers
181
Total Citations
Papers (4)
DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
ICLR 2025arXiv
165
citations
Jet-Nemotron: Efficient Language Model with Post Neural Architecture Search
NeurIPS 2025arXiv
15
citations
SparseVILA: Decoupling Visual Sparsity for Efficient VLM Inference
ICCV 2025
1
citations
NVILA: Efficient Frontier Visual Language Models
CVPR 2025
0
citations