2025 "long-context generation" Papers
5 papers found
APE: Faster and Longer Context-Augmented Generation via Adaptive Parallel Encoding
Xinyu Yang, Tianqi Chen, Beidi Chen
ICLR 2025posterarXiv:2502.05431
16
citations
A Training-Free Sub-quadratic Cost Transformer Model Serving Framework with Hierarchically Pruned Attention
Heejun Lee, Geon Park, Youngwan Lee et al.
ICLR 2025posterarXiv:2406.09827
8
citations
LASeR: Learning to Adaptively Select Reward Models with Multi-Arm Bandits
Duy Nguyen, Archiki Prasad, Elias Stengel-Eskin et al.
NEURIPS 2025posterarXiv:2410.01735
5
citations
PolarQuant: Leveraging Polar Transformation for Key Cache Quantization and Decoding Acceleration
Songhao Wu, Ang Lv, xiao feng et al.
NEURIPS 2025poster
Streaming Attention Approximation via Discrepancy Theory
Ekaterina Kochetkova, Kshiteej Jitesh Sheth, Insu Han et al.
NEURIPS 2025spotlightarXiv:2502.07861
2
citations