Aviral Kumar
6
Papers
117
Total Citations
Papers (6)
Scaling Test-Time Compute Without Verification or RL is Suboptimal
ICML 2025
68
citations
Inference-Aware Fine-Tuning for Best-of-N Sampling in Large Language Models
ICLR 2025
43
citations
Bigger, Regularized, Categorical: High-Capacity Value Functions are Efficient Multi-Task Learners
NeurIPS 2025
6
citations
Stop Regressing: Training Value Functions via Classification for Scalable Deep RL
ICML 2024
0
citations
ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL
ICML 2024
0
citations
Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data
ICML 2024
0
citations