Aviral Kumar
22
Papers
117
Total Citations
Papers (22)
Scaling Test-Time Compute Without Verification or RL is Suboptimal
ICML 2025
68
citations
Inference-Aware Fine-Tuning for Best-of-N Sampling in Large Language Models
ICLR 2025
43
citations
Bigger, Regularized, Categorical: High-Capacity Value Functions are Efficient Multi-Task Learners
NeurIPS 2025
6
citations
Stop Regressing: Training Value Functions via Classification for Scalable Deep RL
ICML 2024
0
citations
ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL
ICML 2024
0
citations
Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data
ICML 2024
0
citations
DisCor: Corrective Feedback in Reinforcement Learning via Distribution Correction
NeurIPS 2020
0
citations
Conservative Data Sharing for Multi-Task Offline Reinforcement Learning
NeurIPS 2021
0
citations
Why Generalization in RL is Difficult: Epistemic POMDPs and Implicit Partial Observability
NeurIPS 2021
0
citations
COMBO: Conservative Offline Model-Based Policy Optimization
NeurIPS 2021
0
citations
Data-Driven Offline Decision-Making via Invariant Representation Learning
NeurIPS 2022
0
citations
DASCO: Dual-Generator Adversarial Support Constrained Offline Reinforcement Learning
NeurIPS 2022
0
citations
Beyond Uniform Sampling: Offline Reinforcement Learning with Imbalanced Datasets
NeurIPS 2023
0
citations
ReDS: Offline RL With Heteroskedastic Datasets via Support Constraints
NeurIPS 2023
0
citations
Cal-QL: Calibrated Offline RL Pre-Training for Efficient Online Fine-Tuning
NeurIPS 2023
0
citations
Trainable Calibration Measures for Neural Networks from Kernel Mean Embeddings
ICML 2018
0
citations
Diagnosing Bottlenecks in Deep Q-learning Algorithms
ICML 2019
0
citations
Graph Normalizing Flows
NeurIPS 2019
0
citations
Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction
NeurIPS 2019
0
citations
Conservative Q-Learning for Offline Reinforcement Learning
NeurIPS 2020
0
citations
Model Inversion Networks for Model-Based Optimization
NeurIPS 2020
0
citations
One Solution is Not All You Need: Few-Shot Extrapolation via Structured MaxEnt RL
NeurIPS 2020
0
citations