Aviral Kumar

22
Papers
117
Total Citations

Papers (22)

Scaling Test-Time Compute Without Verification or RL is Suboptimal

ICML 2025
68
citations

Inference-Aware Fine-Tuning for Best-of-N Sampling in Large Language Models

ICLR 2025
43
citations

Bigger, Regularized, Categorical: High-Capacity Value Functions are Efficient Multi-Task Learners

NeurIPS 2025
6
citations

Stop Regressing: Training Value Functions via Classification for Scalable Deep RL

ICML 2024
0
citations

ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL

ICML 2024
0
citations

Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data

ICML 2024
0
citations

DisCor: Corrective Feedback in Reinforcement Learning via Distribution Correction

NeurIPS 2020
0
citations

Conservative Data Sharing for Multi-Task Offline Reinforcement Learning

NeurIPS 2021
0
citations

Why Generalization in RL is Difficult: Epistemic POMDPs and Implicit Partial Observability

NeurIPS 2021
0
citations

COMBO: Conservative Offline Model-Based Policy Optimization

NeurIPS 2021
0
citations

Data-Driven Offline Decision-Making via Invariant Representation Learning

NeurIPS 2022
0
citations

DASCO: Dual-Generator Adversarial Support Constrained Offline Reinforcement Learning

NeurIPS 2022
0
citations

Beyond Uniform Sampling: Offline Reinforcement Learning with Imbalanced Datasets

NeurIPS 2023
0
citations

ReDS: Offline RL With Heteroskedastic Datasets via Support Constraints

NeurIPS 2023
0
citations

Cal-QL: Calibrated Offline RL Pre-Training for Efficient Online Fine-Tuning

NeurIPS 2023
0
citations

Trainable Calibration Measures for Neural Networks from Kernel Mean Embeddings

ICML 2018
0
citations

Diagnosing Bottlenecks in Deep Q-learning Algorithms

ICML 2019
0
citations

Graph Normalizing Flows

NeurIPS 2019
0
citations

Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction

NeurIPS 2019
0
citations

Conservative Q-Learning for Offline Reinforcement Learning

NeurIPS 2020
0
citations

Model Inversion Networks for Model-Based Optimization

NeurIPS 2020
0
citations

One Solution is Not All You Need: Few-Shot Extrapolation via Structured MaxEnt RL

NeurIPS 2020
0
citations