Nathan Kallus

8

Papers

57

Total Citations

Papers (8)

Provable Offline Preference-Based Reinforcement Learning

$Q\sharp$: Provably Optimal Distributional RL for LLM Post-Training

NeurIPS 2025arXiv

Value-Guided Search for Efficient Chain-of-Thought Reasoning

GST-UNet: A Neural Framework for Spatiotemporal Causal Inference with Time-Varying Confounding

Switching the Loss Reduces the Cost in Batch Reinforcement Learning

Peeking with PEAK: Sequential, Nonparametric Composite Hypothesis Tests for Means of Multiple Data Streams

Inferring the Long-Term Causal Effects of Long-Term Treatments from Short-Term Experiments

More Benefits of Being Distributional: Second-Order Bounds for Reinforcement Learning