GenDataAgent: On-the-fly Dataset Augmentation with Synthetic Data

4citations
Project
4
Citations
6
Authors
1
Data Points

Abstract

We propose a generative agent that augments training datasets with synthetic data for model fine-tuning. Unlike prior work, which uniformly samples synthetic data, our agent iteratively generates relevant samples on-the-fly, aligning with the target distribution. It prioritizes synthetic data that complements difficult training samples, focusing on those with high variance in gradient updates. Experiments across several image classification tasks demonstrate the effectiveness of our approach.

Citation History

Jan 25, 2026
4