🧬Data

Data Augmentation

Augmenting training data

100 papers1,211 total citations

Compare with other topics

Feb '24 — Jan '26324 papers

Top Conferences

ICLR: 28 AAAI: 26 CVPR: 17 ECCV: 16 NeurIPS: 6 ICML: 5

Also includes: data augmentation, augmentation, synthetic data, data generation

Top Papers

#1

Data Filtering Networks

Alex Fang, Albin Madappally Jose, Amit Jain et al.

DiffuseMix: Label-Preserving Data Augmentation with Diffusion Models

Khawar Islam, Muhammad Zaigham Zaheer, Arif Mahmood et al.

Real-Fake: Effective Training Data Synthesis Through Distribution Matching

Jianhao Yuan, Jie Zhang, Shuyang Sun et al.

Data Shapley in One Training Run

Jiachen (Tianhao) Wang, Prateek Mittal, Dawn Song et al.

ICLR 2025arXiv:2406.11011

data attributiondata shapleyfoundation model pretraininggenerative ai copyright+3

44

citations

#5

Gradient Reweighting: Towards Imbalanced Class-Incremental Learning

Jiangpeng He

Synthetic continued pretraining

Zitong Yang, Neil Band, Shuangping Li et al.

Do Generated Data Always Help Contrastive Learning?

Yifei Wang, Jizhe Zhang, Yisen Wang

Fake It till You Make It: Curricular Dynamic Forgery Augmentations towards General Deepfake Detection

Yuzhen Lin, Wentang Song, Bin Li et al.

Harnessing Large Language Models for Knowledge Graph Question Answering via Adaptive Multi-Aspect Retrieval-Augmentation

Derong Xu, Xinhang Li, Ziheng Zhang et al.

Spanning Training Progress: Temporal Dual-Depth Scoring (TDDS) for Enhanced Dataset Pruning

xin zhang, Jiawei Du, Weiying Xie et al.

Dataset Distillation by Automatic Training Trajectories

Dai Liu, Jindong Gu, Hu Cao et al.

Unmasking and Improving Data Credibility: A Study with Datasets for Training Harmless Language Models

Zhaowei Zhu, Jialu Wang, Hao Cheng et al.

Semantic-Guided Generative Image Augmentation Method with Diffusion Models for Image Classification

Bohan Li, Xiao Xu, Xinghao Wang et al.

AAAI 2024arXiv:2302.02070

image augmentationdiffusion modelssemantic consistencyimage classification+2

24

citations

#14

BooW-VTON: Boosting In-the-Wild Virtual Try-On via Mask-Free Pseudo Data Training

Xuanpu Zhang, Dan Song, pengxin zhan et al.

Few-Shot Anomaly-Driven Generation for Anomaly Classification and Segmentation

Guan Gui, Bin-Bin Gao, Jun Liu et al.

Towards a General Time Series Anomaly Detector with Adaptive Bottlenecks and Dual Adversarial Decoders

Qichao Shentu, Beibu Li, Kai Zhao et al.

ICLR 2025arXiv:2405.15273

time series anomaly detectionadaptive bottlenecksadversarial decodersmulti-domain pre-training+2

21

citations

#17

Dataset Enhancement with Instance-Level Augmentations

Orest Kupyn, Christian Rupprecht

Frozen Feature Augmentation for Few-Shot Image Classification

Andreas Bär, Neil Houlsby, Mostafa Dehghani et al.

A Comprehensive Augmentation Framework for Anomaly Detection

Lin Jiang, Yaping Yan

AAAI 2024arXiv:2308.15068

anomaly detectiondata augmentationreconstruction-based approachsimulated anomalies+4

16

citations

#20

Prismatic Synthesis: Gradient-based Data Diversification Boosts Generalization in LLM Reasoning

Jaehun Jung, Seungju Han, Ximing Lu et al.

NeurIPS 2025arXiv:2505.20161

gradient-based diversificationdata diversity metricsout-of-distribution generalizationsynthetic data generation+3

15

citations

#21

AugUndo: Scaling Up Augmentations for Monocular Depth Completion and Estimation

Yangchao Wu, Tian Yu Liu, Hyoungseob Park et al.

Kill Two Birds with One Stone: Rethinking Data Augmentation for Deep Long-tailed Learning

Binwu Wang, Pengkun Wang, Wei Xu et al.

A Simple Background Augmentation Method for Object Detection with Diffusion Model

YUHANG LI, Xin Dong, Chen Chen et al.

BEHAVIOR Vision Suite: Customizable Dataset Generation via Simulation

Yunhao Ge, Yihe Tang, Jiashu Xu et al.

KITS: Inductive Spatio-Temporal Kriging with Increment Training Strategy

Qianxiong Xu, Cheng Long, Ziyue Li et al.

Rethinking Data Augmentation for Robust LiDAR Semantic Segmentation in Adverse Weather

Junsung Park, Kyungmin Kim, Hyunjung Shim

Accelerating Neural Field Training via Soft Mining

Shakiba Kheradmand, Daniel Rebain, Gopal Sharma et al.

Adaptive Self-training Framework for Fine-grained Scene Graph Generation

Kibum Kim, Kanghoon Yoon, Yeonjun In et al.

How to Synthesize Text Data without Model Collapse?

Xuekai Zhu, Daixuan Cheng, Hengli Li et al.

TabDPT: Scaling Tabular Foundation Models on Real Data

Junwei Ma, Valentin Thomas, Rasa Hosseinzadeh et al.

Adapter Merging with Centroid Prototype Mapping for Scalable Class-Incremental Learning

Takuma Fukuda, Hiroshi Kera, Kazuhiko Kawamoto

11293 Cross-Class Feature Augmentation for Class Incremental Learning

Taehoon Kim, JaeYoo Park, Bohyung Han

MAD-TD: Model-Augmented Data stabilizes High Update Ratio RL

Claas Voelcker, Marcel Hussing, ERIC EATON et al.

Effective Training Data Synthesis for Improving MLLM Chart Understanding

Yuwei Yang, Zeyu Zhang, Yunzhong Hou et al.

EventRPG: Event Data Augmentation with Relevance Propagation Guidance

Mingyuan Sun, Donghao Zhang, Zongyuan Ge et al.

On the Limitations of Temperature Scaling for Distributions with Overlaps

Muthu Chidambaram, Rong Ge

DataEnvGym: Data Generation Agents in Teacher Environments with Student Feedback

Zaid Khan, Elias Stengel-Eskin, Jaemin Cho et al.

ICLR 2025arXiv:2410.06215

data generation agentsteacher environmentssequential decision-makingstudent feedback mechanisms+4

8

citations

#38

Data Taggants: Dataset Ownership Verification Via Harmless Targeted Data Poisoning

Wassim Bouaziz, Nicolas Usunier, El-Mahdi El-Mhamdi

Toward Tiny and High-quality Facial Makeup with Data Amplify Learning

Qiaoqiao Jin, Xuanhong Chen, Meiguang Jin et al.

Factor Augmented Tensor-on-Tensor Neural Networks

Guanhao Zhou, Yuefeng Han, Xiufan Yu

Data Augmentation via Latent Diffusion for Saliency Prediction

Bahar Aydemir, Deblina Bhattacharjee, Tong Zhang et al.

CAPrompt: Cyclic Prompt Aggregation for Pre-Trained Model Based Class Incremental Learning

Qiwei Li, Jiahuan Zhou

Reshaping the Online Data Buffering and Organizing Mechanism for Continual Test-Time Adaptation

Zhilin Zhu, Xiaopeng Hong, Zhiheng Ma et al.

Improving Domain Generalization in Self-Supervised Monocular Depth Estimation via Stabilized Adversarial Training

Yuanqi Yao, Gang Wu, Kui Jiang et al.

ECCV 2024arXiv:2411.02149

monocular depth estimationdomain generalizationadversarial trainingself-supervised learning+3

7

citations

#45

Error Norm Truncation: Robust Training in the Presence of Data Noise for Text Generation Models

Tianjian Li, Haoran Xu, Philipp Koehn et al.

Understanding and Mitigating Memorization in Diffusion Models for Tabular Data

Zhengyu Fang, Zhimeng Jiang, Huiyuan Chen et al.

PhysAug: A Physical-guided and Frequency-based Data Augmentation for Single-Domain Generalized Object Detection

Xiaoran Xu, Jiangang Yang, Wenhui Shi et al.

Maximizing the Potential of Synthetic Data: Insights from Random Matrix Theory

Aymane El Firdoussi, Mohamed El Amine Seddik, Soufiane Hayou et al.

Inversion Circle Interpolation: Diffusion-based Image Augmentation for Data-scarce Classification

Yanghao Wang, Long Chen

Prompt Augmentation for Self-supervised Text-guided Image Manipulation

Rumeysa Bodur, Binod Bhattarai, Tae-Kyun Kim

Through the Dual-Prism: A Spectral Perspective on Graph Data Augmentation for Graph Classifications

Yutong Xia, Runpeng Yu, Yuxuan Liang et al.

Montessori-Instruct: Generate Influential Training Data Tailored for Student Learning

Xiaochuan Li, Zichun Yu, Chenyan Xiong

BigDocs: An Open Dataset for Training Multimodal Models on Document and Code Tasks

Juan A. Rodriguez, Xiangru Jian, Siba Smarak Panigrahi et al.

DSMix: Distortion-Induced Saliency Map Based Pre-training for No-Reference Image Quality Assessment

Jinsong Shi, Jinsong Shi, Xiaojiang Peng et al.

Detect Any Mirrors: Boosting Learning Reliability on Large-Scale Unlabeled Data with an Iterative Data Engine

Zhaohu Xing, Lihao Liu, Yijun Yang et al.

Semantic-Aware Data Augmentation for Text-to-Image Synthesis

Zhaorui Tan, Xi Yang, Kaizhu Huang

AAAI 2024arXiv:2312.07951

text-to-image synthesissemantic data augmentationattention mechanismcontrastive learning+4

4

citations

#57

Enhancing Masked Time-Series Modeling via Dropping Patches

Tianyu Qiu, Yi Xie, Hao Niu et al.

Disentangling Tabular Data Towards Better One-Class Anomaly Detection

Jianan Ye, Zhaorui Tan, Yijie Hu et al.

Enhancing Fine-Grained Vision-Language Pretraining with Negative Augmented Samples

Yeyuan Wang, Dehong Gao, Lei Yi et al.

Enhancing Robustness in Incremental Learning with Adversarial Training

Seungju Cho, Hongsin Lee, Changick Kim

Controllable Blur Data Augmentation Using 3D-Aware Motion Estimation

Insoo Kim, Hana Lee, Hyong-Euk Lee et al.

GenDataAgent: On-the-fly Dataset Augmentation with Synthetic Data

Zhiteng Li, Lele Chen, Jerone Andrews et al.

AutoData: A Multi-Agent System for Open Web Data Collection

Tianyi Ma, Yiyue Qian, Zheyuan Zhang et al.

Taste More, Taste Better: Diverse Data and Strong Model Boost Semi-Supervised Crowd Counting

Maochen Yang, Zekun Li, Jian Zhang et al.

Quality over Quantity: Boosting Data Efficiency Through Ensembled Multimodal Data Curation

Jinda Xu, Yuhao Song, Daming Wang et al.

VarDrop: Enhancing Training Efficiency by Reducing Variate Redundancy in Periodic Time Series Forecasting

Junhyeok Kang, Yooju Shin, Jae-Gil Lee

Multi-Accurate CATE is Robust to Unknown Covariate Shifts

Angela Zhou, Christoph Kern, Michael Kim

ICLR 2025

heterogeneous treatment effectsconditional average treatment effectscovariate shift robustnessmulti-accurate predictors+4

3

citations

#68

Infilling Score: A Pretraining Data Detection Algorithm for Large Language Models

Negin Raoof, Litu Rout, Giannis Daras et al.

HOGSA: Bimanual Hand-Object Interaction Understanding with 3D Gaussian Splatting Based Data Augmentation

Wentian Qu, Jiahe Li, Jian Cheng et al.

Leveraging SD Map to Augment HD Map-based Trajectory Prediction

Zhiwei Dong, Ran Ding, Wei Li et al.

Scale Efficient Training for Large Datasets

Qing Zhou, Junyu Gao, Qi Wang

Configuring Data Augmentations to Reduce Variance Shift in Positional Embedding of Vision Transformers

Bum Jun Kim, Sang Woo Kim

Enhancing Noise-Robust Losses for Large-Scale Noisy Data Learning

Max Staats, Matthias Thamm, Bernd Rosenow

Promptable Representation Distribution Learning and Data Augmentation for Gigapixel Histopathology WSI Analysis

Kunming Tang, Zhiguo Jiang, Jun Shi et al.

Representation Space Augmentation for Effective Self-Supervised Learning on Tabular Data

Moonjung Eo, Kyungeun Lee, Hye-Seung Cho et al.

Datasets, Documents, and Repetitions: The Practicalities of Unequal Data Quality

Alex Fang, Hadi Pouransari, Matt Jordan et al.

FreeAugment: Data Augmentation Search Across All Degrees of Freedom

Tom Bekor, Niv Nayman, Lihi Zelnik-Manor

SeiT++: Masked Token Modeling Improves Storage-efficient Training

Minhyun Lee, Song Park, Byeongho Heo et al.

T-CIL: Temperature Scaling using Adversarial Perturbation for Calibration in Class-Incremental Learning

Seong-Hyeon Hwang, Minsu Kim, Steven Euijong Whang

Boosting Vision Semantic Density with Anatomy Normality Modeling for Medical Vision-language Pre-training

Weiwei Cao, Jianpeng Zhang, Zhongyi Shui et al.

Enhancing Adversarial Transferability with Checkpoints of a Single Model’s Training

Shixin Li, Chaoxiang He, Xiaojing Ma et al.

EmoGrowth: Incremental Multi-label Emotion Decoding with Augmented Emotional Relation Graph

Kaicheng Fu, Changde Du, Jie Peng et al.

MGDA: Model-based Goal Data Augmentation for Offline Goal-conditioned Weighted Supervised Learning

Xing Lei, Xuetao Zhang, Donglin Wang

GeoAggregator: An Efficient Transformer Model for Geo-Spatial Tabular Data

Rui Deng, Ziqi Li, Mingshu Wang

Beyond Random Augmentations: Pretraining with Hard Views

Fabio Ferreira, Ivo Rapant, Jörg Franke et al.

Label-Efficient Data Augmentation with Video Diffusion Models for Guidewire Segmentation in Cardiac Fluoroscopy

Shaoyan Pan, Yikang Liu, Lin Zhao et al.

SAVA: Scalable Learning-Agnostic Data Valuation

Samuel Kessler, Tam Le, Vu Nguyen

ICLR 2025arXiv:2406.01130

data valuationoptimal transportstochastic gradient methodsentropic regularization+3

1

citations

#88

EventZoom: A Progressive Approach to Event-Based Data Augmentation for Enhanced Neuromorphic Vision

Yiting Dong, Xiang He, Guobin Shen et al.

TDDBench: A Benchmark for Training data detection

Zhihao Zhu, Yi Yang, Defu Lian

InstaTrain: Adaptive Training via Ultra-Fast Natural Annealing within Dynamical Systems

Chuan Liu, Ruibing Song, Chunshu Wu et al.

TTVD: Towards a Geometric Framework for Test-Time Adaptation Based on Voronoi Diagram

Mingxi Lei, Chunwei Ma, Meng Ding et al.

New Algorithms for the Learning-Augmented k-means Problem

Junyu Huang, Qilong Feng, Ziyun Huang et al.

An Augmentation-Aware Theory for Self-Supervised Contrastive Learning

Jingyi Cui, Hongwei Wen, Yisen Wang

Improving the Generation and Evaluation of Synthetic Data for Downstream Medical Causal Inference

Harry Amad, Zhaozhi Qian, Dennis Frauen et al.

Data Mixture Optimization: A Multi-fidelity Multi-scale Bayesian Framework

Thomson Yen, Andrew Siah, Haozhe Chen et al.

Boosting Segment Anything Model Towards Open-Vocabulary Learning

Xumeng Han, Longhui Wei, Xuehui Yu et al.

Leveraging Imperfect Restoration for Data Availability Attack

YI HUANG, Jeremy Styborski, Mingzhi Lyu et al.

ECCV 2024

data availability attacksunlearnable datasetssupervised learningself-supervised learning+3

1

citations

#98

MetaAug: Meta-Data Augmentation for Post-Training Quantization

Cuong Pham, Hoang Anh Dung, Cuong Cao Nguyen et al.

Enhancing Virtual Try-On with Synthetic Pairs and Error-Aware Noise Scheduling

Nannan Li, Kevin Shih, Bryan A. Plummer

CVPR 2025arXiv:2501.04666

virtual try-onsynthetic data generationgarment extractionschrödinger bridge+4

1

citations

#100

Info-Coevolution: An Efficient Framework for Data Model Coevolution

Ziheng Qin, Hailun Xu, Wei Yew et al.

ICML 2025

1

citations