🧬Data

Data Augmentation

Augmenting training data

100 papers1,211 total citations
Compare with other topics
Feb '24 Jan '26324 papers
Also includes: data augmentation, augmentation, synthetic data, data generation

Top Papers

#1

Data Filtering Networks

Alex Fang, Albin Madappally Jose, Amit Jain et al.

ICLR 2024
217
citations
#2

DiffuseMix: Label-Preserving Data Augmentation with Diffusion Models

Khawar Islam, Muhammad Zaigham Zaheer, Arif Mahmood et al.

CVPR 2024
85
citations
#3

Real-Fake: Effective Training Data Synthesis Through Distribution Matching

Jianhao Yuan, Jie Zhang, Shuyang Sun et al.

ICLR 2024
45
citations
#4

Data Shapley in One Training Run

Jiachen (Tianhao) Wang, Prateek Mittal, Dawn Song et al.

ICLR 2025arXiv:2406.11011
data attributiondata shapleyfoundation model pretraininggenerative ai copyright+3
44
citations
#5

Gradient Reweighting: Towards Imbalanced Class-Incremental Learning

Jiangpeng He

CVPR 2024
39
citations
#6

Synthetic continued pretraining

Zitong Yang, Neil Band, Shuangping Li et al.

ICLR 2025
37
citations
#7

Do Generated Data Always Help Contrastive Learning?

Yifei Wang, Jizhe Zhang, Yisen Wang

ICLR 2024
35
citations
#8

Fake It till You Make It: Curricular Dynamic Forgery Augmentations towards General Deepfake Detection

Yuzhen Lin, Wentang Song, Bin Li et al.

ECCV 2024
34
citations
#9

Harnessing Large Language Models for Knowledge Graph Question Answering via Adaptive Multi-Aspect Retrieval-Augmentation

Derong Xu, Xinhang Li, Ziheng Zhang et al.

AAAI 2025
31
citations
#10

Spanning Training Progress: Temporal Dual-Depth Scoring (TDDS) for Enhanced Dataset Pruning

xin zhang, Jiawei Du, Weiying Xie et al.

CVPR 2024
30
citations
#11

Dataset Distillation by Automatic Training Trajectories

Dai Liu, Jindong Gu, Hu Cao et al.

ECCV 2024
29
citations
#12

Unmasking and Improving Data Credibility: A Study with Datasets for Training Harmless Language Models

Zhaowei Zhu, Jialu Wang, Hao Cheng et al.

ICLR 2024
26
citations
#13

Semantic-Guided Generative Image Augmentation Method with Diffusion Models for Image Classification

Bohan Li, Xiao Xu, Xinghao Wang et al.

AAAI 2024arXiv:2302.02070
image augmentationdiffusion modelssemantic consistencyimage classification+2
24
citations
#14

BooW-VTON: Boosting In-the-Wild Virtual Try-On via Mask-Free Pseudo Data Training

Xuanpu Zhang, Dan Song, pengxin zhan et al.

CVPR 2025
22
citations
#15

Few-Shot Anomaly-Driven Generation for Anomaly Classification and Segmentation

Guan Gui, Bin-Bin Gao, Jun Liu et al.

ECCV 2024
21
citations
#16

Towards a General Time Series Anomaly Detector with Adaptive Bottlenecks and Dual Adversarial Decoders

Qichao Shentu, Beibu Li, Kai Zhao et al.

ICLR 2025arXiv:2405.15273
time series anomaly detectionadaptive bottlenecksadversarial decodersmulti-domain pre-training+2
21
citations
#17

Dataset Enhancement with Instance-Level Augmentations

Orest Kupyn, Christian Rupprecht

ECCV 2024
17
citations
#18

Frozen Feature Augmentation for Few-Shot Image Classification

Andreas Bär, Neil Houlsby, Mostafa Dehghani et al.

CVPR 2024
16
citations
#19

A Comprehensive Augmentation Framework for Anomaly Detection

Lin Jiang, Yaping Yan

AAAI 2024arXiv:2308.15068
anomaly detectiondata augmentationreconstruction-based approachsimulated anomalies+4
16
citations
#20

Prismatic Synthesis: Gradient-based Data Diversification Boosts Generalization in LLM Reasoning

Jaehun Jung, Seungju Han, Ximing Lu et al.

NeurIPS 2025arXiv:2505.20161
gradient-based diversificationdata diversity metricsout-of-distribution generalizationsynthetic data generation+3
15
citations
#21

AugUndo: Scaling Up Augmentations for Monocular Depth Completion and Estimation

Yangchao Wu, Tian Yu Liu, Hyoungseob Park et al.

ECCV 2024
15
citations
#22

Kill Two Birds with One Stone: Rethinking Data Augmentation for Deep Long-tailed Learning

Binwu Wang, Pengkun Wang, Wei Xu et al.

ICLR 2024
15
citations
#23

A Simple Background Augmentation Method for Object Detection with Diffusion Model

YUHANG LI, Xin Dong, Chen Chen et al.

ECCV 2024
15
citations
#24

BEHAVIOR Vision Suite: Customizable Dataset Generation via Simulation

Yunhao Ge, Yihe Tang, Jiashu Xu et al.

CVPR 2024
14
citations
#25

KITS: Inductive Spatio-Temporal Kriging with Increment Training Strategy

Qianxiong Xu, Cheng Long, Ziyue Li et al.

AAAI 2025
13
citations
#26

Rethinking Data Augmentation for Robust LiDAR Semantic Segmentation in Adverse Weather

Junsung Park, Kyungmin Kim, Hyunjung Shim

ECCV 2024
13
citations
#27

Accelerating Neural Field Training via Soft Mining

Shakiba Kheradmand, Daniel Rebain, Gopal Sharma et al.

CVPR 2024
12
citations
#28

Adaptive Self-training Framework for Fine-grained Scene Graph Generation

Kibum Kim, Kanghoon Yoon, Yeonjun In et al.

ICLR 2024
12
citations
#29

How to Synthesize Text Data without Model Collapse?

Xuekai Zhu, Daixuan Cheng, Hengli Li et al.

ICML 2025
12
citations
#30

TabDPT: Scaling Tabular Foundation Models on Real Data

Junwei Ma, Valentin Thomas, Rasa Hosseinzadeh et al.

NeurIPS 2025
12
citations
#31

Adapter Merging with Centroid Prototype Mapping for Scalable Class-Incremental Learning

Takuma Fukuda, Hiroshi Kera, Kazuhiko Kawamoto

CVPR 2025
11
citations
#32

11293 Cross-Class Feature Augmentation for Class Incremental Learning

Taehoon Kim, JaeYoo Park, Bohyung Han

AAAI 2024
10
citations
#33

MAD-TD: Model-Augmented Data stabilizes High Update Ratio RL

Claas Voelcker, Marcel Hussing, ERIC EATON et al.

ICLR 2025
10
citations
#34

Effective Training Data Synthesis for Improving MLLM Chart Understanding

Yuwei Yang, Zeyu Zhang, Yunzhong Hou et al.

ICCV 2025
10
citations
#35

EventRPG: Event Data Augmentation with Relevance Propagation Guidance

Mingyuan Sun, Donghao Zhang, Zongyuan Ge et al.

ICLR 2024
9
citations
#36

On the Limitations of Temperature Scaling for Distributions with Overlaps

Muthu Chidambaram, Rong Ge

ICLR 2024
8
citations
#37

DataEnvGym: Data Generation Agents in Teacher Environments with Student Feedback

Zaid Khan, Elias Stengel-Eskin, Jaemin Cho et al.

ICLR 2025arXiv:2410.06215
data generation agentsteacher environmentssequential decision-makingstudent feedback mechanisms+4
8
citations
#38

Data Taggants: Dataset Ownership Verification Via Harmless Targeted Data Poisoning

Wassim Bouaziz, Nicolas Usunier, El-Mahdi El-Mhamdi

ICLR 2025
8
citations
#39

Toward Tiny and High-quality Facial Makeup with Data Amplify Learning

Qiaoqiao Jin, Xuanhong Chen, Meiguang Jin et al.

ECCV 2024
8
citations
#40

Factor Augmented Tensor-on-Tensor Neural Networks

Guanhao Zhou, Yuefeng Han, Xiufan Yu

AAAI 2025
7
citations
#41

Data Augmentation via Latent Diffusion for Saliency Prediction

Bahar Aydemir, Deblina Bhattacharjee, Tong Zhang et al.

ECCV 2024
7
citations
#42

CAPrompt: Cyclic Prompt Aggregation for Pre-Trained Model Based Class Incremental Learning

Qiwei Li, Jiahuan Zhou

AAAI 2025
7
citations
#43

Reshaping the Online Data Buffering and Organizing Mechanism for Continual Test-Time Adaptation

Zhilin Zhu, Xiaopeng Hong, Zhiheng Ma et al.

ECCV 2024
7
citations
#44

Improving Domain Generalization in Self-Supervised Monocular Depth Estimation via Stabilized Adversarial Training

Yuanqi Yao, Gang Wu, Kui Jiang et al.

ECCV 2024arXiv:2411.02149
monocular depth estimationdomain generalizationadversarial trainingself-supervised learning+3
7
citations
#45

Error Norm Truncation: Robust Training in the Presence of Data Noise for Text Generation Models

Tianjian Li, Haoran Xu, Philipp Koehn et al.

ICLR 2024
6
citations
#46

Understanding and Mitigating Memorization in Diffusion Models for Tabular Data

Zhengyu Fang, Zhimeng Jiang, Huiyuan Chen et al.

ICML 2025
6
citations
#47

PhysAug: A Physical-guided and Frequency-based Data Augmentation for Single-Domain Generalized Object Detection

Xiaoran Xu, Jiangang Yang, Wenhui Shi et al.

AAAI 2025
6
citations
#48

Maximizing the Potential of Synthetic Data: Insights from Random Matrix Theory

Aymane El Firdoussi, Mohamed El Amine Seddik, Soufiane Hayou et al.

ICLR 2025
6
citations
#49

Inversion Circle Interpolation: Diffusion-based Image Augmentation for Data-scarce Classification

Yanghao Wang, Long Chen

CVPR 2025
6
citations
#50

Prompt Augmentation for Self-supervised Text-guided Image Manipulation

Rumeysa Bodur, Binod Bhattarai, Tae-Kyun Kim

CVPR 2024
6
citations
#51

Through the Dual-Prism: A Spectral Perspective on Graph Data Augmentation for Graph Classifications

Yutong Xia, Runpeng Yu, Yuxuan Liang et al.

AAAI 2025
5
citations
#52

Montessori-Instruct: Generate Influential Training Data Tailored for Student Learning

Xiaochuan Li, Zichun Yu, Chenyan Xiong

ICLR 2025
5
citations
#53

BigDocs: An Open Dataset for Training Multimodal Models on Document and Code Tasks

Juan A. Rodriguez, Xiangru Jian, Siba Smarak Panigrahi et al.

ICLR 2025
5
citations
#54

DSMix: Distortion-Induced Saliency Map Based Pre-training for No-Reference Image Quality Assessment

Jinsong Shi, Jinsong Shi, Xiaojiang Peng et al.

ECCV 2024
5
citations
#55

Detect Any Mirrors: Boosting Learning Reliability on Large-Scale Unlabeled Data with an Iterative Data Engine

Zhaohu Xing, Lihao Liu, Yijun Yang et al.

CVPR 2025
5
citations
#56

Semantic-Aware Data Augmentation for Text-to-Image Synthesis

Zhaorui Tan, Xi Yang, Kaizhu Huang

AAAI 2024arXiv:2312.07951
text-to-image synthesissemantic data augmentationattention mechanismcontrastive learning+4
4
citations
#57

Enhancing Masked Time-Series Modeling via Dropping Patches

Tianyu Qiu, Yi Xie, Hao Niu et al.

AAAI 2025
4
citations
#58

Disentangling Tabular Data Towards Better One-Class Anomaly Detection

Jianan Ye, Zhaorui Tan, Yijie Hu et al.

AAAI 2025
4
citations
#59

Enhancing Fine-Grained Vision-Language Pretraining with Negative Augmented Samples

Yeyuan Wang, Dehong Gao, Lei Yi et al.

AAAI 2025
4
citations
#60

Enhancing Robustness in Incremental Learning with Adversarial Training

Seungju Cho, Hongsin Lee, Changick Kim

AAAI 2025
4
citations
#61

Controllable Blur Data Augmentation Using 3D-Aware Motion Estimation

Insoo Kim, Hana Lee, Hyong-Euk Lee et al.

ICLR 2025
4
citations
#62

GenDataAgent: On-the-fly Dataset Augmentation with Synthetic Data

Zhiteng Li, Lele Chen, Jerone Andrews et al.

ICLR 2025
4
citations
#63

AutoData: A Multi-Agent System for Open Web Data Collection

Tianyi Ma, Yiyue Qian, Zheyuan Zhang et al.

NeurIPS 2025
4
citations
#64

Taste More, Taste Better: Diverse Data and Strong Model Boost Semi-Supervised Crowd Counting

Maochen Yang, Zekun Li, Jian Zhang et al.

CVPR 2025
4
citations
#65

Quality over Quantity: Boosting Data Efficiency Through Ensembled Multimodal Data Curation

Jinda Xu, Yuhao Song, Daming Wang et al.

AAAI 2025
3
citations
#66

VarDrop: Enhancing Training Efficiency by Reducing Variate Redundancy in Periodic Time Series Forecasting

Junhyeok Kang, Yooju Shin, Jae-Gil Lee

AAAI 2025
3
citations
#67

Multi-Accurate CATE is Robust to Unknown Covariate Shifts

Angela Zhou, Christoph Kern, Michael Kim

ICLR 2025
heterogeneous treatment effectsconditional average treatment effectscovariate shift robustnessmulti-accurate predictors+4
3
citations
#68

Infilling Score: A Pretraining Data Detection Algorithm for Large Language Models

Negin Raoof, Litu Rout, Giannis Daras et al.

ICLR 2025
3
citations
#69

HOGSA: Bimanual Hand-Object Interaction Understanding with 3D Gaussian Splatting Based Data Augmentation

Wentian Qu, Jiahe Li, Jian Cheng et al.

AAAI 2025
3
citations
#70

Leveraging SD Map to Augment HD Map-based Trajectory Prediction

Zhiwei Dong, Ran Ding, Wei Li et al.

CVPR 2025
3
citations
#71

Scale Efficient Training for Large Datasets

Qing Zhou, Junyu Gao, Qi Wang

CVPR 2025
3
citations
#72

Configuring Data Augmentations to Reduce Variance Shift in Positional Embedding of Vision Transformers

Bum Jun Kim, Sang Woo Kim

AAAI 2025
2
citations
#73

Enhancing Noise-Robust Losses for Large-Scale Noisy Data Learning

Max Staats, Matthias Thamm, Bernd Rosenow

AAAI 2025
2
citations
#74

Promptable Representation Distribution Learning and Data Augmentation for Gigapixel Histopathology WSI Analysis

Kunming Tang, Zhiguo Jiang, Jun Shi et al.

AAAI 2025
2
citations
#75

Representation Space Augmentation for Effective Self-Supervised Learning on Tabular Data

Moonjung Eo, Kyungeun Lee, Hye-Seung Cho et al.

AAAI 2025
2
citations
#76

Datasets, Documents, and Repetitions: The Practicalities of Unequal Data Quality

Alex Fang, Hadi Pouransari, Matt Jordan et al.

NeurIPS 2025
2
citations
#77

FreeAugment: Data Augmentation Search Across All Degrees of Freedom

Tom Bekor, Niv Nayman, Lihi Zelnik-Manor

ECCV 2024
2
citations
#78

SeiT++: Masked Token Modeling Improves Storage-efficient Training

Minhyun Lee, Song Park, Byeongho Heo et al.

ECCV 2024
2
citations
#79

T-CIL: Temperature Scaling using Adversarial Perturbation for Calibration in Class-Incremental Learning

Seong-Hyeon Hwang, Minsu Kim, Steven Euijong Whang

CVPR 2025
2
citations
#80

Boosting Vision Semantic Density with Anatomy Normality Modeling for Medical Vision-language Pre-training

Weiwei Cao, Jianpeng Zhang, Zhongyi Shui et al.

ICCV 2025
2
citations
#81

Enhancing Adversarial Transferability with Checkpoints of a Single Model’s Training

Shixin Li, Chaoxiang He, Xiaojing Ma et al.

CVPR 2025
2
citations
#82

EmoGrowth: Incremental Multi-label Emotion Decoding with Augmented Emotional Relation Graph

Kaicheng Fu, Changde Du, Jie Peng et al.

ICML 2025
1
citations
#83

MGDA: Model-based Goal Data Augmentation for Offline Goal-conditioned Weighted Supervised Learning

Xing Lei, Xuetao Zhang, Donglin Wang

AAAI 2025
1
citations
#84

GeoAggregator: An Efficient Transformer Model for Geo-Spatial Tabular Data

Rui Deng, Ziqi Li, Mingshu Wang

AAAI 2025
1
citations
#85

Beyond Random Augmentations: Pretraining with Hard Views

Fabio Ferreira, Ivo Rapant, Jörg Franke et al.

ICLR 2025
1
citations
#86

Label-Efficient Data Augmentation with Video Diffusion Models for Guidewire Segmentation in Cardiac Fluoroscopy

Shaoyan Pan, Yikang Liu, Lin Zhao et al.

AAAI 2025
1
citations
#87

SAVA: Scalable Learning-Agnostic Data Valuation

Samuel Kessler, Tam Le, Vu Nguyen

ICLR 2025arXiv:2406.01130
data valuationoptimal transportstochastic gradient methodsentropic regularization+3
1
citations
#88

EventZoom: A Progressive Approach to Event-Based Data Augmentation for Enhanced Neuromorphic Vision

Yiting Dong, Xiang He, Guobin Shen et al.

AAAI 2025
1
citations
#89

TDDBench: A Benchmark for Training data detection

Zhihao Zhu, Yi Yang, Defu Lian

ICLR 2025
1
citations
#90

InstaTrain: Adaptive Training via Ultra-Fast Natural Annealing within Dynamical Systems

Chuan Liu, Ruibing Song, Chunshu Wu et al.

ICLR 2025
1
citations
#91

TTVD: Towards a Geometric Framework for Test-Time Adaptation Based on Voronoi Diagram

Mingxi Lei, Chunwei Ma, Meng Ding et al.

ICLR 2025
1
citations
#92

New Algorithms for the Learning-Augmented k-means Problem

Junyu Huang, Qilong Feng, Ziyun Huang et al.

ICLR 2025
1
citations
#93

An Augmentation-Aware Theory for Self-Supervised Contrastive Learning

Jingyi Cui, Hongwei Wen, Yisen Wang

ICML 2025
1
citations
#94

Improving the Generation and Evaluation of Synthetic Data for Downstream Medical Causal Inference

Harry Amad, Zhaozhi Qian, Dennis Frauen et al.

NeurIPS 2025
1
citations
#95

Data Mixture Optimization: A Multi-fidelity Multi-scale Bayesian Framework

Thomson Yen, Andrew Siah, Haozhe Chen et al.

NeurIPS 2025
1
citations
#96

Boosting Segment Anything Model Towards Open-Vocabulary Learning

Xumeng Han, Longhui Wei, Xuehui Yu et al.

AAAI 2025
1
citations
#97

Leveraging Imperfect Restoration for Data Availability Attack

YI HUANG, Jeremy Styborski, Mingzhi Lyu et al.

ECCV 2024
data availability attacksunlearnable datasetssupervised learningself-supervised learning+3
1
citations
#98

MetaAug: Meta-Data Augmentation for Post-Training Quantization

Cuong Pham, Hoang Anh Dung, Cuong Cao Nguyen et al.

ECCV 2024
1
citations
#99

Enhancing Virtual Try-On with Synthetic Pairs and Error-Aware Noise Scheduling

Nannan Li, Kevin Shih, Bryan A. Plummer

CVPR 2025arXiv:2501.04666
virtual try-onsynthetic data generationgarment extractionschrödinger bridge+4
1
citations
#100

Info-Coevolution: An Efficient Framework for Data Model Coevolution

Ziheng Qin, Hailun Xu, Wei Yew et al.

ICML 2025
1
citations