Data Augmentation
Augmenting training data
Top Papers
Data Filtering Networks
Alex Fang, Albin Madappally Jose, Amit Jain et al.
DiffuseMix: Label-Preserving Data Augmentation with Diffusion Models
Khawar Islam, Muhammad Zaigham Zaheer, Arif Mahmood et al.
Real-Fake: Effective Training Data Synthesis Through Distribution Matching
Jianhao Yuan, Jie Zhang, Shuyang Sun et al.
Data Shapley in One Training Run
Jiachen (Tianhao) Wang, Prateek Mittal, Dawn Song et al.
Gradient Reweighting: Towards Imbalanced Class-Incremental Learning
Jiangpeng He
Synthetic continued pretraining
Zitong Yang, Neil Band, Shuangping Li et al.
Do Generated Data Always Help Contrastive Learning?
Yifei Wang, Jizhe Zhang, Yisen Wang
Fake It till You Make It: Curricular Dynamic Forgery Augmentations towards General Deepfake Detection
Yuzhen Lin, Wentang Song, Bin Li et al.
Harnessing Large Language Models for Knowledge Graph Question Answering via Adaptive Multi-Aspect Retrieval-Augmentation
Derong Xu, Xinhang Li, Ziheng Zhang et al.
Spanning Training Progress: Temporal Dual-Depth Scoring (TDDS) for Enhanced Dataset Pruning
xin zhang, Jiawei Du, Weiying Xie et al.
Dataset Distillation by Automatic Training Trajectories
Dai Liu, Jindong Gu, Hu Cao et al.
Unmasking and Improving Data Credibility: A Study with Datasets for Training Harmless Language Models
Zhaowei Zhu, Jialu Wang, Hao Cheng et al.
Semantic-Guided Generative Image Augmentation Method with Diffusion Models for Image Classification
Bohan Li, Xiao Xu, Xinghao Wang et al.
BooW-VTON: Boosting In-the-Wild Virtual Try-On via Mask-Free Pseudo Data Training
Xuanpu Zhang, Dan Song, pengxin zhan et al.
Towards a General Time Series Anomaly Detector with Adaptive Bottlenecks and Dual Adversarial Decoders
Qichao Shentu, Beibu Li, Kai Zhao et al.
Few-Shot Anomaly-Driven Generation for Anomaly Classification and Segmentation
Guan Gui, Bin-Bin Gao, Jun Liu et al.
Dataset Enhancement with Instance-Level Augmentations
Orest Kupyn, Christian Rupprecht
Frozen Feature Augmentation for Few-Shot Image Classification
Andreas Bär, Neil Houlsby, Mostafa Dehghani et al.
A Comprehensive Augmentation Framework for Anomaly Detection
Lin Jiang, Yaping Yan
Kill Two Birds with One Stone: Rethinking Data Augmentation for Deep Long-tailed Learning
Binwu Wang, Pengkun Wang, Wei Xu et al.
AugUndo: Scaling Up Augmentations for Monocular Depth Completion and Estimation
Yangchao Wu, Tian Yu Liu, Hyoungseob Park et al.
Prismatic Synthesis: Gradient-based Data Diversification Boosts Generalization in LLM Reasoning
Jaehun Jung, Seungju Han, Ximing Lu et al.
A Simple Background Augmentation Method for Object Detection with Diffusion Model
YUHANG LI, Xin Dong, Chen Chen et al.
BEHAVIOR Vision Suite: Customizable Dataset Generation via Simulation
Yunhao Ge, Yihe Tang, Jiashu Xu et al.
Rethinking Data Augmentation for Robust LiDAR Semantic Segmentation in Adverse Weather
Junsung Park, Kyungmin Kim, Hyunjung Shim
HarmAug: Effective Data Augmentation for Knowledge Distillation of Safety Guard Models
Seanie Lee, Haebin Seong, Dong Bok Lee et al.
KITS: Inductive Spatio-Temporal Kriging with Increment Training Strategy
Qianxiong Xu, Cheng Long, Ziyue Li et al.
Adaptive Self-training Framework for Fine-grained Scene Graph Generation
Kibum Kim, Kanghoon Yoon, Yeonjun In et al.
TabDPT: Scaling Tabular Foundation Models on Real Data
Junwei Ma, Valentin Thomas, Rasa Hosseinzadeh et al.
How to Synthesize Text Data without Model Collapse?
Xuekai Zhu, Daixuan Cheng, Hengli Li et al.
Accelerating Neural Field Training via Soft Mining
Shakiba Kheradmand, Daniel Rebain, Gopal Sharma et al.
Adapter Merging with Centroid Prototype Mapping for Scalable Class-Incremental Learning
Takuma Fukuda, Hiroshi Kera, Kazuhiko Kawamoto
11293 Cross-Class Feature Augmentation for Class Incremental Learning
Taehoon Kim, JaeYoo Park, Bohyung Han
Effective Training Data Synthesis for Improving MLLM Chart Understanding
Yuwei Yang, Zeyu Zhang, Yunzhong Hou et al.
MAD-TD: Model-Augmented Data stabilizes High Update Ratio RL
Claas Voelcker, Marcel Hussing, ERIC EATON et al.
EventRPG: Event Data Augmentation with Relevance Propagation Guidance
Mingyuan Sun, Donghao Zhang, Zongyuan Ge et al.
On the Limitations of Temperature Scaling for Distributions with Overlaps
Muthu Chidambaram, Rong Ge
DataEnvGym: Data Generation Agents in Teacher Environments with Student Feedback
Zaid Khan, Elias Stengel-Eskin, Jaemin Cho et al.
Data Taggants: Dataset Ownership Verification Via Harmless Targeted Data Poisoning
Wassim Bouaziz, Nicolas Usunier, El-Mahdi El-Mhamdi
Toward Tiny and High-quality Facial Makeup with Data Amplify Learning
Qiaoqiao Jin, Xuanhong Chen, Meiguang Jin et al.
CAPrompt: Cyclic Prompt Aggregation for Pre-Trained Model Based Class Incremental Learning
Qiwei Li, Jiahuan Zhou
Factor Augmented Tensor-on-Tensor Neural Networks
Guanhao Zhou, Yuefeng Han, Xiufan Yu
Data Augmentation via Latent Diffusion for Saliency Prediction
Bahar Aydemir, Deblina Bhattacharjee, Tong Zhang et al.
Reshaping the Online Data Buffering and Organizing Mechanism for Continual Test-Time Adaptation
Zhilin Zhu, Xiaopeng Hong, Zhiheng Ma et al.
Improving Domain Generalization in Self-Supervised Monocular Depth Estimation via Stabilized Adversarial Training
Yuanqi Yao, Gang Wu, Kui Jiang et al.
Error Norm Truncation: Robust Training in the Presence of Data Noise for Text Generation Models
Tianjian Li, Haoran Xu, Philipp Koehn et al.
PhysAug: A Physical-guided and Frequency-based Data Augmentation for Single-Domain Generalized Object Detection
Xiaoran Xu, Jiangang Yang, Wenhui Shi et al.
Understanding and Mitigating Memorization in Diffusion Models for Tabular Data
Zhengyu Fang, Zhimeng Jiang, Huiyuan Chen et al.
Maximizing the Potential of Synthetic Data: Insights from Random Matrix Theory
Aymane El Firdoussi, Mohamed El Amine Seddik, Soufiane Hayou et al.
Inversion Circle Interpolation: Diffusion-based Image Augmentation for Data-scarce Classification
Yanghao Wang, Long Chen
Prompt Augmentation for Self-supervised Text-guided Image Manipulation
Rumeysa Bodur, Binod Bhattarai, Tae-Kyun Kim
BigDocs: An Open Dataset for Training Multimodal Models on Document and Code Tasks
Juan A. Rodriguez, Xiangru Jian, Siba Smarak Panigrahi et al.
Montessori-Instruct: Generate Influential Training Data Tailored for Student Learning
Xiaochuan Li, Zichun Yu, Chenyan Xiong
Through the Dual-Prism: A Spectral Perspective on Graph Data Augmentation for Graph Classifications
Yutong Xia, Runpeng Yu, Yuxuan Liang et al.
DSMix: Distortion-Induced Saliency Map Based Pre-training for No-Reference Image Quality Assessment
Jinsong Shi, Jinsong Shi, Xiaojiang Peng et al.
Detect Any Mirrors: Boosting Learning Reliability on Large-Scale Unlabeled Data with an Iterative Data Engine
Zhaohu Xing, Lihao Liu, Yijun Yang et al.
Semantic-Aware Data Augmentation for Text-to-Image Synthesis
Zhaorui Tan, Xi Yang, Kaizhu Huang
Enhancing Masked Time-Series Modeling via Dropping Patches
Tianyu Qiu, Yi Xie, Hao Niu et al.
Enhancing Fine-Grained Vision-Language Pretraining with Negative Augmented Samples
Yeyuan Wang, Dehong Gao, Lei Yi et al.
Controllable Blur Data Augmentation Using 3D-Aware Motion Estimation
Insoo Kim, Hana Lee, Hyong-Euk Lee et al.
Disentangling Tabular Data Towards Better One-Class Anomaly Detection
Jianan Ye, Zhaorui Tan, Yijie Hu et al.
Enhancing Robustness in Incremental Learning with Adversarial Training
Seungju Cho, Hongsin Lee, Changick Kim
AutoData: A Multi-Agent System for Open Web Data Collection
Tianyi Ma, Yiyue Qian, Zheyuan Zhang et al.
GenDataAgent: On-the-fly Dataset Augmentation with Synthetic Data
Zhiteng Li, Lele Chen, Jerone Andrews et al.
Taste More, Taste Better: Diverse Data and Strong Model Boost Semi-Supervised Crowd Counting
Maochen Yang, Zekun Li, Jian Zhang et al.
Quality over Quantity: Boosting Data Efficiency Through Ensembled Multimodal Data Curation
Jinda Xu, Yuhao Song, Daming Wang et al.
VarDrop: Enhancing Training Efficiency by Reducing Variate Redundancy in Periodic Time Series Forecasting
Junhyeok Kang, Yooju Shin, Jae-Gil Lee
HOGSA: Bimanual Hand-Object Interaction Understanding with 3D Gaussian Splatting Based Data Augmentation
Wentian Qu, Jiahe Li, Jian Cheng et al.
Multi-Accurate CATE is Robust to Unknown Covariate Shifts
Angela Zhou, Christoph Kern, Michael Kim
Infilling Score: A Pretraining Data Detection Algorithm for Large Language Models
Negin Raoof, Litu Rout, Giannis Daras et al.
Leveraging SD Map to Augment HD Map-based Trajectory Prediction
Zhiwei Dong, Ran Ding, Wei Li et al.
Scale Efficient Training for Large Datasets
Qing Zhou, Junyu Gao, Qi Wang
Configuring Data Augmentations to Reduce Variance Shift in Positional Embedding of Vision Transformers
Bum Jun Kim, Sang Woo Kim
Enhancing Noise-Robust Losses for Large-Scale Noisy Data Learning
Max Staats, Matthias Thamm, Bernd Rosenow
Promptable Representation Distribution Learning and Data Augmentation for Gigapixel Histopathology WSI Analysis
Kunming Tang, Zhiguo Jiang, Jun Shi et al.
Representation Space Augmentation for Effective Self-Supervised Learning on Tabular Data
Moonjung Eo, Kyungeun Lee, Hye-Seung Cho et al.
Datasets, Documents, and Repetitions: The Practicalities of Unequal Data Quality
Alex Fang, Hadi Pouransari, Matt Jordan et al.
FreeAugment: Data Augmentation Search Across All Degrees of Freedom
Tom Bekor, Niv Nayman, Lihi Zelnik-Manor
SeiT++: Masked Token Modeling Improves Storage-efficient Training
Minhyun Lee, Song Park, Byeongho Heo et al.
T-CIL: Temperature Scaling using Adversarial Perturbation for Calibration in Class-Incremental Learning
Seong-Hyeon Hwang, Minsu Kim, Steven Euijong Whang
Boosting Vision Semantic Density with Anatomy Normality Modeling for Medical Vision-language Pre-training
Weiwei Cao, Jianpeng Zhang, Zhongyi Shui et al.
Enhancing Adversarial Transferability with Checkpoints of a Single Model’s Training
Shixin Li, Chaoxiang He, Xiaojing Ma et al.
MGDA: Model-based Goal Data Augmentation for Offline Goal-conditioned Weighted Supervised Learning
Xing Lei, Xuetao Zhang, Donglin Wang
Boosting Segment Anything Model Towards Open-Vocabulary Learning
Xumeng Han, Longhui Wei, Xuehui Yu et al.
GeoAggregator: An Efficient Transformer Model for Geo-Spatial Tabular Data
Rui Deng, Ziqi Li, Mingshu Wang
Beyond Random Augmentations: Pretraining with Hard Views
Fabio Ferreira, Ivo Rapant, Jörg Franke et al.
SAVA: Scalable Learning-Agnostic Data Valuation
Samuel Kessler, Tam Le, Vu Nguyen
NRGBoost: Energy-Based Generative Boosted Trees
João Bravo
Label-Efficient Data Augmentation with Video Diffusion Models for Guidewire Segmentation in Cardiac Fluoroscopy
Shaoyan Pan, Yikang Liu, Lin Zhao et al.
InstaTrain: Adaptive Training via Ultra-Fast Natural Annealing within Dynamical Systems
Chuan Liu, Ruibing Song, Chunshu Wu et al.
TTVD: Towards a Geometric Framework for Test-Time Adaptation Based on Voronoi Diagram
Mingxi Lei, Chunwei Ma, Meng Ding et al.
TDDBench: A Benchmark for Training data detection
Zhihao Zhu, Yi Yang, Defu Lian
An Augmentation-Aware Theory for Self-Supervised Contrastive Learning
Jingyi Cui, Hongwei Wen, Yisen Wang
New Algorithms for the Learning-Augmented k-means Problem
Junyu Huang, Qilong Feng, Ziyun Huang et al.
Data Mixture Optimization: A Multi-fidelity Multi-scale Bayesian Framework
Thomson Yen, Andrew Siah, Haozhe Chen et al.
Improving the Generation and Evaluation of Synthetic Data for Downstream Medical Causal Inference
Harry Amad, Zhaozhi Qian, Dennis Frauen et al.
Leveraging Imperfect Restoration for Data Availability Attack
YI HUANG, Jeremy Styborski, Mingzhi Lyu et al.
MetaAug: Meta-Data Augmentation for Post-Training Quantization
Cuong Pham, Hoang Anh Dung, Cuong Cao Nguyen et al.
Enhancing Virtual Try-On with Synthetic Pairs and Error-Aware Noise Scheduling
Nannan Li, Kevin Shih, Bryan A. Plummer
EventZoom: A Progressive Approach to Event-Based Data Augmentation for Enhanced Neuromorphic Vision
Yiting Dong, Xiang He, Guobin Shen et al.