"synthetic data generation" Papers
39 papers found
Any2AnyTryon: Leveraging Adaptive Position Embeddings for Versatile Virtual Clothing Tasks
Hailong Guo, Bohan Zeng, Yiren Song et al.
Auto-GDA: Automatic Domain Adaptation for Efficient Grounding Verification in Retrieval-Augmented Generation
Tobias Leemann, Periklis Petridis, Giuseppe Vietri et al.
ChemOrch: Empowering LLMs with Chemical Intelligence via Groundbreaking Synthetic Instructions
Yue Huang, Zhengzhe Jiang, Xiaonan Luo et al.
Enhancing Virtual Try-On with Synthetic Pairs and Error-Aware Noise Scheduling
Nannan Li, Kevin Shih, Bryan A. Plummer
Everything to the Synthetic: Diffusion-driven Test-time Adaptation via Synthetic-Domain Alignment
Jiayi Guo, Zhao Junhao, Chaoqun Du et al.
GLoRa: A Benchmark to Evaluate the Ability to Learn Long-Range Dependencies in Graphs
Dongzhuoran Zhou, Evgeny Kharlamov, Egor Kostylev
GRIP: A Graph-Based Reasoning Instruction Producer
Jiankang Wang, Jianjun Xu, Xiaorui Wang et al.
LongVPO: From Anchored Cues to Self-Reasoning for Long-Form Video Preference Optimization
Zhenpeng Huang, Jiaqi Li, zihan jia et al.
MarkushGrapher: Joint Visual and Textual Recognition of Markush Structures
Lucas Morin, Valery Weber, Ahmed Nassar et al.
MathCoder2: Better Math Reasoning from Continued Pretraining on Model-translated Mathematical Code
Zimu Lu, Aojun Zhou, Ke Wang et al.
MirrorVerse: Pushing Diffusion Models to Realistically Reflect the World
Ankit Dhiman, Manan Shah, R. Venkatesh Babu
Prismatic Synthesis: Gradient-based Data Diversification Boosts Generalization in LLM Reasoning
Jaehun Jung, Seungju Han, Ximing Lu et al.
Rethinking the Role of Verbatim Memorization in LLM Privacy
Tom Sander, Bargav Jayaraman, Mark Ibrahim et al.
RLGF: Reinforcement Learning with Geometric Feedback for Autonomous Driving Video Generation
Tianyi Yan, Wencheng Han, xia zhou et al.
Scaling Instruction-tuned LLMs to Million-token Contexts via Hierarchical Synthetic Data Generation
Linda He, Jue Wang, Maurice Weber et al.
Synthetic Data is an Elegant GIFT for Continual Vision-Language Models
Bin Wu, Wuxuan Shi, Jinqiao Wang et al.
Triplets Better Than Pairs: Towards Stable and Effective Self-Play Fine-Tuning for LLMs
Yibo Wang, Hai-Long Sun, Guangda Huzhang et al.
V2V: Scaling Event-Based Vision through Efficient Video-to-Voxel Simulation
Hanyue Lou, Jinxiu Liang, Minggui Teng et al.
Valid Inference with Imperfect Synthetic Data
Yewon Byun, Shantanu Gupta, Zachary Lipton et al.
Virus Infection Attack on LLMs: Your Poisoning Can Spread "VIA" Synthetic Data
Zi Liang, Qingqing Ye, Xuan Liu et al.
3DGazeNet: Generalizing Gaze Estimation with Weak Supervision from Synthetic Views
Evangelos Ververas, Polydefkis Gkagkos, Jiankang Deng et al.
CaPS: Collaborative and Private Synthetic Data Generation from Distributed Sources
Sikha Pentyala, Mayana Pereira, Martine De Cock
ConSequence: Synthesizing Logically Constrained Sequences for Electronic Health Record Generation
Brandon Theodorou, Shrusti Jain, Cao Xiao et al.
Curated LLM: Synergy of LLMs and Data Curation for tabular augmentation in low-data regimes
Nabeel Seedat, Nicolas Huynh, Boris van Breugel et al.
CuTS: Customizable Tabular Synthetic Data Generation
Mark Vero, Mislav Balunovic, Martin Vechev
Data-to-Model Distillation: Data-Efficient Learning Framework
Ahmad Sajedi, Samir Khaki, Lucy Z. Liu et al.
Descanning: From Scanned to the Original Images with a Color Correction Diffusion Model
Junghun Cha, Ali Haider, Seoyun Yang et al.
Differentially Private Sum-Product Networks
Xenia Heilmann, Mattia Cerrato, Ernst Althaus
Differentially Private Synthetic Data via Foundation Model APIs 2: Text
Chulin Xie, Zinan Lin, Arturs Backurs et al.
DiffuMatting: Synthesizing Arbitrary Objects with Matting-level Annotation
Xiaobin Hu, Xu Peng, Donghao Luo et al.
FlexKBQA: A Flexible LLM-Powered Framework for Few-Shot Knowledge Base Question Answering
Zhenyu Li, Sunqi Fan, Yu Gu et al.
GLoRe: When, Where, and How to Improve LLM Reasoning via Global and Local Refinements
Alexander Havrilla, Sharath Chandra Raparthy, Christoforos Nalmpantis et al.
Image Captioning with Multi-Context Synthetic Data
Position: Will we run out of data? Limits of LLM scaling based on human-generated data
Pablo Villalobos, Anson Ho, Jaime Sevilla et al.
PrE-Text: Training Language Models on Private Federated Data in the Age of LLMs
Charlie Hou, Akshat Shrivastava, Hongyuan Zhan et al.
Sharpness-Aware Data Generation for Zero-shot Quantization
Hoang Dung, Cuong Pham, Trung Le et al.
Speech Self-Supervised Learning Using Diffusion Model Synthetic Data
Heting Gao, Kaizhi Qian, Junrui Ni et al.
UPose3D: Uncertainty-Aware 3D Human Pose Estimation with Cross-View and Temporal Cues
Vandad Davoodnia, Saeed Ghorbani, Marc-André Carbonneau et al.
What is Dataset Distillation Learning?
William Yang, Ye Zhu, Zhiwei Deng et al.