A Workshop at AAAI 2026
27th January 2026
Singapore Expo, Singapore
Welcome to the AAAI'26 Workshop on Shaping Responsible Synthetic Data in the Era of Foundation Models (RSD).
Foundation models (LLMs and multimodal FMs) are increasingly supplemented with synthetic and LLM-generated data - to extend training corpora, fill coverage gaps, and navigate privacy and fairness requirements. At the same time, these models serve as powerful generators of synthetic data for downstream applications, such as training specialized ML models, augmenting datasets in privacy-sensitive domains, and generating test cases for system validation.
But synthetic datasets, whether used to train FMs or generated by them for other uses, introduce a plethora of risks: legal (copyright, consent), security & privacy related (leakage, membership inference), ethical (bias amplification), and technical (model collapse, quality degradation).
This workshop examines how synthetic data can be responsibly generated and used to fuel, test, and govern foundation models across their lifecycle (pre-training, fine-tuning, evaluation, auditing), as well as how FM-generated synthetic data impacts downstream applications and systems, and what technical, ethical, and regulatory guardrails are needed across this synthetic data ecosystem.
Synthetic data for pre-training and fine-tuning (RLHF/RLAIF), continual evaluation, and self-training or bootstrapping loops—along with their limitations.
Synthetic counterfactuals and narratives for debugging; quantifying uncertainty; cross-domain benchmarks spanning tabular, time-series, text, vision, and multimodal data.
Synthetic adversarial probes, jailbreak tests, and edge-case simulations; preventing model collapse or shortcut learning through robust real/synthetic mixes.
Synthetic adversarial probes, jailbreak tests, and edge-case simulations; preventing model collapse or shortcut learning through robust real/synthetic mixes..
Cross-disciplinary examinations from law, policy, ethics, and social sciences; defining what is synthetic data, questioning appropriateness and societal impacts; tensions between technological possibility and human authenticity; critical assessments of synthetic data's role in shaping future AI systems.
Metric suites for fidelity, utility, and privacy; responsible generation protocols (e.g., source curation, prompt filtering, DP noise, provenance/watermarking, disclosure “data cards”); validation pipelines and audit checklists; open-source vs. commercial generators.
Differential privacy and other PETs; leakage and membership-inference risks; consent, copyright, and provenance concerns; comparative perspectives on regulation.
This is a non-archival workshop. While accepted papers will be accessible on the workshop website, we do not publish formal proceedings. You may submit work that is:
We are looking for reviewers for the workshop. If you would like to volunteer as a reviewer, please fill the Call for Reviewers Google Form.
Coming soon
Email us at aaai26-responsiblesyntheticdata@googlegroups.com