Code in Action: Synthetic Data Revisited - Articles

Articles

Stay at the forefront of the consumer insights and analytics industry with our Thought Leadership content. Here you’ll find timely updates on the Insights Association’s advocacy efforts, including the latest legislative and regulatory developments that impact how we work. In addition, this section offers expert perspectives on innovative research techniques and methodologies, as well as valuable analysis of evolving consumer trends. Together, these insights provide a trusted resource for professionals looking to navigate change, elevate their practice, and shape the future of our industry.

Code in Action: Synthetic Data Revisited

Code in Action: Synthetic Data Revisited

By Melanie Courtright, Member, IA Standards Committee.

Scope and Definition Considerations

Interest in and use of synthetic data has evolved significantly over the past two years. Synthetic respondents, simulated personas, and AI-generated datasets are now common in conference discussions, product development, and professional dialogue.

This shift makes it timely to revisit how synthetic data and synthetic participants intersect with the Insights Association Code of Standards and Ethics. The intent is not to slow innovation or endorse it uncritically, but to ensure that adoption is deliberate, transparent, and aligned with the principles that underpin research integrity and public trust.

For purposes of applying the Code, two foundational realities are relevant:

  • Synthetic data and participants are often derived from, or informed by, data originating from real individuals
  • Outputs generated from synthetic data or participants may be used to inform business decisions in ways similar to traditional research outputs

As a result, both participant protection principles and research integrity standards continue to apply.


IA Code Section 2: Primary Data Collection and Consent

The Code requires that researchers obtain informed consent for the collection and use of participant data, including when that data is used in ways that differ materially from its original purpose.

Application to Synthetic Data

When real participant data is used to develop, train, or inform synthetic models:

  • This may constitute a material change from the original purpose of data collection
  • Participants may not reasonably expect their data to contribute to simulated data and respondents, have their opinions included in training data sets, or enable future modeling applications

Guidance

  • Consent language should clearly account for secondary and future uses of data where applicable
  • If data may be used in modeling or simulation, this should be communicated in clear and understandable terms
  • Where consent does not explicitly cover these uses, researchers should carefully assess whether the data is appropriate for inclusion

IA Code Sections 3, 4 and 6: Artificial Intelligence, Data Protection, and Privacy

The Code requires that personal data be protected against unauthorized access, disclosure, and re-identification.

Application to Synthetic Data

Synthetic models may introduce risk if:

  • Source data contains identifiable or sensitive information
  • Model outputs inadvertently reproduce or reveal elements of original participant data

Guidance

  • Data used in all synthetic modeling, including synthetic participants, digital twins, and generated datasets, should be anonymized and minimized prior to use
  • Researchers should evaluate whether model outputs could enable re-identification, directly or indirectly
  • Appropriate security controls should be maintained throughout the data lifecycle

IA Code Section 9: Research Integrity and Methodological Soundness

The Code requires that research adhere to accepted methodological standards, and that emerging methods be evaluated for validity and reliability.

Application to Synthetic Data

Synthetic approaches may:

  • Generate outputs without direct observation of real participants
  • Produce responses that are plausible, but not grounded in empirical observation

Guidance

  • Researchers should assess whether synthetic outputs are fit for their intended purpose
  • Synthetic data should not be presented as equivalent to observed human responses without appropriate validation
  • Methodological limitations should be clearly understood, documented, and considered in application

IA Code Sections 1, 4, 8, and 10: Artificial Intelligence, Transparency and Disclosure

The Code emphasizes honesty, transparency, and accurate representation of research methods and findings.

Application to Synthetic Data

Risks of misunderstanding increase when:

  • The nature of the data source is not clearly communicated
  • Outputs are presented without distinguishing between human-derived and synthetic inputs

Guidance

  • The use of synthetic participants should be fully disclosed to research buyers and stakeholders
  • Reporting should clearly indicate whether findings are based on:
    • Human participants
    • Synthetic participants or digital twins
    • A combination of both
  • Methodological descriptions should provide sufficient detail to support informed interpretation

IA Code Section 11: Professionalism and Public Trust

The Code requires researchers to act with integrity and avoid practices that could undermine confidence in the profession.

Application to Synthetic Data

Public trust may be affected if:

  • Synthetic data is used in ways that imply representation of real individuals
  • The distinction between observed behavior and modeled behavior is not clearly maintained

Guidance

  • Synthetic data should not be used in a manner that misrepresents the source of insight
  • Researchers should consider how synthetic approaches may influence confidence in findings and the broader research process
  • The welfare and expectations of human participants should remain central, even when direct interaction is not taking place

Summary

The IA Code of Standards and Ethics provides a durable framework for evaluating emerging methodologies, including synthetic data and synthetic participants. While research tools and technologies continue to evolve, the core obligations remain consistent:

  • Obtain appropriate consent
  • Protect participant data
  • Ensure methodological integrity
  • Maintain transparency in methods and reporting
  • Uphold public trust

Synthetic data does not change these responsibilities. It reinforces the need to apply them with the same level of rigor, discipline, and accountability as any other research approach.

ABOUT THIS SERIES: The Insights Association Code of Standards & Ethics sets the principles that guide ethical and professional market research, insights, and analytics. But how do those standards apply in everyday practice? In this series, members of IA’s Standards Committee bring the Code to life through practical examples, showing how it guides responsible research and decision-making across the industry.

About the Author

Melanie Courtright, Chief Strategy Officer at Sago, is a member of the Insights Association Standards Committee and former CEO of the association.

Related

Share

Login

Members only Article - Please login to view
  • Back to top