Limitations Of Using Synthetic Data In Clinical Trials

3 min readApr 18, 2022
Data illustrations by Story set

Not everyone is perfect, which is why it is crucial to understand the limitations of synthetic data in clinical trials. It will give you a better idea of what you can do with it and if you can use it in your research.

Here are the top limitations of using synthetic data in clinical trials:

1. Not Always Reliable In Modeling Outcomes

Unfortunately, synthetic data is not always reliable when it comes to modeling outcomes. That is because some researchers found that the predictions of Synthea did not always line up closely enough with the real-world data. The researchers used Synthea AI to generate a population of more than one million Massachusetts residents.

The synthetic residents mirrored the social determinants, demographics, and conditions that one can expect from such a sample of citizens. After that, they tested this data against the real-world incidences of four health quality measures.

These included:

· Controlling high blood pressure

· The rate of complications after a knee or hip replacement

· Colorectal cancer screening

· COPD thirty-day mortality

According to the results, the researchers found that the data of Synthea greatly underestimated the deaths from COPD and complications after a hip or knee replacement. So, as you can see, Synthea is reliable when modeling demographics and probabilities of services that are offered in a healthcare setting.

However, the capabilities to model the health outcomes are limited. Because of this, synthetic data may not always be reliable in modeling outcomes.

2. Challenge In Modeling The Effects Of Novel Therapies

Another limitation of synthetic data is that it can struggle to model the various effects of novel therapies. That is because the synthetic models we derive from existing datasets can replicate the general trends within the dataset. However, they may have trouble predicting the more specific trends inside a dataset.


Nuvanitic Medium is about decoding the millions of wonderful and inspiring stories within the world of synthetic data.