Clinical trials are the most costly, time-consuming, and heavily regulated stages in drug development, often costing hundreds of millions of dollars and sometimes exceeding a billion dollars.

Every month of delay cuts into the patent-protected window that determines a drug’s commercial viability, with companies losing tens of millions of dollars in the process. And more importantly to me, it delays access to treatments that could change patients’ lives.

Having worked at Pfizer, I’ve seen firsthand the length and complexity of R&D and clinical timelines. This is why the recent Health Pulse Podcast, featuring Hazy co-founder Harry Keen (now part of SAS) and SAS Global Head of Health and Life Sciences Mark Lambrecht, caught my attention. Mark and Harry discussed how synthetic data can accelerate clinical trials, revealing a technology that is transitioning from early experimentation to strategic adoption across health care and pharmaceutical industries.

What synthetic data really offers

Synthetic data is not a replacement for clinical evidence and both speakers emphasize this. Instead, it uses a generative model trained on real clinical datasets until it learns their internal statistical structure. Once trained, the model can create new, fully artificial patient records that mirror the behaviors, patterns, correlations and variability of the original population, which is not traceable back to an individual.

The real bottleneck in clinical development isn’t the absence of data. It is the inability to use data because it is fragmented, highly sensitive, governed tightly or limited to small patient populations. Synthetic data opens the door to safe experimentation, something Mark described as a cornerstone of responsible AI in health care.

Digital twins raised from NASA to pharma

When Mark and Harry spoke about “synthetic twins,” it immediately pulled me back to the engineering roots of this term. I’ve always loved the fact that the first real digital twin wasn’t born in health care at all. It came from NASA during the Apollo programme. Engineers built virtual spacecraft so they could predict behavior, test failures and rehearse the mission from Earth before a single astronaut faced the real risks.

That image aligns perfectly with what synthetic data now enables in pharma. A traditional digital twin represents one real object because it’s fed by real-world data. A synthetic twin starts from a different foundation. It’s built on synthetic data, artificial records that capture the statistical patterns of a population without being traceable to anyone.

Instead of modelling a single patient, teams can explore entire virtual cohorts, test eligibility rules, stress-test operational plans, test hypotheses and predict outcomes long before the first real dataset arrives.

Pain points in clinical trials and how synthetic data addresses them

Anyone who has worked closely with clinical operations knows that clinical trials face a set of challenges: data is difficult to collect, slow to share, often incomplete for rare diseases, and heavily restricted by privacy rules.

However, the real slowdown typically begins before any actual data has been collected. Teams spend months waiting for approvals, aligning systems, reconciling standards and validating assumptions — long before the first patient ever enters a trial. As a result, modelling, testing and validation start too late, when changes and mistakes are already expensive and risky.

Synthetic data shifts this timeline. It gives organizations the freedom to model, test and validate much earlier, in a risk-free environment, well before the real data pipeline turns on. Below are the five key pain points and how synthetic data addresses them:

  • Slow access and strict privacy: It can take months to obtain approvals to work with real patient data. From my experience, approval timelines slow teams down long before any scientific problem does. Synthetic datasets can circulate immediately, enabling teams to collaborate much earlier.
  • Rare disease scarcity: Small cohorts limit the ability to model eligibility, endpoints or risk. Synthetic data can expand these cohorts in a statistically realistic manner, providing a more robust planning foundation.
  • Limited ability to test assumptions: Recruitment, eligibility and operational scenarios can be rehearsed before a trial begins.
  • Fragmentation: Data sits in different countries, systems and custodians. Synthetic data allows teams to rehearse integrations and CDISC transformations.
  • Late discovery of errors: Pipeline or model issues often appear only when real data starts flowing. Synthetic datasets can validate infrastructure proactively.

Synthetic data does not replace clinical evidence. It simply allows teams to prepare clinical trials more thoroughly, more safely and much earlier, long before real patients are involved.

Synthetic data in early-stage drug discovery

What excites me most is that synthetic data is no longer just a tool for operational efficiency; it has become a powerful asset for driving innovation. It’s reshaping early drug discovery and influencing which candidates advance to clinical trials. In my opinion, one of the greatest examples of 2025 is SandboxAQ, which was made possible by NVIDIA’s release of the Structurally Augmented IC50 Repository (SAIR) data set. It is a synthetic collection of over five million 3D protein-ligand structures.

Despite being entirely artificial, models trained on SAIR can predict binding affinities exponentially faster than traditional methods. For pharma, the consequence is direct. By enhancing the quality of early-stage candidates, synthetic data reduces the risk of late-stage failures, resulting in more predictable clinical pipelines. Mark noted that synthetic data is becoming a strategic asset for decision-making long before trials begin.

Operationalising synthetic data: SAS Data Maker

Synthetic data has already proven its application from discovery through development, but putting it to everyday use demands tools with strong governance, auditability, and enterprise workflows. SAS Data Maker meets these exact needs.

Data Maker enables organizations to create statistically realistic synthetic datasets using intuitive, low-code interfaces. It includes differential privacy controls and integrates synthetic data directly into analytics and modelling without risking sensitive information. Data Maker provides a secure platform for simulating patient behaviour and outcomes, testing treatment plans and choosing optimal care paths. All in a governed environment.

And the industry is responding. Synthetic data is shifting from niche experimentation to a core capability across large enterprises. Pharmaceutical companies want faster iteration, safer data access and more flexibility in early exploration. Synthetic data finally allows them to test and design without waiting for real patient data to arrive.

Regulators view

Regulators are often seen as the brake on innovation, but the reality they are far more pragmatic about synthetic data. They still draw a strict line, stating that synthetic data cannot replace real clinical evidence and cannot be used to make claims about safety or efficacy. However, they increasingly recognise its value for everything that happens around these boundaries. For example, EMA and HMA are exploring the use of synthetic data in the regulation of medicines (expected in Q4 2025). Similarly, the FDA is running programs to understand the possibilities and limitations of supplementing patient datasets with synthetic data.

From everything I see, regulators are taking a clear direction. They are truly encouraging safe innovation, not blocking it. If the synthetic datasets are transparently generated, properly validated and not misused as clinical evidence, authorities increasingly see them as a responsible way to accelerate development and de-risk AI.

Synthetic data drives clinical trials forward

Working within the pharmaceutical industry and alongside the organizations that drive it forward, I’m convinced that synthetic data is shifting from an experimental concept to a strategic capability. It shortens development cycles, strengthens early science, and, most importantly, reduces the cost of getting things wrong by providing teams with a safe environment where mistakes are inexpensive.

Clinical trials will always rely on real patients. But the path to patients can be shorter, safer and far more efficient. Synthetic data is becoming a foundational element of digital innovation in health care and life sciences. And the organizations that learn to rehearse early will be the ones that win the race to future medicines.

Learn more about SAS DataMaker: SAS Data Maker | SAS

Catch up on all episodes of The Health Pulse podcast: The Health Pulse | SAS




Source link


administrator