Simulated Patient Data Fuels a New Tool for Healthcare Innovators

September 2017
Topics: Health IT, Data Analytics, Information Systems, Public Health, Software (General), Software Engineering
Lack of useable patient data can stifle innovation in healthcare. So, MITRE employees designed Synthea, a tool that creates simulated patient records. We offer it as open source software without charge—and the health IT community has embraced it.
Artistic rendering of an older man

Editor's Note: Read the latest news about Synthea.

One of the biggest changes in healthcare isn't taking place in the operating room. It's coming in bits and bytes of data.

Digital data can open many doors in the advancement of the quality of healthcare. Such as the ability for researchers to use computer-based patient health records to develop analytical measures and clinical decision-support tools. To get a complete picture of the patient, providers must merge electronic health record (EHR) system information from multiple sources; health data interoperability is key to this integration of patient information. And it is key to the promise of establishing a "learning" health system in which science, informatics, incentives, and culture are aligned for continuous improvement and innovation.

However, all these advances require access to vast amounts of information about people, their health conditions, their lifestyles, their health outcomes, and a host of other factors. But due to privacy regulations, that data is extremely hard to come by even for highly justified causes. But now, MITRE researchers have found a way to create mock patients with realistic health issues to generate the data that's needed.

Our solution is an open source software tool called "Synthea," a contraction of "synthetic" and "health." Synthea generates synthetic patients and their medical histories based on simulated incidences. Their ailments and conditions range from allergies and asthma, to lung cancer and lupus, to total joint replacements and urinary tract infections.

All this information about synthetic patients creates a synthetic "world" that can be fed into EHR systems and services under development. By simulating the health of these patients, from onset of conditions through treatment for some of the most frequent and chronic ailments in the United States, developers have a set of usable, realistic data to work from with no restrictions whatsoever. The approach, methods, and software mechanisms have been published in the Journal of the American Medical Informatics Association. 

Researchers from around the world now use Synthea to move healthcare forward. 

Innovators Stopped "Dead in Our Tracks"

The Synthea saga has its origins in MITRE's work in support of the healthcare IT community. "We needed patient datasets we could use to test software we were developing, such as clinical quality measures," MITRE's Jason Walonoski explains.

However, researchers can't use actual patient data except under rare circumstances. That meant the MITRE team and other innovators needed to turn to patient data that has been "de-identified" and can't be linked to an actual patient.

De-identified data is available but can be very expensive to purchase. In addition, even its use is often restricted and, in some cases, the quality of the data is marginal at best. Another problem, Walonoski says, is that "it's relatively easy to re-identify the data with a real person, and then you could be in violation of the Health Insurance Portability and Accountability Act [HIPAA] and other laws."

According to group leader Dr. Mark Kramer, who focuses on health IT interoperability, "The lack of useful patient data was stifling innovation in healthcare. We were being stopped dead in our tracks." So, five years ago, Kramer and his colleague Dr. Marc Hadley began investigating the possibility of building a solution that could generate synthetic data for healthcare research and analytics.

Kramer dubbed it Synthea.

The team built Synthea with factual human disease models that create synthetic patients and their related health conditions and behaviors to reflect real U.S. populations. MITRE researchers culled clinical care data and academic research to build the Synthea disease models. The team then combined that data with disease incidence and prevalence statistics from the U.S. Centers for Disease Control and Prevention, the National Institutes of Health, and other government sources to create clinical disease "modules" that simulate health events that could occur in a real patient's life. Finally, they added demographic data provided by the United States Census Bureau and other sources.

They tested their work through a demonstration project, SyntheticMass, which models the health information of more than one million Massachusetts residents. (Read "A World of Bay Staters Changing Electronic Health Records" below.)

An Open Source Solution Bridges the Gap

The team knew there would be high demand for Synthea data in the healthcare IT community. There was little doubt in their minds that the solution would have broad impact when implemented as a publicly accessible, open source project.

"Open source is a way to innovate and attract collaborative user communities," Kramer notes. "You can obtain an even higher rate of progress by engaging external communities in the system's development. As Sun Microsystems co-founder Bill Joy once said, 'No matter who you are, most of the smartest people work for someone else.' Open source helps bring in the smartest people."

Researchers can use Synthea's Generic Module Framework (GMF) to create their own illness and disease modules and describe a progression of health states and the transitions between them. The GMF is available to the entire community via the GitHub repository, and the simulated patient health records can be exported in several different standard health data formats to provide researchers additional flexibility.

"People around the world are using the data from Synthea, and folks from as far away as Australia and New Zealand are collaborating with us on improvements," Walonoski says. "It truly is a global project, which we are developing and offering as open source code, in the public interest."

The Increasing Role of Data in Healthcare

"More and more, the healthcare field is about digital patient data," Kramer adds. "We're helping the next generation of healthcare workers in the broadest sense—from a new class of doctors to nursing informatics specialists—to understand and work with realistic patient data."

"We're also enabling innovators to get into health IT because they now have access to realistic health data free of legal, privacy, security, and intellectual restrictions," Walonoski says. "They're using Synthea data to develop software applications, advance health data sharing, and to perform preliminary testing on their new, innovative digital healthcare services."

Synthea is also part of the Standard Health Record Collaborative, an open source, health data interoperability effort started by MITRE. The collaborative's focus is to develop a Standard Health Record (SHR) and the technological infrastructure that will make it easier to deliver patient-centric services. "The type of information provided by Synthea is critical to developing health record standardization," says SHR team leader Andre Quina of MITRE. "We're aiming for a single, high-quality SHR that takes a 'one human, one health record' approach to digital healthcare."

Additionally, the data standards organization Health Level Seven® International singled out Synthea and SyntheticMass this past spring for its contributions to health IT research following Walonoski's presentation at the HL7-AMIA Datathon.

Synthea holds other promises as well.

"Synthea also simulates the behaviors of patients, not only their healthy and non-healthy behaviors, but also their care-seeking behaviors," Kramer says. "In the future, we might be able to capture the impact of accessibility to health services and the costs of health services on the health outcomes of a population. And, intriguingly, maybe we can even make predictions about the impact of specific health policies on the overall health of a community."

—by Jim Chido


Publication Search