An artificial intelligence (AI) algorithm developed by Geisinger researchers that uses echocardiogram videos predicted all-cause mortality at one year more accurately than three out of four expert cardiologists and other predictors commonly used in clinical practice, a study in Nature Biomedical Engineering demonstrated.

Using the model improved the predictive accuracy of cardiologists, the Danville, Penn.-based health care system’s team found. The sensitivity of the mortality predictions increased by 13%, while specificity remained about the same.

"Our goal is to develop computer algorithms to improve patient care," said Alvaro Ulloa Cerna, author and senior data scientist in the Department of Translational Data Science and Informatics at Geisinger. "In this case, we're excited that our algorithm was able to help cardiologists improve their predictions about patients, since decisions about treatment and interventions are based on these types of clinical predictions."

Mortality predictions guide care in many cardiology conditions and are particularly important in heart failure. Despite new medications and devices, 40% of heart failure patients die within a year of first hospitalization.

For these patients, the predictive model could significantly improve care and outcomes. “Perhaps the clearest opportunity is for patients with heart failure,” Cerna told BioWorld. “Clinicians currently use the Seattle Heart Failure Score in their planning for patient treatment. An improvement to that score that is more accurate and leverages imaging features can provide clinicians with highly accurate prediction of outcome and may change their treatment plan for patients.”

Compared to the Seattle Heart Failure (SHF) score, the model provided greater predictive power not only one year out from the echocardiogram, but for a subsequent nine-year period. The AI had a notably better negative predictive value than SHF, 89% vs 83%, indicating that it could be used to screen low-risk patients and reduce unnecessary interventions. The AI also outperformed pooled cohort equations, and models that used 158 variables from echocardiograms and electronic health records.

Building the AI

“Imaging is critical to treatment decisions in most modern medical specialties and has also become one of the most data rich components of the electronic health record,” the authors said. One echocardiogram generates 10 to 50 videos with about 3,000 images.

Those images convey extensive information about heart anatomy and function, but busy cardiologists have little time to analyze them or integrate their data into a steady stream of laboratory values, vital signs, diagnostic results, and other imaging studies. A neural network, however, can analyze the images and incorporate information from other data sources quickly and accurately.

The Geisinger model is a convolutional neural network trained on raw image pixel data from 812,278 echocardiographic videos from 34,362 patients treated at the health care system over the last decade. Using raw image pixel data circumvents biases unwittingly programed into an AI by human perception and pattern-recognition limits.

A proof of concept found that using all echocardiography video views yielded an area under the curve of 0.83 (95% CI, 0.83-0.84) for one year mortality, while using 158 selected variables from echocardiograms and health records produced an AUC of 0.81 (95% CI, 0.80-0.82). Using all echocardiography views plus the 158 variables only slightly increased the accuracy of the model.

"We were excited to find that machine learning can leverage unstructured datasets such as medical images and videos to improve on a wide range of clinical prediction models," said Chris Haggerty, co-senior author and assistant professor in the Department of Translational Data Science and Informatics at Geisinger.

Using a survey set of 600 participants in which half died within one year of echocardiography and the other half survived and a second set of 2,404 heart failure patients who had 3,384 echocardiograms, the researchers tested the model’s predictive ability against four expert cardiologists. The model yielded an AUC of 0.84 (95% CI, 0.81-0.87), while the aggregated score of the cardiologists had an AUC of 0.68 (95% CI, 0.64-0.71).

The cardiologists and researchers did not identify any “human-detectible anatomic findings that were key to the outcome,” Cerna noted.

The success of the model has encouraged the researchers to explore the data available in these videos more deeply to further increase its value. “We’ve only used a small portion of Geisinger’s rich echocardiogram archive,” Cerna said. “Thanks to Geisinger Research support, we can now extract more than 10 million videos and plan to improve the model’s accuracy with more echo data and more clinical data from patients’ electronic health record (EHR) and electrocardiograms.”