They say necessity is the mother of invention. In terms of healthcare, one of the most remarkable inventions begat by the pandemic is something called the National COVID Cohort Collaborative (N3C). The vision of the scientists and researchers behind the N3C is to turn massive amounts of already available data into new knowledge urgently needed to study COVID-19 and identify potential treatments.
The speed at which the N3C has moved is astonishing: In just a little over six months, the initiative was launched, made available to biomedical researchers and has already produced its first publication.
“With N3C, the goal is to collect and harmonize electronic clinical, laboratory and diagnostic data from hospitals and healthcare institutions around the nation, following the highest standards for data security and confidentiality,” said Mike Kurilla, MD, PhD, director of clinical innovation at the National Center for Advancing Translational Sciences of the NIH. “This real-world data will allow us to evaluate both short-term as well as long-term consequences of COVID-19, including related co-morbidities, therapeutic approaches and risk factors that can predict better (or worse) outcomes.”
Tell Bennett, MD, associate professor in the University of Colorado School of Medicine and director of Informatics in the Colorado Clinical and Translational Sciences Institute (CCTSI), has been helping to lead the N3C nationally. He is also the first author on the first paper to be published from N3C data, The National COVID Cohort Collaborative: Clinical Characterization and Early Severity Prediction.
How the study worked
The N3C Data Enclave is a secure cloud-based data and computing environment designed to facilitate virtual access to clinical data provided by hospitals nationwide. University of Colorado Anschutz Medical Campus contributes data from both Children’s Hospital Colorado and UCHealth. Data are updated up to two times per week and has been standardized and harmonized to one common data model to generate efficient and minimally biased results.
Using the N3C Data Enclave, researchers analyzed electronic medical record data from more than 1.9 million patients from 34 medical centers nationwide. Today, Bennett says data from approximately 3 million patients can be found in the N3C enclave, which will continue to grow over time as patient data continue to be added. In this retrospective cohort study, Bennett and his co-authors focused on more than 174,000 adults with COVID-19. They stratified patients using a World Health Organization COVID-19 severity scale and demographics. They then evaluated differences between groups over time, using multivariable logistic regression, establishing vital signs and laboratory values among COVID-19 patients with different severities, providing the foundation for predictive analytics.
Bennett said there were three main goals of the paper. “The first was to characterize the N3C cohort to introduce people to it. The second was to show the richness of data available in N3C about hospitalized patients. And last, we used rich inpatient data and machine learning (ML) to build a severity prediction model from the first day they [patients] were in the hospital.”
What they found
Of the patients with a positive COVID test, 32,472 (18.6%) were hospitalized. The median length of hospital stay was five days. Mortality (including discharge to hospice) was 11.6% among hospitalized patients. Others have reported that inpatient mortality has decreased over time. The study confirmed this: inpatient mortality decreased from 16.4% in March and April to 8.6% in September and October. Their data also showed that clinical severity shifted toward less-invasive mechanical ventilation and/or ECMO as the pandemic has progressed. Moreover, the study validated the ML predictions when tested against the actual data.
Promise for the future
Bennett said the ML models have the potential to be useful to clinicians treating patients in the hospital when paired with electronic health records. “These models tell us about the most powerful predictors of severity. If health systems decided to implement these models in the background, they could be surfaced and made available to physicians in the electronic health record,” he said. Another way the ML models could be used is by providing clinicians with a ranked list of variables that predict severity for each patient, which could potentially help clinicians make decisions.
“The N3C project is exciting to me because it merges the two halves of my work life. My ICU experience and direct experience taking care of patients with COVID-19 has been important to making sure the work I was doing in N3C was relevant and clinically meaningful. With a cloud-based enclave and very large data and complex data structures, it takes informaticists to do effective work in that space. Having a foot in both camps has been really useful,” Bennett said.
Bennett hopes his colleagues at CU Anschutz will take advantage of N3C. Current projects run the gamut from social determinants of health to machine learning on laboratory results. He said, “People are approaching the data from different angles and clinical domains. As examples, there are teams working on the effect of COVID-19 on people who are immunosuppressed, those who have cancer and those who have diabetes.”
He said the next project he is eager to tackle relates to children and COVID-19. “We are waiting for a little more data to accumulate, but I think that a national-level analysis of the effects of COVID-19 on kids will be an important contribution.”
This work is supported by the National Center for Advancing Translational Sciences of the National Institutes of Health under award number U24TR002306 and individual CTSA Program grants under PAR-18-940, PAR-18-464, PAR-15-304, RFA-TR-14-009. The CCTSI is a partner/collaborator of the N3C, which is funded by NIH’s National Center for Advancing Translational Sciences.
- data repositories
- clinical data reuse
Publishing CTSA Program Hub’s Name
Center for Data to Health
CTSA Program In Action Goals
Goal 2: Engage Patients and Communities in Every Phase of the Translational Process
Goal 5: Advance the Use of Cutting-Edge Informatics