The National COVID Cohort Collaborative (N3C): A National Data Sharing Partnership to Fight COVID-19

Abstract

Background:
America’s grim COVID-19 statistics tell a story of human suffering and loss on an epic scale. But there is more to that story than scary numbers—a hopeful plot twist in which patients’ health data are transformed into knowledge that guides our way out of the crisis. The National COVID Cohort Collaborative (N3C) is a new effort to collect, harmonize, and collaboratively analyze comprehensive EHR data on coronavirus patients from across the US. It is a synergistic partnership among the CTSA Program hubs, the National Center for Data to Health (CD2H), distributed clinical data networks (PCORnet, OHDSI, ACT, TriNetX); NCATS functions to securely steward the data. The N3C provides a powerful data analytics resource that will fill a critical knowledge gap: efforts are underway to use N3C to understand COVID-19 disease progression, effects of treatment, identification of drugs for repurposing, and long-term effects. The needs brought about by the COVID-19 pandemic have presented an opportunity to leverage a framework of CTSA and CD2H tools and resources, NCATS cloud resources, and collaborative informatics networks that had already been established throughout the CTSA Program and within the community.

We’re open at covid.cd2h.org!
Built and launched in just a few short months, the N3C is now accepting both data and analytics. Researchers from more than 300 institutions are already involved. Whether you’re a clinician, or a data scientist, or anything in between, there are numerous ways to contribute; learn more at covid.cd2h.org.

N3C has organized 15 Domain Teams that consist of clinicians, machine learning experts, informaticists, and other leaders who enable researchers with shared interests to analyze data within the N3C Data
Enclave and collaborate more efficiently in a team science environment. N3C encourages researchers of all levels to join a Domain Team that represents their interests, or to suggest new clinical areas to explore.

Benefits to CTSAs:
- Provides access to large-scale COVID-19 data from hospitals across the nation.
- Can be used to generate pilot data for grant proposals.
- Provides opportunities for KL2 and TL1 Scholars.
- Offers opportunities to leverage or expand expertise in statistics, machine learning (ML) analytics, natural language processing (NLP) methods, and provide access to community tools, software, and datasets.
- Provides opportunities for team science leveraging cloud compute infrastructure for scalable dissemination nationwide.

Current statistics:
- Over 1 million patient records (over 1 billion rows of data from more than 35 institutions and expected to double in the near future)
- Over 15 Domain Teams focusing on diverse topics from imaging to pregnancy, to social determinants of health
- 4 software languages supported (R, Python, Apache Spark SQL, and Contour)
- Over 1200 researchers from over 300 institutions representing every US state as well as 6 foreign countries

N3C Communications Materials are available on the N3C website. (https://covid.cd2h.org/N3C) Updated Media Kits, Communications Guidance, and Promotional Packets will be released by NCATS soon!

This work has been funded through NCATS and the National Institutes of Health under award number U24 TR002306.

Contact Email