Informatics Community Using EHR Data to Answer COVID-19 Questions: Featuring the Long-COVID Clinical Domain Team

N3C Logo


The National COVID Cohort Collaborative (N3C) continues to grow as a robust EHR data resource, now with over 3 million persons represented and 4 billion rows of data from more than 42 sites. Currently, 24 multidisciplinary Domain Teams composed of clinical and subject matter experts, statisticians, informaticists, and machine learning specialists are addressing the most pressing clinical questions. N3C data can be utilized to: understand COVID-19’s impact on health, collect pilot data for grant submissions, train algorithms on larger datasets, inform clinical trial design, learn how to use tools for large-scale COVID-19 data, and validate results.

This issue features the Long-COVID Clinical Domain Team, which aims to define and characterize patients with long-term sequelae of SARS-CoV-2 infection. These patients continue to experience several symptoms for an extended period of time after recovering from the initial effects of COVID-19 virus. A Long-COVID phenotype will support prognostic characterization of different substrata, potentially more precise care management, and greatly inform prospective interventional studies. The NIH has also just launched a new initiative to study Long COVID to help answer underlying questions surrounding this phenomenon. Read the announcement here.

Longitudinal, multimodal research is necessary for precision medicine. A use case for Long-COVID asks such questions as: Who has Long-COVID? How effective are existing treatments? How do viral/host variants correlate with outcomes? How can we best design a long-covid trial? How can we deploy, evaluate, and refine care guidelines quickly and effectively over time? Defining who will have Long-COVID poses a challenge for several reasons. The presence of the COVID-19 ICD-10 code alone is not sufficient, because we have yet to create a valid and reliable phenotype. Patients who have the ICD-10 code will be extremely heterogeneous and a poor cohort for prospective studies. To define Long-COVID and create sub-classifications, we need multimodal, longitudinal classification of patients (EHR data, imaging data, self reported data, viral & host genomic data, etc.)

The Long-COVID Domain Team is co-led by: Melissa Haendel, PhD, FACMI, Oregon Health & Science University; Emily Pfaff, MSIS, PhD, University of North Carolina Chapel Hill; Joel Saltz, MD, PhD, Stony Brook University; Christopher Chute, MD, DrPH, Johns Hopkins University; and Tell Bennett, MD, University of Colorado Anschutz Medical Campus.

N3C Domain Teams

  • COVID-19
  • Translational Team Science
  • Data Repositories

Center for Data to Health

CTSA Program In Action Goals
Goal 1: Train and Cultivate the Translational Science Workforce
Goal 5: Advance the Use of Cutting-Edge Informatics