University of California Health creates centralized data set to accelerate COVID-19 research

University of California Health

Drawing on electronic health records from across its academic health system, University of California Health has developed a unified, secure data set for use in COVID-19 research. The HIPAA Limited Data Set consisting of clinical information with more than 460 million data points is accessible to researchers across the entire UC system, enabling them to rapidly compare treatment options from previous patients to help future patients.

"Aggregating and using our collective clinical data in this safe and responsible way is one of a series of initiatives to speed up 'bench to bedside' research to treatment," said Atul Butte, chief data scientist for University of California Health and a distinguished professor at UCSF. "With the scale of the pandemic, we need as many UC researchers as possible to work on treatment options. Having access to this diverse data set that is already integrated may contain insights into COVID-19 that they may not find elsewhere, and can make their work more efficient. This type of dataset may provide a window into patterns they might not have otherwise been able to identify."

The University of California COVID Research Data Set (UC CORDS) simplifies the process a researcher would otherwise have to go through to have a critical mass of detailed clinical data and patient variables to make meaningful comparisons. Once the request is validated, researchers gain access to the systemwide data from UC Health’s five academic health centers. UC CORDS follows the U.S. Department of Health & Human Services definition of a HIPAA Limited Data Set and excludes key direct identifiers of the individual or of relatives, employers, or household members of the individual.

The extent and 'richness' of information will continue to expand as University of California hospitals at UC Davis Health, UC San Diego Health, UCI Health, UCLA Health and UCSF Health care for a growing number of patients with COVID-19. The geographic distribution of UC hospitals across California means the data pulls from a broad section of the state's diverse population. 

Patient diversity, along with details about age, pre-existing medical conditions and medications, and previous treatments, is essential to ensure findings are not unintentionally skewed due to homogeneity of the population. "Inequities in health care can start as early as the research phase," said Dr. Carrie L. Byington, executive vice president of University of California Health and an infectious disease expert. "We will always strive to avoid perpetuating inequities." Byington urged the creation of CORDS through the University of California's Biomedical Research Acceleration, Integration & Development consortium (UC BRAID), its five Clinical and Translational Science Award institutions, and University of California Health’s Data Warehouse team in the Center for Data-driven Insights and Innovation (CDI2).

After completing a pre-release phase, UC CORDS now has hundreds of researchers who have expressed interest in leveraging the data. One example is Jonathan Watanabe, PharmD, associate director and founding associate dean of pharmacy assessment and quality at the UC Irvine Susan and Henry Samueli College of Health Sciences, who is using the data set to understand the use of telehealth during the pandemic and selection of medications.

“A significant benefit of UC CORDS is that it gives you insights into clinical practices in much closer to real-time and is representative of a broader patient population than any one organization would have on its own, which is critical for research during the pandemic,” said Watanabe. “This kind of approach to aggregating and sharing data is what we need to create more accessible large, long-term data sets that help avoid a rush to conclusions based on questionable correlations and selection bias.” Watanabe also noted the timesaving benefit of UC CORDS which harmonizes and pre-integrates types of data that would normally come from multiple, separate data sets, if available at all.  

Even as University of California Health seeks to accelerate progress among its researchers, it is actively engaged in national efforts to find effective COVID-19 therapies. These include the National Institute of Health's ACT Network, through its National Center for Advancing Translational Sciences (NCATS), which is working to develop open access to de-identified electronic medical record data from a national network of leading academic health centers, and the Diagnostic Evidence Accelerator, a collaboration of the Reagan-Udall Foundation for the Food and Drug Administration and Friends of Cancer Research.

"The scale and speed of the pandemic calls for unprecedented internal cooperation and collaboration, and we can meet that need while maintaining safe, respectful, and responsible use of this data," said Butte. 

Publishing CTSA Program Hub’s Name
CTSA Program In Action Goals
Goal 1: Train and Cultivate the Translational Science Workforce
Goal 4: Innovate Processes to Increase the Quality and Efficiency of Translational Research, Particularly of Multisite Trials