Targeting underrepresented populations in precision medicine through multi-source data integration

 

Abstract:

The increasing large-scale biobanks and institutional data networks have brought unique opportunities to integrate patients’ genomics and electronic health records data across diverse sources for studying complex human diseases, especially to address the diminished model performance in minority and disadvantaged groups due to their low representation in biomedical research. In this talk, I will introduce several statistical learning methods targeting underrepresented populations by integrating data from multiple biobanks, different ancestries, and related outcomes. These methods protect data privacy by learning from the pre-trained model in each source dataset without sharing patient-level data and account for potential data heterogeneity. We provide theoretical guarantees for the model performance and insights regarding when the external data source can be helpful to the target population. We demonstrate the superiority of our methods compared to benchmark methods, with examples using data from the UK biobank and the electronic Medical Records and Genomics (eMERGE) Network.

Event Details

Please let us know if you require an accommodation in order to participate in this event. Accommodations may include live captioning, ASL interpreters, and/or captioned media and accessible documents from recorded events. At least 5 days in advance is recommended.

University of Pittsburgh Powered by the Localist Community Event Platform © All rights reserved