130 Desoto Street, Pittsburgh, 15261

View map

"Integrative Cellular Deconvolution for Omics Data" Department of Biostatistics and Health Data Science, School of Public Health. 

Advisor and Committee Chair: Chris McKennan

Abstract: 

Deconvolution of bulk omics data estimates cell type proportions within tissue samples, enabling cell type-
specific (CTS) analyses, de-confounded downstream analyses, and many other downstream analyses. Existing
methods struggle with rare or correlated cell types and ignore inter-individual variability in CTS omics profiles.

This dissertation introduces four novel deconvolution models that integrate biological information or multiple
references to resolve these challenges for three types of omic data.

In Chapter 2, we propose Hierarchical Deconvolution (HiDecon), which uses single-cell RNA sequencing
references and incorporates a hierarchical cell type tree to model cell differentiation relationships. HiDecon
corrects estimation biases by coordinating fraction estimates across tree layers. This enables accurate
estimation of rare or similar cell types and identifies associations between cell subtypes and Alzheimer’s
disease.

Chapter 3 introduces BLEND, a hierarchical Bayesian method for bulk RNA-seq cellular deconvolution with
individualized reference integration. Current methods often overlook CTS expression varying across samples,
discrepancies between bulk and single-cell data, or lack guidance on reference data selection and integration.
BLEND addresses this by learning the most suitable reference for each bulk sample by integrating available
single-cell references. We show BLEND significantly outperforms state-of-the-art methods and provides reliable insights into Alzheimer’s disease progression.

Having demonstrated that reference blending improves bulk RNA-seq deconvolution, Chapter 4 extends this
strategy to array-based DNA methylation data with BLEND-M. In addition to capturing inter-individual variability in CTS DNAm, BLEND-M explicitly models the heteroscedasticity of methylation levels across marker CpGs, ensuring robustness to noisy markers. We demonstrate BLEND-M’s superior performance by benchmarking it against existing deconvolution methods using realistic simulated data and real datasets, and illustrate its utility by developing a risk prediction model for childhood atopic asthma.

Chapter 5 further extends the personalized reference blending framework to cell-free DNA methylation
deconvolution with cf-TREBLE. By accounting for inter-subject variability in CTS DNAm, cf-TREBLE introduces a
personalized deconvolution model whose parameters can be estimated up to 300 times faster than existing
methods. We demonstrate its superior performance on realistic simulated data and apply cf-TREBLE to identify biomarkers and develop risk prediction models for endometriosis and adenomyosis—gynecological conditions that affect many women and cause severe pain, as well as fertility and pregnancy complications—which currently require invasive surgery for diagnosis.

Public health significance: This dissertation introduces integrative computational methods for cellular
deconvolution for omics data. These methods improve existing deconvolution methods by integrating biological information or heterogeneous datasets, enabling cell type-specific analyses and non-invasive disease diagnosis.

Event Details

Please let us know if you require an accommodation in order to participate in this event. Accommodations may include live captioning, ASL interpreters, and/or captioned media and accessible documents from recorded events. At least 5 days in advance is recommended.

University of Pittsburgh Powered by the Localist Community Event Platform © All rights reserved