130 Desoto Street, Pittsburgh, 15261

View map

"Statistical Modeling for High-dimensional Omics Studies for congruence, Heterogenity and Clustering" - Public Health/Biostatistics, 

Committee:
George Tseng (advisor and committee chair)

Abstract: 

High-dimensional omics data generated from high-throughput technologies capture molecular intricacy and variations, providing comprehensive insights into the pathological development of human diseases. However, statistical quantification of heterogeneity and congruence can be difficult both within a cohort and across studies due to the high dimensionality. This dissertation focuses on methodology development for cross-species congruence analysis for transcriptomic responses (Chapter 2), multivariate guided clustering for disease subtyping (Chapter 3) and multiple clustering in omics data (Chapter 4).

In Chapter 2, we propose a congruence analysis framework for transcriptomic response analysis by developing quantitative concordance/discordance scores incorporating data variabilities and pathway-centric downstream investigation. This framework can be applied to cross-species/tissues studies to assist researchers to numerically quantify and visually identify molecular mechanisms and pathway subnetworks that are best or least mimicked by model organisms, providing foundations for hypothesis generation and subsequent translational decisions.

In Chapter 3, we propose a multivariate guided clustering model (mgClust) to identify homogeneous molecular subtypes of a complex disease that are associated to multiple disease related clinical variables collectively. The two main components, disease subtyping model and multivariate clinical variable association model, interact with each other through a latent subtyping variable. Compared with existing methods, we show that mgClust has improved clustering and feature selection performance with accurate clinical variable selection through extensive simulations. Application to a lung disease dataset shows its benefit in enhancing interpretation and mechanistic understanding.

In Chapter 4, we propose a model-based multiple clustering algorithm to simultaneously discover multiple meaningful partitions of samples. Views with heterogeneous partitions are achieved by the competition of likelihoods in mixture models while clusters within each view are determined through the competition across individual Gaussian distributions. A relative likelihood of mixture models is proposed in the E-step of the Expectation-Maximization algorithm to enhance the view assignment and a tight clustering initialization is used to encourage dissimilar views. Application to multiple human brain tissue datasets show its effectiveness in capturing multiple distinct perspectives nested in high-dimensional omics data.

Contribution to public health: The framework proposed in Chapter 2 provides a quantitative approach to identify biomarkers, pathways and topological gene regulatory modules that are best or least mimicked by the model organism, which will facilitate hypothesis generation and translational guidance of animal models. The model proposed in Chapter 3 can identify disease subtypes that are associated with clinical variables of interests, which has important implication toward precision medicine. Chapter 4 provides a tool for simultaneously generating multiple partitions of samples reflecting different perspectives of the dataset, facilitating the exploration of publicly available omics data and the discovery of new knowledge in diseases.

Event Details

Please let us know if you require an accommodation in order to participate in this event. Accommodations may include live captioning, ASL interpreters, and/or captioned media and accessible documents from recorded events. At least 5 days in advance is recommended.

University of Pittsburgh Powered by the Localist Community Event Platform © All rights reserved