Thursday, June 17, 2021 2:00pm to 4:00pm
About this Event
Doctoral Candidate Peng Liu defends his dissertation on "Outcome Guided Diseases Subtyping and Power Calculation for High Dimensional Omics Studies".
Advisors: Lu Tang, PhD, & George Tseng, ScD, Department of Biostatistics
Committee Members:
ABSTRACT:
With the rapid advancement of high-throughput technologies and dropping prices, omics experiments have gained popularity, generating a large amount of high-dimensional data in the public domain, which gives rise to various statistical and computational challenges in the design and analysis of omics experiments. Integrative analysis of multi-omics data can improve accuracy, reproducibility and interpretability of single omic studies. This proposal focuses on addressing disease subtyping (Chapters 2\&3) and power calculation issues (Chapter 4) in the analysis of high-dimensional omics studies.
In Chapter 2, we proposed an outcome-guided disease subgrouping framework called ogClust. Disease subtyping by omics data usually applies conventional clustering methods, which primarily concerns identifying subpopulations with similar patterns in gene features. Since outcome information is not considered in clustering, the identified disease subtypes are often not associated with the outcome. ogClust is based on a generative latent class model unified with two components, the disease subtyping model and outcome association model, and linked with the latent class. This framework uses a continuous or survival clinical outcome to guide disease subtypes, which identifies disease subtypes with their driving genes, and guarantees that the resulting subtypes are associated with disease of interest.
In Chapter 3, we extended the ogClust model by integrating multi-omics data and incorporating biological information via the sparse overlapping group lasso to improve the accuracy and interpretability of feature selection and disease subtyping. An EM algorithm with alternating direction method of multiplier (ADMM) approach is applied for fast optimization.
In Chapter 4, we proposed a power calculation and study design method ``MethylSeqDesign" for bisulfite DNA methylation sequencing (Methyl-Seq) studies. Simultaneously considering sample size and sequencing depth in Methyl-Seq and the complexity and large scale of methylation data bring statistical challenges for power calculation. The proposed method utilizes pilot data for power calculation and experimental design for Methyl-Seq experiments. The approach is based on a mixture model fitting of p-value distribution from pilot data and a parametric bootstrap procedure based on approximated Wald test statistics to infer genome-wide power for optimal sample size and sequencing depth. The performance of the method was evaluated with simulations. Two real examples are analyzed to illustrate our approach.
Please let us know if you require an accommodation in order to participate in this event. Accommodations may include live captioning, ASL interpreters, and/or captioned media and accessible documents from recorded events. At least 5 days in advance is recommended.