Events Calendar

17 Jun
Event Type

Defenses

Topic

Research

Target Audience

Faculty, Graduate Students

University Unit
Department of Biostatistics
Subscribe
Google Calendar iCal Outlook

Peng Liu: Outcome Guided Disease Subtyping and Power Calculation for High Dimensional Omics Studies

Doctoral Candidate Peng Liu defends his dissertation on "Outcome Guided Diseases Subtyping and Power Calculation for High Dimensional Omics Studies".

Advisors: Lu Tang, PhD, & George Tseng, ScD, Department of Biostatistics

Committee Members: 

  • Yongseok Park, PhD, Department of Biostatistics
  • Daniel Weeks, PhD, Department of Human Genetics

ABSTRACT:

With the rapid advancement of high-throughput technologies and dropping prices, omics experiments have gained popularity, generating a large amount of high-dimensional data in the public domain, which gives rise to various statistical and computational challenges in the design and analysis of omics experiments. Integrative analysis of multi-omics data can improve accuracy, reproducibility and interpretability of single omic studies. This proposal focuses on addressing disease subtyping (Chapters 2\&3) and power calculation issues (Chapter 4) in the analysis of high-dimensional omics studies.

In Chapter 2, we proposed an outcome-guided disease subgrouping framework called ogClust. Disease subtyping by omics data usually applies conventional clustering methods, which primarily concerns identifying subpopulations with similar patterns in gene features. Since outcome information is not considered in clustering, the identified disease subtypes are often not associated with the outcome. ogClust is based on a generative latent class model unified with two components, the disease subtyping model and outcome association model, and linked with the latent class. This framework uses a continuous or survival clinical outcome to guide disease subtypes, which identifies disease subtypes with their driving genes, and guarantees that the resulting subtypes are associated with disease of interest.

In Chapter 3, we extended the ogClust model by integrating multi-omics data and incorporating biological information via the sparse overlapping group lasso to improve the accuracy and interpretability of feature selection and disease subtyping. An EM algorithm with alternating direction method of multiplier (ADMM) approach is applied for fast optimization.

In Chapter 4, we proposed a power calculation and study design method ``MethylSeqDesign" for bisulfite DNA methylation sequencing (Methyl-Seq) studies. Simultaneously considering sample size and sequencing depth in Methyl-Seq and the complexity and large scale of methylation data bring statistical challenges for power calculation. The proposed method utilizes pilot data for power calculation and experimental design for Methyl-Seq experiments. The approach is based on a mixture model fitting of p-value distribution from pilot data and a parametric bootstrap procedure based on approximated Wald test statistics to infer genome-wide power for optimal sample size and sequencing depth. The performance of the method was evaluated with simulations. Two real examples are analyzed to illustrate our approach.

Dial-In Information

 

Dial-in Information

Join Zoom Meeting

https://pitt.zoom.us/j/5823155813

Passcode: 2021

Thursday, June 17 at 2:00 p.m. to 4:00 p.m.

Virtual Event

Peng Liu: Outcome Guided Disease Subtyping and Power Calculation for High Dimensional Omics Studies

Doctoral Candidate Peng Liu defends his dissertation on "Outcome Guided Diseases Subtyping and Power Calculation for High Dimensional Omics Studies".

Advisors: Lu Tang, PhD, & George Tseng, ScD, Department of Biostatistics

Committee Members: 

  • Yongseok Park, PhD, Department of Biostatistics
  • Daniel Weeks, PhD, Department of Human Genetics

ABSTRACT:

With the rapid advancement of high-throughput technologies and dropping prices, omics experiments have gained popularity, generating a large amount of high-dimensional data in the public domain, which gives rise to various statistical and computational challenges in the design and analysis of omics experiments. Integrative analysis of multi-omics data can improve accuracy, reproducibility and interpretability of single omic studies. This proposal focuses on addressing disease subtyping (Chapters 2\&3) and power calculation issues (Chapter 4) in the analysis of high-dimensional omics studies.

In Chapter 2, we proposed an outcome-guided disease subgrouping framework called ogClust. Disease subtyping by omics data usually applies conventional clustering methods, which primarily concerns identifying subpopulations with similar patterns in gene features. Since outcome information is not considered in clustering, the identified disease subtypes are often not associated with the outcome. ogClust is based on a generative latent class model unified with two components, the disease subtyping model and outcome association model, and linked with the latent class. This framework uses a continuous or survival clinical outcome to guide disease subtypes, which identifies disease subtypes with their driving genes, and guarantees that the resulting subtypes are associated with disease of interest.

In Chapter 3, we extended the ogClust model by integrating multi-omics data and incorporating biological information via the sparse overlapping group lasso to improve the accuracy and interpretability of feature selection and disease subtyping. An EM algorithm with alternating direction method of multiplier (ADMM) approach is applied for fast optimization.

In Chapter 4, we proposed a power calculation and study design method ``MethylSeqDesign" for bisulfite DNA methylation sequencing (Methyl-Seq) studies. Simultaneously considering sample size and sequencing depth in Methyl-Seq and the complexity and large scale of methylation data bring statistical challenges for power calculation. The proposed method utilizes pilot data for power calculation and experimental design for Methyl-Seq experiments. The approach is based on a mixture model fitting of p-value distribution from pilot data and a parametric bootstrap procedure based on approximated Wald test statistics to infer genome-wide power for optimal sample size and sequencing depth. The performance of the method was evaluated with simulations. Two real examples are analyzed to illustrate our approach.

Dial-In Information

 

Dial-in Information

Join Zoom Meeting

https://pitt.zoom.us/j/5823155813

Passcode: 2021

Thursday, June 17 at 2:00 p.m. to 4:00 p.m.

Virtual Event

Event Type

Defenses

Topic

Research

Target Audience

Faculty, Graduate Students

University Unit
Department of Biostatistics

Powered by the Localist Community Events Calendar ©