Doctoral Candidate Xianling Wang defends her dissertation on "Latent variable models for analyses with diagnostic tests and missing covariates"

Advisor: Gong Tang, PhD, Department of Biostatistics

Committee Members:

  • Chaeryon Kang, PhD, Department of Biostatistics
  • Christopher McKennan, PhD, Department of Statistics
  • Lu Tang, PhD, Department of Biostatistics
  • Jonathan Yabes, PhD, Department of Medicine

ABSTRACT:

This dissertation concerns statistical analyses with latent variables under two scenarios. Many discrete diagnostic markers, such as breast cancer tumor grade, are important prognostic factors yet suffer from reproducibility because of their subjective nature. With multiple independent ratings, latent class models are the choice for statistical inference. However, model parameters are only estimable up to a permutation on the labels of the underlying truth. When an auxiliary variable associated with the underlying truth in a known trend is observed, we proposed a joint model that achieves global identification and yields more efficient estimates. Remedy to a specific violation of the conditional independence assumption in those classical models was also provided. The methods were illustrated in the analysis of a tumor grade reading dataset from the National Surgical Adjuvant Breast and Bowel Project (NSABP). The improved efficiency was also demonstrated through simulation studies.

The second part of this dissertation concerns regression analyses when a covariate is subject to missing values with a hierarchical missing data mechanism. In electronic medical records (EMR) data, some important biomarkers such as lab test results are missing due to various reasons. Patients in remission are less likely to take those specialized tests. Furthermore, records of tested patients may be missing due to how the EMR data are assembled. In practice, the exact nature of such missingness is unavailable to the investigators. Standard methods such as the maximum likelihood method and inverse probability weighting typically ignore such heterogeneity and may produce biased estimates. We introduced a latent variable model to model the hierarchical missing data process and yield valid parameter estimates. The maximum likelihood method was used for estimation and inference. The proposed method was applied to a motivating EMR dataset from an inflammatory bowel disease registry at the University of Pittsburgh Medical Center. The performance of the proposed method was evaluated by simulation studies.

Public health significance: We proposed novel statistical methods to address missing data under two different scenarios. By yielding valid inference under those circumstances, application of the proposed methods has important public health implications.

 

Event Details

Please let us know if you require an accommodation in order to participate in this event. Accommodations may include live captioning, ASL interpreters, and/or captioned media and accessible documents from recorded events. At least 5 days in advance is recommended.


 

Dial-in Information

Join Zoom Meeting

https://pitt.zoom.us/j/92136945175

Passcode: 2021

University of Pittsburgh Powered by the Localist Community Event Platform © All rights reserved