Marie Tuft of the Department of Biostatistics defends her dissertation on "Statistical Learning for the Spectral Analysis of Time Series Data". 

Spectral analysis of biological processes poses a wide variety of complications. Statistical learning techniques in both the frequentist and Bayesian frameworks are required to overcome the unique and varied challenges that exist in attempting to analyze these data in a meaningful way. This dissertation presents new methodologies to address problems in multivariate stationery and univariate nonstationary time series analysis.

The first method is motivated by the analysis of the time series of heart rate variability (HRV). Since it is nonstationary, it poses a unique challenge: localized, accurate, and interpretable descriptions of both frequency and time are required. By reframing this question in a reduced-rank regression setting, we propose a novel approach that produces a low-dimensional, empirical basis that is localized in bands of time and frequency. To estimate this frequency-time basis, we apply penalized reduced rank regression with singular value decomposition to the localized discrete Fourier transform. An adaptive sparse fused lasso penalty is applied to the left and right singular vectors, resulting in low-dimensional measures that are interpretable as localized bands in time and frequency. We then apply this method to interpret the power spectrum of HRV measured on a single person over the course of a night.

The second method considers the analysis of 64-channel high dimensional resting-state electroencephalography (EEG) recorded on a group of first-episode psychosis subjects compared to the rsEEG from a group of healthy controls. This type of analysis poses two challenges. First, estimating the spectral density matrix in a high dimensional setting. And second, incorporating covariates into the estimate of the spectral density. To overcome the challenge of dimensionality, we use a Bayesian factor model, which decomposes the Fourier transform of the time series into a matrix of factors and vector of factor loadings. The factor model is then embedded into a mixture model with covariate dependent mixture weights resulting in a power spectrum conditioned on any number or combination of covariates. The method is then applied to examine differences in the power spectrum for first-episode psychosis subjects vs. healthy controls.

Public health significance: As collection methods for time series data of biological processes become ubiquitous in biomedical research, there is an increasing need for statistical methodology that is robust enough to handle the complicated and potentially high dimensionality of the data while retaining the flexibility needed to answer real-world questions of interest posed by clinicians.

 Committee members:

  • Robert Krafty, advisor, Department of Biostatistics
  • Stewart Anderson, Department of Biostatistics
  • Ada Youk, Department of Biostatistics
  • Scott Rothenberger, Division of General Internal Medicine

Event Details

Please let us know if you require an accommodation in order to participate in this event. Accommodations may include live captioning, ASL interpreters, and/or captioned media and accessible documents from recorded events. At least 5 days in advance is recommended.

University of Pittsburgh Powered by the Localist Community Event Platform © All rights reserved