Please use this identifier to cite or link to this item: https://hdl.handle.net/2440/137066
Type: Thesis
Title: Fully Automated Parameter Estimation for Mixtures of Factor Analyzers
Author: Davey, John Colm
Issue Date: 2022
School/Discipline: School of Mathematical Sciences
Abstract: Mixture models are a family of statistical models that can model datasets with underlying sub-population structures effectively. This thesis focuses on one particular mixture model, called the Mixtures of Factor Analyzers (MFA) model [Ghahramani et al., 1997], which is a multivariate clustering model more parsimonious than the well known Gaussian mixture model (GMM). The MFA model has two hyperparameters, g, the number of components, and q, the number of factors per component. When these are assumed to be known in advance, approximate maximum likelihood estimates for the remaining model parameters can be obtained using Expectation Maximisation (EM)-type algorithms [Dempster et al., 1977] [Ghahramani et al., 1997] [McLachlan and Peel, 2000] [Zhao and Yu, 2008]. This work reviews methods for fitting the MFA model in the more realistic case where its two hyperparameters are not known a priori. A systematic comparison of seven methods for fitting the MFA model when its hyperparameters are unknown is conducted. The methods are compared based on their ability to infer the two hyperparameters accurately, as well as general model fit, clustering accuracy and the length of time taken to fit the model. The results suggest that a naive grid search over both hyperparameters performs the best on all of the metrics except for the time taken to fit the models. The Infinite Mixtures of Infinite Factor Analyzers (IMIFA) algorithm [Murphy et al., 2020] also performs well on most of the metrics. However, like the naive search, IMIFA is also very computationally intensive. The Automatic Mixture of Factor Analyzers (AMFA) algorithm [Wang and Lin, 2020] is a viable alternative when available computation time is limited, as it often performs comparably to the na¨ıve search and IMIFA, but with greatly reduced computation times. To facilitate the comparison, the R package autoMFA is created, which implements five methods for the automated fitting of the MFA model and is available on the Comprehensive R Archive Network (CRAN). A limitation of the MFA model is its inability to deal with asymmetrical cluster shapes, which is a consequence of using multivariate Gaussian component densities. The Mixtures of Mean-Variance Mixture of Normal Distribution Factor Analyzers (MMVMNFA) family is proposed as a generalisation of the MFA model, which permits asymmetrical component densities. A new EM-type algorithm for parameter estimation of MMVMNFA models is developed. Based on its performance in the comparison, the AMFA algorithm is selected and generalised to the MMVMNFA family. Six specific instances of the MMVMNFA family are considered, and the steps for the EM-type algorithm are derived for each. The Julia package FactorMixtures is created, which contains implementations of each of these algorithms. The six instances are tested on two synthetic datasets and two real world datasets, where their superior ability to capture heavy-tailed data and data exhibiting multivariate skewness is demonstrated in comparison to the standard MFA model, which cannot effectively capture either of these properties.
Advisor: Glonek, Garique
Lee, Sharon
Dissertation Note: Thesis (Ph.D.) -- University of Adelaide, School of Mathematical Sciences, 2022
Keywords: mixture models
mixtures of factor analyzers
factor analysis
Provenance: This electronic version is made publicly available by the University of Adelaide in accordance with its open access policy for student theses. Copyright in this thesis remains with the author. This thesis may incorporate third party material which has been used by the author pursuant to Fair Dealing exceptions. If you are the owner of any included third party copyright material you wish to be removed from this electronic version, please complete the take down form located at: http://www.adelaide.edu.au/legals
Appears in Collections:Research Theses

Files in This Item:
File Description SizeFormat 
Davey2022_PhD.pdf3.05 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.