Please use this identifier to cite or link to this item: https://hdl.handle.net/2440/136013
Type: Thesis
Title: Quantifying and reducing biases in paleogenomic research
Author: Oliva, Adrien
Issue Date: 2022
School/Discipline: School of Biological Sciences
Abstract: The emergence of high-throughput DNA sequencing technologies has enabled the sequencing of genomes at unprecedented rates and low costs. In parallel, paleogenomics research, which extends the study of ancient DNA molecules to whole genomes, has led to a number of transformative discoveries in evolutionary biology, environmental sciences, and even medicine. However, ancient DNA has a number of properties that make it challenging to investigate, including short fragment length, contamination, and damage, which are often exacerbated at genomic scales. It is now clear that numerous biases are pervasive in paleogenomics investigations, and their influence on downstream inferences is undeniable. In this thesis, I present results from a series of interrelated empirical studies that investigate issues related to reference bias and reproducibility in paleogenomics, providing the relevant historical and technical background in the introduction. In Chapter 1 of this thesis, I benchmark a range of short read alignment methods and algorithms available to paleogenomicists and quantify the impact of reference bias on downstream inferences. I show that the current standard alignment method in the paleogenomics field, i.e., using the BWA-aln software with specific settings developed during the early stages of paleogenomics, is still one of the best available tools for minimising the impact of reference bias. However, reference bias can be decreased even further when using NovoAlign software and an augmented version of the linear reference that incorporates known variants using IUPAC characters. In Chapter 2, I extend this investigation to include the recently developed variation graph methods to paleogenomic datasets, and assess its impact on a series of contentious population genetics inferences when compared to the two best performing 4 traditional (linear) alignment methods identified in chapter 1. Consistent with the results from chapter 1, the added variation captured by variation graphs make them less susceptible to reference bias than linear alignments (including IUPAC augmented methods). I also show that changes in bioinformatic parameters and sample choice can lead to subtle but significant differences in statistical inferences that could impact interpretations. Therefore, in the third chapter, I emphasise the importance of reproducibility in paleogenomic research, and make a series of recommendations regarding the minimum reported information required across all key steps of data processing and analyses to ensure reproducibility of paleogenomic results. Finally, in the discussion chapter, I summarise my findings and discuss their implications for the field of paleogenomics as well as potential directions for future research. Ultimately, this knowledge should help improve the reliability and robustness of paleogenomic inferences, leading to an improved understanding of population history and evolutionary phenomena.
Advisor: Llamas, Bastien
Souilmi, Yassine
Tobler, Raymond
Dissertation Note: Thesis (Ph.D.) -- University of Adelaide, School of Biological Sciences, 2022
Keywords: Bioinformatics
Ancient DNA
Paleogenomic
Pangenome
Provenance: This electronic version is made publicly available by the University of Adelaide in accordance with its open access policy for student theses. Copyright in this thesis remains with the author. This thesis may incorporate third party material which has been used by the author pursuant to Fair Dealing exceptions. If you are the owner of any included third party copyright material you wish to be removed from this electronic version, please complete the take down form located at: http://www.adelaide.edu.au/legals
Appears in Collections:Research Theses

Files in This Item:
File Description SizeFormat 
Oliva2022_PhD.pdf16.28 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.