Export Ready — 

Robust and efficient software for reference-free genomic diversity analysis of GBS data on diploid and polyploid species

Genotype-by-sequencing (GBS) is a widely used cost-effective technique to obtain large numbers of genetic markers from populations. Although a standard reference-based pipeline can be followed to analyze these reads, a reference genome is still not available for a large number of species. Hence,...

Full description

Bibliographic Details
Main Authors: Parra Salazar, Andrea, Gomez, Jorge, Lozano Arce, Daniela, Reyes Herrera, Paula H., Duitama, Jorge
Format: article
Language:Inglés
Published: Cold Sprimg Harbor Laboratory (CSH) 2024
Subjects:
Online Access:https://www.biorxiv.org/content/10.1101/2020.11.28.402131v1
http://hdl.handle.net/20.500.12324/39793
https://doi.org/10.1101/2020.11.28.402131
Description
Summary:Genotype-by-sequencing (GBS) is a widely used cost-effective technique to obtain large numbers of genetic markers from populations. Although a standard reference-based pipeline can be followed to analyze these reads, a reference genome is still not available for a large number of species. Hence, several research groups require reference-free approaches to generate the genetic variability information that can be obtained from a GBS experiment. Unfortunately, tools to perform de-novo analysis of GBS reads are scarce and some of the existing solutions are difficult to operate under different settings generated by the existing GBS protocols. In this manuscript we describe a novel algorithm to perform reference-free variants detection and genotyping from GBS reads. Nonexact searches on a dynamic hash table of consensus sequences allow to perform efficient read clustering and sorting. This algorithm was integrated in the Next Generation Sequencing Experience Platform (NGSEP) to integrate the state-ofthe- art variants detector already implemented in this tool. We performed benchmark experiments with three different real populations of plants and animals with different structures and ploidies, and sequenced with different GBS protocols at different read depths. These experiments show that NGSEP has comparable and in some cases better accuracy and always better computational efficiency compared to existing solutions. We expect that this new development will be useful for several research groups conducting population genetic studies in a wide variety of species.