Bioinformatic analysis of genotype by sequencing (GBS) data with NGSEP

Background Therecent development and availability of different genotype by sequencing (GBS) protocols provided a cost-effective approach to perform high-resolution genomic analysis of entire populations in different species. The central component of all these protocols is the digestion of the initia...

Descripción completa

Detalles Bibliográficos
Autores principales: Perea, Claudia Samantha, Hoz, Juan Fernando de la, Cruz, Daniel Felipe, Lobaton, Juan David, Izquierdo, Paulo, Quintero, Juan Camilo, Raatz, Bodo, Duitama, Jorge
Formato: Journal Article
Lenguaje:Inglés
Publicado: Springer 2016
Materias:
Acceso en línea:https://hdl.handle.net/10568/76973
_version_ 1855524342802153472
author Perea, Claudia Samantha
Hoz, Juan Fernando de la
Cruz, Daniel Felipe
Lobaton, Juan David
Izquierdo, Paulo
Quintero, Juan Camilo
Raatz, Bodo
Duitama, Jorge
author_browse Cruz, Daniel Felipe
Duitama, Jorge
Hoz, Juan Fernando de la
Izquierdo, Paulo
Lobaton, Juan David
Perea, Claudia Samantha
Quintero, Juan Camilo
Raatz, Bodo
author_facet Perea, Claudia Samantha
Hoz, Juan Fernando de la
Cruz, Daniel Felipe
Lobaton, Juan David
Izquierdo, Paulo
Quintero, Juan Camilo
Raatz, Bodo
Duitama, Jorge
author_sort Perea, Claudia Samantha
collection Repository of Agricultural Research Outputs (CGSpace)
description Background Therecent development and availability of different genotype by sequencing (GBS) protocols provided a cost-effective approach to perform high-resolution genomic analysis of entire populations in different species. The central component of all these protocols is the digestion of the initial DNA with known restriction enzymes, to generate sequencing fragments at predictable and reproducible sites. This allows to genotype thousands of genetic markers on populations with hundreds of individuals. Because GBS protocols achieve parallel genotyping through high throughput sequencing (HTS), every GBS protocol must include a bioinformatics pipeline for analysis of HTS data. Our bioinformatics group recently developed the Next Generation Sequencing Eclipse Plugin (NGSEP) for accurate, efficient, and user-friendly analysis of HTS data. Results Here we present the latest functionalities implemented in NGSEP in the context of the analysis of GBS data. We implemented a one step wizard to perform parallel read alignment, variants identification and genotyping from HTS reads sequenced from entire populations. We added different filters for variants, samples and genotype calls as well as calculation of summary statistics overall and per sample, and diversity statistics per site. NGSEP includes a module to translate genotype calls to some of the most widely used input formats for integration with several tools to perform downstream analyses such as population structure analysis, construction of genetic maps, genetic mapping of complex traits and phenotype prediction for genomic selection. We assessed the accuracy of NGSEP on two highly heterozygous F1 cassava populations and on an inbred common bean population, and we showed that NGSEP provides similar or better accuracy compared to other widely used software packages for variants detection such as GATK, Samtools and Tassel. Conclusions NGSEP is a powerful, accurate and efficient bioinformatics software tool for analysis of HTS data, and also one of the best bioinformatic packages to facilitate the analysis and to maximize the genomic variability information that can be obtained from GBS experiments for population genomics.
format Journal Article
id CGSpace76973
institution CGIAR Consortium
language Inglés
publishDate 2016
publishDateRange 2016
publishDateSort 2016
publisher Springer
publisherStr Springer
record_format dspace
spelling CGSpace769732025-03-13T09:44:13Z Bioinformatic analysis of genotype by sequencing (GBS) data with NGSEP Perea, Claudia Samantha Hoz, Juan Fernando de la Cruz, Daniel Felipe Lobaton, Juan David Izquierdo, Paulo Quintero, Juan Camilo Raatz, Bodo Duitama, Jorge genomics bioinformatics genetic markers dna database data analysis bioinformática genómica marcadores genéticos adn análisis de datos Background Therecent development and availability of different genotype by sequencing (GBS) protocols provided a cost-effective approach to perform high-resolution genomic analysis of entire populations in different species. The central component of all these protocols is the digestion of the initial DNA with known restriction enzymes, to generate sequencing fragments at predictable and reproducible sites. This allows to genotype thousands of genetic markers on populations with hundreds of individuals. Because GBS protocols achieve parallel genotyping through high throughput sequencing (HTS), every GBS protocol must include a bioinformatics pipeline for analysis of HTS data. Our bioinformatics group recently developed the Next Generation Sequencing Eclipse Plugin (NGSEP) for accurate, efficient, and user-friendly analysis of HTS data. Results Here we present the latest functionalities implemented in NGSEP in the context of the analysis of GBS data. We implemented a one step wizard to perform parallel read alignment, variants identification and genotyping from HTS reads sequenced from entire populations. We added different filters for variants, samples and genotype calls as well as calculation of summary statistics overall and per sample, and diversity statistics per site. NGSEP includes a module to translate genotype calls to some of the most widely used input formats for integration with several tools to perform downstream analyses such as population structure analysis, construction of genetic maps, genetic mapping of complex traits and phenotype prediction for genomic selection. We assessed the accuracy of NGSEP on two highly heterozygous F1 cassava populations and on an inbred common bean population, and we showed that NGSEP provides similar or better accuracy compared to other widely used software packages for variants detection such as GATK, Samtools and Tassel. Conclusions NGSEP is a powerful, accurate and efficient bioinformatics software tool for analysis of HTS data, and also one of the best bioinformatic packages to facilitate the analysis and to maximize the genomic variability information that can be obtained from GBS experiments for population genomics. 2016-08 2016-09-06T16:38:11Z 2016-09-06T16:38:11Z Journal Article https://hdl.handle.net/10568/76973 en Open Access Springer Perea, Claudia; De La Hoz, Juan Fernando; Cruz, Daniel Felipe; Lobaton, Juan David; Izquierdo, Paulo; Quintero, Juan Camilo; Raatz, Bodo; Duitama, Jorge. 2016. Bioinformatic analysis of genotype by sequencing (GBS) data with NGSEP. BMC Genomics . 17(Suppl 5):498.
spellingShingle genomics
bioinformatics
genetic markers
dna
database
data analysis
bioinformática
genómica
marcadores genéticos
adn
análisis de datos
Perea, Claudia Samantha
Hoz, Juan Fernando de la
Cruz, Daniel Felipe
Lobaton, Juan David
Izquierdo, Paulo
Quintero, Juan Camilo
Raatz, Bodo
Duitama, Jorge
Bioinformatic analysis of genotype by sequencing (GBS) data with NGSEP
title Bioinformatic analysis of genotype by sequencing (GBS) data with NGSEP
title_full Bioinformatic analysis of genotype by sequencing (GBS) data with NGSEP
title_fullStr Bioinformatic analysis of genotype by sequencing (GBS) data with NGSEP
title_full_unstemmed Bioinformatic analysis of genotype by sequencing (GBS) data with NGSEP
title_short Bioinformatic analysis of genotype by sequencing (GBS) data with NGSEP
title_sort bioinformatic analysis of genotype by sequencing gbs data with ngsep
topic genomics
bioinformatics
genetic markers
dna
database
data analysis
bioinformática
genómica
marcadores genéticos
adn
análisis de datos
url https://hdl.handle.net/10568/76973
work_keys_str_mv AT pereaclaudiasamantha bioinformaticanalysisofgenotypebysequencinggbsdatawithngsep
AT hozjuanfernandodela bioinformaticanalysisofgenotypebysequencinggbsdatawithngsep
AT cruzdanielfelipe bioinformaticanalysisofgenotypebysequencinggbsdatawithngsep
AT lobatonjuandavid bioinformaticanalysisofgenotypebysequencinggbsdatawithngsep
AT izquierdopaulo bioinformaticanalysisofgenotypebysequencinggbsdatawithngsep
AT quinterojuancamilo bioinformaticanalysisofgenotypebysequencinggbsdatawithngsep
AT raatzbodo bioinformaticanalysisofgenotypebysequencinggbsdatawithngsep
AT duitamajorge bioinformaticanalysisofgenotypebysequencinggbsdatawithngsep