Benchmarking database systems for Genomic Selection implementation

With high-throughput genotyping systems now available, it has become feasible to fully integrate genotyping information into breeding programs. To make use of this information effectively requires DNA extraction facilities and marker production facilities that can efficiently deploy the desired set...

Full description

Bibliographic Details
Main Authors: Nti-Addae, Y., Matthews, D., Jun Ulat, V., Syed, R., Sempéré, G., Pétel, A., Renner, J., Larmande, Pierre, Guignon, Valentin, Jones, E., Robbins, K.
Format: Journal Article
Language:Inglés
Published: Oxford University Press 2019
Subjects:
Online Access:https://hdl.handle.net/10568/105632
_version_ 1855530580232372224
author Nti-Addae, Y.
Matthews, D.
Jun Ulat, V.
Syed, R.
Sempéré, G.
Pétel, A.
Renner, J.
Larmande, Pierre
Guignon, Valentin
Jones, E.
Robbins, K.
author_browse Guignon, Valentin
Jones, E.
Jun Ulat, V.
Larmande, Pierre
Matthews, D.
Nti-Addae, Y.
Pétel, A.
Renner, J.
Robbins, K.
Sempéré, G.
Syed, R.
author_facet Nti-Addae, Y.
Matthews, D.
Jun Ulat, V.
Syed, R.
Sempéré, G.
Pétel, A.
Renner, J.
Larmande, Pierre
Guignon, Valentin
Jones, E.
Robbins, K.
author_sort Nti-Addae, Y.
collection Repository of Agricultural Research Outputs (CGSpace)
description With high-throughput genotyping systems now available, it has become feasible to fully integrate genotyping information into breeding programs. To make use of this information effectively requires DNA extraction facilities and marker production facilities that can efficiently deploy the desired set of markers across samples with a rapid turnaround time that allows for selection before crosses needed to be made. In reality, breeders often have a short window of time to make decisions by the time they are able to collect all their phenotyping data and receive corresponding genotyping data. This presents a challenge to organize information and utilize it in downstream analyses to support decisions made by breeders. In order to implement genomic selection routinely as part of breeding programs, one would need an efficient genotyping data storage system. We selected and benchmarked six popular open-source data storage systems,i ncluding relational database management and columnar storage systems. Results:We found that data extract times are greatly influenced by the orientation in which genotype data is stored in a system. HDF5 consistently performed best, in part because it can more efficiently work with both orientations of the allele matrix
format Journal Article
id CGSpace105632
institution CGIAR Consortium
language Inglés
publishDate 2019
publishDateRange 2019
publishDateSort 2019
publisher Oxford University Press
publisherStr Oxford University Press
record_format dspace
spelling CGSpace1056322025-11-12T05:44:24Z Benchmarking database systems for Genomic Selection implementation Nti-Addae, Y. Matthews, D. Jun Ulat, V. Syed, R. Sempéré, G. Pétel, A. Renner, J. Larmande, Pierre Guignon, Valentin Jones, E. Robbins, K. information systems information storage data databases genotypes plant breeding With high-throughput genotyping systems now available, it has become feasible to fully integrate genotyping information into breeding programs. To make use of this information effectively requires DNA extraction facilities and marker production facilities that can efficiently deploy the desired set of markers across samples with a rapid turnaround time that allows for selection before crosses needed to be made. In reality, breeders often have a short window of time to make decisions by the time they are able to collect all their phenotyping data and receive corresponding genotyping data. This presents a challenge to organize information and utilize it in downstream analyses to support decisions made by breeders. In order to implement genomic selection routinely as part of breeding programs, one would need an efficient genotyping data storage system. We selected and benchmarked six popular open-source data storage systems,i ncluding relational database management and columnar storage systems. Results:We found that data extract times are greatly influenced by the orientation in which genotype data is stored in a system. HDF5 consistently performed best, in part because it can more efficiently work with both orientations of the allele matrix 2019-01-01 2019-11-05T09:55:52Z 2019-11-05T09:55:52Z Journal Article https://hdl.handle.net/10568/105632 en Open Access application/pdf Oxford University Press Nti-Addae, Y.; Matthews, D.; Jun Ulat, V.; Syed, R.; Sempéré, G.; Pétel, A.; Renner, J.; Larmande, P.; Guignon, V.; Jones, E.; Robbins, K. (2019) Benchmarking database systems for Genomic Selection implementation. Database vol 2019, Article ID: baz096. ISSN: 1758-0463
spellingShingle information systems
information storage
data
databases
genotypes
plant breeding
Nti-Addae, Y.
Matthews, D.
Jun Ulat, V.
Syed, R.
Sempéré, G.
Pétel, A.
Renner, J.
Larmande, Pierre
Guignon, Valentin
Jones, E.
Robbins, K.
Benchmarking database systems for Genomic Selection implementation
title Benchmarking database systems for Genomic Selection implementation
title_full Benchmarking database systems for Genomic Selection implementation
title_fullStr Benchmarking database systems for Genomic Selection implementation
title_full_unstemmed Benchmarking database systems for Genomic Selection implementation
title_short Benchmarking database systems for Genomic Selection implementation
title_sort benchmarking database systems for genomic selection implementation
topic information systems
information storage
data
databases
genotypes
plant breeding
url https://hdl.handle.net/10568/105632
work_keys_str_mv AT ntiaddaey benchmarkingdatabasesystemsforgenomicselectionimplementation
AT matthewsd benchmarkingdatabasesystemsforgenomicselectionimplementation
AT junulatv benchmarkingdatabasesystemsforgenomicselectionimplementation
AT syedr benchmarkingdatabasesystemsforgenomicselectionimplementation
AT sempereg benchmarkingdatabasesystemsforgenomicselectionimplementation
AT petela benchmarkingdatabasesystemsforgenomicselectionimplementation
AT rennerj benchmarkingdatabasesystemsforgenomicselectionimplementation
AT larmandepierre benchmarkingdatabasesystemsforgenomicselectionimplementation
AT guignonvalentin benchmarkingdatabasesystemsforgenomicselectionimplementation
AT jonese benchmarkingdatabasesystemsforgenomicselectionimplementation
AT robbinsk benchmarkingdatabasesystemsforgenomicselectionimplementation