SNP-Seek II: A resource for allele mining and analysis of big genomic data in Oryza sativa

The 3000 Rice Genomes Project generated a large dataset of genomic variation to the world’s most important crop, Oryza sativa L. Using the Burrows-Wheeler Aligner (BWA) and the Genome Analysis Toolkit (GATK) variant calling on this dataset, we identified ∼40 M single-nucleotide polymorphisms (SNPs)....

Descripción completa

Detalles Bibliográficos
Autores principales: Mansueto, Locedie, Fuentes, Roven Rommel, Chebotarov, Dmytro, Borja, Frances Nikki, Detras, Jeffrey, Abriol-Santos, Juan Miguel, Palis, Kevin, Poliakov, Alexandre, Dubchak, Inna, Solovyev, Victor, Hamilton, Ruaraidh Sackville, McNally, Kenneth L., Alexandrov, Nickolai, Mauleon, Ramil
Formato: Journal Article
Lenguaje:Inglés
Publicado: Elsevier 2016
Acceso en línea:https://hdl.handle.net/10568/165183
_version_ 1855540872336113664
author Mansueto, Locedie
Fuentes, Roven Rommel
Chebotarov, Dmytro
Borja, Frances Nikki
Detras, Jeffrey
Abriol-Santos, Juan Miguel
Palis, Kevin
Poliakov, Alexandre
Dubchak, Inna
Solovyev, Victor
Hamilton, Ruaraidh Sackville
McNally, Kenneth L.
Alexandrov, Nickolai
Mauleon, Ramil
author_browse Abriol-Santos, Juan Miguel
Alexandrov, Nickolai
Borja, Frances Nikki
Chebotarov, Dmytro
Detras, Jeffrey
Dubchak, Inna
Fuentes, Roven Rommel
Hamilton, Ruaraidh Sackville
Mansueto, Locedie
Mauleon, Ramil
McNally, Kenneth L.
Palis, Kevin
Poliakov, Alexandre
Solovyev, Victor
author_facet Mansueto, Locedie
Fuentes, Roven Rommel
Chebotarov, Dmytro
Borja, Frances Nikki
Detras, Jeffrey
Abriol-Santos, Juan Miguel
Palis, Kevin
Poliakov, Alexandre
Dubchak, Inna
Solovyev, Victor
Hamilton, Ruaraidh Sackville
McNally, Kenneth L.
Alexandrov, Nickolai
Mauleon, Ramil
author_sort Mansueto, Locedie
collection Repository of Agricultural Research Outputs (CGSpace)
description The 3000 Rice Genomes Project generated a large dataset of genomic variation to the world’s most important crop, Oryza sativa L. Using the Burrows-Wheeler Aligner (BWA) and the Genome Analysis Toolkit (GATK) variant calling on this dataset, we identified ∼40 M single-nucleotide polymorphisms (SNPs). Five reference genomes of rice representing the major variety groups were used: Nipponbare (temperate japonica), IR 64 (indica), 93–11 (indica), DJ 123 (aus), and Kasalath (aus). The results are accessible through the Rice SNP-Seek Database (http://snp-seek.irri.org) and through web services of the application programming interface (API). We incorporated legacy phenotypic and passport data for the sequenced varieties originating from the International Rice Genebank Collection Information System (IRGCIS) and gene models from several rice annotation projects. The massive genotypic data in SNP-Seek are stored using hierarchical data format 5 (HDF5) files for quick retrieval. Germplasm, phenotypic, and genomic data are stored in a relational database management system (RDBMS) using the Chado schema, allowing the use of controlled vocabularies from biological ontologies as query constraints in SNP-Seek. In this paper, we discuss the datasets stored in SNP-Seek, architecture of the database and web application, interoperability methodologies in place, and discuss a few use cases demonstrating the utility of SNP-Seek for diversity analysis and molecular breeding.
format Journal Article
id CGSpace165183
institution CGIAR Consortium
language Inglés
publishDate 2016
publishDateRange 2016
publishDateSort 2016
publisher Elsevier
publisherStr Elsevier
record_format dspace
spelling CGSpace1651832024-12-19T14:13:40Z SNP-Seek II: A resource for allele mining and analysis of big genomic data in Oryza sativa Mansueto, Locedie Fuentes, Roven Rommel Chebotarov, Dmytro Borja, Frances Nikki Detras, Jeffrey Abriol-Santos, Juan Miguel Palis, Kevin Poliakov, Alexandre Dubchak, Inna Solovyev, Victor Hamilton, Ruaraidh Sackville McNally, Kenneth L. Alexandrov, Nickolai Mauleon, Ramil The 3000 Rice Genomes Project generated a large dataset of genomic variation to the world’s most important crop, Oryza sativa L. Using the Burrows-Wheeler Aligner (BWA) and the Genome Analysis Toolkit (GATK) variant calling on this dataset, we identified ∼40 M single-nucleotide polymorphisms (SNPs). Five reference genomes of rice representing the major variety groups were used: Nipponbare (temperate japonica), IR 64 (indica), 93–11 (indica), DJ 123 (aus), and Kasalath (aus). The results are accessible through the Rice SNP-Seek Database (http://snp-seek.irri.org) and through web services of the application programming interface (API). We incorporated legacy phenotypic and passport data for the sequenced varieties originating from the International Rice Genebank Collection Information System (IRGCIS) and gene models from several rice annotation projects. The massive genotypic data in SNP-Seek are stored using hierarchical data format 5 (HDF5) files for quick retrieval. Germplasm, phenotypic, and genomic data are stored in a relational database management system (RDBMS) using the Chado schema, allowing the use of controlled vocabularies from biological ontologies as query constraints in SNP-Seek. In this paper, we discuss the datasets stored in SNP-Seek, architecture of the database and web application, interoperability methodologies in place, and discuss a few use cases demonstrating the utility of SNP-Seek for diversity analysis and molecular breeding. 2016-11 2024-12-19T12:54:48Z 2024-12-19T12:54:48Z Journal Article https://hdl.handle.net/10568/165183 en Open Access Elsevier Mansueto, Locedie; Fuentes, Roven Rommel; Chebotarov, Dmytro; Borja, Frances Nikki; Detras, Jeffrey; Abriol-Santos, Juan Miguel; Palis, Kevin; Poliakov, Alexandre; Dubchak, Inna; Solovyev, Victor; Hamilton, Ruaraidh Sackville; McNally, Kenneth L.; Alexandrov, Nickolai and Mauleon, Ramil. 2016. SNP-Seek II: A resource for allele mining and analysis of big genomic data in Oryza sativa. Current Plant Biology, Volume 7-8 p. 16-25
spellingShingle Mansueto, Locedie
Fuentes, Roven Rommel
Chebotarov, Dmytro
Borja, Frances Nikki
Detras, Jeffrey
Abriol-Santos, Juan Miguel
Palis, Kevin
Poliakov, Alexandre
Dubchak, Inna
Solovyev, Victor
Hamilton, Ruaraidh Sackville
McNally, Kenneth L.
Alexandrov, Nickolai
Mauleon, Ramil
SNP-Seek II: A resource for allele mining and analysis of big genomic data in Oryza sativa
title SNP-Seek II: A resource for allele mining and analysis of big genomic data in Oryza sativa
title_full SNP-Seek II: A resource for allele mining and analysis of big genomic data in Oryza sativa
title_fullStr SNP-Seek II: A resource for allele mining and analysis of big genomic data in Oryza sativa
title_full_unstemmed SNP-Seek II: A resource for allele mining and analysis of big genomic data in Oryza sativa
title_short SNP-Seek II: A resource for allele mining and analysis of big genomic data in Oryza sativa
title_sort snp seek ii a resource for allele mining and analysis of big genomic data in oryza sativa
url https://hdl.handle.net/10568/165183
work_keys_str_mv AT mansuetolocedie snpseekiiaresourceforalleleminingandanalysisofbiggenomicdatainoryzasativa
AT fuentesrovenrommel snpseekiiaresourceforalleleminingandanalysisofbiggenomicdatainoryzasativa
AT chebotarovdmytro snpseekiiaresourceforalleleminingandanalysisofbiggenomicdatainoryzasativa
AT borjafrancesnikki snpseekiiaresourceforalleleminingandanalysisofbiggenomicdatainoryzasativa
AT detrasjeffrey snpseekiiaresourceforalleleminingandanalysisofbiggenomicdatainoryzasativa
AT abriolsantosjuanmiguel snpseekiiaresourceforalleleminingandanalysisofbiggenomicdatainoryzasativa
AT paliskevin snpseekiiaresourceforalleleminingandanalysisofbiggenomicdatainoryzasativa
AT poliakovalexandre snpseekiiaresourceforalleleminingandanalysisofbiggenomicdatainoryzasativa
AT dubchakinna snpseekiiaresourceforalleleminingandanalysisofbiggenomicdatainoryzasativa
AT solovyevvictor snpseekiiaresourceforalleleminingandanalysisofbiggenomicdatainoryzasativa
AT hamiltonruaraidhsackville snpseekiiaresourceforalleleminingandanalysisofbiggenomicdatainoryzasativa
AT mcnallykennethl snpseekiiaresourceforalleleminingandanalysisofbiggenomicdatainoryzasativa
AT alexandrovnickolai snpseekiiaresourceforalleleminingandanalysisofbiggenomicdatainoryzasativa
AT mauleonramil snpseekiiaresourceforalleleminingandanalysisofbiggenomicdatainoryzasativa