Data augmentation enhances plant-genomic-enabled predictions

Genomic selection (GS) is revolutionizing plant breeding. However, its practical implementation is still challenging, since there are many factors that affect its accuracy. For this reason, this research explores data augmentation with the goal of improving its accuracy. Deep neural networks with da...

Descripción completa

Detalles Bibliográficos
Autores principales: Montesinos-Lopez, Osval A., Solis-Camacho, Mario Alberto, Crespo Herrera, Leonardo A., Saint Pierre, Carolina, Huerta Prado, Gloria Isabel, Ramos-Pulido, Sofia, Al-Nowibet, Khalid, Fritsche-Neto, Roberto, Gerard, Guillermo S., Montesinos-Lopez, Abelardo, Crossa, José
Formato: Journal Article
Lenguaje:Inglés
Publicado: MDPI 2024
Materias:
Acceso en línea:https://hdl.handle.net/10568/159826
_version_ 1855533306094813184
author Montesinos-Lopez, Osval A.
Solis-Camacho, Mario Alberto
Crespo Herrera, Leonardo A.
Saint Pierre, Carolina
Huerta Prado, Gloria Isabel
Ramos-Pulido, Sofia
Al-Nowibet, Khalid
Fritsche-Neto, Roberto
Gerard, Guillermo S.
Montesinos-Lopez, Abelardo
Crossa, José
author_browse Al-Nowibet, Khalid
Crespo Herrera, Leonardo A.
Crossa, José
Fritsche-Neto, Roberto
Gerard, Guillermo S.
Huerta Prado, Gloria Isabel
Montesinos-Lopez, Abelardo
Montesinos-Lopez, Osval A.
Ramos-Pulido, Sofia
Saint Pierre, Carolina
Solis-Camacho, Mario Alberto
author_facet Montesinos-Lopez, Osval A.
Solis-Camacho, Mario Alberto
Crespo Herrera, Leonardo A.
Saint Pierre, Carolina
Huerta Prado, Gloria Isabel
Ramos-Pulido, Sofia
Al-Nowibet, Khalid
Fritsche-Neto, Roberto
Gerard, Guillermo S.
Montesinos-Lopez, Abelardo
Crossa, José
author_sort Montesinos-Lopez, Osval A.
collection Repository of Agricultural Research Outputs (CGSpace)
description Genomic selection (GS) is revolutionizing plant breeding. However, its practical implementation is still challenging, since there are many factors that affect its accuracy. For this reason, this research explores data augmentation with the goal of improving its accuracy. Deep neural networks with data augmentation (DA) generate synthetic data from the original training set to increase the training set and to improve the prediction performance of any statistical or machine learning algorithm. There is much empirical evidence of their success in many computer vision applications. Due to this, DA was explored in the context of GS using 14 real datasets. We found empirical evidence that DA is a powerful tool to improve the prediction accuracy, since we improved the prediction accuracy of the top lines in the 14 datasets under study. On average, across datasets and traits, the gain in prediction performance of the DA approach regarding the Conventional method in the top 20% of lines in the testing set was 108.4% in terms of the NRMSE and 107.4% in terms of the MAAPE, but a worse performance was observed on the whole testing set. We encourage more empirical evaluations to support our findings.
format Journal Article
id CGSpace159826
institution CGIAR Consortium
language Inglés
publishDate 2024
publishDateRange 2024
publishDateSort 2024
publisher MDPI
publisherStr MDPI
record_format dspace
spelling CGSpace1598262025-12-08T10:29:22Z Data augmentation enhances plant-genomic-enabled predictions Montesinos-Lopez, Osval A. Solis-Camacho, Mario Alberto Crespo Herrera, Leonardo A. Saint Pierre, Carolina Huerta Prado, Gloria Isabel Ramos-Pulido, Sofia Al-Nowibet, Khalid Fritsche-Neto, Roberto Gerard, Guillermo S. Montesinos-Lopez, Abelardo Crossa, José marker-assisted selection plant breeding data genomes Genomic selection (GS) is revolutionizing plant breeding. However, its practical implementation is still challenging, since there are many factors that affect its accuracy. For this reason, this research explores data augmentation with the goal of improving its accuracy. Deep neural networks with data augmentation (DA) generate synthetic data from the original training set to increase the training set and to improve the prediction performance of any statistical or machine learning algorithm. There is much empirical evidence of their success in many computer vision applications. Due to this, DA was explored in the context of GS using 14 real datasets. We found empirical evidence that DA is a powerful tool to improve the prediction accuracy, since we improved the prediction accuracy of the top lines in the 14 datasets under study. On average, across datasets and traits, the gain in prediction performance of the DA approach regarding the Conventional method in the top 20% of lines in the testing set was 108.4% in terms of the NRMSE and 107.4% in terms of the MAAPE, but a worse performance was observed on the whole testing set. We encourage more empirical evaluations to support our findings. 2024-03 2024-11-15T14:51:00Z 2024-11-15T14:51:00Z Journal Article https://hdl.handle.net/10568/159826 en Open Access application/pdf MDPI Montesinos-López, O. A., Solis-Camacho, M. A., Crespo-Herrera, L., Saint Pierre, C., Huerta Prado, G. I., Ramos-Pulido, S., Al-Nowibet, K., Fritsche-Neto, R., Gerard, G. S., Montesinos-López, A., & Crossa, J. (2024). Data augmentation enhances plant-genomic-enabled predictions. Genes, 15(3), 286. https://doi.org/10.3390/genes15030286
spellingShingle marker-assisted selection
plant breeding
data
genomes
Montesinos-Lopez, Osval A.
Solis-Camacho, Mario Alberto
Crespo Herrera, Leonardo A.
Saint Pierre, Carolina
Huerta Prado, Gloria Isabel
Ramos-Pulido, Sofia
Al-Nowibet, Khalid
Fritsche-Neto, Roberto
Gerard, Guillermo S.
Montesinos-Lopez, Abelardo
Crossa, José
Data augmentation enhances plant-genomic-enabled predictions
title Data augmentation enhances plant-genomic-enabled predictions
title_full Data augmentation enhances plant-genomic-enabled predictions
title_fullStr Data augmentation enhances plant-genomic-enabled predictions
title_full_unstemmed Data augmentation enhances plant-genomic-enabled predictions
title_short Data augmentation enhances plant-genomic-enabled predictions
title_sort data augmentation enhances plant genomic enabled predictions
topic marker-assisted selection
plant breeding
data
genomes
url https://hdl.handle.net/10568/159826
work_keys_str_mv AT montesinoslopezosvala dataaugmentationenhancesplantgenomicenabledpredictions
AT soliscamachomarioalberto dataaugmentationenhancesplantgenomicenabledpredictions
AT crespoherreraleonardoa dataaugmentationenhancesplantgenomicenabledpredictions
AT saintpierrecarolina dataaugmentationenhancesplantgenomicenabledpredictions
AT huertapradogloriaisabel dataaugmentationenhancesplantgenomicenabledpredictions
AT ramospulidosofia dataaugmentationenhancesplantgenomicenabledpredictions
AT alnowibetkhalid dataaugmentationenhancesplantgenomicenabledpredictions
AT fritschenetoroberto dataaugmentationenhancesplantgenomicenabledpredictions
AT gerardguillermos dataaugmentationenhancesplantgenomicenabledpredictions
AT montesinoslopezabelardo dataaugmentationenhancesplantgenomicenabledpredictions
AT crossajose dataaugmentationenhancesplantgenomicenabledpredictions