Sparse kernel models provide optimization of training set design for genomic prediction in multiyear wheat breeding data

The success of genomic selection (GS) in breeding schemes relies on its ability to provide accurate predictions of unobserved lines at early stages. Multigeneration data provides opportunities to increase the training data size and thus, the likelihood of extracting useful information from ancestors...

Descripción completa

Detalles Bibliográficos
Autores principales: López Cruz, Marco, Dreisigacker, Susanne, Crespo-Herrera, Leonardo A., Bentley, Alison R., Singh, Ravi P., Poland, Jesse A., Shrestha, Sandesh, Huerta Espino, Julio, Velu, Govindan, Juliana, Philomin, Mondal, Suchismita, Pérez Rodriguez, Paulino, Crossa, José
Formato: Journal Article
Lenguaje:Inglés
Publicado: Wiley 2022
Materias:
Acceso en línea:https://hdl.handle.net/10568/126293
_version_ 1855531818521985024
author López Cruz, Marco
Dreisigacker, Susanne
Crespo-Herrera, Leonardo A.
Bentley, Alison R.
Singh, Ravi P.
Poland, Jesse A.
Shrestha, Sandesh
Huerta Espino, Julio
Velu, Govindan
Juliana, Philomin
Mondal, Suchismita
Pérez Rodriguez, Paulino
Crossa, José
author_browse Bentley, Alison R.
Crespo-Herrera, Leonardo A.
Crossa, José
Dreisigacker, Susanne
Huerta Espino, Julio
Juliana, Philomin
López Cruz, Marco
Mondal, Suchismita
Poland, Jesse A.
Pérez Rodriguez, Paulino
Shrestha, Sandesh
Singh, Ravi P.
Velu, Govindan
author_facet López Cruz, Marco
Dreisigacker, Susanne
Crespo-Herrera, Leonardo A.
Bentley, Alison R.
Singh, Ravi P.
Poland, Jesse A.
Shrestha, Sandesh
Huerta Espino, Julio
Velu, Govindan
Juliana, Philomin
Mondal, Suchismita
Pérez Rodriguez, Paulino
Crossa, José
author_sort López Cruz, Marco
collection Repository of Agricultural Research Outputs (CGSpace)
description The success of genomic selection (GS) in breeding schemes relies on its ability to provide accurate predictions of unobserved lines at early stages. Multigeneration data provides opportunities to increase the training data size and thus, the likelihood of extracting useful information from ancestors to improve prediction accuracy. The genomic best linear unbiased predictions (GBLUPs) are performed by borrowing information through kinship relationships between individuals. Multigeneration data usually becomes heterogeneous with complex family relationship patterns that are increasingly entangled with each generation. Under these conditions, historical data may not be optimal for model training as the accuracy could be compromised. The sparse selection index (SSI) is a method for training set (TRN) optimization, in which training individuals provide predictions to some but not all predicted subjects. We added an additional trimming process to the original SSI (trimmed SSI) to remove less important training individuals for prediction. Using a large multigeneration (8 yr) wheat (Triticum aestivum L.) grain yield dataset (n = 68,836), we found increases in accuracy as more years are included in the TRN, with improvements of ∼0.05 in the GBLUP accuracy when using 5 yr of historical data relative to when using only 1 yr. The SSI method showed a small gain over the GBLUP accuracy but with an important reduction on the TRN size. These reduced TRNs were formed with a similar number of subjects from each training generation. Our results suggest that the SSI provides a more stable ranking of genotypes than the GBLUP as the TRN becomes larger.
format Journal Article
id CGSpace126293
institution CGIAR Consortium
language Inglés
publishDate 2022
publishDateRange 2022
publishDateSort 2022
publisher Wiley
publisherStr Wiley
record_format dspace
spelling CGSpace1262932025-11-06T13:07:22Z Sparse kernel models provide optimization of training set design for genomic prediction in multiyear wheat breeding data López Cruz, Marco Dreisigacker, Susanne Crespo-Herrera, Leonardo A. Bentley, Alison R. Singh, Ravi P. Poland, Jesse A. Shrestha, Sandesh Huerta Espino, Julio Velu, Govindan Juliana, Philomin Mondal, Suchismita Pérez Rodriguez, Paulino Crossa, José marker-assisted selection training wheat breeding The success of genomic selection (GS) in breeding schemes relies on its ability to provide accurate predictions of unobserved lines at early stages. Multigeneration data provides opportunities to increase the training data size and thus, the likelihood of extracting useful information from ancestors to improve prediction accuracy. The genomic best linear unbiased predictions (GBLUPs) are performed by borrowing information through kinship relationships between individuals. Multigeneration data usually becomes heterogeneous with complex family relationship patterns that are increasingly entangled with each generation. Under these conditions, historical data may not be optimal for model training as the accuracy could be compromised. The sparse selection index (SSI) is a method for training set (TRN) optimization, in which training individuals provide predictions to some but not all predicted subjects. We added an additional trimming process to the original SSI (trimmed SSI) to remove less important training individuals for prediction. Using a large multigeneration (8 yr) wheat (Triticum aestivum L.) grain yield dataset (n = 68,836), we found increases in accuracy as more years are included in the TRN, with improvements of ∼0.05 in the GBLUP accuracy when using 5 yr of historical data relative to when using only 1 yr. The SSI method showed a small gain over the GBLUP accuracy but with an important reduction on the TRN size. These reduced TRNs were formed with a similar number of subjects from each training generation. Our results suggest that the SSI provides a more stable ranking of genotypes than the GBLUP as the TRN becomes larger. 2022-12 2022-12-23T12:17:36Z 2022-12-23T12:17:36Z Journal Article https://hdl.handle.net/10568/126293 en Open Access application/pdf Wiley Lopez‐Cruz, M., Dreisigacker, S., Crespo‐Herrera, L., Bentley, A. R., Singh, R., Poland, J., Shrestha, S., Huerta‐Espino, J., Govindan, V., Juliana, P., Mondal, S., Pérez‐Rodríguez, P., & Crossa, J. (2022). Sparse kernel models provide optimization of training set design for genomic prediction in multiyear wheat breeding data. The Plant Genome, 15(4). Portico. https://doi.org/10.1002/tpg2.20254
spellingShingle marker-assisted selection
training
wheat
breeding
López Cruz, Marco
Dreisigacker, Susanne
Crespo-Herrera, Leonardo A.
Bentley, Alison R.
Singh, Ravi P.
Poland, Jesse A.
Shrestha, Sandesh
Huerta Espino, Julio
Velu, Govindan
Juliana, Philomin
Mondal, Suchismita
Pérez Rodriguez, Paulino
Crossa, José
Sparse kernel models provide optimization of training set design for genomic prediction in multiyear wheat breeding data
title Sparse kernel models provide optimization of training set design for genomic prediction in multiyear wheat breeding data
title_full Sparse kernel models provide optimization of training set design for genomic prediction in multiyear wheat breeding data
title_fullStr Sparse kernel models provide optimization of training set design for genomic prediction in multiyear wheat breeding data
title_full_unstemmed Sparse kernel models provide optimization of training set design for genomic prediction in multiyear wheat breeding data
title_short Sparse kernel models provide optimization of training set design for genomic prediction in multiyear wheat breeding data
title_sort sparse kernel models provide optimization of training set design for genomic prediction in multiyear wheat breeding data
topic marker-assisted selection
training
wheat
breeding
url https://hdl.handle.net/10568/126293
work_keys_str_mv AT lopezcruzmarco sparsekernelmodelsprovideoptimizationoftrainingsetdesignforgenomicpredictioninmultiyearwheatbreedingdata
AT dreisigackersusanne sparsekernelmodelsprovideoptimizationoftrainingsetdesignforgenomicpredictioninmultiyearwheatbreedingdata
AT crespoherreraleonardoa sparsekernelmodelsprovideoptimizationoftrainingsetdesignforgenomicpredictioninmultiyearwheatbreedingdata
AT bentleyalisonr sparsekernelmodelsprovideoptimizationoftrainingsetdesignforgenomicpredictioninmultiyearwheatbreedingdata
AT singhravip sparsekernelmodelsprovideoptimizationoftrainingsetdesignforgenomicpredictioninmultiyearwheatbreedingdata
AT polandjessea sparsekernelmodelsprovideoptimizationoftrainingsetdesignforgenomicpredictioninmultiyearwheatbreedingdata
AT shresthasandesh sparsekernelmodelsprovideoptimizationoftrainingsetdesignforgenomicpredictioninmultiyearwheatbreedingdata
AT huertaespinojulio sparsekernelmodelsprovideoptimizationoftrainingsetdesignforgenomicpredictioninmultiyearwheatbreedingdata
AT velugovindan sparsekernelmodelsprovideoptimizationoftrainingsetdesignforgenomicpredictioninmultiyearwheatbreedingdata
AT julianaphilomin sparsekernelmodelsprovideoptimizationoftrainingsetdesignforgenomicpredictioninmultiyearwheatbreedingdata
AT mondalsuchismita sparsekernelmodelsprovideoptimizationoftrainingsetdesignforgenomicpredictioninmultiyearwheatbreedingdata
AT perezrodriguezpaulino sparsekernelmodelsprovideoptimizationoftrainingsetdesignforgenomicpredictioninmultiyearwheatbreedingdata
AT crossajose sparsekernelmodelsprovideoptimizationoftrainingsetdesignforgenomicpredictioninmultiyearwheatbreedingdata