Sparse kernel models provide optimization of training set design for genomic prediction in multiyear wheat breeding data
The success of genomic selection (GS) in breeding schemes relies on its ability to provide accurate predictions of unobserved lines at early stages. Multigeneration data provides opportunities to increase the training data size and thus, the likelihood of extracting useful information from ancestors...
| Autores principales: | , , , , , , , , , , , , |
|---|---|
| Formato: | Journal Article |
| Lenguaje: | Inglés |
| Publicado: |
Wiley
2022
|
| Materias: | |
| Acceso en línea: | https://hdl.handle.net/10568/126293 |
| _version_ | 1855531818521985024 |
|---|---|
| author | López Cruz, Marco Dreisigacker, Susanne Crespo-Herrera, Leonardo A. Bentley, Alison R. Singh, Ravi P. Poland, Jesse A. Shrestha, Sandesh Huerta Espino, Julio Velu, Govindan Juliana, Philomin Mondal, Suchismita Pérez Rodriguez, Paulino Crossa, José |
| author_browse | Bentley, Alison R. Crespo-Herrera, Leonardo A. Crossa, José Dreisigacker, Susanne Huerta Espino, Julio Juliana, Philomin López Cruz, Marco Mondal, Suchismita Poland, Jesse A. Pérez Rodriguez, Paulino Shrestha, Sandesh Singh, Ravi P. Velu, Govindan |
| author_facet | López Cruz, Marco Dreisigacker, Susanne Crespo-Herrera, Leonardo A. Bentley, Alison R. Singh, Ravi P. Poland, Jesse A. Shrestha, Sandesh Huerta Espino, Julio Velu, Govindan Juliana, Philomin Mondal, Suchismita Pérez Rodriguez, Paulino Crossa, José |
| author_sort | López Cruz, Marco |
| collection | Repository of Agricultural Research Outputs (CGSpace) |
| description | The success of genomic selection (GS) in breeding schemes relies on its ability to provide accurate predictions of unobserved lines at early stages. Multigeneration data provides opportunities to increase the training data size and thus, the likelihood of extracting useful information from ancestors to improve prediction accuracy. The genomic best linear unbiased predictions (GBLUPs) are performed by borrowing information through kinship relationships between individuals. Multigeneration data usually becomes heterogeneous with complex family relationship patterns that are increasingly entangled with each generation. Under these conditions, historical data may not be optimal for model training as the accuracy could be compromised. The sparse selection index (SSI) is a method for training set (TRN) optimization, in which training individuals provide predictions to some but not all predicted subjects. We added an additional trimming process to the original SSI (trimmed SSI) to remove less important training individuals for prediction. Using a large multigeneration (8 yr) wheat (Triticum aestivum L.) grain yield dataset (n = 68,836), we found increases in accuracy as more years are included in the TRN, with improvements of ∼0.05 in the GBLUP accuracy when using 5 yr of historical data relative to when using only 1 yr. The SSI method showed a small gain over the GBLUP accuracy but with an important reduction on the TRN size. These reduced TRNs were formed with a similar number of subjects from each training generation. Our results suggest that the SSI provides a more stable ranking of genotypes than the GBLUP as the TRN becomes larger. |
| format | Journal Article |
| id | CGSpace126293 |
| institution | CGIAR Consortium |
| language | Inglés |
| publishDate | 2022 |
| publishDateRange | 2022 |
| publishDateSort | 2022 |
| publisher | Wiley |
| publisherStr | Wiley |
| record_format | dspace |
| spelling | CGSpace1262932025-11-06T13:07:22Z Sparse kernel models provide optimization of training set design for genomic prediction in multiyear wheat breeding data López Cruz, Marco Dreisigacker, Susanne Crespo-Herrera, Leonardo A. Bentley, Alison R. Singh, Ravi P. Poland, Jesse A. Shrestha, Sandesh Huerta Espino, Julio Velu, Govindan Juliana, Philomin Mondal, Suchismita Pérez Rodriguez, Paulino Crossa, José marker-assisted selection training wheat breeding The success of genomic selection (GS) in breeding schemes relies on its ability to provide accurate predictions of unobserved lines at early stages. Multigeneration data provides opportunities to increase the training data size and thus, the likelihood of extracting useful information from ancestors to improve prediction accuracy. The genomic best linear unbiased predictions (GBLUPs) are performed by borrowing information through kinship relationships between individuals. Multigeneration data usually becomes heterogeneous with complex family relationship patterns that are increasingly entangled with each generation. Under these conditions, historical data may not be optimal for model training as the accuracy could be compromised. The sparse selection index (SSI) is a method for training set (TRN) optimization, in which training individuals provide predictions to some but not all predicted subjects. We added an additional trimming process to the original SSI (trimmed SSI) to remove less important training individuals for prediction. Using a large multigeneration (8 yr) wheat (Triticum aestivum L.) grain yield dataset (n = 68,836), we found increases in accuracy as more years are included in the TRN, with improvements of ∼0.05 in the GBLUP accuracy when using 5 yr of historical data relative to when using only 1 yr. The SSI method showed a small gain over the GBLUP accuracy but with an important reduction on the TRN size. These reduced TRNs were formed with a similar number of subjects from each training generation. Our results suggest that the SSI provides a more stable ranking of genotypes than the GBLUP as the TRN becomes larger. 2022-12 2022-12-23T12:17:36Z 2022-12-23T12:17:36Z Journal Article https://hdl.handle.net/10568/126293 en Open Access application/pdf Wiley Lopez‐Cruz, M., Dreisigacker, S., Crespo‐Herrera, L., Bentley, A. R., Singh, R., Poland, J., Shrestha, S., Huerta‐Espino, J., Govindan, V., Juliana, P., Mondal, S., Pérez‐Rodríguez, P., & Crossa, J. (2022). Sparse kernel models provide optimization of training set design for genomic prediction in multiyear wheat breeding data. The Plant Genome, 15(4). Portico. https://doi.org/10.1002/tpg2.20254 |
| spellingShingle | marker-assisted selection training wheat breeding López Cruz, Marco Dreisigacker, Susanne Crespo-Herrera, Leonardo A. Bentley, Alison R. Singh, Ravi P. Poland, Jesse A. Shrestha, Sandesh Huerta Espino, Julio Velu, Govindan Juliana, Philomin Mondal, Suchismita Pérez Rodriguez, Paulino Crossa, José Sparse kernel models provide optimization of training set design for genomic prediction in multiyear wheat breeding data |
| title | Sparse kernel models provide optimization of training set design for genomic prediction in multiyear wheat breeding data |
| title_full | Sparse kernel models provide optimization of training set design for genomic prediction in multiyear wheat breeding data |
| title_fullStr | Sparse kernel models provide optimization of training set design for genomic prediction in multiyear wheat breeding data |
| title_full_unstemmed | Sparse kernel models provide optimization of training set design for genomic prediction in multiyear wheat breeding data |
| title_short | Sparse kernel models provide optimization of training set design for genomic prediction in multiyear wheat breeding data |
| title_sort | sparse kernel models provide optimization of training set design for genomic prediction in multiyear wheat breeding data |
| topic | marker-assisted selection training wheat breeding |
| url | https://hdl.handle.net/10568/126293 |
| work_keys_str_mv | AT lopezcruzmarco sparsekernelmodelsprovideoptimizationoftrainingsetdesignforgenomicpredictioninmultiyearwheatbreedingdata AT dreisigackersusanne sparsekernelmodelsprovideoptimizationoftrainingsetdesignforgenomicpredictioninmultiyearwheatbreedingdata AT crespoherreraleonardoa sparsekernelmodelsprovideoptimizationoftrainingsetdesignforgenomicpredictioninmultiyearwheatbreedingdata AT bentleyalisonr sparsekernelmodelsprovideoptimizationoftrainingsetdesignforgenomicpredictioninmultiyearwheatbreedingdata AT singhravip sparsekernelmodelsprovideoptimizationoftrainingsetdesignforgenomicpredictioninmultiyearwheatbreedingdata AT polandjessea sparsekernelmodelsprovideoptimizationoftrainingsetdesignforgenomicpredictioninmultiyearwheatbreedingdata AT shresthasandesh sparsekernelmodelsprovideoptimizationoftrainingsetdesignforgenomicpredictioninmultiyearwheatbreedingdata AT huertaespinojulio sparsekernelmodelsprovideoptimizationoftrainingsetdesignforgenomicpredictioninmultiyearwheatbreedingdata AT velugovindan sparsekernelmodelsprovideoptimizationoftrainingsetdesignforgenomicpredictioninmultiyearwheatbreedingdata AT julianaphilomin sparsekernelmodelsprovideoptimizationoftrainingsetdesignforgenomicpredictioninmultiyearwheatbreedingdata AT mondalsuchismita sparsekernelmodelsprovideoptimizationoftrainingsetdesignforgenomicpredictioninmultiyearwheatbreedingdata AT perezrodriguezpaulino sparsekernelmodelsprovideoptimizationoftrainingsetdesignforgenomicpredictioninmultiyearwheatbreedingdata AT crossajose sparsekernelmodelsprovideoptimizationoftrainingsetdesignforgenomicpredictioninmultiyearwheatbreedingdata |