Methodology for the identification of relevant loci for milk traits in dairy cattle, using machine learning algorithms

Machine learning methods were considered efficient in identifying single nucleotide polymorphisms (SNP) underlying a trait of interest. This study aimed to construct predictive models using machine learning algorithms, to identify loci that best explain the variance in milk traits of dairy cattle. F...

Full description

Bibliographic Details
Main Authors: Raschia, Maria Agustina, Ríos, Pablo Javier, Maizon, Daniel Omar, Demitrio, Daniel Arturo, Poli, Mario Andres
Format: info:ar-repo/semantics/artículo
Language:Inglés
Published: Elsevier 2022
Subjects:
Online Access:http://hdl.handle.net/20.500.12123/11954
https://www.sciencedirect.com/science/article/pii/S2215016122001145
https://doi.org/10.1016/j.mex.2022.101733
_version_ 1855036667320074240
author Raschia, Maria Agustina
Ríos, Pablo Javier
Maizon, Daniel Omar
Demitrio, Daniel Arturo
Poli, Mario Andres
author_browse Demitrio, Daniel Arturo
Maizon, Daniel Omar
Poli, Mario Andres
Raschia, Maria Agustina
Ríos, Pablo Javier
author_facet Raschia, Maria Agustina
Ríos, Pablo Javier
Maizon, Daniel Omar
Demitrio, Daniel Arturo
Poli, Mario Andres
author_sort Raschia, Maria Agustina
collection INTA Digital
description Machine learning methods were considered efficient in identifying single nucleotide polymorphisms (SNP) underlying a trait of interest. This study aimed to construct predictive models using machine learning algorithms, to identify loci that best explain the variance in milk traits of dairy cattle. Further objectives involved validating the results by comparison with reported relevant regions and retrieving the pathways overrepresented by the genes flanking relevant SNPs. Regression models using XGBoost (XGB), LightGBM (LGB), and Random Forest (RF) algorithms were trained using estimated breeding values for milk production (EBVM), milk fat content (EBVF) and milk protein content (EBVP) as phenotypes and genotypes on 40417 SNPs as predictor variables. To evaluate their efficiency, metrics for actual vs. predicted values were determined in validation folds (XGB and LGB) and out-of-bag data (RF). Less than 4500 relevant SNPs were retrieved for each trait. Among the genes flanking them, signaling and transmembrane transporter activities were overrepresented. The models trained: •Predicted breeding values for animals not included in the dataset. •Were efficient in identifying a subset of SNPs explaining phenotypic variation. The results obtained using XGB and LGB algorithms agreed with previous results. Therefore, the method proposed could be applied for future association studies on milk traits.
format info:ar-repo/semantics/artículo
id INTA11954
institution Instituto Nacional de Tecnología Agropecuaria (INTA -Argentina)
language Inglés
publishDate 2022
publishDateRange 2022
publishDateSort 2022
publisher Elsevier
publisherStr Elsevier
record_format dspace
spelling INTA119542022-05-26T17:41:05Z Methodology for the identification of relevant loci for milk traits in dairy cattle, using machine learning algorithms Raschia, Maria Agustina Ríos, Pablo Javier Maizon, Daniel Omar Demitrio, Daniel Arturo Poli, Mario Andres Single Nucleotide Polymorphism Dairy Cattle Milk Production Milk Protein Bioinformatics Loci Polimorfismo de un Solo Nucleótidos Ganado de Leche Producción Lechera Proteínas de la Leche Bioinformática Milk Fat Content Machine Learning Algorithms Contenido de Grasa Láctea Algoritmos de Aprendizaje Automático Machine learning methods were considered efficient in identifying single nucleotide polymorphisms (SNP) underlying a trait of interest. This study aimed to construct predictive models using machine learning algorithms, to identify loci that best explain the variance in milk traits of dairy cattle. Further objectives involved validating the results by comparison with reported relevant regions and retrieving the pathways overrepresented by the genes flanking relevant SNPs. Regression models using XGBoost (XGB), LightGBM (LGB), and Random Forest (RF) algorithms were trained using estimated breeding values for milk production (EBVM), milk fat content (EBVF) and milk protein content (EBVP) as phenotypes and genotypes on 40417 SNPs as predictor variables. To evaluate their efficiency, metrics for actual vs. predicted values were determined in validation folds (XGB and LGB) and out-of-bag data (RF). Less than 4500 relevant SNPs were retrieved for each trait. Among the genes flanking them, signaling and transmembrane transporter activities were overrepresented. The models trained: •Predicted breeding values for animals not included in the dataset. •Were efficient in identifying a subset of SNPs explaining phenotypic variation. The results obtained using XGB and LGB algorithms agreed with previous results. Therefore, the method proposed could be applied for future association studies on milk traits. Instituto de Genética Fil: Raschia, Maria Agustina. Instituto Nacional de Tecnología Agropecuaria (INTA). Instituto de Genética; Argentina Fil: Ríos, Pablo J. Universidad de Buenos Aires; Argentina Fil: Ríos, Pablo J. Universidad Nacional de La Plata. Facultad de Ciencias Exactas; Argentina Fil: Maizon, Daniel Omar. Instituto Nacional de Tecnología Agropecuaria (INTA). Estación Experimental Agropecuaria Anguil; Argentina Fil: Maizon, Daniel Omar. Universidad Nacional de La Pampa. Facultad de Agronomía; Argentina Fil: Demitrio, Daniel Arturo. Instituto Nacional de Tecnología Agropecuaria (INTA). Dirección General de Sistemas de Información, Comunicación y Procesos. Gerencia de Informática y Gestión de la Información; Argentina Fil: Demitrio, Daniel Arturo. Universidad Nacional de La Plata. Facultad de Ciencias Exactas; Argentina Fil: Poli, Mario Andres. Instituto Nacional de Tecnología Agropecuaria (INTA). Instituto de Genética; Argentina Fil: Poli, Mario Andres. Universidad del Salvador. Facultad de Ciencias Agrarias y Veterinaria; Argentina 2022-05-26T17:34:45Z 2022-05-26T17:34:45Z 2022 info:ar-repo/semantics/artículo info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion http://hdl.handle.net/20.500.12123/11954 https://www.sciencedirect.com/science/article/pii/S2215016122001145 2215-0161 https://doi.org/10.1016/j.mex.2022.101733 eng info:eu-repograntAgreement/INTA/2019-PE-E6-I145-001/2019-PE-E6-I145-001/AR./Mejora genética objetiva para aumentar la eficiencia de los sistemas de producción animal. info:eu-repograntAgreement/INTA/2019-PT-E9-I180-001/2019-PT-E9-I180-001/AR./TICs y gestión de Big Data info:eu-repograntAgreement/INTA/2019-PT-E6-I513-001/2019-PT-E6-I513-001/AR./Plataforma de mejoramiento animal info:eu-repo/semantics/openAccess http://creativecommons.org/licenses/by-nc-sa/4.0/ Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) application/pdf Elsevier MethodsX 9 : 101733 (2022)
spellingShingle Single Nucleotide Polymorphism
Dairy Cattle
Milk Production
Milk Protein
Bioinformatics
Loci
Polimorfismo de un Solo Nucleótidos
Ganado de Leche
Producción Lechera
Proteínas de la Leche
Bioinformática
Milk Fat Content
Machine Learning Algorithms
Contenido de Grasa Láctea
Algoritmos de Aprendizaje Automático
Raschia, Maria Agustina
Ríos, Pablo Javier
Maizon, Daniel Omar
Demitrio, Daniel Arturo
Poli, Mario Andres
Methodology for the identification of relevant loci for milk traits in dairy cattle, using machine learning algorithms
title Methodology for the identification of relevant loci for milk traits in dairy cattle, using machine learning algorithms
title_full Methodology for the identification of relevant loci for milk traits in dairy cattle, using machine learning algorithms
title_fullStr Methodology for the identification of relevant loci for milk traits in dairy cattle, using machine learning algorithms
title_full_unstemmed Methodology for the identification of relevant loci for milk traits in dairy cattle, using machine learning algorithms
title_short Methodology for the identification of relevant loci for milk traits in dairy cattle, using machine learning algorithms
title_sort methodology for the identification of relevant loci for milk traits in dairy cattle using machine learning algorithms
topic Single Nucleotide Polymorphism
Dairy Cattle
Milk Production
Milk Protein
Bioinformatics
Loci
Polimorfismo de un Solo Nucleótidos
Ganado de Leche
Producción Lechera
Proteínas de la Leche
Bioinformática
Milk Fat Content
Machine Learning Algorithms
Contenido de Grasa Láctea
Algoritmos de Aprendizaje Automático
url http://hdl.handle.net/20.500.12123/11954
https://www.sciencedirect.com/science/article/pii/S2215016122001145
https://doi.org/10.1016/j.mex.2022.101733
work_keys_str_mv AT raschiamariaagustina methodologyfortheidentificationofrelevantlociformilktraitsindairycattleusingmachinelearningalgorithms
AT riospablojavier methodologyfortheidentificationofrelevantlociformilktraitsindairycattleusingmachinelearningalgorithms
AT maizondanielomar methodologyfortheidentificationofrelevantlociformilktraitsindairycattleusingmachinelearningalgorithms
AT demitriodanielarturo methodologyfortheidentificationofrelevantlociformilktraitsindairycattleusingmachinelearningalgorithms
AT polimarioandres methodologyfortheidentificationofrelevantlociformilktraitsindairycattleusingmachinelearningalgorithms