Coupling Process-Based Models and Machine Learning Algorithms for Predicting Yield and Evapotranspiration of Maize in Arid Environments

Crop yield prediction is critical for investigating the yield gap and potential adaptations to environmental and management factors in arid regions. Crop models (CMs) are powerful tools for predicting yield and water use, but they still have some limitations and uncertainties; therefore, combining t...

Descripción completa

Detalles Bibliográficos
Autores principales: Attia, Ahmed, Govind, Ajit, Qureshi, Asad Sarwar, Feike, Til, Rizk, Mosa Sayed, Shabana, Mahmoud Mohamed Abd ElHay, Kheir, Ahmed M.S.
Formato: Journal Article
Lenguaje:Inglés
Publicado: MDPI 2024
Materias:
Acceso en línea:https://hdl.handle.net/10568/152177
_version_ 1855516537185632256
author Attia, Ahmed
Govind, Ajit
Qureshi, Asad Sarwar
Feike, Til
Rizk, Mosa Sayed
Shabana, Mahmoud Mohamed Abd ElHay
Kheir, Ahmed M.S.
author_browse Attia, Ahmed
Feike, Til
Govind, Ajit
Kheir, Ahmed M.S.
Qureshi, Asad Sarwar
Rizk, Mosa Sayed
Shabana, Mahmoud Mohamed Abd ElHay
author_facet Attia, Ahmed
Govind, Ajit
Qureshi, Asad Sarwar
Feike, Til
Rizk, Mosa Sayed
Shabana, Mahmoud Mohamed Abd ElHay
Kheir, Ahmed M.S.
author_sort Attia, Ahmed
collection Repository of Agricultural Research Outputs (CGSpace)
description Crop yield prediction is critical for investigating the yield gap and potential adaptations to environmental and management factors in arid regions. Crop models (CMs) are powerful tools for predicting yield and water use, but they still have some limitations and uncertainties; therefore, combining them with machine learning algorithms (MLs) could improve predictions and reduce uncertainty. To that end, the DSSAT-CERES-maize model was calibrated in one location and validated in others across Egypt with varying agro-climatic zones. Following that, the dynamic model (CERES-Maize) was used for long-term simulation (1990–2020) of maize grain yield (GY) and evapotranspiration (ET) under a wide range of management and environmental factors. Detailed outputs from three growing seasons of field experiments in Egypt, as well as CERES-maize outputs, were used to train and test six machine learning algorithms (linear regression, ridge regression, lasso regression, K-nearest neighbors, random forest, and XGBoost), resulting in more than 1.5 million simulated yield and evapotranspiration scenarios. Seven warming years (i.e., 1991, 1998, 2002, 2005, 2010, 2013, and 2020) were chosen from a 31-year dataset to test MLs, while the remaining 23 years were used to train the models. The Ensemble model (super learner) and XGBoost outperform other models in predicting GY and ET for maize, as evidenced by R2 values greater than 0.82 and RRMSE less than 9%. The broad range of management practices, when averaged across all locations and 31 years of simulation, not only reduced the hazard impact of environmental factors but also increased GY and reduced ET. Moving beyond prediction and interpreting the outputs from Lasso and XGBoost, and using global and local SHAP values, we found that the most important features for predicting GY and ET are maximum temperatures, minimum temperature, available water content, soil organic carbon, irrigation, cultivars, soil texture, solar radiation, and planting date. Determining the most important features is critical for assisting farmers and agronomists in prioritizing such features over other factors in order to increase yield and resource efficiency values. The combination of CMs and ML algorithms is a powerful tool for predicting yield and water use in arid regions, which are particularly vulnerable to climate change and water scarcity.
format Journal Article
id CGSpace152177
institution CGIAR Consortium
language Inglés
publishDate 2024
publishDateRange 2024
publishDateSort 2024
publisher MDPI
publisherStr MDPI
record_format dspace
spelling CGSpace1521772026-01-15T02:02:22Z Coupling Process-Based Models and Machine Learning Algorithms for Predicting Yield and Evapotranspiration of Maize in Arid Environments Attia, Ahmed Govind, Ajit Qureshi, Asad Sarwar Feike, Til Rizk, Mosa Sayed Shabana, Mahmoud Mohamed Abd ElHay Kheir, Ahmed M.S. water use random forest dssat models xgboost super learner lasso regression hyperparameters tuning feature importance Crop yield prediction is critical for investigating the yield gap and potential adaptations to environmental and management factors in arid regions. Crop models (CMs) are powerful tools for predicting yield and water use, but they still have some limitations and uncertainties; therefore, combining them with machine learning algorithms (MLs) could improve predictions and reduce uncertainty. To that end, the DSSAT-CERES-maize model was calibrated in one location and validated in others across Egypt with varying agro-climatic zones. Following that, the dynamic model (CERES-Maize) was used for long-term simulation (1990–2020) of maize grain yield (GY) and evapotranspiration (ET) under a wide range of management and environmental factors. Detailed outputs from three growing seasons of field experiments in Egypt, as well as CERES-maize outputs, were used to train and test six machine learning algorithms (linear regression, ridge regression, lasso regression, K-nearest neighbors, random forest, and XGBoost), resulting in more than 1.5 million simulated yield and evapotranspiration scenarios. Seven warming years (i.e., 1991, 1998, 2002, 2005, 2010, 2013, and 2020) were chosen from a 31-year dataset to test MLs, while the remaining 23 years were used to train the models. The Ensemble model (super learner) and XGBoost outperform other models in predicting GY and ET for maize, as evidenced by R2 values greater than 0.82 and RRMSE less than 9%. The broad range of management practices, when averaged across all locations and 31 years of simulation, not only reduced the hazard impact of environmental factors but also increased GY and reduced ET. Moving beyond prediction and interpreting the outputs from Lasso and XGBoost, and using global and local SHAP values, we found that the most important features for predicting GY and ET are maximum temperatures, minimum temperature, available water content, soil organic carbon, irrigation, cultivars, soil texture, solar radiation, and planting date. Determining the most important features is critical for assisting farmers and agronomists in prioritizing such features over other factors in order to increase yield and resource efficiency values. The combination of CMs and ML algorithms is a powerful tool for predicting yield and water use in arid regions, which are particularly vulnerable to climate change and water scarcity. 2024-09-11T17:14:44Z 2024-09-11T17:14:44Z Journal Article https://hdl.handle.net/10568/152177 en Open Access application/pdf MDPI Ahmed Attia, Ajit Govind, Asad Sarwar Qureshi, Til Feike, Mosa Sayed Rizk, Mahmoud Mohamed Abd ElHay Shabana, Ahmed M. S. Kheir. (12/11/2022). Coupling Process-Based Models and Machine Learning Algorithms for Predicting Yield and Evapotranspiration of Maize in Arid Environments. WATER, 14 (22).
spellingShingle water use
random forest
dssat models
xgboost
super learner
lasso regression
hyperparameters tuning
feature importance
Attia, Ahmed
Govind, Ajit
Qureshi, Asad Sarwar
Feike, Til
Rizk, Mosa Sayed
Shabana, Mahmoud Mohamed Abd ElHay
Kheir, Ahmed M.S.
Coupling Process-Based Models and Machine Learning Algorithms for Predicting Yield and Evapotranspiration of Maize in Arid Environments
title Coupling Process-Based Models and Machine Learning Algorithms for Predicting Yield and Evapotranspiration of Maize in Arid Environments
title_full Coupling Process-Based Models and Machine Learning Algorithms for Predicting Yield and Evapotranspiration of Maize in Arid Environments
title_fullStr Coupling Process-Based Models and Machine Learning Algorithms for Predicting Yield and Evapotranspiration of Maize in Arid Environments
title_full_unstemmed Coupling Process-Based Models and Machine Learning Algorithms for Predicting Yield and Evapotranspiration of Maize in Arid Environments
title_short Coupling Process-Based Models and Machine Learning Algorithms for Predicting Yield and Evapotranspiration of Maize in Arid Environments
title_sort coupling process based models and machine learning algorithms for predicting yield and evapotranspiration of maize in arid environments
topic water use
random forest
dssat models
xgboost
super learner
lasso regression
hyperparameters tuning
feature importance
url https://hdl.handle.net/10568/152177
work_keys_str_mv AT attiaahmed couplingprocessbasedmodelsandmachinelearningalgorithmsforpredictingyieldandevapotranspirationofmaizeinaridenvironments
AT govindajit couplingprocessbasedmodelsandmachinelearningalgorithmsforpredictingyieldandevapotranspirationofmaizeinaridenvironments
AT qureshiasadsarwar couplingprocessbasedmodelsandmachinelearningalgorithmsforpredictingyieldandevapotranspirationofmaizeinaridenvironments
AT feiketil couplingprocessbasedmodelsandmachinelearningalgorithmsforpredictingyieldandevapotranspirationofmaizeinaridenvironments
AT rizkmosasayed couplingprocessbasedmodelsandmachinelearningalgorithmsforpredictingyieldandevapotranspirationofmaizeinaridenvironments
AT shabanamahmoudmohamedabdelhay couplingprocessbasedmodelsandmachinelearningalgorithmsforpredictingyieldandevapotranspirationofmaizeinaridenvironments
AT kheirahmedms couplingprocessbasedmodelsandmachinelearningalgorithmsforpredictingyieldandevapotranspirationofmaizeinaridenvironments