Coupling Process-Based Models and Machine Learning Algorithms for Predicting Yield and Evapotranspiration of Maize in Arid Environments
Crop yield prediction is critical for investigating the yield gap and potential adaptations to environmental and management factors in arid regions. Crop models (CMs) are powerful tools for predicting yield and water use, but they still have some limitations and uncertainties; therefore, combining t...
| Autores principales: | , , , , , , |
|---|---|
| Formato: | Journal Article |
| Lenguaje: | Inglés |
| Publicado: |
MDPI
2024
|
| Materias: | |
| Acceso en línea: | https://hdl.handle.net/10568/152177 |
| _version_ | 1855516537185632256 |
|---|---|
| author | Attia, Ahmed Govind, Ajit Qureshi, Asad Sarwar Feike, Til Rizk, Mosa Sayed Shabana, Mahmoud Mohamed Abd ElHay Kheir, Ahmed M.S. |
| author_browse | Attia, Ahmed Feike, Til Govind, Ajit Kheir, Ahmed M.S. Qureshi, Asad Sarwar Rizk, Mosa Sayed Shabana, Mahmoud Mohamed Abd ElHay |
| author_facet | Attia, Ahmed Govind, Ajit Qureshi, Asad Sarwar Feike, Til Rizk, Mosa Sayed Shabana, Mahmoud Mohamed Abd ElHay Kheir, Ahmed M.S. |
| author_sort | Attia, Ahmed |
| collection | Repository of Agricultural Research Outputs (CGSpace) |
| description | Crop yield prediction is critical for investigating the yield gap and potential adaptations to environmental and management factors in arid regions. Crop models (CMs) are powerful tools for predicting yield and water use, but they still have some limitations and uncertainties; therefore, combining them with machine learning algorithms (MLs) could improve predictions and reduce uncertainty. To that end, the DSSAT-CERES-maize model was calibrated in one location and validated in others across Egypt with varying agro-climatic zones. Following that, the dynamic model (CERES-Maize) was used for long-term simulation (1990–2020) of maize grain yield (GY) and evapotranspiration (ET) under a wide range of management and environmental factors. Detailed outputs from three growing seasons of field experiments in Egypt, as well as CERES-maize outputs, were used to train and test six machine learning algorithms (linear regression, ridge regression, lasso regression, K-nearest neighbors, random forest, and XGBoost), resulting in more than 1.5 million simulated yield and evapotranspiration scenarios. Seven warming years (i.e., 1991, 1998, 2002, 2005, 2010, 2013, and 2020) were chosen from a 31-year dataset to test MLs, while the remaining 23 years were used to train the models. The Ensemble model (super learner) and XGBoost outperform other models in predicting GY and ET for maize, as evidenced by R2 values greater than 0.82 and RRMSE less than 9%. The broad range of management practices, when averaged across all locations and 31 years of simulation, not only reduced the hazard impact of environmental factors but also increased GY and reduced ET. Moving beyond prediction and interpreting the outputs from Lasso and XGBoost, and using global and local SHAP values, we found that the most important features for predicting GY and ET are maximum temperatures, minimum temperature, available water content, soil organic carbon, irrigation, cultivars, soil texture, solar radiation, and planting date. Determining the most important features is critical for assisting farmers and agronomists in prioritizing such features over other factors in order to increase yield and resource efficiency values. The combination of CMs and ML algorithms is a powerful tool for predicting yield and water use in arid regions, which are particularly vulnerable to climate change and water scarcity. |
| format | Journal Article |
| id | CGSpace152177 |
| institution | CGIAR Consortium |
| language | Inglés |
| publishDate | 2024 |
| publishDateRange | 2024 |
| publishDateSort | 2024 |
| publisher | MDPI |
| publisherStr | MDPI |
| record_format | dspace |
| spelling | CGSpace1521772026-01-15T02:02:22Z Coupling Process-Based Models and Machine Learning Algorithms for Predicting Yield and Evapotranspiration of Maize in Arid Environments Attia, Ahmed Govind, Ajit Qureshi, Asad Sarwar Feike, Til Rizk, Mosa Sayed Shabana, Mahmoud Mohamed Abd ElHay Kheir, Ahmed M.S. water use random forest dssat models xgboost super learner lasso regression hyperparameters tuning feature importance Crop yield prediction is critical for investigating the yield gap and potential adaptations to environmental and management factors in arid regions. Crop models (CMs) are powerful tools for predicting yield and water use, but they still have some limitations and uncertainties; therefore, combining them with machine learning algorithms (MLs) could improve predictions and reduce uncertainty. To that end, the DSSAT-CERES-maize model was calibrated in one location and validated in others across Egypt with varying agro-climatic zones. Following that, the dynamic model (CERES-Maize) was used for long-term simulation (1990–2020) of maize grain yield (GY) and evapotranspiration (ET) under a wide range of management and environmental factors. Detailed outputs from three growing seasons of field experiments in Egypt, as well as CERES-maize outputs, were used to train and test six machine learning algorithms (linear regression, ridge regression, lasso regression, K-nearest neighbors, random forest, and XGBoost), resulting in more than 1.5 million simulated yield and evapotranspiration scenarios. Seven warming years (i.e., 1991, 1998, 2002, 2005, 2010, 2013, and 2020) were chosen from a 31-year dataset to test MLs, while the remaining 23 years were used to train the models. The Ensemble model (super learner) and XGBoost outperform other models in predicting GY and ET for maize, as evidenced by R2 values greater than 0.82 and RRMSE less than 9%. The broad range of management practices, when averaged across all locations and 31 years of simulation, not only reduced the hazard impact of environmental factors but also increased GY and reduced ET. Moving beyond prediction and interpreting the outputs from Lasso and XGBoost, and using global and local SHAP values, we found that the most important features for predicting GY and ET are maximum temperatures, minimum temperature, available water content, soil organic carbon, irrigation, cultivars, soil texture, solar radiation, and planting date. Determining the most important features is critical for assisting farmers and agronomists in prioritizing such features over other factors in order to increase yield and resource efficiency values. The combination of CMs and ML algorithms is a powerful tool for predicting yield and water use in arid regions, which are particularly vulnerable to climate change and water scarcity. 2024-09-11T17:14:44Z 2024-09-11T17:14:44Z Journal Article https://hdl.handle.net/10568/152177 en Open Access application/pdf MDPI Ahmed Attia, Ajit Govind, Asad Sarwar Qureshi, Til Feike, Mosa Sayed Rizk, Mahmoud Mohamed Abd ElHay Shabana, Ahmed M. S. Kheir. (12/11/2022). Coupling Process-Based Models and Machine Learning Algorithms for Predicting Yield and Evapotranspiration of Maize in Arid Environments. WATER, 14 (22). |
| spellingShingle | water use random forest dssat models xgboost super learner lasso regression hyperparameters tuning feature importance Attia, Ahmed Govind, Ajit Qureshi, Asad Sarwar Feike, Til Rizk, Mosa Sayed Shabana, Mahmoud Mohamed Abd ElHay Kheir, Ahmed M.S. Coupling Process-Based Models and Machine Learning Algorithms for Predicting Yield and Evapotranspiration of Maize in Arid Environments |
| title | Coupling Process-Based Models and Machine Learning Algorithms for Predicting Yield and Evapotranspiration of Maize in Arid Environments |
| title_full | Coupling Process-Based Models and Machine Learning Algorithms for Predicting Yield and Evapotranspiration of Maize in Arid Environments |
| title_fullStr | Coupling Process-Based Models and Machine Learning Algorithms for Predicting Yield and Evapotranspiration of Maize in Arid Environments |
| title_full_unstemmed | Coupling Process-Based Models and Machine Learning Algorithms for Predicting Yield and Evapotranspiration of Maize in Arid Environments |
| title_short | Coupling Process-Based Models and Machine Learning Algorithms for Predicting Yield and Evapotranspiration of Maize in Arid Environments |
| title_sort | coupling process based models and machine learning algorithms for predicting yield and evapotranspiration of maize in arid environments |
| topic | water use random forest dssat models xgboost super learner lasso regression hyperparameters tuning feature importance |
| url | https://hdl.handle.net/10568/152177 |
| work_keys_str_mv | AT attiaahmed couplingprocessbasedmodelsandmachinelearningalgorithmsforpredictingyieldandevapotranspirationofmaizeinaridenvironments AT govindajit couplingprocessbasedmodelsandmachinelearningalgorithmsforpredictingyieldandevapotranspirationofmaizeinaridenvironments AT qureshiasadsarwar couplingprocessbasedmodelsandmachinelearningalgorithmsforpredictingyieldandevapotranspirationofmaizeinaridenvironments AT feiketil couplingprocessbasedmodelsandmachinelearningalgorithmsforpredictingyieldandevapotranspirationofmaizeinaridenvironments AT rizkmosasayed couplingprocessbasedmodelsandmachinelearningalgorithmsforpredictingyieldandevapotranspirationofmaizeinaridenvironments AT shabanamahmoudmohamedabdelhay couplingprocessbasedmodelsandmachinelearningalgorithmsforpredictingyieldandevapotranspirationofmaizeinaridenvironments AT kheirahmedms couplingprocessbasedmodelsandmachinelearningalgorithmsforpredictingyieldandevapotranspirationofmaizeinaridenvironments |