Predicting aflatoxin risk in maize using machine learning and satellite data in East and Southern Africa

Mycotoxin contamination in staple cereals like maize poses significant health risks to humans and livestock worldwide. The fungus Aspergillus flavus, the primary aflatoxin producer, is influenced by climate, soil type, nutrients, and crop management practices. This study mapped aflatoxin risk in mai...

Descripción completa

Detalles Bibliográficos
Autores principales: Gachoki, S., Muthoni, F.K., Mahuku, G., Atehnkeng, J., Njeru, N., Kamau, J., Tripathi, L.
Formato: Journal Article
Lenguaje:Inglés
Publicado: Walter de Gruyter GmbH 2025
Materias:
Acceso en línea:https://hdl.handle.net/10568/175102
_version_ 1855518557468622848
author Gachoki, S.
Muthoni, F.K.
Mahuku, G.
Atehnkeng, J.
Njeru, N.
Kamau, J.
Tripathi, L.
author_browse Atehnkeng, J.
Gachoki, S.
Kamau, J.
Mahuku, G.
Muthoni, F.K.
Njeru, N.
Tripathi, L.
author_facet Gachoki, S.
Muthoni, F.K.
Mahuku, G.
Atehnkeng, J.
Njeru, N.
Kamau, J.
Tripathi, L.
author_sort Gachoki, S.
collection Repository of Agricultural Research Outputs (CGSpace)
description Mycotoxin contamination in staple cereals like maize poses significant health risks to humans and livestock worldwide. The fungus Aspergillus flavus, the primary aflatoxin producer, is influenced by climate, soil type, nutrients, and crop management practices. This study mapped aflatoxin risk in maize and its drivers using ensemble (gradient boosting, adaptive boosting, random forest) and non-ensemble (support vector machine, neural networks, naïve bayes, K-nearest neighbours) machine learning methods. We analysed 907 pre-harvest samples from Kenya, Uganda, Malawi, and Tanzania collected between 2009 and 2022. Aflatoxin levels were categorized into low (<5 ppb), medium (>5–20 ppb), and high (>20 ppb) risk classes. Models were trained on biophysical variables – temperature, moisture, topography, and soil – using space-time cross-validation (80:20 training-to-testing split). The trained random forest ranger model showed consistent performance with a balanced accuracy of 51%, overall accuracy of 48%, and F1-scores of 32%. In the test set, gradient boosting achieved the highest balanced accuracy of 62% and class-based F1-scores of 67% (low), 45% (medium), and 41% (high). External validation on 2020 data exhibited model generalizability, with an F1-score of 71% for the low-risk class, while the medium- and high-risk classes had lower F1-scores (5% and 14%, respectively) due to small sample sizes. The gradient boosting model that performed best on unseen data was used for variable importance and spatial predictions. Key risk factors were precipitation (higher risk <200 mm), elevation (higher risk <1000 m asl), and soil temperature (18–24 °C for low-moderate risk; 24–27 °C for higher risk). Spatial predictions indicated major risks along the coast, with projections suggesting rising temperatures could expand high-risk areas inland. Despite model limitations, findings can guide better sampling strategies and early action. Future research should incorporate crop management practices and finer-resolution environmental and socio-economic data to improve model reliability.
format Journal Article
id CGSpace175102
institution CGIAR Consortium
language Inglés
publishDate 2025
publishDateRange 2025
publishDateSort 2025
publisher Walter de Gruyter GmbH
publisherStr Walter de Gruyter GmbH
record_format dspace
spelling CGSpace1751022025-10-30T15:14:34Z Predicting aflatoxin risk in maize using machine learning and satellite data in East and Southern Africa Gachoki, S. Muthoni, F.K. Mahuku, G. Atehnkeng, J. Njeru, N. Kamau, J. Tripathi, L. maize zea mays aflatoxins machine learning Mycotoxin contamination in staple cereals like maize poses significant health risks to humans and livestock worldwide. The fungus Aspergillus flavus, the primary aflatoxin producer, is influenced by climate, soil type, nutrients, and crop management practices. This study mapped aflatoxin risk in maize and its drivers using ensemble (gradient boosting, adaptive boosting, random forest) and non-ensemble (support vector machine, neural networks, naïve bayes, K-nearest neighbours) machine learning methods. We analysed 907 pre-harvest samples from Kenya, Uganda, Malawi, and Tanzania collected between 2009 and 2022. Aflatoxin levels were categorized into low (<5 ppb), medium (>5–20 ppb), and high (>20 ppb) risk classes. Models were trained on biophysical variables – temperature, moisture, topography, and soil – using space-time cross-validation (80:20 training-to-testing split). The trained random forest ranger model showed consistent performance with a balanced accuracy of 51%, overall accuracy of 48%, and F1-scores of 32%. In the test set, gradient boosting achieved the highest balanced accuracy of 62% and class-based F1-scores of 67% (low), 45% (medium), and 41% (high). External validation on 2020 data exhibited model generalizability, with an F1-score of 71% for the low-risk class, while the medium- and high-risk classes had lower F1-scores (5% and 14%, respectively) due to small sample sizes. The gradient boosting model that performed best on unseen data was used for variable importance and spatial predictions. Key risk factors were precipitation (higher risk <200 mm), elevation (higher risk <1000 m asl), and soil temperature (18–24 °C for low-moderate risk; 24–27 °C for higher risk). Spatial predictions indicated major risks along the coast, with projections suggesting rising temperatures could expand high-risk areas inland. Despite model limitations, findings can guide better sampling strategies and early action. Future research should incorporate crop management practices and finer-resolution environmental and socio-economic data to improve model reliability. 2025-04-08 2025-06-16T09:42:21Z 2025-06-16T09:42:21Z Journal Article https://hdl.handle.net/10568/175102 en Limited Access Walter de Gruyter GmbH Gachoki, S., Muthoni, F., Mahuku, G., Atehnkeng, J., Njeru, N., Kamau, J. & Tripathi, L. (2025). Predicting aflatoxin risk in maize using machine learning and satellite data in East and Southern Africa. World Mycotoxin Journal, 1-20.
spellingShingle maize
zea mays
aflatoxins
machine learning
Gachoki, S.
Muthoni, F.K.
Mahuku, G.
Atehnkeng, J.
Njeru, N.
Kamau, J.
Tripathi, L.
Predicting aflatoxin risk in maize using machine learning and satellite data in East and Southern Africa
title Predicting aflatoxin risk in maize using machine learning and satellite data in East and Southern Africa
title_full Predicting aflatoxin risk in maize using machine learning and satellite data in East and Southern Africa
title_fullStr Predicting aflatoxin risk in maize using machine learning and satellite data in East and Southern Africa
title_full_unstemmed Predicting aflatoxin risk in maize using machine learning and satellite data in East and Southern Africa
title_short Predicting aflatoxin risk in maize using machine learning and satellite data in East and Southern Africa
title_sort predicting aflatoxin risk in maize using machine learning and satellite data in east and southern africa
topic maize
zea mays
aflatoxins
machine learning
url https://hdl.handle.net/10568/175102
work_keys_str_mv AT gachokis predictingaflatoxinriskinmaizeusingmachinelearningandsatellitedataineastandsouthernafrica
AT muthonifk predictingaflatoxinriskinmaizeusingmachinelearningandsatellitedataineastandsouthernafrica
AT mahukug predictingaflatoxinriskinmaizeusingmachinelearningandsatellitedataineastandsouthernafrica
AT atehnkengj predictingaflatoxinriskinmaizeusingmachinelearningandsatellitedataineastandsouthernafrica
AT njerun predictingaflatoxinriskinmaizeusingmachinelearningandsatellitedataineastandsouthernafrica
AT kamauj predictingaflatoxinriskinmaizeusingmachinelearningandsatellitedataineastandsouthernafrica
AT tripathil predictingaflatoxinriskinmaizeusingmachinelearningandsatellitedataineastandsouthernafrica