| Sumario: | Mycotoxin contamination in staple cereals like maize poses significant health risks to humans and livestock worldwide. The fungus Aspergillus flavus, the primary aflatoxin producer, is influenced by climate, soil type, nutrients, and crop management practices. This study mapped aflatoxin risk in maize and its drivers using ensemble (gradient boosting, adaptive boosting, random forest) and non-ensemble (support vector machine, neural networks, naïve bayes, K-nearest neighbours) machine learning methods. We analysed 907 pre-harvest samples from Kenya, Uganda, Malawi, and Tanzania collected between 2009 and 2022. Aflatoxin levels were categorized into low (<5 ppb), medium (>5–20 ppb), and high (>20 ppb) risk classes. Models were trained on biophysical variables – temperature, moisture, topography, and soil – using space-time cross-validation (80:20 training-to-testing split). The trained random forest ranger model showed consistent performance with a balanced accuracy of 51%, overall accuracy of 48%, and F1-scores of 32%. In the test set, gradient boosting achieved the highest balanced accuracy of 62% and class-based F1-scores of 67% (low), 45% (medium), and 41% (high). External validation on 2020 data exhibited model generalizability, with an F1-score of 71% for the low-risk class, while the medium- and high-risk classes had lower F1-scores (5% and 14%, respectively) due to small sample sizes. The gradient boosting model that performed best on unseen data was used for variable importance and spatial predictions. Key risk factors were precipitation (higher risk <200 mm), elevation (higher risk <1000 m asl), and soil temperature (18–24 °C for low-moderate risk; 24–27 °C for higher risk). Spatial predictions indicated major risks along the coast, with projections suggesting rising temperatures could expand high-risk areas inland. Despite model limitations, findings can guide better sampling strategies and early action. Future research should incorporate crop management practices and finer-resolution environmental and socio-economic data to improve model reliability.
|