Big data, small explanatory and predictive power: Lessons from random forest modeling of on-farm yield variability and implications for data-driven agronomy

Context Collection and analysis of large volumes of on-farm production data are widely seen as key to understanding yield variability among farmers and improving resource-use efficiency. Objective The aim of this study was to assess the performance of statistical and machine learning methods to exp...

Full description

Bibliographic Details
Main Authors:	Silva, ‪João Vasco, Heerwaarden, Joost van, Reidsma, Pytrik, Laborte, Alice G., Tesfaye, Kindie, Ittersum, Martin K. van
Format:	Journal Article
Language:	Inglés
Published:	Elsevier 2023
Subjects:	sustainable intensification machine learning big data agronomy
Online Access:	https://hdl.handle.net/10568/131409

_version_	1855514556789424128
author	Silva, ‪João Vasco Heerwaarden, Joost van Reidsma, Pytrik Laborte, Alice G. Tesfaye, Kindie Ittersum, Martin K. van
author_browse	Heerwaarden, Joost van Ittersum, Martin K. van Laborte, Alice G. Reidsma, Pytrik Silva, ‪João Vasco Tesfaye, Kindie
author_facet	Silva, ‪João Vasco Heerwaarden, Joost van Reidsma, Pytrik Laborte, Alice G. Tesfaye, Kindie Ittersum, Martin K. van
author_sort	Silva, ‪João Vasco
collection	Repository of Agricultural Research Outputs (CGSpace)
description	Context Collection and analysis of large volumes of on-farm production data are widely seen as key to understanding yield variability among farmers and improving resource-use efficiency. Objective The aim of this study was to assess the performance of statistical and machine learning methods to explain and predict crop yield across thousands of farmers’ fields in contrasting farming systems worldwide. Methods A large database of 10,940 field-year combinations from three countries in different stages of agricultural intensification was analyzed. Random effects models were used to partition crop yield variability and random forest models were used to explain and predict crop yield within a cross-validation scheme with data re-sampling over space and time. Results Yield variability in relative terms was smallest for wheat and barley in the Netherlands and for wheat in Ethiopia, intermediate for rice in the Philippines, and greatest for maize in Ethiopia. Random forest models comprising a total of 87 variables explained a maximum of 65 % of cereal yield variability in the Netherlands and less than 45 % of cereal yield variability in Ethiopia and in the Philippines. Crop management related variables were important to explain and predict cereal yields in Ethiopia, while predictive (i.e., known before the growing season) climatic variables and explanatory (i.e., known during or after the growing season) climatic variables were most important to explain and predict cereal yield variability in the Philippines and in the Netherlands, respectively. Finally, model cross-validation for regions or years not seen during model training reduced the R2 considerably for most crop x country combinations, while for wheat in the Netherlands this was model dependent. Conclusion Big data from farmers’ fields is useful to explain on-farm yield variability to some extent, but not to predict it across time and space. Significance The results call for moderate expectations towards big data and machine learning in agronomic studies, particularly for smallholder farms in the tropics where model performance was poorest independently of the variables considered and the cross-validation scheme used.
format	Journal Article
id	CGSpace131409
institution	CGIAR Consortium
language	Inglés
publishDate	2023
publishDateRange	2023
publishDateSort	2023
publisher	Elsevier
publisherStr	Elsevier
record_format	dspace
spelling	CGSpace1314092025-11-12T04:55:15Z Big data, small explanatory and predictive power: Lessons from random forest modeling of on-farm yield variability and implications for data-driven agronomy Silva, ‪João Vasco Heerwaarden, Joost van Reidsma, Pytrik Laborte, Alice G. Tesfaye, Kindie Ittersum, Martin K. van sustainable intensification machine learning big data agronomy Context Collection and analysis of large volumes of on-farm production data are widely seen as key to understanding yield variability among farmers and improving resource-use efficiency. Objective The aim of this study was to assess the performance of statistical and machine learning methods to explain and predict crop yield across thousands of farmers’ fields in contrasting farming systems worldwide. Methods A large database of 10,940 field-year combinations from three countries in different stages of agricultural intensification was analyzed. Random effects models were used to partition crop yield variability and random forest models were used to explain and predict crop yield within a cross-validation scheme with data re-sampling over space and time. Results Yield variability in relative terms was smallest for wheat and barley in the Netherlands and for wheat in Ethiopia, intermediate for rice in the Philippines, and greatest for maize in Ethiopia. Random forest models comprising a total of 87 variables explained a maximum of 65 % of cereal yield variability in the Netherlands and less than 45 % of cereal yield variability in Ethiopia and in the Philippines. Crop management related variables were important to explain and predict cereal yields in Ethiopia, while predictive (i.e., known before the growing season) climatic variables and explanatory (i.e., known during or after the growing season) climatic variables were most important to explain and predict cereal yield variability in the Philippines and in the Netherlands, respectively. Finally, model cross-validation for regions or years not seen during model training reduced the R2 considerably for most crop x country combinations, while for wheat in the Netherlands this was model dependent. Conclusion Big data from farmers’ fields is useful to explain on-farm yield variability to some extent, but not to predict it across time and space. Significance The results call for moderate expectations towards big data and machine learning in agronomic studies, particularly for smallholder farms in the tropics where model performance was poorest independently of the variables considered and the cross-validation scheme used. 2023-10 2023-08-08T07:28:11Z 2023-08-08T07:28:11Z Journal Article https://hdl.handle.net/10568/131409 en Open Access application/pdf Elsevier Silva, João Vasco, Joost van Heerwaarden, Pytrik Reidsma, Alice G. Laborte, Kindie Tesfaye, and Martin K. van Ittersum. "Big data, small explanatory and predictive power: Lessons from random forest modeling of on-farm yield variability and implications for data-driven agronomy." Field Crops Research 302 (2023): 109063.
spellingShingle	sustainable intensification machine learning big data agronomy Silva, ‪João Vasco Heerwaarden, Joost van Reidsma, Pytrik Laborte, Alice G. Tesfaye, Kindie Ittersum, Martin K. van Big data, small explanatory and predictive power: Lessons from random forest modeling of on-farm yield variability and implications for data-driven agronomy
title	Big data, small explanatory and predictive power: Lessons from random forest modeling of on-farm yield variability and implications for data-driven agronomy
title_full	Big data, small explanatory and predictive power: Lessons from random forest modeling of on-farm yield variability and implications for data-driven agronomy
title_fullStr	Big data, small explanatory and predictive power: Lessons from random forest modeling of on-farm yield variability and implications for data-driven agronomy
title_full_unstemmed	Big data, small explanatory and predictive power: Lessons from random forest modeling of on-farm yield variability and implications for data-driven agronomy
title_short	Big data, small explanatory and predictive power: Lessons from random forest modeling of on-farm yield variability and implications for data-driven agronomy
title_sort	big data small explanatory and predictive power lessons from random forest modeling of on farm yield variability and implications for data driven agronomy
topic	sustainable intensification machine learning big data agronomy
url	https://hdl.handle.net/10568/131409
work_keys_str_mv	AT silvajoaovasco bigdatasmallexplanatoryandpredictivepowerlessonsfromrandomforestmodelingofonfarmyieldvariabilityandimplicationsfordatadrivenagronomy AT heerwaardenjoostvan bigdatasmallexplanatoryandpredictivepowerlessonsfromrandomforestmodelingofonfarmyieldvariabilityandimplicationsfordatadrivenagronomy AT reidsmapytrik bigdatasmallexplanatoryandpredictivepowerlessonsfromrandomforestmodelingofonfarmyieldvariabilityandimplicationsfordatadrivenagronomy AT labortealiceg bigdatasmallexplanatoryandpredictivepowerlessonsfromrandomforestmodelingofonfarmyieldvariabilityandimplicationsfordatadrivenagronomy AT tesfayekindie bigdatasmallexplanatoryandpredictivepowerlessonsfromrandomforestmodelingofonfarmyieldvariabilityandimplicationsfordatadrivenagronomy AT ittersummartinkvan bigdatasmallexplanatoryandpredictivepowerlessonsfromrandomforestmodelingofonfarmyieldvariabilityandimplicationsfordatadrivenagronomy

Big data, small explanatory and predictive power: Lessons from random forest modeling of on-farm yield variability and implications for data-driven agronomy

Similar Items