Parametric and machine learning approaches to examine yield differences between control and treatment considering outliers and statistical biases: The case of insect resistant/herbicide tolerant (IR/HT) maize in Honduras

Robust impact assessment methods need credible yield, costs, and other production performance parameter estimates. Sample data issues and the realities of producer heterogeneity and markets, including endogeneity, simultaneity, and outliers can affect such parameters. Methods have continued to evolv...

Descripción completa

Detalles Bibliográficos
Autores principales: Falck-Zepeda, José B., Zambrano, Patricia, Sanders, Arie, Trabanino, Carlos Rogelio
Formato: Artículo preliminar
Lenguaje:Inglés
Publicado: International Food Policy Research Institute 2025
Materias:
Acceso en línea:https://hdl.handle.net/10568/174327
_version_ 1855542235984035840
author Falck-Zepeda, José B.
Zambrano, Patricia
Sanders, Arie
Trabanino, Carlos Rogelio
author_browse Falck-Zepeda, José B.
Sanders, Arie
Trabanino, Carlos Rogelio
Zambrano, Patricia
author_facet Falck-Zepeda, José B.
Zambrano, Patricia
Sanders, Arie
Trabanino, Carlos Rogelio
author_sort Falck-Zepeda, José B.
collection Repository of Agricultural Research Outputs (CGSpace)
description Robust impact assessment methods need credible yield, costs, and other production performance parameter estimates. Sample data issues and the realities of producer heterogeneity and markets, including endogeneity, simultaneity, and outliers can affect such parameters. Methods have continued to evolve that may address data issues identified in the earlier literature examining genetically modified (GM) crops impacts especially those of conventional field level surveys. These methods may themselves have limitations, introduce trade-offs, and may not always be successful in addressing such issues. Experimental methods such as randomized control trials have been proposed to address several control treatment data issues, but these may not be suitable for every situation and issue and may be more expensive and complex than conventional field surveys. Furthermore, experimental methods may induce the unfortunate outcome of crowding-out impact assessors from low- and middle-income countries. The continued search for alternatives that help address conventional survey shortcomings remains critical. Previously, existing assessment methods were applied to the impact assessment of insect resistant and herbicide tolerant maize adoption in Honduras in 2008 and 2012. Results from assessments identified endogeneity issues such as self-selection and simultaneity concurrently with influential outliers. Procedures used to address these issues independently showed trade-offs between addressing endogeneity and outliers. Thus, the need to identify methods that address both issues simultaneously, minimizing as much as possible the impact of method trade-offs, continues. We structured this paper as follows. First, we review the literature to delineate data and assessment issues potentially affecting robust performance indicators such as yields and costs differentials. Second, we discuss and apply four types of approaches that can be used to obtain robust performance estimates for yield and cost differentials including: 1) Robust Instrumental Variables, 2) Instrumental Variable Regressions, and 3) Control/Treatment, and 4) Machine Learning methods that are amenable to robust strategies to deal with outliers including Random Forest and a Stacking regression approach that allows for a number of “base learners” in order to examine the pooled 2008 and 2012 Honduras field surveys. Third, we discuss implications for impact assessment results and implementation limitations especially in low- and middle-income countries. We further discuss and draw some conclusions regarding methodological issues for consideration by impact assessors and stakeholders.
format Artículo preliminar
id CGSpace174327
institution CGIAR Consortium
language Inglés
publishDate 2025
publishDateRange 2025
publishDateSort 2025
publisher International Food Policy Research Institute
publisherStr International Food Policy Research Institute
record_format dspace
spelling CGSpace1743272025-12-08T10:06:44Z Parametric and machine learning approaches to examine yield differences between control and treatment considering outliers and statistical biases: The case of insect resistant/herbicide tolerant (IR/HT) maize in Honduras Falck-Zepeda, José B. Zambrano, Patricia Sanders, Arie Trabanino, Carlos Rogelio maize yields impact assessment agriculture data capacity building machine learning parametric programming herbicide resistance Robust impact assessment methods need credible yield, costs, and other production performance parameter estimates. Sample data issues and the realities of producer heterogeneity and markets, including endogeneity, simultaneity, and outliers can affect such parameters. Methods have continued to evolve that may address data issues identified in the earlier literature examining genetically modified (GM) crops impacts especially those of conventional field level surveys. These methods may themselves have limitations, introduce trade-offs, and may not always be successful in addressing such issues. Experimental methods such as randomized control trials have been proposed to address several control treatment data issues, but these may not be suitable for every situation and issue and may be more expensive and complex than conventional field surveys. Furthermore, experimental methods may induce the unfortunate outcome of crowding-out impact assessors from low- and middle-income countries. The continued search for alternatives that help address conventional survey shortcomings remains critical. Previously, existing assessment methods were applied to the impact assessment of insect resistant and herbicide tolerant maize adoption in Honduras in 2008 and 2012. Results from assessments identified endogeneity issues such as self-selection and simultaneity concurrently with influential outliers. Procedures used to address these issues independently showed trade-offs between addressing endogeneity and outliers. Thus, the need to identify methods that address both issues simultaneously, minimizing as much as possible the impact of method trade-offs, continues. We structured this paper as follows. First, we review the literature to delineate data and assessment issues potentially affecting robust performance indicators such as yields and costs differentials. Second, we discuss and apply four types of approaches that can be used to obtain robust performance estimates for yield and cost differentials including: 1) Robust Instrumental Variables, 2) Instrumental Variable Regressions, and 3) Control/Treatment, and 4) Machine Learning methods that are amenable to robust strategies to deal with outliers including Random Forest and a Stacking regression approach that allows for a number of “base learners” in order to examine the pooled 2008 and 2012 Honduras field surveys. Third, we discuss implications for impact assessment results and implementation limitations especially in low- and middle-income countries. We further discuss and draw some conclusions regarding methodological issues for consideration by impact assessors and stakeholders. 2025-04-24 2025-04-25T16:03:58Z 2025-04-25T16:03:58Z Working Paper https://hdl.handle.net/10568/174327 en Open Access application/pdf International Food Policy Research Institute Falck-Zepeda, José B.; Zambrano, Patricia; Sanders, Arie; and Trabanino, Carlos Rogelio. 2025. Parametric and machine learning approaches to examine yield differences between control and treatment considering outliers and statistical biases: The case of insect resistant/herbicide tolerant (IR/HT) maize in Honduras. IFPRI Discussion Paper 2334. Washington, DC: International Food Policy Research Institute. https://hdl.handle.net/10568/174327
spellingShingle maize
yields
impact assessment
agriculture
data
capacity building
machine learning
parametric programming
herbicide resistance
Falck-Zepeda, José B.
Zambrano, Patricia
Sanders, Arie
Trabanino, Carlos Rogelio
Parametric and machine learning approaches to examine yield differences between control and treatment considering outliers and statistical biases: The case of insect resistant/herbicide tolerant (IR/HT) maize in Honduras
title Parametric and machine learning approaches to examine yield differences between control and treatment considering outliers and statistical biases: The case of insect resistant/herbicide tolerant (IR/HT) maize in Honduras
title_full Parametric and machine learning approaches to examine yield differences between control and treatment considering outliers and statistical biases: The case of insect resistant/herbicide tolerant (IR/HT) maize in Honduras
title_fullStr Parametric and machine learning approaches to examine yield differences between control and treatment considering outliers and statistical biases: The case of insect resistant/herbicide tolerant (IR/HT) maize in Honduras
title_full_unstemmed Parametric and machine learning approaches to examine yield differences between control and treatment considering outliers and statistical biases: The case of insect resistant/herbicide tolerant (IR/HT) maize in Honduras
title_short Parametric and machine learning approaches to examine yield differences between control and treatment considering outliers and statistical biases: The case of insect resistant/herbicide tolerant (IR/HT) maize in Honduras
title_sort parametric and machine learning approaches to examine yield differences between control and treatment considering outliers and statistical biases the case of insect resistant herbicide tolerant ir ht maize in honduras
topic maize
yields
impact assessment
agriculture
data
capacity building
machine learning
parametric programming
herbicide resistance
url https://hdl.handle.net/10568/174327
work_keys_str_mv AT falckzepedajoseb parametricandmachinelearningapproachestoexamineyielddifferencesbetweencontrolandtreatmentconsideringoutliersandstatisticalbiasesthecaseofinsectresistantherbicidetolerantirhtmaizeinhonduras
AT zambranopatricia parametricandmachinelearningapproachestoexamineyielddifferencesbetweencontrolandtreatmentconsideringoutliersandstatisticalbiasesthecaseofinsectresistantherbicidetolerantirhtmaizeinhonduras
AT sandersarie parametricandmachinelearningapproachestoexamineyielddifferencesbetweencontrolandtreatmentconsideringoutliersandstatisticalbiasesthecaseofinsectresistantherbicidetolerantirhtmaizeinhonduras
AT trabaninocarlosrogelio parametricandmachinelearningapproachestoexamineyielddifferencesbetweencontrolandtreatmentconsideringoutliersandstatisticalbiasesthecaseofinsectresistantherbicidetolerantirhtmaizeinhonduras