Parametric and machine learning approaches to examine yield differences between control and treatment considering outliers and statistical biases: The case of insect resistant/herbicide tolerant (IR/HT) maize in Honduras
Robust impact assessment methods need credible yield, costs, and other production performance parameter estimates. Sample data issues and the realities of producer heterogeneity and markets, including endogeneity, simultaneity, and outliers can affect such parameters. Methods have continued to evolv...
| Main Authors: | , , , |
|---|---|
| Format: | Artículo preliminar |
| Language: | Inglés |
| Published: |
International Food Policy Research Institute
2025
|
| Subjects: | |
| Online Access: | https://hdl.handle.net/10568/174327 |
| _version_ | 1855542235984035840 |
|---|---|
| author | Falck-Zepeda, José B. Zambrano, Patricia Sanders, Arie Trabanino, Carlos Rogelio |
| author_browse | Falck-Zepeda, José B. Sanders, Arie Trabanino, Carlos Rogelio Zambrano, Patricia |
| author_facet | Falck-Zepeda, José B. Zambrano, Patricia Sanders, Arie Trabanino, Carlos Rogelio |
| author_sort | Falck-Zepeda, José B. |
| collection | Repository of Agricultural Research Outputs (CGSpace) |
| description | Robust impact assessment methods need credible yield, costs, and other production performance parameter estimates. Sample data issues and the realities of producer heterogeneity and markets, including endogeneity, simultaneity, and outliers can affect such parameters. Methods have continued to evolve that may address data issues identified in the earlier literature examining genetically modified (GM) crops impacts especially those of conventional field level surveys. These methods may themselves have limitations, introduce trade-offs, and may not always be successful in addressing such issues. Experimental methods such as randomized control trials have been proposed to address several control treatment data issues, but these may not be suitable for every situation and issue and may be more expensive and complex than conventional field surveys. Furthermore, experimental methods may induce the unfortunate outcome of crowding-out impact assessors from low- and middle-income countries. The continued search for alternatives that help address conventional survey shortcomings remains critical. Previously, existing assessment methods were applied to the impact assessment of insect resistant and herbicide tolerant maize adoption in Honduras in 2008 and 2012. Results from assessments identified endogeneity issues such as self-selection and simultaneity concurrently with influential outliers. Procedures used to address these issues independently showed trade-offs between addressing endogeneity and outliers. Thus, the need to identify methods that address both issues simultaneously, minimizing as much as possible the impact of method trade-offs, continues. We structured this paper as follows. First, we review the literature to delineate data and assessment issues potentially affecting robust performance indicators such as yields and costs differentials. Second, we discuss and apply four types of approaches that can be used to obtain robust performance estimates for yield and cost differentials including: 1) Robust Instrumental Variables, 2) Instrumental Variable Regressions, and 3) Control/Treatment, and 4) Machine Learning methods that are amenable to robust strategies to deal with outliers including Random Forest and a Stacking regression approach that allows for a number of “base learners” in order to examine the pooled 2008 and 2012 Honduras field surveys. Third, we discuss implications for impact assessment results and implementation limitations especially in low- and middle-income countries. We further discuss and draw some conclusions regarding methodological issues for consideration by impact assessors and stakeholders. |
| format | Artículo preliminar |
| id | CGSpace174327 |
| institution | CGIAR Consortium |
| language | Inglés |
| publishDate | 2025 |
| publishDateRange | 2025 |
| publishDateSort | 2025 |
| publisher | International Food Policy Research Institute |
| publisherStr | International Food Policy Research Institute |
| record_format | dspace |
| spelling | CGSpace1743272025-12-08T10:06:44Z Parametric and machine learning approaches to examine yield differences between control and treatment considering outliers and statistical biases: The case of insect resistant/herbicide tolerant (IR/HT) maize in Honduras Falck-Zepeda, José B. Zambrano, Patricia Sanders, Arie Trabanino, Carlos Rogelio maize yields impact assessment agriculture data capacity building machine learning parametric programming herbicide resistance Robust impact assessment methods need credible yield, costs, and other production performance parameter estimates. Sample data issues and the realities of producer heterogeneity and markets, including endogeneity, simultaneity, and outliers can affect such parameters. Methods have continued to evolve that may address data issues identified in the earlier literature examining genetically modified (GM) crops impacts especially those of conventional field level surveys. These methods may themselves have limitations, introduce trade-offs, and may not always be successful in addressing such issues. Experimental methods such as randomized control trials have been proposed to address several control treatment data issues, but these may not be suitable for every situation and issue and may be more expensive and complex than conventional field surveys. Furthermore, experimental methods may induce the unfortunate outcome of crowding-out impact assessors from low- and middle-income countries. The continued search for alternatives that help address conventional survey shortcomings remains critical. Previously, existing assessment methods were applied to the impact assessment of insect resistant and herbicide tolerant maize adoption in Honduras in 2008 and 2012. Results from assessments identified endogeneity issues such as self-selection and simultaneity concurrently with influential outliers. Procedures used to address these issues independently showed trade-offs between addressing endogeneity and outliers. Thus, the need to identify methods that address both issues simultaneously, minimizing as much as possible the impact of method trade-offs, continues. We structured this paper as follows. First, we review the literature to delineate data and assessment issues potentially affecting robust performance indicators such as yields and costs differentials. Second, we discuss and apply four types of approaches that can be used to obtain robust performance estimates for yield and cost differentials including: 1) Robust Instrumental Variables, 2) Instrumental Variable Regressions, and 3) Control/Treatment, and 4) Machine Learning methods that are amenable to robust strategies to deal with outliers including Random Forest and a Stacking regression approach that allows for a number of “base learners” in order to examine the pooled 2008 and 2012 Honduras field surveys. Third, we discuss implications for impact assessment results and implementation limitations especially in low- and middle-income countries. We further discuss and draw some conclusions regarding methodological issues for consideration by impact assessors and stakeholders. 2025-04-24 2025-04-25T16:03:58Z 2025-04-25T16:03:58Z Working Paper https://hdl.handle.net/10568/174327 en Open Access application/pdf International Food Policy Research Institute Falck-Zepeda, José B.; Zambrano, Patricia; Sanders, Arie; and Trabanino, Carlos Rogelio. 2025. Parametric and machine learning approaches to examine yield differences between control and treatment considering outliers and statistical biases: The case of insect resistant/herbicide tolerant (IR/HT) maize in Honduras. IFPRI Discussion Paper 2334. Washington, DC: International Food Policy Research Institute. https://hdl.handle.net/10568/174327 |
| spellingShingle | maize yields impact assessment agriculture data capacity building machine learning parametric programming herbicide resistance Falck-Zepeda, José B. Zambrano, Patricia Sanders, Arie Trabanino, Carlos Rogelio Parametric and machine learning approaches to examine yield differences between control and treatment considering outliers and statistical biases: The case of insect resistant/herbicide tolerant (IR/HT) maize in Honduras |
| title | Parametric and machine learning approaches to examine yield differences between control and treatment considering outliers and statistical biases: The case of insect resistant/herbicide tolerant (IR/HT) maize in Honduras |
| title_full | Parametric and machine learning approaches to examine yield differences between control and treatment considering outliers and statistical biases: The case of insect resistant/herbicide tolerant (IR/HT) maize in Honduras |
| title_fullStr | Parametric and machine learning approaches to examine yield differences between control and treatment considering outliers and statistical biases: The case of insect resistant/herbicide tolerant (IR/HT) maize in Honduras |
| title_full_unstemmed | Parametric and machine learning approaches to examine yield differences between control and treatment considering outliers and statistical biases: The case of insect resistant/herbicide tolerant (IR/HT) maize in Honduras |
| title_short | Parametric and machine learning approaches to examine yield differences between control and treatment considering outliers and statistical biases: The case of insect resistant/herbicide tolerant (IR/HT) maize in Honduras |
| title_sort | parametric and machine learning approaches to examine yield differences between control and treatment considering outliers and statistical biases the case of insect resistant herbicide tolerant ir ht maize in honduras |
| topic | maize yields impact assessment agriculture data capacity building machine learning parametric programming herbicide resistance |
| url | https://hdl.handle.net/10568/174327 |
| work_keys_str_mv | AT falckzepedajoseb parametricandmachinelearningapproachestoexamineyielddifferencesbetweencontrolandtreatmentconsideringoutliersandstatisticalbiasesthecaseofinsectresistantherbicidetolerantirhtmaizeinhonduras AT zambranopatricia parametricandmachinelearningapproachestoexamineyielddifferencesbetweencontrolandtreatmentconsideringoutliersandstatisticalbiasesthecaseofinsectresistantherbicidetolerantirhtmaizeinhonduras AT sandersarie parametricandmachinelearningapproachestoexamineyielddifferencesbetweencontrolandtreatmentconsideringoutliersandstatisticalbiasesthecaseofinsectresistantherbicidetolerantirhtmaizeinhonduras AT trabaninocarlosrogelio parametricandmachinelearningapproachestoexamineyielddifferencesbetweencontrolandtreatmentconsideringoutliersandstatisticalbiasesthecaseofinsectresistantherbicidetolerantirhtmaizeinhonduras |