Secondary data analysis using evidence-based bayesian networks with an application to investigate the determinants of childhood stunting

Secondary data – data previously collected by other researchers for a different purpose – offers a cost-effective and readily available resource for research and policy or program design but presents challenges due to the lack of control of sampling design or data. Bayesian Networks (BN) are well-su...

Descripción completa

Detalles Bibliográficos
Autores principales: Yet, Barbaros, Öykü Başerdem, Elif, Rosenstock, Todd
Formato: Journal Article
Lenguaje:Inglés
Publicado: Elsevier 2024
Materias:
Acceso en línea:https://hdl.handle.net/10568/173693
_version_ 1855526171455782912
author Yet, Barbaros
Öykü Başerdem, Elif
Rosenstock, Todd
author_browse Rosenstock, Todd
Yet, Barbaros
Öykü Başerdem, Elif
author_facet Yet, Barbaros
Öykü Başerdem, Elif
Rosenstock, Todd
author_sort Yet, Barbaros
collection Repository of Agricultural Research Outputs (CGSpace)
description Secondary data – data previously collected by other researchers for a different purpose – offers a cost-effective and readily available resource for research and policy or program design but presents challenges due to the lack of control of sampling design or data. Bayesian Networks (BN) are well-suited for guiding secondary data analysis as their graphical structure can encode domain knowledge about the causal relationships among factors, and secondary data can be used to learn the nature and strength of these relationships. In order to build BNs from a combination of knowledge and secondary data, the causal structure is firstly built based on expert knowledge and published evidence, and then the parameters are learned from the data. However, the variables in secondary data often imperfectly match the variables in the causal BN structure. When ad-hoc structural modifications are made to match the structure and data, the link between the parameterized model and the supporting knowledge and evidence is lost. This paper presents a systematic method of building BNs based on secondary data. We build the BN structure based on published evidence and expert interviews, carefully documenting the origin of evidence for each relation in the BN. We use formal BN abstraction operations to match the expert structure with the secondary data. The causal and associational implications of applying abstraction operations are traced, making it possible to link the original BN with the parameterized model and trace it back to more complicated models when additional data become available. The method is demonstrated by building a BN model for the drivers of childhood stunting. The BN model puts together the rich published evidence in this domain in a BN structure and evidence-base while learning the parameters of this model from the Demographic and Health Survey (DHS) datasets for India and Senegal. We compared the BNs built by our approach to BNs learned purely from secondary data using structure learning algorithms. We found that none of the learning algorithms can lead to structures close to the evidence-based model. Yet, the link between our models and the evidence is clearly established due to abstraction approaches. The stunting case study demonstrates the advantages of having a clear evidence-base and building a formal link between the evidence and secondary data using abstraction. The resulting models and supporting evidence can be browsed in an online tool.
format Journal Article
id CGSpace173693
institution CGIAR Consortium
language Inglés
publishDate 2024
publishDateRange 2024
publishDateSort 2024
publisher Elsevier
publisherStr Elsevier
record_format dspace
spelling CGSpace1736932025-10-26T12:51:48Z Secondary data analysis using evidence-based bayesian networks with an application to investigate the determinants of childhood stunting Yet, Barbaros Öykü Başerdem, Elif Rosenstock, Todd data analysis Secondary data – data previously collected by other researchers for a different purpose – offers a cost-effective and readily available resource for research and policy or program design but presents challenges due to the lack of control of sampling design or data. Bayesian Networks (BN) are well-suited for guiding secondary data analysis as their graphical structure can encode domain knowledge about the causal relationships among factors, and secondary data can be used to learn the nature and strength of these relationships. In order to build BNs from a combination of knowledge and secondary data, the causal structure is firstly built based on expert knowledge and published evidence, and then the parameters are learned from the data. However, the variables in secondary data often imperfectly match the variables in the causal BN structure. When ad-hoc structural modifications are made to match the structure and data, the link between the parameterized model and the supporting knowledge and evidence is lost. This paper presents a systematic method of building BNs based on secondary data. We build the BN structure based on published evidence and expert interviews, carefully documenting the origin of evidence for each relation in the BN. We use formal BN abstraction operations to match the expert structure with the secondary data. The causal and associational implications of applying abstraction operations are traced, making it possible to link the original BN with the parameterized model and trace it back to more complicated models when additional data become available. The method is demonstrated by building a BN model for the drivers of childhood stunting. The BN model puts together the rich published evidence in this domain in a BN structure and evidence-base while learning the parameters of this model from the Demographic and Health Survey (DHS) datasets for India and Senegal. We compared the BNs built by our approach to BNs learned purely from secondary data using structure learning algorithms. We found that none of the learning algorithms can lead to structures close to the evidence-based model. Yet, the link between our models and the evidence is clearly established due to abstraction approaches. The stunting case study demonstrates the advantages of having a clear evidence-base and building a formal link between the evidence and secondary data using abstraction. The resulting models and supporting evidence can be browsed in an online tool. 2024-12 2025-03-18T13:03:44Z 2025-03-18T13:03:44Z Journal Article https://hdl.handle.net/10568/173693 en Open Access Elsevier Yet, B.; Öykü Başerdem, E.; Rosenstock, T. (2024) Secondary data analysis using evidence-based bayesian networks with an application to investigate the determinants of childhood stunting. Expert Systems with Applications 256: 124940. ISSN: 0957-4174
spellingShingle data analysis
Yet, Barbaros
Öykü Başerdem, Elif
Rosenstock, Todd
Secondary data analysis using evidence-based bayesian networks with an application to investigate the determinants of childhood stunting
title Secondary data analysis using evidence-based bayesian networks with an application to investigate the determinants of childhood stunting
title_full Secondary data analysis using evidence-based bayesian networks with an application to investigate the determinants of childhood stunting
title_fullStr Secondary data analysis using evidence-based bayesian networks with an application to investigate the determinants of childhood stunting
title_full_unstemmed Secondary data analysis using evidence-based bayesian networks with an application to investigate the determinants of childhood stunting
title_short Secondary data analysis using evidence-based bayesian networks with an application to investigate the determinants of childhood stunting
title_sort secondary data analysis using evidence based bayesian networks with an application to investigate the determinants of childhood stunting
topic data analysis
url https://hdl.handle.net/10568/173693
work_keys_str_mv AT yetbarbaros secondarydataanalysisusingevidencebasedbayesiannetworkswithanapplicationtoinvestigatethedeterminantsofchildhoodstunting
AT oykubaserdemelif secondarydataanalysisusingevidencebasedbayesiannetworkswithanapplicationtoinvestigatethedeterminantsofchildhoodstunting
AT rosenstocktodd secondarydataanalysisusingevidencebasedbayesiannetworkswithanapplicationtoinvestigatethedeterminantsofchildhoodstunting