Shrinking a large dataset to identify variables associated with increased risk of Plasmodium falciparum infection in Western Kenya

Large datasets are often not amenable to analysis using traditional single-step approaches. Here, our general objective was to apply imputation techniques, principal component analysis (PCA), elastic net and generalized linear models to a large dataset in a systematic approach to extract the most me...

Descripción completa

Detalles Bibliográficos
Autores principales: Tremblay, M., Dahm, J.S., Wamae, C.N., Glanville, William A. de, Fèvre, Eric M., Dopfer, D.
Formato: Journal Article
Lenguaje:Inglés
Publicado: Cambridge University Press 2015
Materias:
Acceso en línea:https://hdl.handle.net/10568/65161
_version_ 1855517180978790400
author Tremblay, M.
Dahm, J.S.
Wamae, C.N.
Glanville, William A. de
Fèvre, Eric M.
Dopfer, D.
author_browse Dahm, J.S.
Dopfer, D.
Fèvre, Eric M.
Glanville, William A. de
Tremblay, M.
Wamae, C.N.
author_facet Tremblay, M.
Dahm, J.S.
Wamae, C.N.
Glanville, William A. de
Fèvre, Eric M.
Dopfer, D.
author_sort Tremblay, M.
collection Repository of Agricultural Research Outputs (CGSpace)
description Large datasets are often not amenable to analysis using traditional single-step approaches. Here, our general objective was to apply imputation techniques, principal component analysis (PCA), elastic net and generalized linear models to a large dataset in a systematic approach to extract the most meaningful predictors for a health outcome. We extracted predictors for Plasmodium falciparum infection, from a large covariate dataset while facing limited numbers of observations, using data from the People, Animals, and their Zoonoses (PAZ) project to demonstrate these techniques: data collected from 415 homesteads in western Kenya, contained over 1500 variables that describe the health, environment, and social factors of the humans, livestock, and the homesteads in which they reside. The wide, sparse dataset was simplified to 42 predictors of P. falciparum malaria infection and wealth rankings were produced for all homesteads. The 42 predictors make biological sense and are supported by previous studies. This systematic datamining approach we used would make many large datasets more manageable and informative for decision-making processes and health policy prioritization.
format Journal Article
id CGSpace65161
institution CGIAR Consortium
language Inglés
publishDate 2015
publishDateRange 2015
publishDateSort 2015
publisher Cambridge University Press
publisherStr Cambridge University Press
record_format dspace
spelling CGSpace651612024-04-25T06:00:26Z Shrinking a large dataset to identify variables associated with increased risk of Plasmodium falciparum infection in Western Kenya Tremblay, M. Dahm, J.S. Wamae, C.N. Glanville, William A. de Fèvre, Eric M. Dopfer, D. cattle health zoonoses infectious diseases epidemiology Large datasets are often not amenable to analysis using traditional single-step approaches. Here, our general objective was to apply imputation techniques, principal component analysis (PCA), elastic net and generalized linear models to a large dataset in a systematic approach to extract the most meaningful predictors for a health outcome. We extracted predictors for Plasmodium falciparum infection, from a large covariate dataset while facing limited numbers of observations, using data from the People, Animals, and their Zoonoses (PAZ) project to demonstrate these techniques: data collected from 415 homesteads in western Kenya, contained over 1500 variables that describe the health, environment, and social factors of the humans, livestock, and the homesteads in which they reside. The wide, sparse dataset was simplified to 42 predictors of P. falciparum malaria infection and wealth rankings were produced for all homesteads. The 42 predictors make biological sense and are supported by previous studies. This systematic datamining approach we used would make many large datasets more manageable and informative for decision-making processes and health policy prioritization. 2015-12 2015-04-21T14:39:50Z 2015-04-21T14:39:50Z Journal Article https://hdl.handle.net/10568/65161 en Open Access Cambridge University Press Tremblay, M., Dahm, J.S., Wamae, C.N., de Glanville, W.A., Fèvre, E.M. and Döpfer, D. 2015. Shrinking a large dataset to identify variables associated with increased risk of Plasmodium falciparum infection in Western Kenya. Epidemiology and Infection 143(16): 3538-3545.
spellingShingle cattle
health
zoonoses
infectious diseases
epidemiology
Tremblay, M.
Dahm, J.S.
Wamae, C.N.
Glanville, William A. de
Fèvre, Eric M.
Dopfer, D.
Shrinking a large dataset to identify variables associated with increased risk of Plasmodium falciparum infection in Western Kenya
title Shrinking a large dataset to identify variables associated with increased risk of Plasmodium falciparum infection in Western Kenya
title_full Shrinking a large dataset to identify variables associated with increased risk of Plasmodium falciparum infection in Western Kenya
title_fullStr Shrinking a large dataset to identify variables associated with increased risk of Plasmodium falciparum infection in Western Kenya
title_full_unstemmed Shrinking a large dataset to identify variables associated with increased risk of Plasmodium falciparum infection in Western Kenya
title_short Shrinking a large dataset to identify variables associated with increased risk of Plasmodium falciparum infection in Western Kenya
title_sort shrinking a large dataset to identify variables associated with increased risk of plasmodium falciparum infection in western kenya
topic cattle
health
zoonoses
infectious diseases
epidemiology
url https://hdl.handle.net/10568/65161
work_keys_str_mv AT tremblaym shrinkingalargedatasettoidentifyvariablesassociatedwithincreasedriskofplasmodiumfalciparuminfectioninwesternkenya
AT dahmjs shrinkingalargedatasettoidentifyvariablesassociatedwithincreasedriskofplasmodiumfalciparuminfectioninwesternkenya
AT wamaecn shrinkingalargedatasettoidentifyvariablesassociatedwithincreasedriskofplasmodiumfalciparuminfectioninwesternkenya
AT glanvillewilliamade shrinkingalargedatasettoidentifyvariablesassociatedwithincreasedriskofplasmodiumfalciparuminfectioninwesternkenya
AT fevreericm shrinkingalargedatasettoidentifyvariablesassociatedwithincreasedriskofplasmodiumfalciparuminfectioninwesternkenya
AT dopferd shrinkingalargedatasettoidentifyvariablesassociatedwithincreasedriskofplasmodiumfalciparuminfectioninwesternkenya