Shrinking a large dataset to identify variables associated with increased risk of Plasmodium falciparum infection in Western Kenya
Large datasets are often not amenable to analysis using traditional single-step approaches. Here, our general objective was to apply imputation techniques, principal component analysis (PCA), elastic net and generalized linear models to a large dataset in a systematic approach to extract the most me...
| Autores principales: | , , , , , |
|---|---|
| Formato: | Journal Article |
| Lenguaje: | Inglés |
| Publicado: |
Cambridge University Press
2015
|
| Materias: | |
| Acceso en línea: | https://hdl.handle.net/10568/65161 |
| _version_ | 1855517180978790400 |
|---|---|
| author | Tremblay, M. Dahm, J.S. Wamae, C.N. Glanville, William A. de Fèvre, Eric M. Dopfer, D. |
| author_browse | Dahm, J.S. Dopfer, D. Fèvre, Eric M. Glanville, William A. de Tremblay, M. Wamae, C.N. |
| author_facet | Tremblay, M. Dahm, J.S. Wamae, C.N. Glanville, William A. de Fèvre, Eric M. Dopfer, D. |
| author_sort | Tremblay, M. |
| collection | Repository of Agricultural Research Outputs (CGSpace) |
| description | Large datasets are often not amenable to analysis using traditional single-step approaches. Here, our general objective was to apply imputation techniques, principal component analysis (PCA), elastic net and generalized linear models to a large dataset in a systematic approach to extract the most meaningful predictors for a health outcome. We extracted predictors for Plasmodium falciparum infection, from a large covariate dataset while facing limited numbers of observations, using data from the People, Animals, and their Zoonoses (PAZ) project to demonstrate these techniques: data collected from 415 homesteads in western Kenya, contained over 1500 variables that describe the health, environment, and social factors of the humans, livestock, and the homesteads in which they reside. The wide, sparse dataset was simplified to 42 predictors of P. falciparum malaria infection and wealth rankings were produced for all homesteads. The 42 predictors make biological sense and are supported by previous studies. This systematic datamining approach we used would make many large datasets more manageable and informative for decision-making processes and health policy prioritization. |
| format | Journal Article |
| id | CGSpace65161 |
| institution | CGIAR Consortium |
| language | Inglés |
| publishDate | 2015 |
| publishDateRange | 2015 |
| publishDateSort | 2015 |
| publisher | Cambridge University Press |
| publisherStr | Cambridge University Press |
| record_format | dspace |
| spelling | CGSpace651612024-04-25T06:00:26Z Shrinking a large dataset to identify variables associated with increased risk of Plasmodium falciparum infection in Western Kenya Tremblay, M. Dahm, J.S. Wamae, C.N. Glanville, William A. de Fèvre, Eric M. Dopfer, D. cattle health zoonoses infectious diseases epidemiology Large datasets are often not amenable to analysis using traditional single-step approaches. Here, our general objective was to apply imputation techniques, principal component analysis (PCA), elastic net and generalized linear models to a large dataset in a systematic approach to extract the most meaningful predictors for a health outcome. We extracted predictors for Plasmodium falciparum infection, from a large covariate dataset while facing limited numbers of observations, using data from the People, Animals, and their Zoonoses (PAZ) project to demonstrate these techniques: data collected from 415 homesteads in western Kenya, contained over 1500 variables that describe the health, environment, and social factors of the humans, livestock, and the homesteads in which they reside. The wide, sparse dataset was simplified to 42 predictors of P. falciparum malaria infection and wealth rankings were produced for all homesteads. The 42 predictors make biological sense and are supported by previous studies. This systematic datamining approach we used would make many large datasets more manageable and informative for decision-making processes and health policy prioritization. 2015-12 2015-04-21T14:39:50Z 2015-04-21T14:39:50Z Journal Article https://hdl.handle.net/10568/65161 en Open Access Cambridge University Press Tremblay, M., Dahm, J.S., Wamae, C.N., de Glanville, W.A., Fèvre, E.M. and Döpfer, D. 2015. Shrinking a large dataset to identify variables associated with increased risk of Plasmodium falciparum infection in Western Kenya. Epidemiology and Infection 143(16): 3538-3545. |
| spellingShingle | cattle health zoonoses infectious diseases epidemiology Tremblay, M. Dahm, J.S. Wamae, C.N. Glanville, William A. de Fèvre, Eric M. Dopfer, D. Shrinking a large dataset to identify variables associated with increased risk of Plasmodium falciparum infection in Western Kenya |
| title | Shrinking a large dataset to identify variables associated with increased risk of Plasmodium falciparum infection in Western Kenya |
| title_full | Shrinking a large dataset to identify variables associated with increased risk of Plasmodium falciparum infection in Western Kenya |
| title_fullStr | Shrinking a large dataset to identify variables associated with increased risk of Plasmodium falciparum infection in Western Kenya |
| title_full_unstemmed | Shrinking a large dataset to identify variables associated with increased risk of Plasmodium falciparum infection in Western Kenya |
| title_short | Shrinking a large dataset to identify variables associated with increased risk of Plasmodium falciparum infection in Western Kenya |
| title_sort | shrinking a large dataset to identify variables associated with increased risk of plasmodium falciparum infection in western kenya |
| topic | cattle health zoonoses infectious diseases epidemiology |
| url | https://hdl.handle.net/10568/65161 |
| work_keys_str_mv | AT tremblaym shrinkingalargedatasettoidentifyvariablesassociatedwithincreasedriskofplasmodiumfalciparuminfectioninwesternkenya AT dahmjs shrinkingalargedatasettoidentifyvariablesassociatedwithincreasedriskofplasmodiumfalciparuminfectioninwesternkenya AT wamaecn shrinkingalargedatasettoidentifyvariablesassociatedwithincreasedriskofplasmodiumfalciparuminfectioninwesternkenya AT glanvillewilliamade shrinkingalargedatasettoidentifyvariablesassociatedwithincreasedriskofplasmodiumfalciparuminfectioninwesternkenya AT fevreericm shrinkingalargedatasettoidentifyvariablesassociatedwithincreasedriskofplasmodiumfalciparuminfectioninwesternkenya AT dopferd shrinkingalargedatasettoidentifyvariablesassociatedwithincreasedriskofplasmodiumfalciparuminfectioninwesternkenya |