Nonlinear projection methods for visualizing Barcode data and application on two data sets

Developing tools for visualizing DNA sequences is an important issue in the Barcoding context. Visualizing Barcode data can be put in a purely statistical context, unsupervised learning. Clustering methods combined with projection methods have two closely linked objectives, visualizing and finding s...

Descripción completa

Detalles Bibliográficos
Autores principales: Olteanu, M, Nicolas, V, Schaeffer, B, Denys, C, Missoup, A-D, Kennis, Jan, Larédo, C.
Formato: Journal Article
Lenguaje:Inglés
Publicado: Wiley 2013
Materias:
Acceso en línea:https://hdl.handle.net/10568/95300
_version_ 1855514889987031040
author Olteanu, M
Nicolas, V
Schaeffer, B
Denys, C
Missoup, A-D
Kennis, Jan
Larédo, C.
author_browse Denys, C
Kennis, Jan
Larédo, C.
Missoup, A-D
Nicolas, V
Olteanu, M
Schaeffer, B
author_facet Olteanu, M
Nicolas, V
Schaeffer, B
Denys, C
Missoup, A-D
Kennis, Jan
Larédo, C.
author_sort Olteanu, M
collection Repository of Agricultural Research Outputs (CGSpace)
description Developing tools for visualizing DNA sequences is an important issue in the Barcoding context. Visualizing Barcode data can be put in a purely statistical context, unsupervised learning. Clustering methods combined with projection methods have two closely linked objectives, visualizing and finding structure in the data. Multidimensional scaling (MDS) and Self‐organizing maps (SOM) are unsupervised statistical tools for data visualization. Both algorithms map data onto a lower dimensional manifold: MDS looks for a projection that best preserves pairwise distances while SOM preserves the topology of the data. Both algorithms were initially developed for Euclidean data and the conditions necessary to their good implementation were not satisfied for Barcode data. We developed a workflow consisting in four steps: collapse data into distinct sequences; compute a dissimilarity matrix; run a modified version of SOM for dissimilarity matrices to structure the data and reduce dimensionality; project the results using MDS. This methodology was applied to Astraptes fulgerator and Hylomyscus, an African rodent with debated taxonomy. We obtained very good results for both data sets. The results were robust against unbalanced species. All the species in Astraptes were well displayed in very distinct groups in the various visualizations, except for LOHAMP and FABOV that were mixed up. For Hylomyscus, our findings were consistent with known species, confirmed the existence of four unnamed taxa and suggested the existence of potentially new species.
format Journal Article
id CGSpace95300
institution CGIAR Consortium
language Inglés
publishDate 2013
publishDateRange 2013
publishDateSort 2013
publisher Wiley
publisherStr Wiley
record_format dspace
spelling CGSpace953002025-06-17T08:23:17Z Nonlinear projection methods for visualizing Barcode data and application on two data sets Olteanu, M Nicolas, V Schaeffer, B Denys, C Missoup, A-D Kennis, Jan Larédo, C. nucleotide sequences algorithms computer analysis mathematics and statistics genetics biotechnology Developing tools for visualizing DNA sequences is an important issue in the Barcoding context. Visualizing Barcode data can be put in a purely statistical context, unsupervised learning. Clustering methods combined with projection methods have two closely linked objectives, visualizing and finding structure in the data. Multidimensional scaling (MDS) and Self‐organizing maps (SOM) are unsupervised statistical tools for data visualization. Both algorithms map data onto a lower dimensional manifold: MDS looks for a projection that best preserves pairwise distances while SOM preserves the topology of the data. Both algorithms were initially developed for Euclidean data and the conditions necessary to their good implementation were not satisfied for Barcode data. We developed a workflow consisting in four steps: collapse data into distinct sequences; compute a dissimilarity matrix; run a modified version of SOM for dissimilarity matrices to structure the data and reduce dimensionality; project the results using MDS. This methodology was applied to Astraptes fulgerator and Hylomyscus, an African rodent with debated taxonomy. We obtained very good results for both data sets. The results were robust against unbalanced species. All the species in Astraptes were well displayed in very distinct groups in the various visualizations, except for LOHAMP and FABOV that were mixed up. For Hylomyscus, our findings were consistent with known species, confirmed the existence of four unnamed taxa and suggested the existence of potentially new species. 2013-01 2018-07-03T11:02:45Z 2018-07-03T11:02:45Z Journal Article https://hdl.handle.net/10568/95300 en Limited Access Wiley Olteanu, M., Nicolas, V., Schaeffer, B., Denys, C., Missoup, A-D., Kennis, Jan, Larédo, C. . 2013. Nonlinear projection methods for visualizing Barcode data and application on two data sets Molecular Ecology Resources, 13 (6) : 976-990. https://doi.org/10.1111/1755-0998.12047
spellingShingle nucleotide sequences
algorithms
computer analysis
mathematics and statistics
genetics
biotechnology
Olteanu, M
Nicolas, V
Schaeffer, B
Denys, C
Missoup, A-D
Kennis, Jan
Larédo, C.
Nonlinear projection methods for visualizing Barcode data and application on two data sets
title Nonlinear projection methods for visualizing Barcode data and application on two data sets
title_full Nonlinear projection methods for visualizing Barcode data and application on two data sets
title_fullStr Nonlinear projection methods for visualizing Barcode data and application on two data sets
title_full_unstemmed Nonlinear projection methods for visualizing Barcode data and application on two data sets
title_short Nonlinear projection methods for visualizing Barcode data and application on two data sets
title_sort nonlinear projection methods for visualizing barcode data and application on two data sets
topic nucleotide sequences
algorithms
computer analysis
mathematics and statistics
genetics
biotechnology
url https://hdl.handle.net/10568/95300
work_keys_str_mv AT olteanum nonlinearprojectionmethodsforvisualizingbarcodedataandapplicationontwodatasets
AT nicolasv nonlinearprojectionmethodsforvisualizingbarcodedataandapplicationontwodatasets
AT schaefferb nonlinearprojectionmethodsforvisualizingbarcodedataandapplicationontwodatasets
AT denysc nonlinearprojectionmethodsforvisualizingbarcodedataandapplicationontwodatasets
AT missoupad nonlinearprojectionmethodsforvisualizingbarcodedataandapplicationontwodatasets
AT kennisjan nonlinearprojectionmethodsforvisualizingbarcodedataandapplicationontwodatasets
AT laredoc nonlinearprojectionmethodsforvisualizingbarcodedataandapplicationontwodatasets