Nonlinear projection methods for visualizing Barcode data and application on two data sets
Developing tools for visualizing DNA sequences is an important issue in the Barcoding context. Visualizing Barcode data can be put in a purely statistical context, unsupervised learning. Clustering methods combined with projection methods have two closely linked objectives, visualizing and finding s...
| Autores principales: | , , , , , , |
|---|---|
| Formato: | Journal Article |
| Lenguaje: | Inglés |
| Publicado: |
Wiley
2013
|
| Materias: | |
| Acceso en línea: | https://hdl.handle.net/10568/95300 |
| _version_ | 1855514889987031040 |
|---|---|
| author | Olteanu, M Nicolas, V Schaeffer, B Denys, C Missoup, A-D Kennis, Jan Larédo, C. |
| author_browse | Denys, C Kennis, Jan Larédo, C. Missoup, A-D Nicolas, V Olteanu, M Schaeffer, B |
| author_facet | Olteanu, M Nicolas, V Schaeffer, B Denys, C Missoup, A-D Kennis, Jan Larédo, C. |
| author_sort | Olteanu, M |
| collection | Repository of Agricultural Research Outputs (CGSpace) |
| description | Developing tools for visualizing DNA sequences is an important issue in the Barcoding context. Visualizing Barcode data can be put in a purely statistical context, unsupervised learning. Clustering methods combined with projection methods have two closely linked objectives, visualizing and finding structure in the data. Multidimensional scaling (MDS) and Self‐organizing maps (SOM) are unsupervised statistical tools for data visualization. Both algorithms map data onto a lower dimensional manifold: MDS looks for a projection that best preserves pairwise distances while SOM preserves the topology of the data. Both algorithms were initially developed for Euclidean data and the conditions necessary to their good implementation were not satisfied for Barcode data. We developed a workflow consisting in four steps: collapse data into distinct sequences; compute a dissimilarity matrix; run a modified version of SOM for dissimilarity matrices to structure the data and reduce dimensionality; project the results using MDS. This methodology was applied to Astraptes fulgerator and Hylomyscus, an African rodent with debated taxonomy. We obtained very good results for both data sets. The results were robust against unbalanced species. All the species in Astraptes were well displayed in very distinct groups in the various visualizations, except for LOHAMP and FABOV that were mixed up. For Hylomyscus, our findings were consistent with known species, confirmed the existence of four unnamed taxa and suggested the existence of potentially new species. |
| format | Journal Article |
| id | CGSpace95300 |
| institution | CGIAR Consortium |
| language | Inglés |
| publishDate | 2013 |
| publishDateRange | 2013 |
| publishDateSort | 2013 |
| publisher | Wiley |
| publisherStr | Wiley |
| record_format | dspace |
| spelling | CGSpace953002025-06-17T08:23:17Z Nonlinear projection methods for visualizing Barcode data and application on two data sets Olteanu, M Nicolas, V Schaeffer, B Denys, C Missoup, A-D Kennis, Jan Larédo, C. nucleotide sequences algorithms computer analysis mathematics and statistics genetics biotechnology Developing tools for visualizing DNA sequences is an important issue in the Barcoding context. Visualizing Barcode data can be put in a purely statistical context, unsupervised learning. Clustering methods combined with projection methods have two closely linked objectives, visualizing and finding structure in the data. Multidimensional scaling (MDS) and Self‐organizing maps (SOM) are unsupervised statistical tools for data visualization. Both algorithms map data onto a lower dimensional manifold: MDS looks for a projection that best preserves pairwise distances while SOM preserves the topology of the data. Both algorithms were initially developed for Euclidean data and the conditions necessary to their good implementation were not satisfied for Barcode data. We developed a workflow consisting in four steps: collapse data into distinct sequences; compute a dissimilarity matrix; run a modified version of SOM for dissimilarity matrices to structure the data and reduce dimensionality; project the results using MDS. This methodology was applied to Astraptes fulgerator and Hylomyscus, an African rodent with debated taxonomy. We obtained very good results for both data sets. The results were robust against unbalanced species. All the species in Astraptes were well displayed in very distinct groups in the various visualizations, except for LOHAMP and FABOV that were mixed up. For Hylomyscus, our findings were consistent with known species, confirmed the existence of four unnamed taxa and suggested the existence of potentially new species. 2013-01 2018-07-03T11:02:45Z 2018-07-03T11:02:45Z Journal Article https://hdl.handle.net/10568/95300 en Limited Access Wiley Olteanu, M., Nicolas, V., Schaeffer, B., Denys, C., Missoup, A-D., Kennis, Jan, Larédo, C. . 2013. Nonlinear projection methods for visualizing Barcode data and application on two data sets Molecular Ecology Resources, 13 (6) : 976-990. https://doi.org/10.1111/1755-0998.12047 |
| spellingShingle | nucleotide sequences algorithms computer analysis mathematics and statistics genetics biotechnology Olteanu, M Nicolas, V Schaeffer, B Denys, C Missoup, A-D Kennis, Jan Larédo, C. Nonlinear projection methods for visualizing Barcode data and application on two data sets |
| title | Nonlinear projection methods for visualizing Barcode data and application on two data sets |
| title_full | Nonlinear projection methods for visualizing Barcode data and application on two data sets |
| title_fullStr | Nonlinear projection methods for visualizing Barcode data and application on two data sets |
| title_full_unstemmed | Nonlinear projection methods for visualizing Barcode data and application on two data sets |
| title_short | Nonlinear projection methods for visualizing Barcode data and application on two data sets |
| title_sort | nonlinear projection methods for visualizing barcode data and application on two data sets |
| topic | nucleotide sequences algorithms computer analysis mathematics and statistics genetics biotechnology |
| url | https://hdl.handle.net/10568/95300 |
| work_keys_str_mv | AT olteanum nonlinearprojectionmethodsforvisualizingbarcodedataandapplicationontwodatasets AT nicolasv nonlinearprojectionmethodsforvisualizingbarcodedataandapplicationontwodatasets AT schaefferb nonlinearprojectionmethodsforvisualizingbarcodedataandapplicationontwodatasets AT denysc nonlinearprojectionmethodsforvisualizingbarcodedataandapplicationontwodatasets AT missoupad nonlinearprojectionmethodsforvisualizingbarcodedataandapplicationontwodatasets AT kennisjan nonlinearprojectionmethodsforvisualizingbarcodedataandapplicationontwodatasets AT laredoc nonlinearprojectionmethodsforvisualizingbarcodedataandapplicationontwodatasets |