A Natural Language Processing Pipeline to extract phenotypic data from formal taxonomic descriptions with a focus on flagellate plants

Assembling large-scale phenotypic datasets for evolutionary and biodiversity studies of plants can be extremely difficult and time consuming. New semi-automated Natural Language Processing (NLP) pipelines can extract phenotypic data from taxonomic descriptions, and their performance can be enhanced...

Descripción completa

Detalles Bibliográficos
Autores principales: Endara L., Burleigh G., Cooper L., Jaiswal, P., Laporte, Marie-Angélique
Formato: Conference Paper
Lenguaje:Inglés
Publicado: 2018
Materias:
Acceso en línea:https://hdl.handle.net/10568/100813
_version_ 1855526379650547712
author Endara L.
Burleigh G.
Cooper L.
Jaiswal, P.
Laporte, Marie-Angélique
author_browse Burleigh G.
Cooper L.
Endara L.
Jaiswal, P.
Laporte, Marie-Angélique
author_facet Endara L.
Burleigh G.
Cooper L.
Jaiswal, P.
Laporte, Marie-Angélique
author_sort Endara L.
collection Repository of Agricultural Research Outputs (CGSpace)
description Assembling large-scale phenotypic datasets for evolutionary and biodiversity studies of plants can be extremely difficult and time consuming. New semi-automated Natural Language Processing (NLP) pipelines can extract phenotypic data from taxonomic descriptions, and their performance can be enhanced by incorporating information from ontologies, like the Plant Ontology (PO) and the Plant Trait Ontology (TO). These ontologies are powerful tools for comparing phenotypes across taxa for large-scale evolutionary and ecological analyses, but they are largely focused on terms associated with flowering plants. We describe a bottom-up approach to identify terms from flagellate plants (including bryophytes, lycophytes, ferns, and gymnosperms) that can be added to existing plant ontologies. We first parsed a large corpus of electronic taxonomic descriptions using the Explorer of Taxon Concepts tool (http://taxonconceptexplorer.org/) and identified flagellate plant specific terms that were missing from the existing ontologies. We extracted new structure and trait terms, and we are currently incorporating the missing structure terms to the PO and modifying the definitions of existing terms to expand their coverage to flagellate plants. We will incorporate trait terms to the TO in the near future.
format Conference Paper
id CGSpace100813
institution CGIAR Consortium
language Inglés
publishDate 2018
publishDateRange 2018
publishDateSort 2018
record_format dspace
spelling CGSpace1008132025-11-05T07:50:53Z A Natural Language Processing Pipeline to extract phenotypic data from formal taxonomic descriptions with a focus on flagellate plants Endara L. Burleigh G. Cooper L. Jaiswal, P. Laporte, Marie-Angélique data processing ontology taxonomy mastigophora phenotypes Assembling large-scale phenotypic datasets for evolutionary and biodiversity studies of plants can be extremely difficult and time consuming. New semi-automated Natural Language Processing (NLP) pipelines can extract phenotypic data from taxonomic descriptions, and their performance can be enhanced by incorporating information from ontologies, like the Plant Ontology (PO) and the Plant Trait Ontology (TO). These ontologies are powerful tools for comparing phenotypes across taxa for large-scale evolutionary and ecological analyses, but they are largely focused on terms associated with flowering plants. We describe a bottom-up approach to identify terms from flagellate plants (including bryophytes, lycophytes, ferns, and gymnosperms) that can be added to existing plant ontologies. We first parsed a large corpus of electronic taxonomic descriptions using the Explorer of Taxon Concepts tool (http://taxonconceptexplorer.org/) and identified flagellate plant specific terms that were missing from the existing ontologies. We extracted new structure and trait terms, and we are currently incorporating the missing structure terms to the PO and modifying the definitions of existing terms to expand their coverage to flagellate plants. We will incorporate trait terms to the TO in the near future. 2018 2019-04-16T14:00:31Z 2019-04-16T14:00:31Z Conference Paper https://hdl.handle.net/10568/100813 en Open Access application/pdf Endara L.; Burleigh G.; Cooper L.; Jaiswal P.; Laporte M-A.; Cui H. (2018) A Natural Language Processing Pipeline to extract phenotypic data from formal taxonomic descriptions with a focus on flagellate plants. In: Jaiswal P.; Cooper, L.; Haendel, M.A.; Mungall, C.J. (eds.) International Conference on Biological Ontology (ICBO 2018), Proceedings of the 9th International Conference on Biological Ontology, Corvallis, Oregon, USA, August 7-10, 2018, 4 p. ISSN: 1613-0073
spellingShingle data processing
ontology
taxonomy
mastigophora
phenotypes
Endara L.
Burleigh G.
Cooper L.
Jaiswal, P.
Laporte, Marie-Angélique
A Natural Language Processing Pipeline to extract phenotypic data from formal taxonomic descriptions with a focus on flagellate plants
title A Natural Language Processing Pipeline to extract phenotypic data from formal taxonomic descriptions with a focus on flagellate plants
title_full A Natural Language Processing Pipeline to extract phenotypic data from formal taxonomic descriptions with a focus on flagellate plants
title_fullStr A Natural Language Processing Pipeline to extract phenotypic data from formal taxonomic descriptions with a focus on flagellate plants
title_full_unstemmed A Natural Language Processing Pipeline to extract phenotypic data from formal taxonomic descriptions with a focus on flagellate plants
title_short A Natural Language Processing Pipeline to extract phenotypic data from formal taxonomic descriptions with a focus on flagellate plants
title_sort natural language processing pipeline to extract phenotypic data from formal taxonomic descriptions with a focus on flagellate plants
topic data processing
ontology
taxonomy
mastigophora
phenotypes
url https://hdl.handle.net/10568/100813
work_keys_str_mv AT endaral anaturallanguageprocessingpipelinetoextractphenotypicdatafromformaltaxonomicdescriptionswithafocusonflagellateplants
AT burleighg anaturallanguageprocessingpipelinetoextractphenotypicdatafromformaltaxonomicdescriptionswithafocusonflagellateplants
AT cooperl anaturallanguageprocessingpipelinetoextractphenotypicdatafromformaltaxonomicdescriptionswithafocusonflagellateplants
AT jaiswalp anaturallanguageprocessingpipelinetoextractphenotypicdatafromformaltaxonomicdescriptionswithafocusonflagellateplants
AT laportemarieangelique anaturallanguageprocessingpipelinetoextractphenotypicdatafromformaltaxonomicdescriptionswithafocusonflagellateplants
AT endaral naturallanguageprocessingpipelinetoextractphenotypicdatafromformaltaxonomicdescriptionswithafocusonflagellateplants
AT burleighg naturallanguageprocessingpipelinetoextractphenotypicdatafromformaltaxonomicdescriptionswithafocusonflagellateplants
AT cooperl naturallanguageprocessingpipelinetoextractphenotypicdatafromformaltaxonomicdescriptionswithafocusonflagellateplants
AT jaiswalp naturallanguageprocessingpipelinetoextractphenotypicdatafromformaltaxonomicdescriptionswithafocusonflagellateplants
AT laportemarieangelique naturallanguageprocessingpipelinetoextractphenotypicdatafromformaltaxonomicdescriptionswithafocusonflagellateplants