MusaDeepMosaic: Development of a machine learning genomic mosaic classifier tool.

Machine learning and deep learning offer promising prospects for the analysis of biological data and the efficiency of image analysis, particularly in the field of genomic characterization to provides through automation, reproducibility, and accuracy of biological image and genomic analysis. Genomic...

Full description

Bibliographic Details
Main Authors: Vicens, Romain, Cenci, Alberto, Guillaume, Martin, Sardos, Julie, Rouard, Mathieu, Breton, Catherine
Format: Poster
Language:Inglés
Published: 2024
Subjects:
Online Access:https://hdl.handle.net/10568/148863
_version_ 1855518497004584960
author Vicens, Romain
Cenci, Alberto
Guillaume, Martin
Sardos, Julie
Rouard, Mathieu
Breton, Catherine
author_browse Breton, Catherine
Cenci, Alberto
Guillaume, Martin
Rouard, Mathieu
Sardos, Julie
Vicens, Romain
author_facet Vicens, Romain
Cenci, Alberto
Guillaume, Martin
Sardos, Julie
Rouard, Mathieu
Breton, Catherine
author_sort Vicens, Romain
collection Repository of Agricultural Research Outputs (CGSpace)
description Machine learning and deep learning offer promising prospects for the analysis of biological data and the efficiency of image analysis, particularly in the field of genomic characterization to provides through automation, reproducibility, and accuracy of biological image and genomic analysis. Genomic diversity can be represented by SNP markers encoded in image form (bitmap), allowing an operational computational representation of genetic complexity. Bitmap images can be processed by machine learning algorithms [1] or other automated tools, enabling faster and more accurate analysis of genomic data. Our recent research has focused on the analysis of genetic variation within different banana populations using single nucleotide polymorphisms (SNPs) markers. This approach enabled us to define SNPs diversity groups linked to ancestral genomes [2, 3, 4]. The genome of cultivated varieties were then visualized as colored mosaics, which provided a unique visual representation of the genetic complexity of the banana genome ancestry. All cultivated varieties display a composite chromosomic structure with a complex mosaic of segments from different wild species and sub-species that was curated into groups [6]. However, the precise definition of these groups and their attribution to specific ancestors requires expert manual work. The present study describes MusaDeepMosaic intends to facilitate the classification based on pattern recognition. The methodology is using the machine learning model that combine an image-based visualization module transformation of mosaic plot and a convolutional neural network-based classification module adapted for our case. The first step was to define reference groups, which allowed us to segment the data into training classes. However, to improve the accuracy of our model, we needed to significantly increase our data set. Due to the different sequencing technology, we normalized our data. This was done using a data augmentation method adapted to our dataset, which allowed us to expand our data corpus without compromising quality. The ResNet-50 model, a 50-layer deep convolutional neural network introduced in 2015 for image recognition, was utilized in this study. Optimized for accurate performance and fast processing times, ResNet-50 will be integrated into an automated system capable of characterizing newly genotyped individuals, analyzing the new genetic data, and automatically assigning individuals to the appropriate diversity groups. The initial results of our data augmentation and normalization efforts, based on clustering, are encouraging. MusaDeepMosaic will be trained on 1,483 simulated and 317 experimental plots representing the groups of cultivars which is the train dataset. The test dataset comprised 200 plots and the validation dataset contain 178 plots. MusaDeepMosaic achieved a higher level of accuracy (0,97 to 1). This type of classifier will complement VcfHunter tools [2, 3, 4, 5] to analyze and characterize the diversity of the cultivars present in the International Musa Transit Center (Alliance Bioversity CIAT, CGIAR).
format Poster
id CGSpace148863
institution CGIAR Consortium
language Inglés
publishDate 2024
publishDateRange 2024
publishDateSort 2024
record_format dspace
spelling CGSpace1488632025-11-05T11:09:55Z MusaDeepMosaic: Development of a machine learning genomic mosaic classifier tool. Vicens, Romain Cenci, Alberto Guillaume, Martin Sardos, Julie Rouard, Mathieu Breton, Catherine genomics machine learning musa Machine learning and deep learning offer promising prospects for the analysis of biological data and the efficiency of image analysis, particularly in the field of genomic characterization to provides through automation, reproducibility, and accuracy of biological image and genomic analysis. Genomic diversity can be represented by SNP markers encoded in image form (bitmap), allowing an operational computational representation of genetic complexity. Bitmap images can be processed by machine learning algorithms [1] or other automated tools, enabling faster and more accurate analysis of genomic data. Our recent research has focused on the analysis of genetic variation within different banana populations using single nucleotide polymorphisms (SNPs) markers. This approach enabled us to define SNPs diversity groups linked to ancestral genomes [2, 3, 4]. The genome of cultivated varieties were then visualized as colored mosaics, which provided a unique visual representation of the genetic complexity of the banana genome ancestry. All cultivated varieties display a composite chromosomic structure with a complex mosaic of segments from different wild species and sub-species that was curated into groups [6]. However, the precise definition of these groups and their attribution to specific ancestors requires expert manual work. The present study describes MusaDeepMosaic intends to facilitate the classification based on pattern recognition. The methodology is using the machine learning model that combine an image-based visualization module transformation of mosaic plot and a convolutional neural network-based classification module adapted for our case. The first step was to define reference groups, which allowed us to segment the data into training classes. However, to improve the accuracy of our model, we needed to significantly increase our data set. Due to the different sequencing technology, we normalized our data. This was done using a data augmentation method adapted to our dataset, which allowed us to expand our data corpus without compromising quality. The ResNet-50 model, a 50-layer deep convolutional neural network introduced in 2015 for image recognition, was utilized in this study. Optimized for accurate performance and fast processing times, ResNet-50 will be integrated into an automated system capable of characterizing newly genotyped individuals, analyzing the new genetic data, and automatically assigning individuals to the appropriate diversity groups. The initial results of our data augmentation and normalization efforts, based on clustering, are encouraging. MusaDeepMosaic will be trained on 1,483 simulated and 317 experimental plots representing the groups of cultivars which is the train dataset. The test dataset comprised 200 plots and the validation dataset contain 178 plots. MusaDeepMosaic achieved a higher level of accuracy (0,97 to 1). This type of classifier will complement VcfHunter tools [2, 3, 4, 5] to analyze and characterize the diversity of the cultivars present in the International Musa Transit Center (Alliance Bioversity CIAT, CGIAR). 2024-06 2024-07-03T09:38:32Z 2024-07-03T09:38:32Z Poster https://hdl.handle.net/10568/148863 en Open Access application/pdf Vicens, R.; Cenci, A.; Guillaume, M.; Sardos, J.; Rouard, M.; Breton, C. (2024) MusaDeepMosaic: Development of a machine learning genomic mosaic classifier tool. 1 p.
spellingShingle genomics
machine learning
musa
Vicens, Romain
Cenci, Alberto
Guillaume, Martin
Sardos, Julie
Rouard, Mathieu
Breton, Catherine
MusaDeepMosaic: Development of a machine learning genomic mosaic classifier tool.
title MusaDeepMosaic: Development of a machine learning genomic mosaic classifier tool.
title_full MusaDeepMosaic: Development of a machine learning genomic mosaic classifier tool.
title_fullStr MusaDeepMosaic: Development of a machine learning genomic mosaic classifier tool.
title_full_unstemmed MusaDeepMosaic: Development of a machine learning genomic mosaic classifier tool.
title_short MusaDeepMosaic: Development of a machine learning genomic mosaic classifier tool.
title_sort musadeepmosaic development of a machine learning genomic mosaic classifier tool
topic genomics
machine learning
musa
url https://hdl.handle.net/10568/148863
work_keys_str_mv AT vicensromain musadeepmosaicdevelopmentofamachinelearninggenomicmosaicclassifiertool
AT cencialberto musadeepmosaicdevelopmentofamachinelearninggenomicmosaicclassifiertool
AT guillaumemartin musadeepmosaicdevelopmentofamachinelearninggenomicmosaicclassifiertool
AT sardosjulie musadeepmosaicdevelopmentofamachinelearninggenomicmosaicclassifiertool
AT rouardmathieu musadeepmosaicdevelopmentofamachinelearninggenomicmosaicclassifiertool
AT bretoncatherine musadeepmosaicdevelopmentofamachinelearninggenomicmosaicclassifiertool