Sample Earth: Machine-Learning–Ready Land-Cover Reference Dataset

This dataset is part of the Sample Earth initiative, a global effort to build open, high-quality reference data for improving the accuracy and inclusiveness of land-cover maps. It contains GPS-located land-cover samples that can be used to train and validate AI models that generate detailed, accurat...

Full description

Bibliographic Details
Main Authors: Vantalon, Thibaud, Luong, Phuong Thi, Perez Escobar, Jorge Andres, Tello Dagua, Jhon Jairo, Phan, Trong Van, Nguyen, Hang, Hong Nguyen, Hoa Nguyen, Reymondin, Louis
Format: Conjunto de datos
Language:Inglés
Published: 2025
Subjects:
Online Access:https://hdl.handle.net/10568/178010
_version_ 1855526919713325056
author Vantalon, Thibaud
Luong, Phuong Thi
Perez Escobar, Jorge Andres
Tello Dagua, Jhon Jairo
Phan, Trong Van
Nguyen, Hang
Hong Nguyen
Hoa Nguyen
Reymondin, Louis
author_browse Hoa Nguyen
Hong Nguyen
Luong, Phuong Thi
Nguyen, Hang
Perez Escobar, Jorge Andres
Phan, Trong Van
Reymondin, Louis
Tello Dagua, Jhon Jairo
Vantalon, Thibaud
author_facet Vantalon, Thibaud
Luong, Phuong Thi
Perez Escobar, Jorge Andres
Tello Dagua, Jhon Jairo
Phan, Trong Van
Nguyen, Hang
Hong Nguyen
Hoa Nguyen
Reymondin, Louis
author_sort Vantalon, Thibaud
collection Repository of Agricultural Research Outputs (CGSpace)
description This dataset is part of the Sample Earth initiative, a global effort to build open, high-quality reference data for improving the accuracy and inclusiveness of land-cover maps. It contains GPS-located land-cover samples that can be used to train and validate AI models that generate detailed, accurate maps, with a focus on coffee and cocoa production systems. The data were collected across Vietnam and Ghana, combining expert interpretation of high-resolution satellite imagery (Google Earth, Planet) with a smaller subset of ground-truth observations. Each point is labeled and quality-controlled to represent a diverse range of land-cover types commonly found within and around smallholder production areas. The classification scheme includes 10 main classes (such as coffee, cocoa, orchard, natural forests) and 68 sub-classes (such as full sun coffee, coffee intercropped with black pepper, While the primary goal is to distinguish coffee and cocoa systems from other land uses, the dataset also supports broader applications such as agricultural monitoring, deforestation analysis, ecosystem service mapping, land-use planning, and suitability modeling. By providing transparent, well-validated training data, this dataset contributes to Sample Earth’s broader objective: strengthening AI-based land monitoring tools and supporting global efforts, including the EU Deforestation Regulation (EUDR), to ensure sustainable, deforestation-free agricultural supply chains. The dataset is designed to grow continuously, incorporating new commodities, timeframes, and countries over time. Methodology:The dataset was developed primarily through expert visual interpretation of high-resolution satellite imagery from Google Earth and Planet, collected between 2019 and 2022. A smaller subset of points in the Central Highlands of Vietnam was derived from field observations, providing additional ground-truth validation. To enhance interpreter accuracy and contextual understanding, field visits and Google Street View assessments were conducted in both Vietnam and Ghana. These activities helped experts better recognize local land-use patterns and distinguish among different crop and landscape types. All sample points were digitized and standardized using QGIS, with attributes including class ID, crop type, sampling date, and associated metadata to ensure consistency and interoperability. This combined approach of expert interpretation, localized training, and structured data management ensured a high-quality, consistent, and machine-learning–ready dataset suitable for land-cover mapping and model training workflows.
format Conjunto de datos
id CGSpace178010
institution CGIAR Consortium
language Inglés
publishDate 2025
publishDateRange 2025
publishDateSort 2025
record_format dspace
spelling CGSpace1780102025-11-18T22:32:11Z Sample Earth: Machine-Learning–Ready Land-Cover Reference Dataset Vantalon, Thibaud Luong, Phuong Thi Perez Escobar, Jorge Andres Tello Dagua, Jhon Jairo Phan, Trong Van Nguyen, Hang Hong Nguyen Hoa Nguyen Reymondin, Louis artificial intelligence reference points land classification This dataset is part of the Sample Earth initiative, a global effort to build open, high-quality reference data for improving the accuracy and inclusiveness of land-cover maps. It contains GPS-located land-cover samples that can be used to train and validate AI models that generate detailed, accurate maps, with a focus on coffee and cocoa production systems. The data were collected across Vietnam and Ghana, combining expert interpretation of high-resolution satellite imagery (Google Earth, Planet) with a smaller subset of ground-truth observations. Each point is labeled and quality-controlled to represent a diverse range of land-cover types commonly found within and around smallholder production areas. The classification scheme includes 10 main classes (such as coffee, cocoa, orchard, natural forests) and 68 sub-classes (such as full sun coffee, coffee intercropped with black pepper, While the primary goal is to distinguish coffee and cocoa systems from other land uses, the dataset also supports broader applications such as agricultural monitoring, deforestation analysis, ecosystem service mapping, land-use planning, and suitability modeling. By providing transparent, well-validated training data, this dataset contributes to Sample Earth’s broader objective: strengthening AI-based land monitoring tools and supporting global efforts, including the EU Deforestation Regulation (EUDR), to ensure sustainable, deforestation-free agricultural supply chains. The dataset is designed to grow continuously, incorporating new commodities, timeframes, and countries over time. Methodology:The dataset was developed primarily through expert visual interpretation of high-resolution satellite imagery from Google Earth and Planet, collected between 2019 and 2022. A smaller subset of points in the Central Highlands of Vietnam was derived from field observations, providing additional ground-truth validation. To enhance interpreter accuracy and contextual understanding, field visits and Google Street View assessments were conducted in both Vietnam and Ghana. These activities helped experts better recognize local land-use patterns and distinguish among different crop and landscape types. All sample points were digitized and standardized using QGIS, with attributes including class ID, crop type, sampling date, and associated metadata to ensure consistency and interoperability. This combined approach of expert interpretation, localized training, and structured data management ensured a high-quality, consistent, and machine-learning–ready dataset suitable for land-cover mapping and model training workflows. 2025-11 2025-11-18T22:28:40Z 2025-11-18T22:28:40Z Dataset https://hdl.handle.net/10568/178010 en Open Access Vantalon, T.; Luong, P.T.; Perez Escobar, J.A.; Tello Dagua, J.J.; Phan, T.V.; Nguyen, H.; Hong Nguyen; Hoa Nguyen; Reymondin, L. (2025) Sample Earth: Machine-Learning–Ready Land-Cover Reference Dataset. https://doi.org/10.7910/DVN/U7HWY1
spellingShingle artificial intelligence
reference points
land classification
Vantalon, Thibaud
Luong, Phuong Thi
Perez Escobar, Jorge Andres
Tello Dagua, Jhon Jairo
Phan, Trong Van
Nguyen, Hang
Hong Nguyen
Hoa Nguyen
Reymondin, Louis
Sample Earth: Machine-Learning–Ready Land-Cover Reference Dataset
title Sample Earth: Machine-Learning–Ready Land-Cover Reference Dataset
title_full Sample Earth: Machine-Learning–Ready Land-Cover Reference Dataset
title_fullStr Sample Earth: Machine-Learning–Ready Land-Cover Reference Dataset
title_full_unstemmed Sample Earth: Machine-Learning–Ready Land-Cover Reference Dataset
title_short Sample Earth: Machine-Learning–Ready Land-Cover Reference Dataset
title_sort sample earth machine learning ready land cover reference dataset
topic artificial intelligence
reference points
land classification
url https://hdl.handle.net/10568/178010
work_keys_str_mv AT vantalonthibaud sampleearthmachinelearningreadylandcoverreferencedataset
AT luongphuongthi sampleearthmachinelearningreadylandcoverreferencedataset
AT perezescobarjorgeandres sampleearthmachinelearningreadylandcoverreferencedataset
AT tellodaguajhonjairo sampleearthmachinelearningreadylandcoverreferencedataset
AT phantrongvan sampleearthmachinelearningreadylandcoverreferencedataset
AT nguyenhang sampleearthmachinelearningreadylandcoverreferencedataset
AT hongnguyen sampleearthmachinelearningreadylandcoverreferencedataset
AT hoanguyen sampleearthmachinelearningreadylandcoverreferencedataset
AT reymondinlouis sampleearthmachinelearningreadylandcoverreferencedataset