Sample Earth: Machine-Learning–Ready Land-Cover Reference Dataset

This dataset is part of the Sample Earth initiative, a global effort to build open, high-quality reference data for improving the accuracy and inclusiveness of land-cover maps. It contains GPS-located land-cover samples that can be used to train and validate AI models that generate detailed, accurat...

Full description

Bibliographic Details
Main Authors:	Vantalon, Thibaud, Luong, Phuong Thi, Perez Escobar, Jorge Andres, Tello Dagua, Jhon Jairo, Phan, Trong Van, Nguyen, Hang, Hong Nguyen, Hoa Nguyen, Reymondin, Louis
Format:	Conjunto de datos
Language:	Inglés
Published:	2025
Subjects:	artificial intelligence reference points land classification
Online Access:	https://hdl.handle.net/10568/178010

_version_	1855526919713325056
author	Vantalon, Thibaud Luong, Phuong Thi Perez Escobar, Jorge Andres Tello Dagua, Jhon Jairo Phan, Trong Van Nguyen, Hang Hong Nguyen Hoa Nguyen Reymondin, Louis
author_browse	Hoa Nguyen Hong Nguyen Luong, Phuong Thi Nguyen, Hang Perez Escobar, Jorge Andres Phan, Trong Van Reymondin, Louis Tello Dagua, Jhon Jairo Vantalon, Thibaud
author_facet	Vantalon, Thibaud Luong, Phuong Thi Perez Escobar, Jorge Andres Tello Dagua, Jhon Jairo Phan, Trong Van Nguyen, Hang Hong Nguyen Hoa Nguyen Reymondin, Louis
author_sort	Vantalon, Thibaud
collection	Repository of Agricultural Research Outputs (CGSpace)
description	This dataset is part of the Sample Earth initiative, a global effort to build open, high-quality reference data for improving the accuracy and inclusiveness of land-cover maps. It contains GPS-located land-cover samples that can be used to train and validate AI models that generate detailed, accurate maps, with a focus on coffee and cocoa production systems. The data were collected across Vietnam and Ghana, combining expert interpretation of high-resolution satellite imagery (Google Earth, Planet) with a smaller subset of ground-truth observations. Each point is labeled and quality-controlled to represent a diverse range of land-cover types commonly found within and around smallholder production areas. The classification scheme includes 10 main classes (such as coffee, cocoa, orchard, natural forests) and 68 sub-classes (such as full sun coffee, coffee intercropped with black pepper, While the primary goal is to distinguish coffee and cocoa systems from other land uses, the dataset also supports broader applications such as agricultural monitoring, deforestation analysis, ecosystem service mapping, land-use planning, and suitability modeling. By providing transparent, well-validated training data, this dataset contributes to Sample Earth’s broader objective: strengthening AI-based land monitoring tools and supporting global efforts, including the EU Deforestation Regulation (EUDR), to ensure sustainable, deforestation-free agricultural supply chains. The dataset is designed to grow continuously, incorporating new commodities, timeframes, and countries over time. Methodology:The dataset was developed primarily through expert visual interpretation of high-resolution satellite imagery from Google Earth and Planet, collected between 2019 and 2022. A smaller subset of points in the Central Highlands of Vietnam was derived from field observations, providing additional ground-truth validation. To enhance interpreter accuracy and contextual understanding, field visits and Google Street View assessments were conducted in both Vietnam and Ghana. These activities helped experts better recognize local land-use patterns and distinguish among different crop and landscape types. All sample points were digitized and standardized using QGIS, with attributes including class ID, crop type, sampling date, and associated metadata to ensure consistency and interoperability. This combined approach of expert interpretation, localized training, and structured data management ensured a high-quality, consistent, and machine-learning–ready dataset suitable for land-cover mapping and model training workflows.
format	Conjunto de datos
id	CGSpace178010
institution	CGIAR Consortium
language	Inglés
publishDate	2025
publishDateRange	2025
publishDateSort	2025
record_format	dspace
spelling	CGSpace1780102025-11-18T22:32:11Z Sample Earth: Machine-Learning–Ready Land-Cover Reference Dataset Vantalon, Thibaud Luong, Phuong Thi Perez Escobar, Jorge Andres Tello Dagua, Jhon Jairo Phan, Trong Van Nguyen, Hang Hong Nguyen Hoa Nguyen Reymondin, Louis artificial intelligence reference points land classification This dataset is part of the Sample Earth initiative, a global effort to build open, high-quality reference data for improving the accuracy and inclusiveness of land-cover maps. It contains GPS-located land-cover samples that can be used to train and validate AI models that generate detailed, accurate maps, with a focus on coffee and cocoa production systems. The data were collected across Vietnam and Ghana, combining expert interpretation of high-resolution satellite imagery (Google Earth, Planet) with a smaller subset of ground-truth observations. Each point is labeled and quality-controlled to represent a diverse range of land-cover types commonly found within and around smallholder production areas. The classification scheme includes 10 main classes (such as coffee, cocoa, orchard, natural forests) and 68 sub-classes (such as full sun coffee, coffee intercropped with black pepper, While the primary goal is to distinguish coffee and cocoa systems from other land uses, the dataset also supports broader applications such as agricultural monitoring, deforestation analysis, ecosystem service mapping, land-use planning, and suitability modeling. By providing transparent, well-validated training data, this dataset contributes to Sample Earth’s broader objective: strengthening AI-based land monitoring tools and supporting global efforts, including the EU Deforestation Regulation (EUDR), to ensure sustainable, deforestation-free agricultural supply chains. The dataset is designed to grow continuously, incorporating new commodities, timeframes, and countries over time. Methodology:The dataset was developed primarily through expert visual interpretation of high-resolution satellite imagery from Google Earth and Planet, collected between 2019 and 2022. A smaller subset of points in the Central Highlands of Vietnam was derived from field observations, providing additional ground-truth validation. To enhance interpreter accuracy and contextual understanding, field visits and Google Street View assessments were conducted in both Vietnam and Ghana. These activities helped experts better recognize local land-use patterns and distinguish among different crop and landscape types. All sample points were digitized and standardized using QGIS, with attributes including class ID, crop type, sampling date, and associated metadata to ensure consistency and interoperability. This combined approach of expert interpretation, localized training, and structured data management ensured a high-quality, consistent, and machine-learning–ready dataset suitable for land-cover mapping and model training workflows. 2025-11 2025-11-18T22:28:40Z 2025-11-18T22:28:40Z Dataset https://hdl.handle.net/10568/178010 en Open Access Vantalon, T.; Luong, P.T.; Perez Escobar, J.A.; Tello Dagua, J.J.; Phan, T.V.; Nguyen, H.; Hong Nguyen; Hoa Nguyen; Reymondin, L. (2025) Sample Earth: Machine-Learning–Ready Land-Cover Reference Dataset. https://doi.org/10.7910/DVN/U7HWY1
spellingShingle	artificial intelligence reference points land classification Vantalon, Thibaud Luong, Phuong Thi Perez Escobar, Jorge Andres Tello Dagua, Jhon Jairo Phan, Trong Van Nguyen, Hang Hong Nguyen Hoa Nguyen Reymondin, Louis Sample Earth: Machine-Learning–Ready Land-Cover Reference Dataset
title	Sample Earth: Machine-Learning–Ready Land-Cover Reference Dataset
title_full	Sample Earth: Machine-Learning–Ready Land-Cover Reference Dataset
title_fullStr	Sample Earth: Machine-Learning–Ready Land-Cover Reference Dataset
title_full_unstemmed	Sample Earth: Machine-Learning–Ready Land-Cover Reference Dataset
title_short	Sample Earth: Machine-Learning–Ready Land-Cover Reference Dataset
title_sort	sample earth machine learning ready land cover reference dataset
topic	artificial intelligence reference points land classification
url	https://hdl.handle.net/10568/178010
work_keys_str_mv	AT vantalonthibaud sampleearthmachinelearningreadylandcoverreferencedataset AT luongphuongthi sampleearthmachinelearningreadylandcoverreferencedataset AT perezescobarjorgeandres sampleearthmachinelearningreadylandcoverreferencedataset AT tellodaguajhonjairo sampleearthmachinelearningreadylandcoverreferencedataset AT phantrongvan sampleearthmachinelearningreadylandcoverreferencedataset AT nguyenhang sampleearthmachinelearningreadylandcoverreferencedataset AT hongnguyen sampleearthmachinelearningreadylandcoverreferencedataset AT hoanguyen sampleearthmachinelearningreadylandcoverreferencedataset AT reymondinlouis sampleearthmachinelearningreadylandcoverreferencedataset

Sample Earth: Machine-Learning–Ready Land-Cover Reference Dataset

Similar Items