Sample Earth: Machine-Learning–Ready Land-Cover Reference Dataset
This dataset is part of the Sample Earth initiative, a global effort to build open, high-quality reference data for improving the accuracy and inclusiveness of land-cover maps. It contains GPS-located land-cover samples that can be used to train and validate AI models that generate detailed, accurat...
| Main Authors: | , , , , , , , , |
|---|---|
| Format: | Conjunto de datos |
| Language: | Inglés |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://hdl.handle.net/10568/178010 |
| _version_ | 1855526919713325056 |
|---|---|
| author | Vantalon, Thibaud Luong, Phuong Thi Perez Escobar, Jorge Andres Tello Dagua, Jhon Jairo Phan, Trong Van Nguyen, Hang Hong Nguyen Hoa Nguyen Reymondin, Louis |
| author_browse | Hoa Nguyen Hong Nguyen Luong, Phuong Thi Nguyen, Hang Perez Escobar, Jorge Andres Phan, Trong Van Reymondin, Louis Tello Dagua, Jhon Jairo Vantalon, Thibaud |
| author_facet | Vantalon, Thibaud Luong, Phuong Thi Perez Escobar, Jorge Andres Tello Dagua, Jhon Jairo Phan, Trong Van Nguyen, Hang Hong Nguyen Hoa Nguyen Reymondin, Louis |
| author_sort | Vantalon, Thibaud |
| collection | Repository of Agricultural Research Outputs (CGSpace) |
| description | This dataset is part of the Sample Earth initiative, a global effort to build open, high-quality reference data for improving the accuracy and inclusiveness of land-cover maps. It contains GPS-located land-cover samples that can be used to train and validate AI models that generate detailed, accurate maps, with a focus on coffee and cocoa production systems.
The data were collected across Vietnam and Ghana, combining expert interpretation of high-resolution satellite imagery (Google Earth, Planet) with a smaller subset of ground-truth observations. Each point is labeled and quality-controlled to represent a diverse range of land-cover types commonly found within and around smallholder production areas. The classification scheme includes 10 main classes (such as coffee, cocoa, orchard, natural forests) and 68 sub-classes (such as full sun coffee, coffee intercropped with black pepper,
While the primary goal is to distinguish coffee and cocoa systems from other land uses, the dataset also supports broader applications such as agricultural monitoring, deforestation analysis, ecosystem service mapping, land-use planning, and suitability modeling.
By providing transparent, well-validated training data, this dataset contributes to Sample Earth’s broader objective: strengthening AI-based land monitoring tools and supporting global efforts, including the EU Deforestation Regulation (EUDR), to ensure sustainable, deforestation-free agricultural supply chains.
The dataset is designed to grow continuously, incorporating new commodities, timeframes, and countries over time.
Methodology:The dataset was developed primarily through expert visual interpretation of high-resolution satellite imagery from Google Earth and Planet, collected between 2019 and 2022. A smaller subset of points in the Central Highlands of Vietnam was derived from field observations, providing additional ground-truth validation.
To enhance interpreter accuracy and contextual understanding, field visits and Google Street View assessments were conducted in both Vietnam and Ghana. These activities helped experts better recognize local land-use patterns and distinguish among different crop and landscape types.
All sample points were digitized and standardized using QGIS, with attributes including class ID, crop type, sampling date, and associated metadata to ensure consistency and interoperability.
This combined approach of expert interpretation, localized training, and structured data management ensured a high-quality, consistent, and machine-learning–ready dataset suitable for land-cover mapping and model training workflows. |
| format | Conjunto de datos |
| id | CGSpace178010 |
| institution | CGIAR Consortium |
| language | Inglés |
| publishDate | 2025 |
| publishDateRange | 2025 |
| publishDateSort | 2025 |
| record_format | dspace |
| spelling | CGSpace1780102025-11-18T22:32:11Z Sample Earth: Machine-Learning–Ready Land-Cover Reference Dataset Vantalon, Thibaud Luong, Phuong Thi Perez Escobar, Jorge Andres Tello Dagua, Jhon Jairo Phan, Trong Van Nguyen, Hang Hong Nguyen Hoa Nguyen Reymondin, Louis artificial intelligence reference points land classification This dataset is part of the Sample Earth initiative, a global effort to build open, high-quality reference data for improving the accuracy and inclusiveness of land-cover maps. It contains GPS-located land-cover samples that can be used to train and validate AI models that generate detailed, accurate maps, with a focus on coffee and cocoa production systems. The data were collected across Vietnam and Ghana, combining expert interpretation of high-resolution satellite imagery (Google Earth, Planet) with a smaller subset of ground-truth observations. Each point is labeled and quality-controlled to represent a diverse range of land-cover types commonly found within and around smallholder production areas. The classification scheme includes 10 main classes (such as coffee, cocoa, orchard, natural forests) and 68 sub-classes (such as full sun coffee, coffee intercropped with black pepper, While the primary goal is to distinguish coffee and cocoa systems from other land uses, the dataset also supports broader applications such as agricultural monitoring, deforestation analysis, ecosystem service mapping, land-use planning, and suitability modeling. By providing transparent, well-validated training data, this dataset contributes to Sample Earth’s broader objective: strengthening AI-based land monitoring tools and supporting global efforts, including the EU Deforestation Regulation (EUDR), to ensure sustainable, deforestation-free agricultural supply chains. The dataset is designed to grow continuously, incorporating new commodities, timeframes, and countries over time. Methodology:The dataset was developed primarily through expert visual interpretation of high-resolution satellite imagery from Google Earth and Planet, collected between 2019 and 2022. A smaller subset of points in the Central Highlands of Vietnam was derived from field observations, providing additional ground-truth validation. To enhance interpreter accuracy and contextual understanding, field visits and Google Street View assessments were conducted in both Vietnam and Ghana. These activities helped experts better recognize local land-use patterns and distinguish among different crop and landscape types. All sample points were digitized and standardized using QGIS, with attributes including class ID, crop type, sampling date, and associated metadata to ensure consistency and interoperability. This combined approach of expert interpretation, localized training, and structured data management ensured a high-quality, consistent, and machine-learning–ready dataset suitable for land-cover mapping and model training workflows. 2025-11 2025-11-18T22:28:40Z 2025-11-18T22:28:40Z Dataset https://hdl.handle.net/10568/178010 en Open Access Vantalon, T.; Luong, P.T.; Perez Escobar, J.A.; Tello Dagua, J.J.; Phan, T.V.; Nguyen, H.; Hong Nguyen; Hoa Nguyen; Reymondin, L. (2025) Sample Earth: Machine-Learning–Ready Land-Cover Reference Dataset. https://doi.org/10.7910/DVN/U7HWY1 |
| spellingShingle | artificial intelligence reference points land classification Vantalon, Thibaud Luong, Phuong Thi Perez Escobar, Jorge Andres Tello Dagua, Jhon Jairo Phan, Trong Van Nguyen, Hang Hong Nguyen Hoa Nguyen Reymondin, Louis Sample Earth: Machine-Learning–Ready Land-Cover Reference Dataset |
| title | Sample Earth: Machine-Learning–Ready Land-Cover Reference Dataset |
| title_full | Sample Earth: Machine-Learning–Ready Land-Cover Reference Dataset |
| title_fullStr | Sample Earth: Machine-Learning–Ready Land-Cover Reference Dataset |
| title_full_unstemmed | Sample Earth: Machine-Learning–Ready Land-Cover Reference Dataset |
| title_short | Sample Earth: Machine-Learning–Ready Land-Cover Reference Dataset |
| title_sort | sample earth machine learning ready land cover reference dataset |
| topic | artificial intelligence reference points land classification |
| url | https://hdl.handle.net/10568/178010 |
| work_keys_str_mv | AT vantalonthibaud sampleearthmachinelearningreadylandcoverreferencedataset AT luongphuongthi sampleearthmachinelearningreadylandcoverreferencedataset AT perezescobarjorgeandres sampleearthmachinelearningreadylandcoverreferencedataset AT tellodaguajhonjairo sampleearthmachinelearningreadylandcoverreferencedataset AT phantrongvan sampleearthmachinelearningreadylandcoverreferencedataset AT nguyenhang sampleearthmachinelearningreadylandcoverreferencedataset AT hongnguyen sampleearthmachinelearningreadylandcoverreferencedataset AT hoanguyen sampleearthmachinelearningreadylandcoverreferencedataset AT reymondinlouis sampleearthmachinelearningreadylandcoverreferencedataset |