| Title: | Precision Agriculture Data Analysis |
|---|---|
| Description: | Precision agriculture spatial data depuration and homogeneous zones (management zone) delineation. The package includes functions that performs protocols for data cleaning management zone delineation and zone comparison; protocols are described in Paccioretti et al., (2020) <doi:10.1016/j.compag.2020.105556>. |
| Authors: | Pablo Paccioretti [aut, cre, cph], Mariano Córdoba [aut], Franca Giannini-Kurina [aut], Mónica Balzarini [aut] |
| Maintainer: | Pablo Paccioretti <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 1.0.2.9000 |
| Built: | 2026-05-20 09:47:29 UTC |
| Source: | https://github.com/ppaccioretti/paar |
A dataset containing Barley grain yield using calibrated commercial yield monitors mounted on combines equipped with DGPS.
barleybarley
A data frame with 7395 rows and 3 variables:
X coordinate, in meters
Y coordinate, in meters
grain yield, in ton per hectare
Coordinate reference system is "WGS 84 / UTM zone 20S", epsg:32720
Bind outlier condition to an object.
## S3 method for class 'paar' cbind(..., deparse.level = 1)## S3 method for class 'paar' cbind(..., deparse.level = 1)
... |
objects to bind. |
deparse.level |
integer controlling the construction of labels in
the case of non-matrix-like arguments (for the default method): |
cbind called with m.
Compares variable means across spatial zones using a spatially-adjusted least significant difference (LSD) approach based on kriging variance.
The function accounts for spatial variability by estimating semivariograms and deriving a spatial variance component, which is then used to assess differences between zone means.
compare_zone( data, variable, zonesCol, alpha = 0.05, join = sf::st_nearest_feature, returnLSD = FALSE, grid_dim )compare_zone( data, variable, zonesCol, alpha = 0.05, join = sf::st_nearest_feature, returnLSD = FALSE, grid_dim )
data |
an |
variable |
either:
|
zonesCol |
|
alpha |
|
join |
function used in |
returnLSD |
|
grid_dim |
|
When variable is an external sf object, values are interpolated
using ordinary kriging before comparison. Otherwise, cross-validation of the
variogram model is used to estimate spatial variance.
Pairwise comparisons between zones are evaluated using a spatially-adjusted LSD criterion:
where is derived from kriging variance.
Results are presented using compact letter displays to indicate groups of zones that are not significantly different.
A list with:
list of data frames with mean comparisons per variable
data frame with descriptive statistics and spatial variance
Paccioretti, P., Córdoba, M., & Balzarini, M. (2020). FastMapping: Software to create field maps and identify management zones in precision agriculture. Computers and Electronics in Agriculture, 175, 105556. doi:10.1016/j.compag.2020.105556
library(sf) data(wheat, package = "paar") ##Convert to an sf object wheat <- sf::st_as_sf(wheat, coords = c("x", "y"), crs = 32720) clusters <- paar::kmspc( wheat, variables = c('CE30', 'CE90', 'Elev', 'Pe', 'Tg'), number_cluster = 3:4 ) data_clusters <- cbind(wheat, clusters$cluster) compare_zone(data_clusters, "Elev", "Cluster_3")library(sf) data(wheat, package = "paar") ##Convert to an sf object wheat <- sf::st_as_sf(wheat, coords = c("x", "y"), crs = 32720) clusters <- paar::kmspc( wheat, variables = c('CE30', 'CE90', 'Elev', 'Pe', 'Tg'), number_cluster = 3:4 ) data_clusters <- cbind(wheat, clusters$cluster) compare_zone(data_clusters, "Elev", "Cluster_3")
Filters spatial point data by removing erroneous observations based on geometric, statistical, and spatial criteria. The function implements a sequential depuration workflow commonly used in precision agriculture.
depurate( x, y, toremove = c("edges", "outlier", "inlier"), crs = NULL, buffer = -10, ylimitmax = NA, ylimitmin = 0, sdout = 3, ldist = 0, udist = 40, criteria = c("LM", "MP"), zero.policy = NULL, poly_border = NULL )depurate( x, y, toremove = c("edges", "outlier", "inlier"), crs = NULL, buffer = -10, ylimitmax = NA, ylimitmin = 0, sdout = 3, ldist = 0, udist = 40, criteria = c("LM", "MP"), zero.policy = NULL, poly_border = NULL )
x |
An |
y |
A |
toremove |
A |
crs |
Coordinate reference system used when transforming longitude/latitude data. Can be an EPSG code or proj4string. |
buffer |
A |
ylimitmax |
Numeric upper bound for |
ylimitmin |
Numeric lower bound for |
sdout |
Numeric multiplier for standard deviation used to detect global outliers. |
ldist |
Numeric lower distance bound for neighborhood definition. |
udist |
Numeric upper distance bound for neighborhood definition. |
criteria |
Character vector specifying spatial outlier detection
methods: |
zero.policy |
Logical. If |
poly_border |
Optional |
The depuration process is applied in a fixed sequence:
Edge removal ("edges")
Global outlier removal ("outlier")
Spatial outlier removal ("inlier")
The toremove argument controls which of these steps are applied,
but **does not modify the order of execution**.
Available procedures are:
Removes points located within a specified buffer distance from
the field boundary. The boundary is computed using a concave hull
(concaveman) or a convex hull if the package is not available.
Removes global outliers based on:
user-defined limits (ylimitmin, ylimitmax)
statistical thresholds defined as
Identifies and removes spatial outliers using:
Local Moran's I statistic ("LM")
Moran scatterplot influence ("MP")
Default parameter values are tuned for precision agriculture datasets (e.g., yield maps).
An object of class paar (list) with:
Filtered sf object
Character vector indicating the reason each observation
was removed (or NA if retained)
Vega, A., Córdoba, M., Castro-Franco, M. et al. (2019). Protocol for automating error removal from yield maps. Precision Agriculture, 20, 1030–1044. doi:10.1007/s11119-018-09632-8
library(sf) data(barley, package = 'paar') #Convert to an sf object barley <- st_as_sf(barley, coords = c("X", "Y"), crs = 32720) depurated <- depurate(barley, "Yield") # Summary of depurated data summary(depurated) # Keep only depurate data depurated_data <- depurated$depurated_data # Combine the condition for all data all_data_condition <- cbind(depurated, barley)library(sf) data(barley, package = 'paar') #Convert to an sf object barley <- st_as_sf(barley, coords = c("X", "Y"), crs = 32720) depurated <- depurate(barley, "Yield") # Summary of depurated data summary(depurated) # Keep only depurate data depurated_data <- depurated$depurated_data # Combine the condition for all data all_data_condition <- cbind(depurated, barley)
Performs fuzzy k-means clustering on tabular data (non-spatial).
This function is a lightweight wrapper around e1071::cmeans,
providing a vectorized workflow and clustering quality indices.
It is primarily intended as a fallback method when spatial clustering
(e.g., kmspc) cannot be applied, such as when only one variable
is available.
fuzzy_k_means( data, variables, number_cluster = 3:5, fuzzyness = 1.2, distance = "euclidean" )fuzzy_k_means( data, variables, number_cluster = 3:5, fuzzyness = 1.2, distance = "euclidean" )
data |
an |
variables |
|
number_cluster |
|
fuzzyness |
|
distance |
|
Missing values are removed prior to clustering. Observations with missing
values are reintroduced in the output with NA cluster assignments.
Clustering is performed for each value in number_cluster, and
several indices are returned to assist in selecting the optimal number
of clusters:
Xie-Beni index
Partition coefficient
Partition entropy
Summary index
A list with:
data.frame with cluster assignments for each
evaluated number of clusters
data.frame with clustering validity indices
data.frame with clustering metrics
library(sf) data(wheat, package = 'paar') # Transform the data.frame into a sf object wheat_sf <- st_as_sf(wheat, coords = c('x', 'y'), crs = 32720) # Run the fuzzy_k_means function fuzzy_k_means_results <- fuzzy_k_means( wheat_sf, variables = 'Tg', number_cluster = 2:4 ) # Print the summaryResults fuzzy_k_means_results$summaryResults # Print the indices fuzzy_k_means_results$indices # Print the cluster head(fuzzy_k_means_results$cluster, 5) # Combine the results in a single object wheat_clustered <- cbind(wheat_sf, fuzzy_k_means_results$cluster) # Plot the results plot(wheat_clustered[, "Cluster_2"])library(sf) data(wheat, package = 'paar') # Transform the data.frame into a sf object wheat_sf <- st_as_sf(wheat, coords = c('x', 'y'), crs = 32720) # Run the fuzzy_k_means function fuzzy_k_means_results <- fuzzy_k_means( wheat_sf, variables = 'Tg', number_cluster = 2:4 ) # Print the summaryResults fuzzy_k_means_results$summaryResults # Print the indices fuzzy_k_means_results$indices # Print the cluster head(fuzzy_k_means_results$cluster, 5) # Combine the results in a single object wheat_clustered <- cbind(wheat_sf, fuzzy_k_means_results$cluster) # Plot the results plot(wheat_clustered[, "Cluster_2"])
Performs clustering of spatial data using a combination of spatial Principal Component Analysis (PCA), and fuzzy k-means clustering.
The workflow consists of:
Dimensionality reduction using spatial PCA
Selection of components based on explained spatial variance
Fuzzy clustering over selected components
kmspc( data, variables, number_cluster = 3:5, explainedVariance = 70, ldist = 0, udist = 40, center = TRUE, fuzzyness = 1.2, distance = "euclidean", zero.policy = FALSE, only_spca_results = TRUE, all_results = FALSE )kmspc( data, variables, number_cluster = 3:5, explainedVariance = 70, ldist = 0, udist = 40, center = TRUE, fuzzyness = 1.2, distance = "euclidean", zero.policy = FALSE, only_spca_results = TRUE, all_results = FALSE )
data |
an |
variables |
|
number_cluster |
|
explainedVariance |
|
ldist, udist
|
|
center |
centering option passed to PCA:
|
fuzzyness |
|
distance |
|
zero.policy |
Logical. If |
only_spca_results |
|
all_results |
|
Spatial relationships are defined using distance-based neighbors
(spdep::dnearneigh). These relationships are incorporated into the
spatial PCA analysis to extract spatially structured components.
Clustering is performed using fuzzy c-means over selected spatial components. Several indices are computed to help determine the optimal number of clusters:
Xie-Beni index
Partition coefficient
Partition entropy
Summary index (normalized combination)
A list with the following elements:
data.frame with cluster assignments for each evaluated number of clusters
data.frame with clustering validity indices
data.frame with clustering metrics (iterations, SSDW)
(optional) PCA and/or spatial PCA summaries depending on arguments
library(sf) data(wheat, package = 'paar') # Transform the data.frame into a sf object wheat_sf <- st_as_sf(wheat, coords = c('x', 'y'), crs = 32720) # Run the kmspc function kmspc_results <- kmspc(wheat_sf, number_cluster = 2:4) # Print the summaryResults kmspc_results$summaryResults # Print the indices kmspc_results$indices # Print the cluster head(kmspc_results$cluster, 5) # Combine the results in a single object wheat_clustered <- cbind(wheat_sf, kmspc_results$cluster) # Plot the results plot(wheat_clustered[, "Cluster_2"])library(sf) data(wheat, package = 'paar') # Transform the data.frame into a sf object wheat_sf <- st_as_sf(wheat, coords = c('x', 'y'), crs = 32720) # Run the kmspc function kmspc_results <- kmspc(wheat_sf, number_cluster = 2:4) # Print the summaryResults kmspc_results$summaryResults # Print the indices kmspc_results$indices # Print the cluster head(kmspc_results$cluster, 5) # Combine the results in a single object wheat_clustered <- cbind(wheat_sf, kmspc_results$cluster) # Plot the results plot(wheat_clustered[, "Cluster_2"])
Print paar objects
## S3 method for class 'paar' print(x, n = 3, ...)## S3 method for class 'paar' print(x, n = 3, ...)
x |
an object used to select a method. |
n |
an integer vector specifying maximum number of rows or elements to print. |
... |
further arguments passed to or from other methods. |
invisible object x
Print summarized paar object
## S3 method for class 'summary.paar' print(x, digits, ...)## S3 method for class 'summary.paar' print(x, digits, ...)
x |
an object used to select a method. |
digits |
minimal number of significant digits, see
|
... |
further arguments passed to or from other methods. |
A data.frame with the summarized condition of the object.
Performs a modified t-test to assess the correlation between variables
while accounting for spatial autocorrelation. This implementation wraps
SpatialPack::modified.ttest.
spatial_t_test(data, variables)spatial_t_test(data, variables)
data |
An |
variables |
A |
The function computes pairwise correlations between the specified variables
and adjusts the significance test to account for spatial dependence using
coordinates. If data is an sf object, coordinates are extracted
automatically. Otherwise, coordinates must be provided as an object with two
columns.
A data.frame with the following columns:
Name of the first variable
Name of the second variable
Estimated correlation coefficient
P-value adjusted for spatial autocorrelation
if (requireNamespace("SpatialPack", quietly = TRUE)) { library(sf) data(wheat, package = 'paar') # Transform the data.frame into a sf object wheat_sf <- st_as_sf(wheat, coords = c('x', 'y'), crs = 32720) # Run spatial t test t_test_results <- spatial_t_test( wheat_sf, variables = c('CE30', 'CE90') ) # Print the t_test_results t_test_results }if (requireNamespace("SpatialPack", quietly = TRUE)) { library(sf) data(wheat, package = 'paar') # Transform the data.frame into a sf object wheat_sf <- st_as_sf(wheat, coords = c('x', 'y'), crs = 32720) # Run spatial t test t_test_results <- spatial_t_test( wheat_sf, variables = c('CE30', 'CE90') ) # Print the t_test_results t_test_results }
Summarizing paar objects
## S3 method for class 'paar' summary(object, ...)## S3 method for class 'paar' summary(object, ...)
object |
an object for which a summary is desired. |
... |
additional arguments affecting the summary produced. |
An object of class summary.paar (data.frame) with the following columns:
condition a character vector with the final condition.
n a numeric vector with the number of rows for each condition.
percentage a numeric vector with the percentage of rows for each condition.
A database from a wheat (Triticum aestivum L.) production field (60 ha) under continuous agriculture, located in south-eastern Pampas, Argentina.
wheatwheat
A data frame with 5982 rows and 7 variables:
X coordinate, in meters
Y coordinate, in meters
apparent electrical conductivity taken at 0–30 cm
apparent electrical conductivity taken at 0–90 cm
elevation, in meters
soil depth, in centimeters
wheat grain yield
Coordinate reference system is "WGS 84 / UTM zone 20S", epsg:32720 Wheat grain yield was recorded in 2009 using calibrated commercial yield monitors mounted on combines equipped with DGPS. Soil ECa measurements were taken using Veris 3100 (VERIS technologies enr., Salina, KS, USA). Soil depth was measured using a hydraulic penetrometer on a 30 × 30 m regular grid (Peralta et al., 2015). Re-gridding was performed to obtain values of all variables at each intersection point of a 10 × 10 m grid.
N.R. Peralta, J.L. Costa, M. Balzarini, M. Castro Franco, M. Córdoba, D. Bullock Delineation of management zones to improve nitrogen management of wheat Comput. Electron. Agric., 110 (2015), pp. 103-113, 10.1016/j.compag.2014.10.017
Paccioretti, P., Córdoba, M., & Balzarini, M. (2020). FastMapping: Software to create field maps and identify management zones in precision agriculture. Computers and Electronics in Agriculture, 175, 105556.