Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

cd_round

Identify Datasets with Rasterized Coordinates


Description

Flags datasets with periodicity patterns indicative of a rasterized (lattice) collection scheme, as often obtain from e.g. atlas data. Using a combination of autocorrelation and sliding-window outlier detection to identify periodicity patterns in the data. See https://besjournals.onlinelibrary.wiley.com/doi/full/10.1111/2041-210X.13152 for further details and a description of the algorithm

Usage

cd_round(
  x,
  lon = "decimallongitude",
  lat = "decimallatitude",
  ds = "dataset",
  T1 = 7,
  reg_out_thresh = 2,
  reg_dist_min = 0.1,
  reg_dist_max = 2,
  min_unique_ds_size = 4,
  graphs = TRUE,
  test = "both",
  value = "clean",
  verbose = TRUE
)

Arguments

x

data.frame. Containing geographical coordinates and species names.

lon

character string. The column with the longitude coordinates. Default = “decimallongitude”.

lat

character string. The column with the latitude coordinates. Default = “decimallatitude”.

ds

a character string. The column with the dataset of each record. In case x should be treated as a single dataset, identical for all records. Default = “dataset”.

T1

numeric. The threshold for outlier detection in a in an interquantile range based test. This is the major parameter to specify the sensitivity of the test: lower values, equal higher detection rate. Values between 7-11 are recommended. Default = 7.

reg_out_thresh

numeric. Threshold on the number of equal distances between outlier points. See details. Default = 2.

reg_dist_min

numeric. The minimum detection distance between outliers in degrees (the minimum resolution of grids that will be flagged). Default = 0.1.

reg_dist_max

numeric. The maximum detection distance between outliers in degrees (the maximum resolution of grids that will be flagged). Default = 2.

min_unique_ds_size

numeric. The minimum number of unique locations (values in the tested column) for datasets to be included in the test. Default = 4.

graphs

logical. If TRUE, diagnostic plots are produced. Default = TRUE.

test

character string. Indicates which column to test. Either “lat” for latitude, “lon” for longitude, or “both” for both. In the latter case datasets are only flagged if both test are failed. Default = “both”

value

character string. Defining the output value. See value.

verbose

logical. If TRUE reports the name of the test and the number of records flagged.

Value

Depending on the ‘value’ argument, either a data.frame with summary statistics and flags for each dataset (“dataset”) or a data.frame containing the records considered correct by the test (“clean”) or a logical vector (“flagged”), with TRUE = test passed and FALSE = test failed/potentially problematic. Default = “clean”.

Note

See https://ropensci.github.io/CoordinateCleaner/ for more details and tutorials.

See Also

Other Datasets: cd_ddmm()

Examples

#simulate bias grid, one degree resolution, 10% error on a 1000 records dataset
#simulate biased fraction of the data, grid resolution = 1 degree
#simulate non-biased fraction of the data
  bi <- sample(3 + 0:5, size = 100, replace = TRUE)
  mu <- runif(3, 0, 15)
  sig <- runif(3, 0.1, 5)
  cl <- rnorm(n = 900, mean = mu, sd = sig)
  lon <- c(cl, bi)
  
  bi <- sample(9:13, size = 100, replace = TRUE)
  mu <- runif(3, 0, 15)
  sig <- runif(3, 0.1, 5)
  cl <- rnorm(n = 900, mean = mu, sd = sig)
  lat <- c(cl, bi)
  
  #add biased data
  
  inp <- data.frame(decimallongitude = lon,
                    decimallatitude = lat,
                    dataset = "test")
            
          
  #run test
  ## Not run: 
  cd_round(inp, value = "dataset")
  
## End(Not run)

CoordinateCleaner

Automated Cleaning of Occurrence Records from Biological Collections

v2.0-18
GPL-3
Authors
Alexander Zizka [aut, cre], Daniele Silvestro [ctb], Tobias Andermann [ctb], Josue Azevedo [ctb], Camila Duarte Ritter [ctb], Daniel Edler [ctb], Harith Farooq [ctb], Andrei Herdean [ctb], Maria Ariza [ctb], Ruud Scharn [ctb], Sten Svanteson [ctb], Niklas Wengstrom [ctb], Vera Zizka [ctb], Alexandre Antonelli [ctb], Irene Steves [rev] (Irene reviewed the package for ropensci, see <https://github.com/ropensci/onboarding/issues/210>), Francisco Rodriguez-Sanchez [rev] (Francisco reviewed the package for ropensci, see <https://github.com/ropensci/onboarding/issues/210>)
Initial release

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.