CoordinateCleaner: cc_dupl – R documentation

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

cc_dupl

Identify Duplicated Records

Description

Removes or flags duplicated records based on species name and coordinates, as well as user-defined additional columns. True (specimen) duplicates or duplicates from the same species can make up the bulk of records in a biological collection database, but are undesirable for many analyses. Both can be flagged with this function, the former given enough additional information.

Usage

cc_dupl(
  x,
  lon = "decimallongitude",
  lat = "decimallatitude",
  species = "species",
  additions = NULL,
  value = "clean",
  verbose = TRUE
)

Arguments

`x`	data.frame. Containing geographical coordinates and species names.
`lon`	character string. The column with the longitude coordinates. Default = “decimallongitude”.
`lat`	character string. The column with the latitude coordinates. Default = “decimallatitude”.
`species`	a character string. The column with the species name. Default = “species”.
`additions`	a vector of character strings. Additional columns to be included in the test for duplication. For example as below, collector name and collector number.
`value`	character string. Defining the output value. See value.
`verbose`	logical. If TRUE reports the name of the test and the number of records flagged.

Value

Depending on the ‘value’ argument, either a data.frame containing the records considered correct by the test (“clean”) or a logical vector (“flagged”), with TRUE = test passed and FALSE = test failed/potentially problematic . Default = “clean”.

Examples

x <- data.frame(species = letters[1:10], 
                decimallongitude = sample(x = 0:10, size = 100, replace = TRUE), 
                decimallatitude = sample(x = 0:10, size = 100, replace = TRUE),
                collector = "Bonpl",
                collector.number = c(1001, 354),
                collection = rep(c("K", "WAG","FR", "P", "S"), 20))

cc_dupl(x, value = "flagged")
cc_dupl(x, additions = c("collector", "collector.number"))

CoordinateCleaner

Automated Cleaning of Occurrence Records from Biological Collections

v2.0-18

GPL-3

Authors

Alexander Zizka [aut, cre], Daniele Silvestro [ctb], Tobias Andermann [ctb], Josue Azevedo [ctb], Camila Duarte Ritter [ctb], Daniel Edler [ctb], Harith Farooq [ctb], Andrei Herdean [ctb], Maria Ariza [ctb], Ruud Scharn [ctb], Sten Svanteson [ctb], Niklas Wengstrom [ctb], Vera Zizka [ctb], Alexandre Antonelli [ctb], Irene Steves [rev] (Irene reviewed the package for ropensci, see <https://github.com/ropensci/onboarding/issues/210>), Francisco Rodriguez-Sanchez [rev] (Francisco reviewed the package for ropensci, see <https://github.com/ropensci/onboarding/issues/210>)

Initial release