Identify Duplicated Records
Removes or flags duplicated records based on species name and coordinates, as well as user-defined additional columns. True (specimen) duplicates or duplicates from the same species can make up the bulk of records in a biological collection database, but are undesirable for many analyses. Both can be flagged with this function, the former given enough additional information.
cc_dupl( x, lon = "decimallongitude", lat = "decimallatitude", species = "species", additions = NULL, value = "clean", verbose = TRUE )
x |
data.frame. Containing geographical coordinates and species names. |
lon |
character string. The column with the longitude coordinates. Default = “decimallongitude”. |
lat |
character string. The column with the latitude coordinates. Default = “decimallatitude”. |
species |
a character string. The column with the species name. Default = “species”. |
additions |
a vector of character strings. Additional columns to be included in the test for duplication. For example as below, collector name and collector number. |
value |
character string. Defining the output value. See value. |
verbose |
logical. If TRUE reports the name of the test and the number of records flagged. |
Depending on the ‘value’ argument, either a data.frame
containing the records considered correct by the test (“clean”) or a
logical vector (“flagged”), with TRUE = test passed and FALSE = test failed/potentially
problematic . Default = “clean”.
x <- data.frame(species = letters[1:10], decimallongitude = sample(x = 0:10, size = 100, replace = TRUE), decimallatitude = sample(x = 0:10, size = 100, replace = TRUE), collector = "Bonpl", collector.number = c(1001, 354), collection = rep(c("K", "WAG","FR", "P", "S"), 20)) cc_dupl(x, value = "flagged") cc_dupl(x, additions = c("collector", "collector.number"))
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.