Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

cf_range

Identify Fossils with Extreme Age Ranges


Description

Removes or flags records with an unexpectedly large temporal range, based on a quantile outlier test.

Usage

cf_range(
  x,
  lon = "decimallongitude",
  lat = "decimallatitude",
  min_age = "min_ma",
  max_age = "max_ma",
  taxon = "accepted_name",
  method = "quantile",
  mltpl = 5,
  size_thresh = 7,
  max_range = 500,
  uniq_loc = FALSE,
  value = "clean",
  verbose = TRUE
)

Arguments

x

data.frame. Containing fossil records with taxon names, ages, and geographic coordinates.

lon

character string. The column with the longitude coordinates. To identify unique records if uniq_loc = TRUE. Default = “decimallongitude”.

lat

character string. The column with the longitude coordinates. Default = “decimallatitude”. To identify unique records if uniq_loc = T.

min_age

character string. The column with the minimum age. Default = “min_ma”.

max_age

character string. The column with the maximum age. Default = “max_ma”.

taxon

character string. The column with the taxon name. If “”, searches for outliers over the entire dataset, otherwise per specified taxon. Default = “accepted_name”.

method

character string. Defining the method for outlier selection. See details. Either “quantile” or “mad”. Default = “quantile”.

mltpl

numeric. The multiplier of the interquartile range (method == 'quantile') or median absolute deviation (method == 'mad') to identify outliers. See details. Default = 5.

size_thresh

numeric. The minimum number of records needed for a dataset to be tested. Default = 10.

max_range

numeric. A absolute maximum time interval between min age and max age. Only relevant for method = “time”.

uniq_loc

logical. If TRUE only single records per location and time point (and taxon if taxon != "") are used for the outlier testing. Default = T.

value

character string. Defining the output value. See value.

verbose

logical. If TRUE reports the name of the test and the number of records flagged.

Value

Depending on the ‘value’ argument, either a data.frame containing the records considered correct by the test (“clean”) or a logical vector (“flagged”), with TRUE = test passed and FALSE = test failed/potentially problematic . Default = “clean”.

Note

See https://ropensci.github.io/CoordinateCleaner/ for more details and tutorials.

See Also

Other fossils: cf_age(), cf_equal(), cf_outl()

Examples

minages <- runif(n = 11, min = 0.1, max = 25)
x <- data.frame(species = c(letters[1:10], "z"),
                lng = c(runif(n = 9, min = 4, max = 16), 75, 7),
                lat = c(runif(n = 11, min = -5, max = 5)),
                min_ma = minages, 
                max_ma = minages + c(runif(n = 10, min = 0, max = 5), 25))

cf_range(x, value = "flagged", taxon = "")

CoordinateCleaner

Automated Cleaning of Occurrence Records from Biological Collections

v2.0-18
GPL-3
Authors
Alexander Zizka [aut, cre], Daniele Silvestro [ctb], Tobias Andermann [ctb], Josue Azevedo [ctb], Camila Duarte Ritter [ctb], Daniel Edler [ctb], Harith Farooq [ctb], Andrei Herdean [ctb], Maria Ariza [ctb], Ruud Scharn [ctb], Sten Svanteson [ctb], Niklas Wengstrom [ctb], Vera Zizka [ctb], Alexandre Antonelli [ctb], Irene Steves [rev] (Irene reviewed the package for ropensci, see <https://github.com/ropensci/onboarding/issues/210>), Francisco Rodriguez-Sanchez [rev] (Francisco reviewed the package for ropensci, see <https://github.com/ropensci/onboarding/issues/210>)
Initial release

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.