Removes redundant words from beginnings of character strings
In Qualtrix data, we sometimes find repeated words in column names. For whatever reason, the variable names have repeated words like "Philadelphia_Philadelphia_3". This function changes a vector c("Philadelphia_Philadelphia_3", "Denver_Denver_4") to c("Philadelphia_3", "Denver_4"). It is non destructive, so that other values will not be altered.
deduper(x, sep = ",_\\s-", n = NULL)
x |
Character vector |
sep |
Delimiter. A regular expression indicating the point at which to split the strings before checking for duplicates. Default will look for repeat separated by comma, underscore, or one space character. |
n |
Limit on number of duplicates to remove. Default, NULL, means delete all duplicates at the beginning of a string. |
Cleaned up vector.
Paul Johnson <pauljohn@ku.edu>
x <- c("Philadelphia_Philadelphia_3", "Denver_Denver_4", "Den_Den_Den_Den_Den_Den_Den_5") deduper(x) deduper(x, n = 2) deduper(x, n = 3) deduper(x, n = 4) x <- c("Philadelphia,Philadelphia_3", "Denver Denver_4") ## Shows comma also detected by default deduper(x) ## Works even if delimiter is inside matched string, ## or separators vary x <- c("Den_5_Den_5_Den_5,Den_5 Den_5") deduper(x) ## generate vector x <- replicate(10, paste(sample(letters, 5), collapse = "")) n <- c(paste0("_", sample(1:10, 5)), rep("", 5)) x <- paste0(x, "_", x, n, n) x deduper(x)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.