Statistics Canada Name Coding
The modified Statistics Canada name coding procedure
statcan(word, maxCodeLen = 4, clean = TRUE)
word |
string or vector of strings to encode |
maxCodeLen |
maximum length of the resulting encodings, in characters |
clean |
if |
The variable word is the name to be encoded. The variable
maxCodeLen is the limit on how long the returned name code
should be. The default is 4.
The statcan algorithm is only defined for inputs over the
standard French alphabet. Non-alphabetical characters are removed
from the string in a locale-dependent fashion. This strips spaces,
hyphens, and numbers. Other letters, such as "Ü," may be permissible
in the current locale but are unknown to statcan. For inputs
outside of its known range, the output is undefined and NA is
returned and a warning this thrown. If clean is
FALSE, statcan attempts to process the strings. The
default is TRUE.
the Statistics Canada encoded character vector
James P. Howard, II, "Phonetic Spelling Algorithm Implementations for R," Journal of Statistical Software, vol. 25, no. 8, (2020), p. 1–21, <10.18637/jss.v095.i08>.
Billy T. Lynch and William L. Arends. "Selection of surname coding procedure for the SRS record linkage system." United States Department of Agriculture, Sample Survey Research Branch, Research Division, Washington, 1977.
statcan("William")
statcan(c("Peter", "Peady"))
statcan("Stevenson", maxCodeLen = 8)Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.