Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

SAX

Symbolic Aggregate approXimation


Description

This function converts a numeric times seris into a series of letters with a specific length and alphabet.

Usage

SAX(x, alphabet_size, PAA_number,
breakpoints = "gaussian", collapse = NULL)

Arguments

x

a numeric vector.

alphabet_size

a numeric vector of length 1 setting the size of the alphabet.

PAA_number

a numeric vector of length 1 setting the number of elements (subsequences) of the Piecewise Aggregate Approximation (PAA).

breakpoints

either a character vector ("gaussian", "quantiles") or a numeric vector specifying the sorted values of the breakpoints along the distribution of x. See details and examples.

collapse

a character vector of length 1, specifying the way to collapse the output letters, see paste. By default letters are returned separated.

Details

The SAX method has been developed to reduce the dimensionality of a numerical series into a short chain of characters. SAX follows a two-step process: (1) Piecewise Aggregate Approximation (PAA) and (2) conversion a PAA sequence into a series of letters.

PAA consists in a Z-normalisation, a segmentation of the series of length n into w segments, and the computation of each segment average.

The conversion of the PAA into a series of letters is achieved by attributing with equiprobability each value of the PAA to a letter in reference to a Gaussian distribution. This process therefore assumes that the distribution of the numeric series x follows a Gaussian distribution. To relax the constraints of normality we here added the possibility to directly work on the quantiles of the original data distribution or to specify particular breakpoints along the distribution of x. See the examples.

Value

A character vector of length (when collapse is NULL) or number of character (when collapse is not NULL) corresponding to PAA_number argument.

Note

SAX has been used recently to search similar times series in a soundcape data base (Kasten et al., 2012).

Author(s)

Laurent Lellouch. An improvement added by Pavel Senin.

References

Kasten, E.P., Gage, S.H., Fox, J. & Joo, W. (2012). The remote environmental assessment laboratory's acoustic library: an archive for studying soundscape ecology. Ecological Informatics, 12, 50 - 67.

Lin, J., Keogh, E., Lonardi, S., Chiu, B., June (2003). A symbolic representation of time series with implications for streaming algorithms. Proceedings of the 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery. San Diego, California, USA.

See Also

Examples

data(tico)
spec <- soundscapespec(tico, plot=FALSE)[,2]
SAX(spec, alphabet = 5, PAA = 10)

# change breakpoints
SAX(spec,  alphabet = 5, PAA = 10, breakpoints="quantiles")
SAX(spec,  alphabet = 5, PAA = 10, breakpoints=c(0, 0.5, 0.75, 1))
SAX(spec,  alphabet = 5, PAA = 10, breakpoints=c(0, 0.33, 0.66, 1))

# different output formats
SAX(spec,  alphabet = 5, PAA = 10, collapse="")
SAX(spec,  alphabet = 5, PAA = 10, collapse="-")

seewave

Sound Analysis and Synthesis

v2.1.6
GPL (>= 2)
Authors
Jerome Sueur <sueur@mnhn.fr> [cre, au], Thierry Aubin [au], Caroline Simonis [au], Laurent Lellouch [main ctrb], Ethan C. Brown [ctrb], Marion Depraetere [ctrb], Camille Desjonqueres [ctrb], Francois Fabianek [ctrb], Amandine Gasc [ctrb], Eric Kasten [ctrb], Stefanie LaZerte [ctrb], Jonathan Lees [ctrb], Jean Marchal [ctrb], Andre Mikulec [ctrb], Sandrine Pavoine [ctrb], David Pinaud [ctrb], Alicia Stotz [ctrb], Luis J. Villanueva-Rivera [ctrb], Zev Ross [ctrb], Carl G. Witthoft [ctrb], Hristo Zhivomirov [ctrb].
Initial release
2020-06-28

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.