Prediction Analysis of Categorical Data
Performs a Prediction Analysis of Categorical Data.
pamCat(data, cl, theta = NULL, n.theta = 10, newdata = NULL, newcl = NULL)
data |
a numeric matrix composed of the integers between 1 and n.cat,
where n.cat is the number of levels each of the variables represented
by the rows of |
cl |
a numeric vector of length |
theta |
a numeric vector consisting of the strictly positive values of the shrinkage parameter used
in the Prediction Analysis. If |
n.theta |
an integer specifying the number of values for the shrinkage parameter of the
Prediction Analysis. Ignored if |
newdata |
a numeric matrix composed of the integers between 1 and n.cat.
Must have the same number of rows as |
newcl |
a numeric vector of length |
An object of class pamCat
composed of
mat.chisq |
a matrix with m rows and n.cl columns consisting of the classwise values of Pearson's ChiSquare statistic for each of the m variables. |
mat.obs |
a matrix with m rows and n.cat * n.cl columns
in which each row shows a contingency table between the corresponding variable and |
mat.exp |
a matrix of the same size as |
mat.theta |
a data frame consisting of the numbers of variables used in the classification
of the observations in |
tab.cl |
a table summarizing the values of the response, i.e.\ the class labels. |
n.cat |
n.cat. |
Holger Schwender, holger.schwender@udo.edu
Schwender, H.\ (2007). Statistical Analysis of Genotype and Gene Expression Data. Dissertation, Department of Statistics, University of Dortmund.
## Not run: # Generate a data set consisting of 2000 rows (variables) and 50 columns. # Assume that the first 25 observations belong to class 1, and the other # 50 observations to class 2. mat <- matrix(sample(3, 100000, TRUE), 2000) rownames(mat) <- paste("SNP", 1:2000, sep = "") cl <- rep(1:2, e = 25) # Apply PAM for categorical data to this matrix, and compute the # misclassification rate on the training set, i.e. on mat. pam.out <- pamCat(mat, cl) pam.out # Now generate a new data set consisting of 20 observations, # and predict the classes of these observations using the # value of theta that has led to the smallest misclassification # rate in pam.out. mat2 <- matrix(sample(3, 40000, TRUE), 2000) rownames(mat2) <- paste("SNP", 1:2000, sep = "") predict(pam.out, mat2) # Let's assume that the predicted classes are the real classes # of the observations. Then, mat2 can also be used in pamCat # to compute the misclassification rate. cl2 <- predict(pam.out, mat2) pamCat(mat, cl, newdata = mat2, newcl = cl2) ## End(Not run)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.