SETRED generic method
SETRED is a variant of the self-training classification method
(selfTraining
) with a different addition mechanism.
The SETRED classifier is initially trained with a
reduced set of labeled examples. Then it is iteratively retrained with its own most
confident predictions over the unlabeled examples. SETRED uses an amending scheme
to avoid the introduction of noisy examples into the enlarged labeled set. For each
iteration, the mislabeled examples are identified using the local information provided
by the neighborhood graph.
setredG( y, D, gen.learner, gen.pred, theta = 0.1, max.iter = 50, perc.full = 0.7 )
y |
A vector with the labels of training instances. In this vector the
unlabeled instances are specified with the value |
D |
A distance matrix between all the training instances. This matrix is used to construct the neighborhood graph. |
gen.learner |
A function for training a supervised base classifier. This function needs two parameters, indexes and cls, where indexes indicates the instances to use and cls specifies the classes of those instances. |
gen.pred |
A function for predicting the probabilities per classes.
This function must be two parameters, model and indexes, where the model
is a classifier trained with |
theta |
Rejection threshold to test the critical region. Default is 0.1. |
max.iter |
Maximum number of iterations to execute the self-labeling process. Default is 50. |
perc.full |
A number between 0 and 1. If the percentage of new labeled examples reaches this value the self-training process is stopped. Default is 0.7. |
SetredG can be helpful in those cases where the method selected as
base classifier needs a learner
and pred
functions with other
specifications. For more information about the general setred method,
please see setred
function. Essentially, setred
function is a wrapper of setredG
function.
A list object of class "setredG" containing:
The final base classifier trained using the enlarged labeled set.
The indexes of the training instances used to
train the model
. These indexes include the initial labeled instances
and the newly labeled instances.
Those indexes are relative to the y
argument.
library(SSLR) library(caret) ## Load Wine data set data(wine) cls <- which(colnames(wine) == "Wine") x <- wine[, - cls] # instances without classes y <- wine[, cls] # the classes x <- scale(x) # scale the attributes ## Prepare data set.seed(20) # Use 50% of instances for training tra.idx <- sample(x = length(y), size = ceiling(length(y) * 0.5)) xtrain <- x[tra.idx,] # training instances ytrain <- y[tra.idx] # classes of training instances # Use 70% of train instances as unlabeled set tra.na.idx <- sample(x = length(tra.idx), size = ceiling(length(tra.idx) * 0.7)) ytrain[tra.na.idx] <- NA # remove class information of unlabeled instances # Use the other 50% of instances for inductive testing tst.idx <- setdiff(1:length(y), tra.idx) xitest <- x[tst.idx,] # testing instances yitest <- y[tst.idx] # classes of testing instances # Compute distances between training instances D <- as.matrix(proxy::dist(x = xtrain, method = "euclidean", by_rows = TRUE)) ## Example: Training from a set of instances with 1-NN (knn3) as base classifier. # Compute distances between training instances D <- as.matrix(proxy::dist(x = xtrain, method = "euclidean", by_rows = TRUE)) ## Example: Training from a set of instances with 1-NN (knn3) as base classifier. gen.learner <- function(indexes, cls) caret::knn3(x = xtrain[indexes,], y = cls, k = 1) gen.pred <- function(model, indexes) predict(model, xtrain[indexes,]) trControl_SETRED1 <- list(D = D, gen.learner = gen.learner, gen.pred = gen.pred) md1 <- train_generic(ytrain, method = "setredG", trControl = trControl_SETRED1) 'md1 <- setredG(y = ytrain, D, gen.learner, gen.pred)' cls1 <- predict(md1$model, xitest, type = "class") table(cls1, yitest) confusionMatrix(cls1, yitest)$overall[1] ## Example: Training from a distance matrix with 1-NN (oneNN) as base classifier gen.learner <- function(indexes, cls) { m <- SSLR::oneNN(y = cls) attr(m, "tra.idxs") <- indexes m } gen.pred <- function(model, indexes) { tra.idxs <- attr(model, "tra.idxs") d <- D[indexes, tra.idxs] prob <- predict(model, d, distance.weighting = "none") prob } trControl_SETRED2 <- list(D = D, gen.learner = gen.learner, gen.pred = gen.pred) md2 <- train_generic(ytrain, method = "setredG", trControl = trControl_SETRED2) ditest <- proxy::dist(x = xitest, y = xtrain[md2$instances.index,], method = "euclidean", by_rows = TRUE) cls2 <- predict(md2$model, ditest, type = "class") table(cls2, yitest) confusionMatrix(cls2, yitest)$overall[1]
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.