CoBC generic method
CoBC is a semi-supervised learning algorithm with a co-training
style. This algorithm trains N
classifiers with the learning scheme defined in
gen.learner
using a reduced set of labeled examples. For each iteration, an unlabeled
example is labeled for a classifier if the most confident classifications assigned by the
other N-1
classifiers agree on the labeling proposed. The unlabeled examples
candidates are selected randomly from a pool of size u
.
coBCG(y, gen.learner, gen.pred, N = 3, perc.full = 0.7, u = 100, max.iter = 50)
y |
A vector with the labels of training instances. In this vector the
unlabeled instances are specified with the value |
gen.learner |
A function for training |
gen.pred |
A function for predicting the probabilities per classes.
This function must be two parameters, model and indexes, where the model
is a classifier trained with |
N |
The number of classifiers used as committee members. All these classifiers
are trained using the |
perc.full |
A number between 0 and 1. If the percentage of new labeled examples reaches this value the self-labeling process is stopped. Default is 0.7. |
u |
Number of unlabeled instances in the pool. Default is 100. |
max.iter |
Maximum number of iterations to execute in the self-labeling process. Default is 50. |
coBCG can be helpful in those cases where the method selected as
base classifier needs a learner
and pred
functions with other
specifications. For more information about the general coBC method,
please see coBC
function. Essentially, coBC
function is a wrapper of coBCG
function.
A list object of class "coBCG" containing:
The final N
base classifiers trained using the enlarged labeled set.
List of N
vectors of indexes related to the training instances
used per each classifier. These indexes are relative to the y
argument.
The indexes of all training instances used to
train the N
models. These indexes include the initial labeled instances
and the newly labeled instances. These indexes are relative to the y
argument.
List of three vectors with the same information in model.index
but the indexes are relative to instances.index
vector.
The levels of y
factor.
library(SSLR) library(caret) ## Load Wine data set data(wine) cls <- which(colnames(wine) == "Wine") x <- wine[, - cls] # instances without classes y <- wine[, cls] # the classes x <- scale(x) # scale the attributes ## Prepare data set.seed(20) # Use 50% of instances for training tra.idx <- sample(x = length(y), size = ceiling(length(y) * 0.5)) xtrain <- x[tra.idx,] # training instances ytrain <- y[tra.idx] # classes of training instances # Use 70% of train instances as unlabeled set tra.na.idx <- sample(x = length(tra.idx), size = ceiling(length(tra.idx) * 0.7)) ytrain[tra.na.idx] <- NA # remove class information of unlabeled instances # Use the other 50% of instances for inductive testing tst.idx <- setdiff(1:length(y), tra.idx) xitest <- x[tst.idx,] # testing instances yitest <- y[tst.idx] # classes of testing instances ## Example: Training from a set of instances with 1-NN (knn3) as base classifier. gen.learner1 <- function(indexes, cls) caret::knn3(x = xtrain[indexes,], y = cls, k = 1) gen.pred1 <- function(model, indexes) predict(model, xtrain[indexes,]) set.seed(1) trControl_coBCG <- list(gen.learner = gen.learner1, gen.pred = gen.pred1) md1 <- train_generic(ytrain, method = "coBCG", trControl = trControl_coBCG) # Predict probabilities per instances using each model h.prob <- lapply( X = md1$model, FUN = function(m) predict(m, xitest) ) # Combine the predictions cls1 <- coBCCombine(h.prob, md1$classes) table(cls1, yitest) confusionMatrix(cls1, yitest)$overall[1] ## Example: Training from a distance matrix with 1-NN (oneNN) as base classifier. dtrain <- as.matrix(proxy::dist(x = xtrain, method = "euclidean", by_rows = TRUE)) gen.learner2 <- function(indexes, cls) { m <- SSLR::oneNN(y = cls) attr(m, "tra.idxs") <- indexes m } gen.pred2 <- function(model, indexes) { tra.idxs <- attr(model, "tra.idxs") d <- dtrain[indexes, tra.idxs] prob <- predict(model, d, distance.weighting = "none") prob } set.seed(1) trControl_coBCG2 <- list(gen.learner = gen.learner2, gen.pred = gen.pred2) md2 <- train_generic(ytrain, method = "coBCG", trControl = trControl_coBCG2) # Predict probabilities per instances using each model ditest <- proxy::dist(x = xitest, y = xtrain[md2$instances.index,], method = "euclidean", by_rows = TRUE) h.prob <- list() ninstances <- nrow(dtrain) for (i in 1:length(md2$model)) { m <- md2$model[[i]] D <- ditest[, md2$model.index.map[[i]]] h.prob[[i]] <- predict(m, D) } # Combine the predictions cls2 <- coBCCombine(h.prob, md2$classes) table(cls2, yitest) confusionMatrix(cls2, yitest)$overall[1]
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.