BayesMallows: label_switching – R documentation

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

label_switching

Checking for Label Switching in the Mallows Mixture Model

Description

Label switching may sometimes be a problem when running mixture models. The algorithm by Stephens (Stephens 2000), implemented in the label.switching package (Papastamoulis 2016), allows assessment of label switching after MCMC. At the moment, this is the only avaiable option in the BayesMallows package. The Stephens algorithms requires the individual cluster probabilities of each assessor to be saved in each iteration of the MCMC algorithm. As this potentially requires much memory, the current implementation of compute_mallows saves these cluster probabilities to a csv file in each iteration. The example below shows how to perform such a check for label switching in practice.

Beware that this functionality is under development. Later releases might let the user determine the directory and filenames of the csv files.

References

Papastamoulis P (2016). “label.switching: An R Package for Dealing with the Label Switching Problem in MCMC Outputs.” Journal of Statistical Software, Code Snippets, 69(1), 1–24. ISSN 1548-7660, doi: 10.18637/jss.v069.c01, https://www.jstatsoft.org/v069/c01.

Stephens M (2000). “Dealing with label switching in mixture models.” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 62(4), 795–809. doi: 10.1111/1467-9868.00265, https://rss.onlinelibrary.wiley.com/doi/abs/10.1111/1467-9868.00265.

Examples

## Not run: 
  # This example shows how to assess if label switching happens in BayesMallows

  library(BayesMallows)
  # We start by creating a directory in which csv files with individual
  # cluster probabilities should be saved in each step of the MCMC algorithm
  dir.create("./test_label_switch")
  # Next, we go into this directory
  setwd("./test_label_switch/")
  # For comparison, we run compute_mallows with and without saving the cluster
  # probabilities The purpose of this is to assess the time it takes to save
  # the cluster probabilites
  system.time(m <- compute_mallows(rankings = sushi_rankings,
                                   n_clusters = 6, nmc = 2000, save_clus = TRUE,
                                   save_ind_clus = FALSE))
  # With this options, compute_mallows will save cluster_probs2.csv,
  # cluster_probs3.csv, ..., cluster_probs[nmc].csv.
  system.time(m <- compute_mallows(rankings = sushi_rankings, n_clusters = 6,
                                   nmc = 2000, save_clus = TRUE,
                                   save_ind_clus = TRUE))

  # Next, we check convergence of alpha
  assess_convergence(m)

  # We set the burnin to 1000
  burnin <- 1000

  # Find all files that were saved. Note that the first file saved is cluster_probs2.csv
  cluster_files <- list.files(pattern = "cluster\\_probs[[:digit:]]+\\.csv")

  # Check the size of the files that were saved.
  paste(sum(do.call(file.size, list(cluster_files))) * 1e-6, "MB")

  # Find the iteration each file corresponds to, by extracting its number
  library(stringr)
  iteration_number <- as.integer(str_extract(cluster_files, "[:digit:]+"))
  # Remove all files before burnin
  file.remove(cluster_files[iteration_number <= burnin])
  # Update the vector of files, after the deletion
  cluster_files <- list.files(pattern = "cluster\\_probs[[:digit:]]+\\.csv")
  # Create 3d array, with dimensions (iterations, assessors, clusters)
  prob_array <- array(dim = c(length(cluster_files), m$n_assessors, m$n_clusters))
  # Read each file, adding to the right element of the array
  library(readr)
  for(i in seq_along(cluster_files)){
    prob_array[i, , ] <- as.matrix(
      read_delim(cluster_files[[i]], delim = ",",
                 col_names = FALSE, col_types = paste(rep("d", m$n_clusters),
                                                      collapse = "")))
  }

  library(dplyr)
  library(tidyr)
  # Create an tnteger array of latent allocations, as this is required by label.switching
  z <- m$cluster_assignment %>%
    filter(iteration > burnin) %>%
    mutate(value = as.integer(str_extract(value, "[:digit:]+"))) %>%
    spread(key = assessor, value = value, sep = "_") %>%
    select(-iteration) %>%
    as.matrix()

  # Now apply Stephen's algorithm
  library(label.switching)
  ls <- label.switching("STEPHENS", z = z, K = m$n_clusters, p = prob_array)

  # Check the proportion of cluster assignments that were switched
  mean(apply(ls$permutations$STEPHENS, 1, function(x) !all.equal(x, seq(1, m$n_clusters))))

  # Remove the rest of the csv files
  file.remove(cluster_files)
  # Move up one directory
  setwd("..")
  # Remove the directory in which the csv files were saved
  file.remove("./test_label_switch/")

## End(Not run)

BayesMallows

Bayesian Preference Learning with the Mallows Rank Model

v1.0.1

GPL-3

Authors

Oystein Sorensen [aut, cre] (<https://orcid.org/0000-0003-0724-3542>), Valeria Vitelli [aut] (<https://orcid.org/0000-0002-6746-0453>), Marta Crispino [aut], Qinghua Liu [aut], Cristina Mollica [aut], Luca Tardella [aut]

Initial release

label_switching

Description

References

Examples

BayesMallows

We don't support your browser anymore