Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

udpipe_accuracy

Evaluate the accuracy of your UDPipe model on holdout data


Description

Get precision, recall and F1 measures on finding words / sentences / upos / xpos / features annotation as well as UAS and LAS dependency scores on holdout data in conllu format.

Usage

udpipe_accuracy(
  object,
  file_conllu,
  tokenizer = c("default", "none"),
  tagger = c("default", "none"),
  parser = c("default", "none")
)

Arguments

object

an object of class udpipe_model as returned by udpipe_load_model

file_conllu

the full path to a file on disk containing holdout data in conllu format

tokenizer

a character string of length 1, which is either 'default' or 'none'

tagger

a character string of length 1, which is either 'default' or 'none'

parser

a character string of length 1, which is either 'default' or 'none'

Value

a list with 3 elements

  • accuracy: A character vector with accuracy metrics.

  • error: A character string with possible errors when calculating the accuracy metrics

References

See Also

Examples

model <- udpipe_download_model(language = "dutch-lassysmall")
if(!model$download_failed){
ud_dutch <- udpipe_load_model(model$file_model)

file_conllu <- system.file(package = "udpipe", "dummydata", "traindata.conllu")
metrics <- udpipe_accuracy(ud_dutch, file_conllu)
metrics$accuracy
metrics <- udpipe_accuracy(ud_dutch, file_conllu, 
                           tokenizer = "none", tagger = "default", parser = "default")
metrics$accuracy
metrics <- udpipe_accuracy(ud_dutch, file_conllu, 
                           tokenizer = "none", tagger = "none", parser = "default")
metrics$accuracy
metrics <- udpipe_accuracy(ud_dutch, file_conllu, 
                           tokenizer = "default", tagger = "none", parser = "none")
metrics$accuracy
}


## cleanup for CRAN only - you probably want to keep your model if you have downloaded it
if(file.exists(model$file_model)) file.remove(model$file_model)

udpipe

Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing with the 'UDPipe' 'NLP' Toolkit

v0.8.5
MPL-2.0
Authors
Jan Wijffels [aut, cre, cph], BNOSAC [cph], Institute of Formal and Applied Linguistics, Faculty of Mathematics and Physics, Charles University in Prague, Czech Republic [cph], Milan Straka [ctb, cph], Jana Straková [ctb, cph]
Initial release

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.