udpipe: udpipe_accuracy – R documentation

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

udpipe_accuracy

Evaluate the accuracy of your UDPipe model on holdout data

Description

Get precision, recall and F1 measures on finding words / sentences / upos / xpos / features annotation as well as UAS and LAS dependency scores on holdout data in conllu format.

Usage

udpipe_accuracy(
  object,
  file_conllu,
  tokenizer = c("default", "none"),
  tagger = c("default", "none"),
  parser = c("default", "none")
)

Arguments

`object`	an object of class `udpipe_model` as returned by `udpipe_load_model`
`file_conllu`	the full path to a file on disk containing holdout data in conllu format
`tokenizer`	a character string of length 1, which is either 'default' or 'none'
`tagger`	a character string of length 1, which is either 'default' or 'none'
`parser`	a character string of length 1, which is either 'default' or 'none'

Value

a list with 3 elements

accuracy: A character vector with accuracy metrics.
error: A character string with possible errors when calculating the accuracy metrics

References

https://ufal.mff.cuni.cz/udpipe, https://universaldependencies.org/format.html

Examples

model <- udpipe_download_model(language = "dutch-lassysmall")
if(!model$download_failed){
ud_dutch <- udpipe_load_model(model$file_model)

file_conllu <- system.file(package = "udpipe", "dummydata", "traindata.conllu")
metrics <- udpipe_accuracy(ud_dutch, file_conllu)
metrics$accuracy
metrics <- udpipe_accuracy(ud_dutch, file_conllu, 
                           tokenizer = "none", tagger = "default", parser = "default")
metrics$accuracy
metrics <- udpipe_accuracy(ud_dutch, file_conllu, 
                           tokenizer = "none", tagger = "none", parser = "default")
metrics$accuracy
metrics <- udpipe_accuracy(ud_dutch, file_conllu, 
                           tokenizer = "default", tagger = "none", parser = "none")
metrics$accuracy
}


## cleanup for CRAN only - you probably want to keep your model if you have downloaded it
if(file.exists(model$file_model)) file.remove(model$file_model)

udpipe

Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing with the 'UDPipe' 'NLP' Toolkit

v0.8.5

MPL-2.0

Authors

Jan Wijffels [aut, cre, cph], BNOSAC [cph], Institute of Formal and Applied Linguistics, Faculty of Mathematics and Physics, Charles University in Prague, Czech Republic [cph], Milan Straka [ctb, cph], Jana Straková [ctb, cph]

Initial release