Spam Classification Task
Spam data set from the UCI machine learning repository (http://archive.ics.uci.edu/ml/datasets/spambase). Data set collected at Hewlett-Packard Labs to classify emails as spam or non-spam. 57 variables indicate the frequency of certain words and characters in the e-mail. The positive class is set to "spam".
R6::R6Class inheriting from TaskClassif.
mlr_tasks$get("spam") tsk("spam")
Task type: “classif”
Dimensions: 4601x58
Properties: “twoclass”
Has Missings: FALSE
Target: “type”
Features: “address”, “addresses”, “all”, “business”, “capitalAve”, “capitalLong”, “capitalTotal”, “charDollar”, “charExclamation”, “charHash”, “charRoundbracket”, “charSemicolon”, “charSquarebracket”, “conference”, “credit”, “cs”, “data”, “direct”, “edu”, “email”, “font”, “free”, “george”, “hp”, “hpl”, “internet”, “lab”, “labs”, “mail”, “make”, “meeting”, “money”, “num000”, “num1999”, “num3d”, “num415”, “num650”, “num85”, “num857”, “order”, “original”, “our”, “over”, “parts”, “people”, “pm”, “project”, “re”, “receive”, “remove”, “report”, “table”, “technology”, “telnet”, “will”, “you”, “your”
Creators: Mark Hopkins, Erik Reeber, George Forman, Jaap Suermondt. Hewlett-Packard Labs, 1501 Page Mill Rd., Palo Alto, CA 94304
Donor: George Forman (gforman at nospam hpl.hp.com) 650-857-7835
Preprocessing: Columns have been renamed. Preprocessed data taken from the kernlab package.
Dua, Dheeru, Graff, Casey (2017). “UCI Machine Learning Repository.” http://archive.ics.uci.edu/ml/.
Dictionary of Tasks: mlr_tasks
as.data.table(mlr_tasks)
for a complete table of all (also dynamically created) Tasks.
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.