StatMeasures: dqcategorical – R documentation

Pricing

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

Get Started for Free

Documentation

StatMeasures

dqcategorical

Data quality check of categorical variables

Description

Takes in a data, and returns summary of categorical variables

Usage

dqcategorical(data)

Arguments

data

a data.frame or data.table

Details

While trying to understand a data, it is important to know the distribution of categorical variables. dqcategorical produces an output which answers a couple of questions regarding such variabes - how many distinct categories does the variable have, what are the categories, what is the frequency of each of them and the percentage frequency.

But first, it is critical to identify categorical variables in the data. They may be integer, numeric or character. All such variables should be converted to factor; one may use factorise function in this package to do this task easily.

The function identifies all the factor variables and produces an output for each of them and returns a consolidated summary. It works for both 'data.frame' and 'data.table' but the output summary is a 'data.frame' only.

Value

a data.frame which contains the variable, category index, category, category frequency and percentage frequency of all factor variables

Author(s)

Akash Jain

Examples

# A 'data.frame'
df <- data.frame(phone = c('IP', 'SN', 'HO', 'IP', 'SN', 'IP', 'HO', 'SN', 'IP', 'SN'),
                 colour = c('black', 'blue', 'green', 'blue', 'black', 'silver', 'black',
                 'white', 'black', 'green'))

# Factorise categorical variables
df <- factorise(data = df, colNames = c('phone', 'colour'))

# Generate a data quality report of continuous variables
summaryCategorical <- dqcategorical(data = df)

StatMeasures

Easy Data Manipulation, Data Quality and Statistical Checks

v1.0

GPL-2

Authors

Akash Jain

Initial release

2015-03-24

dqcategorical

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Examples

StatMeasures

We don't support your browser anymore