Skip to main content
Version: v1.3.0

Concepts

Encoders

One-hot encoder

One-hot encode is a process where categorical variables are converted to a new categorical column while assigning a binary value of 1 or 0 to those columns.

Before one-hot encode > After one-hot encode

Color>YellowGreenRed
Yellow>100
Green>010
Red>001

Label encoder

Label encoding refers to converting labels of a column into a numeric form to follow a machine-readable form. The label encoder can normalize labels. It can also be used to transform non-numerical labels into numerical labels as long as the non-numerical labels are hashable and comparable.

Before label encoder > After label encoder

Color>Color
Yellow>1
Green>2
Red>3

Run-length encoder

Run-length encoding (RLE) refers to the type of data compression which takes a string of identical values and replaces it with codes to indicate the value and the number of times it occurs in the string. In particular, RLE is lossless, which refers to the idea that when decompressed, all of the original data (string) is recovered when decoded. For example: FFFQQQC -> 3F3Q1C.

Note

Classification tasks

Suported classification tasks are as follows:

note

To learn which problem types support one, two, or all of the supported classification tasks, see Supported problem types.

Binary​

Binary classification refers to a task that has two class labels. A single class label is predicted for each example in a binary classification task. In other words, a single column with 0/1 values.

Multi-class​

Multi-class classification refers to a task that has more than two class labels. A single class label is predicted for each example in a multi-class classification task. In other words, multiple columns where one column has to be 1.

Multi-label​

Multi-label classification refers to a task with two or more class labels, where you may predict one or more class labels for each example. In other words, multiple columns where any column can be 0/1.


Feedback