Task 2: Explore dataset
Let's explore the dataset to understand each column.
- In the DATASETS page, observe the two datasets we will use for this tutorial.
- Click the
UCI_Credit_card.csv
dataset. - Select DETAILS.
- Let’s take a quick overview of the columns of the training dataset:
- The dataset consists of 10 columns which are as follows:
Column | Description |
---|---|
ID | This column represents a unique identifier for each individual in the dataset, allowing for distinction between different persons, typically assigned sequentially. |
LIMIT_BAL | This column denotes the credit limit assigned to each individual, representing the maximum amount of credit they are permitted to borrow from the credit card issuer. |
SEX | This column indicates the gender of each individual, with values representing male (1) or female (2). |
EDUCATION | This column represents the education level of each individual, categorized into different levels such as graduate school (1), university (2), high school (3), or others (4). |
MARRIAGE | This column denotes the marital status of each individual, categorized as married (1), single (2), or others (3). |
AGE | This column represents the age of each individual in years. |
PAY_1 to PAY_6 | These columns represent the history of past payments, tracking monthly payment records from April to September 2005. In other words, these columns represent each individual's repayment status over six consecutive months, from April 2005 (PAY_6) to September 2005 (PAY_1). The repayment status is categorized on a scale where -1 indicates payment made duly, and values from 1 to 9 indicate payment delays of one month up to nine months or more. |
BILL_AMT1 to BILL_AMT6 | These columns indicate the amount of bill statements for each individual over six consecutive months, from April 2005 (BILL_AMT6) to September 2005 (BILL_AMT1). |
PAY_AMT1 to PAY_AMT6 | These columns represent the amount of previous payments made by each individual over six consecutive months, from April 2005 (PAY_AMT6) to September 2005 (PAY_AMT1). |
default payment next month | This column indicates whether a client defaulted on their next payment. As observed in the dataset, we possess a feature indicating whether a client defaulted on their payment for the following month. This feature, known as "default payment next month," essentially corresponds to whether PAY_7 defaulted (PAY_7 is not a column in our dataset). As specified in the Introduction section, our aim is to develop a classification model to predict whether an individual will default on their next payment, specifically on PAY_7. In this context, a value of 1 in the "default payment next month" feature typically signifies default, indicating the individual failed to pay their credit card bill on time, while a value of 0 suggests no default, representing timely payment. |
Now that you understand each column of the dataset, in Task 3, we will learn how to set up the experiment from scratch.
Feedback
- Submit and view feedback for this page
- Send feedback about H2O Driverless AI | Tutorials to cloud-feedback@h2o.ai