Skip to main content

Task 2: Explore dataset

Let's explore the dataset to understand each column.

  1. In the DATASETS page, observe the two datasets we will use for this tutorial.
  2. Click the UCI_Credit_card.csv dataset.
  3. Select DETAILS.
  4. Let’s take a quick overview of the columns of the training dataset:
  • The dataset consists of 10 columns which are as follows:
ColumnDescription
IDThis column represents a unique identifier for each individual in the dataset, allowing for distinction between different persons, typically assigned sequentially.
LIMIT_BALThis column denotes the credit limit assigned to each individual, representing the maximum amount of credit they are permitted to borrow from the credit card issuer.
SEXThis column indicates the gender of each individual, with values representing male (1) or female (2).
EDUCATIONThis column represents the education level of each individual, categorized into different levels such as graduate school (1), university (2), high school (3), or others (4).
MARRIAGEThis column denotes the marital status of each individual, categorized as married (1), single (2), or others (3).
AGEThis column represents the age of each individual in years.
PAY_1 to PAY_6These columns represent the history of past payments, tracking monthly payment records from April to September 2005. In other words, these columns represent each individual's repayment status over six consecutive months, from April 2005 (PAY_6) to September 2005 (PAY_1). The repayment status is categorized on a scale where -1 indicates payment made duly, and values from 1 to 9 indicate payment delays of one month up to nine months or more.
BILL_AMT1 to BILL_AMT6These columns indicate the amount of bill statements for each individual over six consecutive months, from April 2005 (BILL_AMT6) to September 2005 (BILL_AMT1).
PAY_AMT1 to PAY_AMT6These columns represent the amount of previous payments made by each individual over six consecutive months, from April 2005 (PAY_AMT6) to September 2005 (PAY_AMT1).
default payment next monthThis column indicates whether a client defaulted on their next payment. As observed in the dataset, we possess a feature indicating whether a client defaulted on their payment for the following month. This feature, known as "default payment next month," essentially corresponds to whether PAY_7 defaulted (PAY_7 is not a column in our dataset). As specified in the Introduction section, our aim is to develop a classification model to predict whether an individual will default on their next payment, specifically on PAY_7. In this context, a value of 1 in the "default payment next month" feature typically signifies default, indicating the individual failed to pay their credit card bill on time, while a value of 0 suggests no default, representing timely payment.

Now that you understand each column of the dataset, in Task 3, we will learn how to set up the experiment from scratch.


Feedback