Skip to main content

Task 1: Experiment setup

Import and configure experiment

  1. Click + ADD DATASET (OR DRAG & DROP) .

  2. Select AMAZON S3.

  3. In the AMAZON S3 URL box, paste the following URL:

    https://s3.amazonaws.com/data.h2o.ai/DAI-Tutorials/loan_level_500k.csv
  4. Select loan-level_500k.csv.

  5. After selecting the file, select CLICK TO IMPORT SELECTION.

  6. Once the file is uploaded, select DETAILS from the drop-down menu.

  7. You can browse through all the avaliable columns and view their corresponding information.

  8. Return to the Datasets overview page by selecting the DATASETS option from the tab above or by clicking on the arrow.

  9. Click on the loan_level_500k.csv file and go to the DATA PREP menu.

  10. Click on SPLIT. loan-level-split-1

  11. Split the dataset into two sets: freddie_mac_500_train and freddie_mac_500_test as shown: loan-level-split-2

note

Dataset Splitter configuration

a. Type freddie_mac_500_train for OUTPUT NAME 1, this will serve as the training set.

b. Type freddie_mac_500_test for OUTPUT NAME 2, this will serve as the test set.

c. For CLASSIFICATION TARGET COLUMN select DELINQUENT.

d. You can set the RANDOM SEED to any number. By choosing a random seed, you will obtain a consistent split.

e. Change the split value to 0.75 by adjusting the slider to 75% or entering .75 in the SELECT SPLIT RATIO (BY ROWS) section.

f. Click SAVE.

  1. Verify that there are three datasets, freddie_mac_500_test, freddie_mac_500_train, and loan_level_500k.csv :
  1. Click on the freddie_mac_500_train file then select PREDICT.

  2. Select Not Now on the "First time Driverless AI, Click Yes to get a tour!" prompt.

  3. Name your experiment Freddie Mac Classification Tutorial by typing it under DISPLAY NAME.

  1. Select DROPPED COLUMNS, and drop the following 2 columns:
  • PREPAYMENT_PENALTY_MORTGAGE_FLAG
  • PREPAID

Click DONE after selecting the 2 columns.

info

These two columns will be dropped as they strongly correlate with loan delinquency and will therefore cause data leakage.

  1. Select TARGET COLUMN, then select DELINQUENT:

  2. Select Test Dataset, then select freddie_mac_500_test.

  3. Your experiment page should now look like this: experiment-settings-1

About the dataset

For this experiment, we used a subset of the Freddie Mac Single-Family loan-level dataset which contains 27 columns and 500,137 rows. This dataset contains information about loan-level credit performance data for a portion of fully amortizing, fixed-rate mortgages that Freddie Mac purchased between 1999 and 2017. The features include demographic factors, monthly loan performance, credit performance details like property disposition, voluntary prepayments, MI and non-MI recoveries, expenses, current deferred UPB, and the due date of the last paid installment.

You have successfully configured part of your experiment. In Task 2, you will learn to explore Experiment and Expert settings.


Feedback