Task 1: Experiment setup
Import and configure experiment
Click + ADD DATASET (OR DRAG & DROP) .
Select AMAZON S3.
In the AMAZON S3 URL box, paste the following URL:
https://s3.amazonaws.com/data.h2o.ai/DAI-Tutorials/loan_level_500k.csv
Select
loan-level_500k.csv
.After selecting the file, select CLICK TO IMPORT SELECTION.
Once the file is uploaded, select DETAILS from the drop-down menu.
You can browse through all the avaliable columns and view their corresponding information.
Return to the Datasets overview page by selecting the DATASETS option from the tab above or by clicking on the arrow.
Click on the
loan_level_500k.csv
file and go to the DATA PREP menu.Click on SPLIT.
Split the dataset into two sets:
freddie_mac_500_train
andfreddie_mac_500_test
as shown:
Dataset Splitter configuration
a. Type freddie_mac_500_train
for OUTPUT NAME 1, this will serve as the training set.
b. Type freddie_mac_500_test
for OUTPUT NAME 2, this will serve as the test set.
c. For CLASSIFICATION TARGET COLUMN select DELINQUENT.
d. You can set the RANDOM SEED to any number. By choosing a random seed, you will obtain a consistent split.
e. Change the split value to 0.75 by adjusting the slider to 75% or entering .75 in the SELECT SPLIT RATIO (BY ROWS) section.
f. Click SAVE.
- Verify that there are three datasets,
freddie_mac_500_test
,freddie_mac_500_train
, andloan_level_500k.csv
:
Click on the
freddie_mac_500_train
file then select PREDICT.Select Not Now on the "First time Driverless AI, Click Yes to get a tour!" prompt.
Name your experiment
Freddie Mac Classification Tutorial
by typing it under DISPLAY NAME.
- Select DROPPED COLUMNS, and drop the following 2 columns:
- PREPAYMENT_PENALTY_MORTGAGE_FLAG
- PREPAID
Click DONE after selecting the 2 columns.
These two columns will be dropped as they strongly correlate with loan delinquency and will therefore cause data leakage.
Select TARGET COLUMN, then select DELINQUENT:
Select Test Dataset, then select
freddie_mac_500_test
.Your experiment page should now look like this:
About the dataset
For this experiment, we used a subset of the Freddie Mac Single-Family loan-level
dataset which contains 27 columns and 500,137 rows. This dataset contains information about loan-level credit performance data for a portion of fully amortizing, fixed-rate mortgages that Freddie Mac purchased between 1999 and 2017. The features include demographic factors, monthly loan performance, credit performance details like property disposition, voluntary prepayments, MI and non-MI recoveries, expenses, current deferred UPB, and the due date of the last paid installment.
You have successfully configured part of your experiment. In Task 2, you will learn to explore Experiment and Expert settings.
- Submit and view feedback for this page
- Send feedback about H2O Driverless AI | Tutorials to cloud-feedback@h2o.ai