Task 1: Experiment setup
Import and configure experiment
- Click + ADD DATASET (OR DRAG & DROP) . 
- Select AMAZON S3. 
- In the AMAZON S3 URL box, paste the following URL: - https://s3.amazonaws.com/data.h2o.ai/DAI-Tutorials/loan_level_500k.csv
- Select - loan-level_500k.csv.
- After selecting the file, select CLICK TO IMPORT SELECTION. 
- Once the file is uploaded, select DETAILS from the drop-down menu. 
- You can browse through all the avaliable columns and view their corresponding information. 
- Return to the Datasets overview page by selecting the DATASETS option from the tab above or by clicking on the arrow. 
- Click on the - loan_level_500k.csvfile and go to the DATA PREP menu.
- Click on SPLIT.  
- Split the dataset into two sets: - freddie_mac_500_trainand- freddie_mac_500_testas shown: 
Dataset Splitter configuration
a. Type freddie_mac_500_train for OUTPUT NAME 1, this will serve as the training set.
b. Type freddie_mac_500_test for OUTPUT NAME 2, this will serve as the test set.
c. For CLASSIFICATION TARGET COLUMN select DELINQUENT.
d. You can set the RANDOM SEED to any number. By choosing a random seed, you will obtain a consistent split.
e. Change the split value to 0.75 by adjusting the slider to 75% or entering .75 in the SELECT SPLIT RATIO (BY ROWS) section.
f. Click SAVE.
- Verify that there are three datasets, freddie_mac_500_test,freddie_mac_500_train, andloan_level_500k.csv:
- Click on the - freddie_mac_500_trainfile then select PREDICT.
- Select Not Now on the "First time Driverless AI, Click Yes to get a tour!" prompt. 
- Name your experiment - Freddie Mac Classification Tutorialby typing it under DISPLAY NAME.
- Select DROPPED COLUMNS, and drop the following 2 columns:
- PREPAYMENT_PENALTY_MORTGAGE_FLAG
- PREPAID
Click DONE after selecting the 2 columns.
These two columns will be dropped as they strongly correlate with loan delinquency and will therefore cause data leakage.
- Select TARGET COLUMN, then select DELINQUENT: 
- Select Test Dataset, then select - freddie_mac_500_test.
- Your experiment page should now look like this:  
About the dataset
For this experiment, we used a subset of the Freddie Mac Single-Family loan-level dataset which contains 27 columns and 500,137 rows. This dataset contains information about loan-level credit performance data for a portion of fully amortizing, fixed-rate mortgages that Freddie Mac purchased between 1999 and 2017. The features include demographic factors, monthly loan performance, credit performance details like property disposition, voluntary prepayments, MI and non-MI recoveries, expenses, current deferred UPB, and the due date of the last paid installment.
You have successfully configured part of your experiment. In Task 2, you will learn to explore Experiment and Expert settings.
- Submit and view feedback for this page
- Send feedback about H2O Driverless AI | Tutorials to cloud-feedback@h2o.ai