Task 3: Build and launch the experiment
Now that we have a better understanding of the dataset, let's build the experiment from scratch.
In the H2O Driverless AI navigation menu, click DATASETS.
In the Datasets table, click loan_prediction_train.csv.
Select PREDICT.
As soon as you select the Predict option, you are asked if you want to take a tour of the Driverless AI environment. Skip it for now by clicking Not Now.
Enter the following information into H2O Driverless AI:
a. Display Name - Name the current experiment:
tutorial-4a
.b. Target Column - Select LoanAmount as the target variable for the regression task.
c. Test Dataset - The Test dataset is a dataset used to provide an unbiased evaluation of a
final
model fit on the training dataset. It is not used during training of the model. Select theloan_prediction_test.csv
dataset for the test dataset option.Adjust the following settings accurately to run an effective regression model.
a. Accuracy -
This setting controls the relative accuracy of the experiment. Higher values generally lead to greater confidence in model performance. The Accuracy setting determines feature evolution, model tuning (including the number of training models, types of cross-validation, and the number of cross-validation repetitions), and the final pipeline.
Feature evolution: represents the algorithms used to create the experiment. Feature evolution uses a genetic algorithm to find the best set of model parameters and feature transformation to be used in the final model.
Final pipeline: represents the number of models and the validation method used in the final pipeline.
noteKeep the default Accuracy value provided by H2O Driverless AI.
b. Time -
This setting specifies the relative time required to complete the experiment. Higher values will result in longer experiments, allowing H2O Driverless AI more time to perform additional work and find the best model.
Feature evolution: indicates the number of models (individuals) and the maximum number of iterations that will be run in the experiment.
Early stopping: a technique that halts the experiment before it reaches the maximum number of iterations if no improvement in the score is observed after a specified number of iterations. In this experiment, training will stop early if the model's performance does not improve after 5 consecutive iterations.
- If the Time setting is set to 1, early stopping is disabled.
- For any Time value greater than 1, early stopping will be activated.
noteKeep the default Time value provided by H2O Driverless AI.
c. Interpretability -
This setting controls the complexity of the models and features used in the experiments. For example, a higher interpretability setting generally restricts the use of complex features, feature engineering, and models. If there is a strong need for a model to be easily explainable, a higher interpretability value should be used. Conversely, a lower value allows H2O Driverless AI to explore more creative solutions, such as using complex feature engineering techniques. While these features might not always be easily understood, they can lead to more accurate models.
The choice between high and low interpretability settings
The choice between high and low interpretability depends on the specific business use case.
High interpretability settings
In the banking industry, local regulators may impose restrictions that prevent the use of complex feature engineering and require explanations of feature derivatives. In such cases, the interpretability setting should be set to high.
For example, when predicting loan amounts for customers, the predictive model may rely on features such as credit history, income, education, and more. With a high interpretability setting, the model will use simple feature transformations, such as,
- Direct use of these features without complex interactions.
- Clear feature engineering steps that make it easy to identify the impact of each feature on the predicted loan amount.
This approach enables the bank to explain, for instance, that an increase in income would likely lead to a higher loan amount, making the model more interpretable to both regulators and customers.
Low interpretability settings
If the primary goal is to maximize predictive accuracy and the model’s inner workings are less important to stakeholders, a lower interpretability setting can be used.
For example, in a fraud detection system, the model must quickly predict and identify whether a transaction is fraudulent. If a transaction is flagged as fraudulent, immediate action is required to prevent it from occurring. In this case, even if the complexity of feature engineering is not easily understood, the model’s accuracy takes priority. A lower interpretability setting allows the model to use more complex transformations of features which can lead to higher performance in detecting fraud.
noteKeep the default Interpretability value provided by H2O Driverless AI.
noteThe default settings in H2O Driverless AI provide an ideal benchmark for evaluating your machine learning task. After reviewing the resulting metrics, you can make informed decisions to adjust the accuracy, time, and interpretability settings to ensure they align with your specific requirements.
d. Scorer - the scorer is the metric used to evaluate the machine learning algorithm. H2O Driverless AI automatically chooses the optimal scorer, considering the problem type and other factors. You can manually select an alternative scorer if necessary. For regression tasks, the available scorers are,
- GINI (Gini Coefficient): measures the inequality among values of a frequency distribution, and can be used to measure the quality of a binary classifier.
- MAE (Mean Absolute Error): The average of the absolute errors. The MAE units are the same as the predicted target, which is useful for understanding whether the size of the error is of concern or not. The smaller the MAE the better the model’s performance.
- MAPE (Mean Absolute Percentage Error): measures the size of the error in percentage terms. It is calculated as the average of the unsigned percentage error.
- MER (Median Error Rate or Median Absolute Percentage Error): measures the median size of the error in percentage terms. It is calculated as the median of the unsigned percentage error.
- MSE (Mean Squared Error): measures the average of the squares of the errors or deviations. It incorporates both the variance and the bias of the predictor.
- R2 (R Squared): represents the degree that the predicted value and the actual value move in unison. The R2 value varies between 0 and 1 where 0 represents no correlation between the predicted and actual value and 1 represents complete correlation.
- R2COD:
- RMSE (Root Mean Squared Error): evaluates how well a model can predict a continuous value.
- RMSLE (Root Mean Squared Logarithmic Error): measures the ratio between actual values and predicted values and takes the log of the predictions and actual values.
- RMSPE (Root Mean Square Percentage Error): the RMSE (Root Mean Squared Error) expressed as a percentage. The smaller the RMSPE, the better the model performance.
- SMAPE (Symmetric Mean Absolute Percentage Error): divides the absolute errors by the mean of the absolute actual and the absolute predicted values. This is important when the actual values can be 0 or near 0. Since SMAPE includes both the actual and the predicted values, the SMAPE value can never be greater than 200%.
For detailed descriptions of each regression scorer, see Regression scorers.
noteKeep the default Scorer value provided by H2O Driverless AI.
Additionally, there are three more buttons located beneath the experimental settings knob which stand for the following:
e. Model type - H2O Driverless AI automatically identifies the type of machine learning problem as classification or regression by analyzing the data type and distribution of the selected target column.
- Make sure the model type is set to Regression.
f. Reproducible: This button allows you to build an experiment with a random seed and get reproducible results. If this is disabled (default), the results will vary between runs.
- Set this option to
Off
.
g. GPUS Enable: Specify whether to enable GPUs. (Note that this option is ignored on CPU-only systems).
- Set this option to
Off
.
After configuring the above settings click Launch experiment.
Congratulations, you have successfully configured the training settings and launched your experiment. In Task 4, you will explore the experiment results.
- Submit and view feedback for this page
- Send feedback about H2O Driverless AI | Tutorials to cloud-feedback@h2o.ai