Driverless AI provides a Project Workspace for managing datasets and experiments related to a specific business problem or use case. Whether you are trying to detect fraud or predict user retention, datasets and experiments can be stored and saved in the individual projects. A Leaderboard on the Projects page lets you easily compare performance and results and identify the best solution for your problem.
The following sections describe how to create and manage projects.
Note: For information on how to export Driverless AI experiments to H2O MLOps from the Projects page, see https://docs.h2o.ai/mlops-release/latest-stable/docs/userguide/using.html#exporting-experiments-from-driverless-ai-into-mlops.
Creating a Project Workspace¶
To create a Project Workspace:
Click the Projects option on the top menu.
Click New Project.
Specify a name for the project and provide a description.
Click Create Project. This creates an empty Project page.
From the Projects page, you can link datasets and/or experiments, run new experiments, and score experiments on a scoring dataset. When you link an existing experiment to a Project, the datasets used for the experiment are automatically linked to the project (if not already linked).
When attempting to solve a business problem, a normal workflow will include running multiple experiments, either with different/new data or with a variety of settings, and the optimal solution can vary for different users and/or business problems. For some users, the model with the highest accuracy for validation and test data could be the most optimal one. Other users might be willing to make an acceptable compromise on the accuracy of the model for a model with greater performance (faster prediction). For some, it could also mean how quickly the model could be trained with acceptable levels of accuracy. The Experiments list allows you to find the best solution for your business problem.
The list is organized based on experiment name. You can change the sorting of experiments by selecting the up/down arrows beside a column heading in the experiment menu.
Hover over the right menu of an experiment to view additional information about the experiment, including the problem type, datasets used, and the target column.
Finished experiments linked to the project show their validation and test scores. You can also score experiments on other datasets. To do this, you first need to add a dataset by clicking the Link Dataset button and choosing Testing from the drop-down menu. After the test dataset has been added, click the Score on Scoring Data button and choose the experiment(s) that you want to score along with the test dataset to be applied. This triggers a diagnostics job, the results of which are located on the diagnostics page. (Refer to Diagnosing a Model for more information.) After the scoring process has completed, the result appears in the Score and Scoring Time columns. The Score column shows results for the scorer specified by the Show Results for Scorer picker.
If an experiment has already been scored on a dataset, Driverless AI cannot score it again. The scoring step is deterministic, so for a particular test dataset and experiment combination, the score will be same regardless of how many times you repeat it.
The test dataset must have all the columns that are expected by the various experiments you are scoring it on. However, the columns of the test dataset need not be exactly the same as input features expected by the experiment. There can be additional columns in the test dataset. If these columns were not used for training, they will be ignored. This feature gives you the ability to train experiments on different training datasets (i.e., having different features), and if you have an “uber test dataset” that includes all these feature columns, then you can use the same dataset to score these experiments.
A Test Time column is available in the Experiments Leaderboard. This value shows the total time (in seconds) that it took for calculating the experiment scores for all applicable scorers for the experiment type. This is valuable to users who need to estimate the runtime performance of an experiment.
You can compare two or three experiments and view side-by-side detailed information about each.
Select either two or three experiments that you want to compare. You cannot compare more than three experiments.
Click the Compare n Items button.
This opens the Compare Experiments page. This page includes the experiment summary and metric plots for each experiment. The metric plots vary depending on whether this is a classification or regression experiment.
For classification experiments, this page includes:
Variable Importance list
Precision Recall Curve
For regression experiments, this page includes:
Variable Importance list
Actual vs. Predicted Graph
To delete a project, click the Projects option on the top menu to open the main Projects page. Click the dotted menu the right-most column, and then select Delete. You will be prompted to confirm the deletion.
Note that deleting projects does not delete datasets and experiments from Driverless AI. Any datasets and experiments from deleted projects will still be available on the Datasets and Experiments pages.
Leaderboard Wizard: Business value calculator¶
From the Project page, you can access a business value calculator wizard by clicking the Analyze Results button. This wizard makes it simple to perform a business value analysis for all models in a given project. Note that this feature is only supported for classification experiments.
The Leaderboard Wizard lets you assign the business value of correctly and incorrectly predicted outcomes. By default, a correct classification is worth 1, and an incorrect classification is worth -1 (in arbitrary units), but you can edit these values as needed.
For leaderboards of experiments, experiments are ranked by the net business value they provide on the test set.