Task 3: Load data
H2O Driverless AI workflow
The typical Driverless AI workflow is as follows.
- Load data
- Visualize data
- Run an experiment
- Interpret the model
- Deploy the scoring pipeline
Besides this, you can also diagnose a model, transform another dataset, score the model against another dataset and manage your data in Projects. The focus of this tutorial is on steps 1 - 4 of the typical workflow. We will cover other aspects of DAI in other tutorials found in the Driverless AI learning path.
In this task, let's begin by doing the first step of the DAI workflow: load the data.
About the Dataset
The dataset used for this experiment is a version of the Titanic Kaggle dataset. This dataset contains the list of estimated passengers aboard the RMS Titanic. The lack of lifeboats, amongst other things, was one of the factors that resulted in a significant loss of life when the Titanic ship hit an iceburg and sank. Although there was some element of luck involved in surviving the sinking, some groups of people were more likely to survive than others.
The table below outlines the data included within this dataset.
Attribute | Definition | Key |
---|---|---|
passenger Id | Id randomly generated | - |
pclass | Passenger Class | 1= 1st, 2 =2nd, 3=3rd |
survived | Survival | 0=No, 1=Yes |
name_with_salutations | Passenger name | - |
name_without_salutations | Passenger name without salutations | - |
sex | Sex | Female, Male |
age | Age in years | - |
sibsp | Number of siblings/Spouse aboard | - |
parch | Number of Parents/Children aboard | - |
ticket | Ticket number | - |
fare | Passenger fare | - |
cabin | Cabin number | - |
embarked | Port of Embarkment | C = Cherbourg, Q = Queenstown, S = Southampton |
boat | Boat number | - |
body | Body number | - |
home.des | Home Destination | - |
Import the dataset
Navigate to the H2O Driverless AI Datasets page.
Click + ADD DATASET (OR DRAG & DROP).
infoThe ADD DATASET button has a series of supported data connectors and ingestion methods. With these data connectors, you can load datasets from various sources.
Select AMAZON S3.
In the AMAZON S3 URL box, paste the following URL:
https://s3.amazonaws.com/data.h2o.ai/DAI-Tutorials/titanic.csv
Select
titanic.csv
.Once you have selected the file, click CLICK TO IMPORT SELECTION.
Once the dataset is imported, you will see it listed on the Datasets page.
- For more information about the dataset file formats that can be uploaded, see the H2O Driverless AI documentation.
- For more datasets, check out Kaggle Datasets.
Now that the dataset has been imported, let's proceed to Task 4 and discover how DAI allows users to further understand a selected dataset. Doing this will allow us to further explore the second step of the Driverless AI workflow: visualize data.
- Submit and view feedback for this page
- Send feedback about H2O Driverless AI | Tutorials to cloud-feedback@h2o.ai