Task 3: Load data

H2O Driverless AI workflow

The typical Driverless AI workflow is as follows.

Load data
Visualize data
Run an experiment
Interpret the model
Deploy the scoring pipeline

Besides this, you can also diagnose a model, transform another dataset, score the model against another dataset and manage your data in Projects. The focus of this tutorial is on steps 1 - 4 of the typical workflow. We will cover other aspects of DAI in other tutorials found in the Driverless AI learning path.

In this task, let's begin by doing the first step of the DAI workflow: load the data.

About the Dataset

The dataset used for this experiment is a version of the Titanic Kaggle dataset. This dataset contains the list of estimated passengers aboard the RMS Titanic. The lack of lifeboats, amongst other things, was one of the factors that resulted in a significant loss of life when the Titanic ship hit an iceburg and sank. Although there was some element of luck involved in surviving the sinking, some groups of people were more likely to survive than others.

The table below outlines the data included within this dataset.

Attribute	Definition	Key
`passenger Id`	Id randomly generated	-
`pclass`	Passenger Class	1= 1st, 2 =2nd, 3=3rd
`survived`	Survival	0=No, 1=Yes
`name_with_salutations`	Passenger name	-
`name_without_salutations`	Passenger name without salutations	-
`sex`	Sex	Female, Male
`age`	Age in years	-
`sibsp`	Number of siblings/Spouse aboard	-
`parch`	Number of Parents/Children aboard	-
`ticket`	Ticket number	-
`fare`	Passenger fare	-
`cabin`	Cabin number	-
`embarked`	Port of Embarkment	C = Cherbourg, Q = Queenstown, S = Southampton
`boat`	Boat number	-
`body`	Body number	-
`home.des`	Home Destination	-

Import the dataset

Navigate to the H2O Driverless AI Datasets page.
Click + ADD DATASET (OR DRAG & DROP).
info
The ADD DATASET button has a series of supported data connectors and ingestion methods. With these data connectors, you can load datasets from various sources.
Select AMAZON S3.

In the AMAZON S3 URL box, paste the following URL:

https://s3.amazonaws.com/data.h2o.ai/DAI-Tutorials/titanic.csv

Select titanic.csv.
Once you have selected the file, click CLICK TO IMPORT SELECTION.

Once the dataset is imported, you will see it listed on the Datasets page. Imported Dataset

More resources

For more information about the dataset file formats that can be uploaded, see the H2O Driverless AI documentation.
For more datasets, check out Kaggle Datasets.

Now that the dataset has been imported, let's proceed to Task 4 and discover how DAI allows users to further understand a selected dataset. Doing this will allow us to further explore the second step of the Driverless AI workflow: visualize data.

Feedback

Submit and view feedback for this page
Send feedback about H2O Driverless AI | Tutorials to cloud-feedback@h2o.ai

Task 3: Load data

H2O Driverless AI workflow​

About the Dataset​

Import the dataset​

H2O Driverless AI workflow

About the Dataset

Import the dataset