Skip to main content

Task 3: Load data

H2O Driverless AI workflow

The typical Driverless AI workflow is as follows.

  1. Load data
  2. Visualize data
  3. Run an experiment
  4. Interpret the model
  5. Deploy the scoring pipeline

Besides this, you can also diagnose a model, transform another dataset, score the model against another dataset and manage your data in Projects. The focus of this tutorial is on steps 1 - 4 of the typical workflow. We will cover other aspects of DAI in other tutorials found in the Driverless AI learning path.

In this task, let's begin by doing the first step of the DAI workflow: load the data.

About the Dataset

The dataset used for this experiment is a version of the Titanic Kaggle dataset. This dataset contains the list of estimated passengers aboard the RMS Titanic. The lack of lifeboats, amongst other things, was one of the factors that resulted in a significant loss of life when the Titanic ship hit an iceburg and sank. Although there was some element of luck involved in surviving the sinking, some groups of people were more likely to survive than others.

The table below outlines the data included within this dataset.

AttributeDefinitionKey
passenger IdId randomly generated-
pclassPassenger Class1= 1st, 2 =2nd, 3=3rd
survivedSurvival0=No, 1=Yes
name_with_salutationsPassenger name-
name_without_salutationsPassenger name without salutations-
sexSexFemale, Male
ageAge in years-
sibspNumber of siblings/Spouse aboard-
parchNumber of Parents/Children aboard-
ticketTicket number-
farePassenger fare-
cabinCabin number-
embarkedPort of EmbarkmentC = Cherbourg, Q = Queenstown, S = Southampton
boatBoat number-
bodyBody number-
home.desHome Destination-

Import the dataset

  1. Navigate to the H2O Driverless AI Datasets page.

  2. Click + ADD DATASET (OR DRAG & DROP).

    info

    The ADD DATASET button has a series of supported data connectors and ingestion methods. With these data connectors, you can load datasets from various sources.

  3. Select AMAZON S3.

  4. In the AMAZON S3 URL box, paste the following URL:

    https://s3.amazonaws.com/data.h2o.ai/DAI-Tutorials/titanic.csv
  5. Select titanic.csv.

  6. Once you have selected the file, click CLICK TO IMPORT SELECTION.

Once the dataset is imported, you will see it listed on the Datasets page. Imported Dataset

More resources

Now that the dataset has been imported, let's proceed to Task 4 and discover how DAI allows users to further understand a selected dataset. Doing this will allow us to further explore the second step of the Driverless AI workflow: visualize data.


Feedback