Skip to main content
Version: Next

Tutorial 1: Introduction to H2O AutoInsights

This tutorial uses H2O AutoInsights to discover insights in the Superstore Orders dataset. In a matter of minutes, you will discover insights in the dataset, insights that manually take time to discover. Completing this tutorial should improve your understanding of H2O AutoInsights and the general analysis flow to follow when analyzing a dataset.

Prerequisites

Overview

The Superstore Orders dataset contains details of orders made in a superstore in the United States. There are 20 columns and 9,994 rows. The first row of the dataset is as follows:

Order IDOrder DateShip DateShip ModeCustomer IDCustomer NameSegmentCountryCityStatePostal CodeRegionProduct IDCategorySub-CategoryProduct NameSalesQuantityDiscountProfit
CA-2013-15215611/9/1311/12/13Second ClassCG-12520Claire GuteConsumerUnited StatesHendersonKentucky42420SouthFUR-BO-10001798FurnitureBookcasesBush Somerset Collection Bookcase261.962041.9136

Most likely, the size of the dataset can overwhelm someone trying to understand the dataset, especially when attempting to discover insights when time is of the essence. As a result, the likelihood of missing important insights increases.

With the above in mind, let's explore how H2O AutoInsights can discover insights in the Superstore Orders dataset in a matter of minutes while decreasing the likelihood of missing important insights.

Step 1: Upload dataset

Let's upload the Superstore Orders dataset:

  1. In the Select a datasource list (in the H2O AutoInsights home), click Upload file. Select a datasource
  2. Click Browse....
  3. Click Upload.

Step 2: Explore dataset

Now that you have uploaded the dataset let's explore the dataset. To beging with, let's observe the columns and a few rows of the Superstore Orders dataset.

  1. Click superstore_orders. Dataset
  2. On the What would you like to do? card, click Preview. preview

Besides being able to explore the actual dataset rows and columns, H2O AutoInsights enables you to generate helpful summary points about the dataset in view. Let's summarize the dataset.

  1. Click Summary. Summary
  2. H2O AutoInsights generates a dataset summary by analyzing and providing summaries about each column of the dataset.
    • For categorical low cardinality columns, a pie chart is displayed with percentages for all categories, plus a list of all unique values and top/bottom counts. categorical
    • For numeric columns, common descriptive statistics such as min, max, mean, median, quantiles, standard deviation, kurtosis, and skewness are displayed. numeric
  3. Click Close. Close

Despite the ability to observe the columns and rows of the dataset, one can only truly understand what is presented. To truly understand the dataset, H2O AutoInsights can help, as we will observe in step 4.

Step 3: Analyze dataset

Now that we better understand the dataset, let's analyze the dataset to obtain helpful insights beyond a dataset summary.

  1. Click Analyze.
  2. In the Enter a name for your analysis box, enter superstore orders. Superstore orders
  3. Click Save.
  4. Click Skip/Next.
    • For purposes of this tutorial, let's skip this settings (step) and know that in this step, H2O AutoInsights lets you control the treatment of numerical (measure) and categorical (dimension) type columns. If you skip this step, the auto transformation engine (H2O AuotInsights) handles the numeric to categorical conversion. Skip data transformation
  5. For purposes of this tutorial, let's analyze all columns in the dataset. After uploading your dataset and specifying whether to transform specific data columns, H2O AutoInsights enables you to select the columns to analyze.
    note

    H2O AutoInsights divides the dataset columns into the following three categories: Measure, dimension, and temporal

    1. In the Measures tab, click Select All: measures
    2. In the Dimensions tab, click Select All: dimensions
    3. In the Temporal tab, click Select All. temporal
  6. Click Next.
  7. After selecting all the columns in the dataset to analyze, H2O AutoInsights:
    • Selects the analysis types for the overall H2O AutoInsights analysis of the dataset
      • Analysis types are selected based on the data type of the selected columns
        • To learn more about the available analysis types, see Supported analysis types
        • For the most part, H2O AutoInsights offers you the ability to customize the settings for a particular analysis type. To learn more, see Analyses settings.
          • To access the customizable settings for a particular analysis type, click Customize....
    • Provides the reasons why H2O AutoInsights dropped or transformed specific columns for the overall H2O AutoInsights analysis of the dataset Analisis types
  8. Click Analyze.

Step 4: Review analysis

Now that our analysis is complete, the storyboard summary contains 37 insights that our analysis has generated. These insights contain new reveling information about the Superstore Orders dataset. Let's view the insights.

Generated insights

As an example, and to highlight the power of H2O AutoInsights, let's see the insights discovered from the shallow Profit column. Recall, the Profit column contains numerical values that provide limited to no insights at first sight.

  1. In the Insight types navigation menu, click Geo spatial analysis.
  2. In Geo spatial analysis category, scroll down to the Aggregation of profit by state card.
    • According to the Aggregation of profit by state map:
      • Average profit ranges from -36.19 to 204.09
      • The mean of Average profit is 54.05, while the median of Average profit is 56.81
      • Ohio has the lowest Average profit at -36.19
      • Vermont has the highest Average profit at 204.09

Map

Similar to the above insight, other 36 insights are available. You can understand the insights through interactive visuals and simple narrative summaries that clarify and elevate the value of the Superstore Orders dataset.

Summary

In this tutorial, we learned how to generate insights for a dataset. In particular, we discovered how H2O AutoInsights follows a specific flow of analysis to quickly discover hidden insights in a dataset. We also learned that H2O AutoInsights provides several analysis types to understand a dataset.

Resources


Feedback