Tutorial 1: Introduction to H2O AutoInsights
This tutorial uses H2O AutoInsights to discover insights in the Superstore Orders dataset. In a matter of minutes, you will discover insights in the dataset, insights that manually take time to discover. Completing this tutorial should improve your understanding of H2O AutoInsights and the general analysis flow to follow when analyzing a dataset.
Prerequisites
- H2O AutoInsights v0.8.2
- A copy of the Superstore Orders dataset (download the superstore_orders.csv file)
Overview
The Superstore Orders dataset contains details of orders made in a superstore in the United States. There are 20 columns and 9,994 rows. The first row of the dataset is as follows:
Order ID | Order Date | Ship Date | Ship Mode | Customer ID | Customer Name | Segment | Country | City | State | Postal Code | Region | Product ID | Category | Sub-Category | Product Name | Sales | Quantity | Discount | Profit |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
CA-2013-152156 | 11/9/13 | 11/12/13 | Second Class | CG-12520 | Claire Gute | Consumer | United States | Henderson | Kentucky | 42420 | South | FUR-BO-10001798 | Furniture | Bookcases | Bush Somerset Collection Bookcase | 261.96 | 2 | 0 | 41.9136 |
Most likely, the size of the dataset can overwhelm someone trying to understand the dataset, especially when attempting to discover insights when time is of the essence. As a result, the likelihood of missing important insights increases.
With the above in mind, let's explore how H2O AutoInsights can discover insights in the Superstore Orders dataset in a matter of minutes while decreasing the likelihood of missing important insights.
Step 1: Upload dataset
Let's upload the Superstore Orders dataset:
- In the Select a datasource list (in the H2O AutoInsights home), click Upload file.
- Click Browse....
- Click Upload.
Step 2: Explore dataset
Now that you have uploaded the dataset let's explore the dataset. To beging with, let's observe the columns and a few rows of the Superstore Orders dataset.
- Click superstore_orders.
- On the What would you like to do? card, click Preview.
Besides being able to explore the actual dataset rows and columns, H2O AutoInsights enables you to generate helpful summary points about the dataset in view. Let's summarize the dataset.
- Click Summary.
- H2O AutoInsights generates a dataset summary by analyzing and providing summaries about each column of the dataset.
- For categorical low cardinality columns, a pie chart is displayed with percentages for all categories, plus a list of all unique values and top/bottom counts.
- For numeric columns, common descriptive statistics such as min, max, mean, median, quantiles, standard deviation, kurtosis, and skewness are displayed.
- Click Close.
Despite the ability to observe the columns and rows of the dataset, one can only truly understand what is presented. To truly understand the dataset, H2O AutoInsights can help, as we will observe in step 4.
Step 3: Analyze dataset
Now that we better understand the dataset, let's analyze the dataset to obtain helpful insights beyond a dataset summary.
- Click Analyze.
- In the Enter a name for your analysis box, enter
superstore orders
. - Click Save.
- Click Skip/Next.
- For purposes of this tutorial, let's skip this settings (step) and know that in this step, H2O AutoInsights lets you control the treatment of numerical (measure) and categorical (dimension) type columns. If you skip this step, the auto transformation engine (H2O AuotInsights) handles the numeric to categorical conversion.
- For purposes of this tutorial, let's analyze all columns in the dataset. After uploading your dataset and specifying whether to transform specific data columns, H2O AutoInsights enables you to select the columns to analyze.note
- In the Measures tab, click Select All:
- In the Dimensions tab, click Select All:
- In the Temporal tab, click Select All.
- Click Next.
- After selecting all the columns in the dataset to analyze, H2O AutoInsights:
- Selects the analysis types for the overall H2O AutoInsights analysis of the dataset
- Analysis types are selected based on the data type of the selected columns
- To learn more about the available analysis types, see Supported analysis types
- For the most part, H2O AutoInsights offers you the ability to customize the settings for a particular analysis type. To learn more, see Analyses settings.
- To access the customizable settings for a particular analysis type, click Customize....
- Analysis types are selected based on the data type of the selected columns
- Provides the reasons why H2O AutoInsights dropped or transformed specific columns for the overall H2O AutoInsights analysis of the dataset
- Selects the analysis types for the overall H2O AutoInsights analysis of the dataset
- Click Analyze.
Step 4: Review analysis
Now that our analysis is complete, the storyboard summary contains 37 insights that our analysis has generated. These insights contain new reveling information about the Superstore Orders dataset. Let's view the insights.
As an example, and to highlight the power of H2O AutoInsights, let's see the insights discovered from the shallow Profit column. Recall, the Profit column contains numerical values that provide limited to no insights at first sight.
- In the Insight types navigation menu, click Geo spatial analysis.
- In Geo spatial analysis category, scroll down to the Aggregation of profit by state card.
- According to the Aggregation of profit by state map:
- Average profit ranges from -36.19 to 204.09
- The mean of Average profit is 54.05, while the median of Average profit is 56.81
- Ohio has the lowest Average profit at -36.19
- Vermont has the highest Average profit at 204.09
- According to the Aggregation of profit by state map:
Similar to the above insight, other 36 insights are available. You can understand the insights through interactive visuals and simple narrative summaries that clarify and elevate the value of the Superstore Orders dataset.
Summary
In this tutorial, we learned how to generate insights for a dataset. In particular, we discovered how H2O AutoInsights follows a specific flow of analysis to quickly discover hidden insights in a dataset. We also learned that H2O AutoInsights provides several analysis types to understand a dataset.
Resources
- To learn about supported analysis types, see Supported analysis types
- To learn how to import a dataset, see Import a dataset
- To learn how to analyze a dataset, see Analyze a dataset
- Submit and view feedback for this page
- Send feedback about H2O AutoInsights to cloud-feedback@h2o.ai