Visualizations¶
Connect to a server¶
Initialize a client with your server credentials and store it in the variable dai
.
import driverlessai
dai = driverlessai.Client(address='http://localhost:12345', username='py', password='py')
Load data¶
Import the file CreditCard_Cat-train.csv
from S3 to the Driverless AI server.
dataset = dai.datasets.create(
data='s3://h2o-public-test-data/smalldata/creditcard/CreditCard_Cat-train.csv',
data_source='s3',
name='creditcard_cat-train.csv'
)
Complete 100.00% - [4/4] Computed stats for column DEFAULT_PAYMENT_NEXT_MONTH
This creates a dataset object that's stored in the variable dataset
, which contains the following columns.
dataset.columns
['ID', 'LIMIT_BAL', 'SEX', 'EDUCATION', 'MARRIAGE', 'AGE', 'PAY_0', 'PAY_2', 'PAY_3', 'PAY_4', 'PAY_5', 'PAY_6', 'BILL_AMT1', 'BILL_AMT2', 'BILL_AMT3', 'BILL_AMT4', 'BILL_AMT5', 'BILL_AMT6', 'PAY_AMT1', 'PAY_AMT2', 'PAY_AMT3', 'PAY_AMT4', 'PAY_AMT5', 'PAY_AMT6', 'DEFAULT_PAYMENT_NEXT_MONTH']
Create visualizations¶
Create the Visualization
object and store it in the variable visualization
.
visualization = dai.autoviz.create_async(dataset).result()
Complete 100.00% - Visualization ready
You can use the visualization
variable to access the graphs in the generated visualization.All of the returned plots are in Vega Lite(v3)
format. For more information, see https://vega.github.io/vega-lite-v3/.
from vega import Vega
Visualizing box plots¶
A visualization may consist of disparate boxplots, heteroscedastic boxplots, or both, depending on the dataset. Hence, for visualizing a particular graph, you need to provide the boxplot type and access one graph from the returned list.
heteroscedastic_boxplot = visualization.box_plots['heteroscedastic'][0]
Vega(heteroscedastic_boxplot)
Visualizing histograms¶
Histograms may be spikey, skewed, or gaps histograms.
histogram = visualization.histograms['gaps'][1]
Vega(histogram)
Visualizing parallel coordinates plot¶
parallel_coordinates_plot = visualization.parallel_coordinates_plot
Vega(parallel_coordinates_plot)
Complete 100.00% -
Get recommendations¶
The following code demonstrates how you can view recommendations.
visualization.recommendations
{'transforms': {'BILL_AMT5': 'yeo_johnson_square_root', 'BILL_AMT4': 'yeo_johnson_square_root', 'BILL_AMT6': 'yeo_johnson_square_root', 'BILL_AMT1': 'yeo_johnson_log', 'BILL_AMT3': 'yeo_johnson_square_root', 'BILL_AMT2': 'yeo_johnson_log'}, 'deletions': {}}
Visualizing scatter plot¶
scatter_plot = visualization.scatter_plot
Vega(scatter_plot)
Add a custom plot¶
Add a bar chart¶
bar_chart = visualization.add_bar_chart(x_variable_name = 'EDUCATION', y_variable_name = 'AGE', transpose = False, mark = 'bar')
Vega(bar_chart.plot_data)
bar_chart.name
'bar chart of EDUCATION, AGE'
Add a box plot¶
box_plot = visualization.add_box_plot(variable_name = 'AGE', transpose = False)
Vega(box_plot.plot_data)
box_plot.name
'boxplot of AGE'
Add a dot plot¶
dot_plot = visualization.add_dot_plot(variable_name = 'AGE', mark = 'point')
Vega(dot_plot.plot_data)
dot_plot.name
'dotplot of AGE'
Add a grouped box plot¶
grouped_box_plot = visualization.add_grouped_box_plot(variable_name = 'AGE',
group_variable_name = 'EDUCATION',
transpose = False)
Vega(grouped_box_plot.plot_data)
grouped_box_plot.name
'grouped boxplot of AGE, EDUCATION'
Add a heatmap¶
heatmap = visualization.add_heatmap(variable_names = ['EDUCATION','AGE'],
permute = False,
transpose = False,
matrix_type = 'rectangular')
Vega(heatmap.plot_data)
heatmap.name
'heatmap of EDUCATION, AGE'
Add a histogram¶
histogram = visualization.add_histogram(variable_name = 'AGE',
number_of_bars = 10,
transformation = 'none',
mark = 'bar')
Vega(histogram.plot_data)
histogram.name
'histogram of AGE'
Add a linear regression plot¶
linear_regression_plot = visualization.add_linear_regression(x_variable_name = 'ID',
y_variable_name = 'AGE',
mark = 'point')
Vega(linear_regression_plot.plot_data)
linear_regression_plot.name
'linear regression of ID, AGE'
Add a loess regression plot¶
loess_regression_plot = visualization.add_loess_regression(x_variable_name = 'AGE',
y_variable_name = 'LIMIT_BAL',
mark = 'point',
bandwidth = 0.5)
Vega(loess_regression_plot.plot_data)
loess_regression_plot.name
'loess regression of AGE, LIMIT_BAL'
Add a parallel coordinates plot¶
parallel_coordinates_plot = visualization.add_parallel_coordinates_plot(variable_names = ['EDUCATION','AGE'],
permute = False,
transpose = False,
cluster = False)
Vega(parallel_coordinates_plot.plot_data)
parallel_coordinates_plot.name
'parallel coordinates plot of EDUCATION, AGE'
Add a probability plot¶
probability_plot = visualization.add_probability_plot(x_variable_name = 'AGE',
distribution = 'normal',
mark = 'point',
transpose = False)
Vega(probability_plot.plot_data)
probability_plot.name
'probability plot of AGE'
Add a quantile plot¶
quantile_plot = visualization.add_quantile_plot(x_variable_name = 'AGE',
y_variable_name = 'LIMIT_BAL',
distribution = 'normal',
mark = 'point',
transpose = False)
Vega(quantile_plot.plot_data)
quantile_plot.name
'quantile plot of AGE, LIMIT_BAL'
Add a scatter plot¶
scatter_plot = visualization.add_scatter_plot(x_variable_name = 'AGE',
y_variable_name = 'LIMIT_BAL',
mark = 'point')
Vega(scatter_plot.plot_data)
scatter_plot.name
'scatterplot of AGE, LIMIT_BAL'
View a custom plot¶
All of the custom plots that you added to the visualization can be accessed as follows.
custom_plot = visualization.custom_plots[0]
Vega(custom_plot.plot_data)
custom_plot.name
'bar chart of EDUCATION, AGE'
Remove a custom plot¶
The following example assumes that you want to remove the scatter plot you created previously.
visualization.remove_custom_plot(scatter_plot)