Using H2O AutoDoc with Scikit-learn

Introduction

AutoDoc users can generate docs for their scikit models using H2O AutoDoc. (also see Scikit-learn)

Creating & Configuring H2O AutoDoc for Scikit Models

This section includes the code examples for setting up a model, along with basic and advanced H2O AutoDoc configurations.

The H2O AutoDoc setup requires:

  • license key (see H2O AutoDoc License Key for an example on setting your license)

  • a scikit-learn library

  • a trained scikit model

Scikit Model Setup:

Basic configurations:

Prepare Data for Scikit Model

Prepare data for model building.

import pandas as pd

url = (
    "https://s3.amazonaws.com/h2o-training/events/ibm_index/CreditCard_Cat-train.csv"
)

df_joint = pd.read_csv(url)
df_joint = df_joint.drop(columns=["ID"])

# number of rows for train data
num_train = int(df_joint.shape[0] * 0.8)

# apply one hot encoding for categorical variables in dataset
joint_ohe = pd.get_dummies(df_joint, prefix_sep=".", drop_first=True)

# split dataset into train and test
train_ohe = joint_ohe.iloc[:num_train]
test_ohe = joint_ohe.iloc[num_train:]

# target column label
target = "DEFAULT_PAYMENT_NEXT_MONTH"

# specify the target
train_y = train_ohe[target]
test_y = test_ohe[target]

# drop target column from train dataframes
train_ohe = train_ohe.drop(columns=[target])
test_ohe = test_ohe.drop(columns=[target])

Build Scikit Model

Build your classification.

# import classifier
from sklearn.ensemble import GradientBoostingClassifier

# train model
model = GradientBoostingClassifier()
model.fit(train_ohe, train_y)

Generate a Default H2O AutoDoc

Generate AutoDoc using the default template.

from h2o_autodoc import Config
from h2o_autodoc.scikit.autodoc import render_autodoc

# Parameters the User Must Set: output_file_path
# specify the full path to where you want your AutoDoc saved
# replace the path below with your own path
output_file_path = "full/path/to/your/autodoc/autodoc_report.docx"

# set your AutoDoc configurations
config = Config(output_path=output_file_path)

# render your AutoDoc
render_autodoc(config, model, train, train_y, test=test, test_label=test_y)

Push Generated AutoDoc to S3 Bucket

from h2o_autodoc import Config
from h2o_autodoc.scikit.autodoc import render_autodoc

# Parameters the User Must Set: output_file_path
# specify the s3 URI/URL to where you want your AutoDoc saved
# we support below patterns
    # s3://<bucket>/<key>
    # https://<bucket>.s3.amazonaws.com/<key>
    # https://s3.amazonaws.com/<bucket>/<key>
# if <key> points to a directory in bucket, auto generated filename will be used
    # Eg: if <key> is  s3://<bucket>/experiment_test/,
    # then generated report will be s3://<bucket>/experiment_test/Experiment_Report_2021-08-31-13-37-12.docx

# either you should have ~/.aws/credentials configured or below
os.environ["AWS_ACCESS_KEY_ID"] = "your_aws_access_key"
os.environ["AWS_SECRET_ACCESS_KEY"] = "your_aws_secret"

output_file_path = "s3://h2o-datasets/autodoc-examples/autodoc_report.docx"

# set your AutoDoc configurations
config = Config(output_path=output_file_path)

# render your AutoDoc
render_autodoc(config, model, train, train_y, test=test, test_label=test_y)

Note: Local copy will be erased on successful upload to s3 bucket.

Push Generated AutoDoc to Github Repository

from h2o_autodoc import Config
from h2o_autodoc.scikit.autodoc import render_autodoc

# Parameters the User Must Set: output_file_path
# specify the github URL to where you want your AutoDoc saved
# we support below patterns
    # https://github.com/<organization or username>/<repo name>/tree/<branch name>/<path>
# if <path> points to a directory in bucket, auto generated filename will be used
    # Eg: if <path> is  https://github.com/<organization or username>/<repo name>/tree/docs,
    # then generated report will be https://github.com/<organization or username>/<repo name>/tree/docs/Experiment_Report_2021-08-31-13-37-12.docx

# either you should have ~/.git_autodoc/credentials configured or below
os.environ["GITHUB_PAT"] = "your_github_personal_access_token"

output_file_path = "https://github.com/h2oai/h2o-autodoc/tree/master/tests/autodoc_report.docx"

# set your AutoDoc configurations
config = Config(output_path=output_file_path)

# render your AutoDoc
render_autodoc(config, model, train, train_y, test=test, test_label=test_y)

Note: Local copy will be erased on successful upload to github repo.

Set the H2O AutoDoc File Type

The H2O AutoDoc can generate a Word document or markdown file. The default report is a Word document (e.g., docx).

Word Document

from h2o_autodoc import Config
from h2o_autodoc.scikit.autodoc import render_autodoc

# Parameters the User Must Set: output_file_path
# specify the full path to where you want your AutoDoc saved
# replace the path below with your own path
output_file_path = "path/to/your/autodoc/my_word_report.docx"

# only your output_path is required, as the default AutoDoc is a word document
config = Config(output_path=output_file_path)

# render your AutoDoc
render_autodoc(config, model, train, train_y, test=test, test_label=test_y)

Markdown File

Note when the main_template_type is set to “md” a zip file is returned. This zip file contains the markdown file and any images that are linked in the markdown file.

from h2o_autodoc import Config
from h2o_autodoc.scikit.autodoc import render_autodoc

# Parameters the User Must Set: output_file_path
# specify the full path to where you want your AutoDoc saved
# replace the path below with your own path (make sure to keep the '.md' file extension)
output_file_path = "path/to/your/autodoc/my_markdown_report.md"

# set the exported AutoDoc to markdown ('md')
main_template_type = "md"
config = Config(output_path=output_file_path, main_template_type=main_template_type)

# render your AutoDoc
render_autodoc(config, model, train, train_y, test=test, test_label=test_y)