Using H2O AutoDoc with Scikit-learn¶
Introduction¶
AutoDoc users can generate docs for their scikit models using H2O AutoDoc. (also see Scikit-learn)
Creating & Configuring H2O AutoDoc for Scikit Models¶
This section includes the code examples for setting up a model, along with basic and advanced H2O AutoDoc configurations.
The H2O AutoDoc setup requires:
license key (see H2O AutoDoc License Key for an example on setting your license)
a scikit-learn library
a trained scikit model
Scikit Model Setup:
Basic configurations:
Prepare Data for Scikit Model¶
Prepare data for model building.
import pandas as pd
url = (
"https://s3.amazonaws.com/h2o-training/events/ibm_index/CreditCard_Cat-train.csv"
)
df_joint = pd.read_csv(url)
df_joint = df_joint.drop(columns=["ID"])
# number of rows for train data
num_train = int(df_joint.shape[0] * 0.8)
# apply one hot encoding for categorical variables in dataset
joint_ohe = pd.get_dummies(df_joint, prefix_sep=".", drop_first=True)
# split dataset into train and test
train_ohe = joint_ohe.iloc[:num_train]
test_ohe = joint_ohe.iloc[num_train:]
# target column label
target = "DEFAULT_PAYMENT_NEXT_MONTH"
# specify the target
train_y = train_ohe[target]
test_y = test_ohe[target]
# drop target column from train dataframes
train_ohe = train_ohe.drop(columns=[target])
test_ohe = test_ohe.drop(columns=[target])
Build Scikit Model¶
Build your classification.
# import classifier
from sklearn.ensemble import GradientBoostingClassifier
# train model
model = GradientBoostingClassifier()
model.fit(train_ohe, train_y)
Generate a Default H2O AutoDoc¶
Generate AutoDoc using the default template.
from h2o_autodoc import Config
from h2o_autodoc.scikit.autodoc import render_autodoc
# Parameters the User Must Set: output_file_path
# specify the full path to where you want your AutoDoc saved
# replace the path below with your own path
output_file_path = "full/path/to/your/autodoc/autodoc_report.docx"
# set your AutoDoc configurations
config = Config(output_path=output_file_path)
# render your AutoDoc
render_autodoc(config, model, train, train_y, test=test, test_label=test_y)
Push Generated AutoDoc to S3 Bucket¶
from h2o_autodoc import Config
from h2o_autodoc.scikit.autodoc import render_autodoc
# Parameters the User Must Set: output_file_path
# specify the s3 URI/URL to where you want your AutoDoc saved
# we support below patterns
# s3://<bucket>/<key>
# https://<bucket>.s3.amazonaws.com/<key>
# https://s3.amazonaws.com/<bucket>/<key>
# if <key> points to a directory in bucket, auto generated filename will be used
# Eg: if <key> is s3://<bucket>/experiment_test/,
# then generated report will be s3://<bucket>/experiment_test/Experiment_Report_2021-08-31-13-37-12.docx
# either you should have ~/.aws/credentials configured or below
os.environ["AWS_ACCESS_KEY_ID"] = "your_aws_access_key"
os.environ["AWS_SECRET_ACCESS_KEY"] = "your_aws_secret"
output_file_path = "s3://h2o-datasets/autodoc-examples/autodoc_report.docx"
# set your AutoDoc configurations
config = Config(output_path=output_file_path)
# render your AutoDoc
render_autodoc(config, model, train, train_y, test=test, test_label=test_y)
Note: Local copy will be erased on successful upload to s3 bucket.
Push Generated AutoDoc to Github Repository¶
from h2o_autodoc import Config
from h2o_autodoc.scikit.autodoc import render_autodoc
# Parameters the User Must Set: output_file_path
# specify the github URL to where you want your AutoDoc saved
# we support below patterns
# https://github.com/<organization or username>/<repo name>/tree/<branch name>/<path>
# if <path> points to a directory in bucket, auto generated filename will be used
# Eg: if <path> is https://github.com/<organization or username>/<repo name>/tree/docs,
# then generated report will be https://github.com/<organization or username>/<repo name>/tree/docs/Experiment_Report_2021-08-31-13-37-12.docx
# either you should have ~/.git_autodoc/credentials configured or below
os.environ["GITHUB_PAT"] = "your_github_personal_access_token"
output_file_path = "https://github.com/h2oai/h2o-autodoc/tree/master/tests/autodoc_report.docx"
# set your AutoDoc configurations
config = Config(output_path=output_file_path)
# render your AutoDoc
render_autodoc(config, model, train, train_y, test=test, test_label=test_y)
Note: Local copy will be erased on successful upload to github repo.
Set the H2O AutoDoc File Type¶
The H2O AutoDoc can generate a Word document or markdown file. The default report is a Word document (e.g., docx).
Word Document
from h2o_autodoc import Config
from h2o_autodoc.scikit.autodoc import render_autodoc
# Parameters the User Must Set: output_file_path
# specify the full path to where you want your AutoDoc saved
# replace the path below with your own path
output_file_path = "path/to/your/autodoc/my_word_report.docx"
# only your output_path is required, as the default AutoDoc is a word document
config = Config(output_path=output_file_path)
# render your AutoDoc
render_autodoc(config, model, train, train_y, test=test, test_label=test_y)
Markdown File
Note when the main_template_type is set to “md” a zip file is returned. This zip file contains the markdown file and any images that are linked in the markdown file.
from h2o_autodoc import Config
from h2o_autodoc.scikit.autodoc import render_autodoc
# Parameters the User Must Set: output_file_path
# specify the full path to where you want your AutoDoc saved
# replace the path below with your own path (make sure to keep the '.md' file extension)
output_file_path = "path/to/your/autodoc/my_markdown_report.md"
# set the exported AutoDoc to markdown ('md')
main_template_type = "md"
config = Config(output_path=output_file_path, main_template_type=main_template_type)
# render your AutoDoc
render_autodoc(config, model, train, train_y, test=test, test_label=test_y)