.. _scikit-autodoc-usage-ref: Using H2O AutoDoc with Scikit-learn =================================== Introduction ------------ AutoDoc users can generate docs for their scikit models using H2O AutoDoc. (also see :ref:`Scikit-learn`) Creating & Configuring H2O AutoDoc for Scikit Models ---------------------------------------------------- This section includes the code examples for setting up a model, along with basic and advanced H2O AutoDoc configurations. The H2O AutoDoc setup requires: * license key (see :ref:`autodoc-license-key-example-ref` for an example on setting your license) * a scikit-learn library * a trained scikit model **Scikit Model Setup:** - :ref:`scikit-prepare-data-ref` - :ref:`scikit-build-model-ref` **Basic configurations:** - :ref:`scikit-generate-default-autodoc-ref` - :ref:`scikit-save-autodoc-to-s3-ref` - :ref:`scikit-save-autodoc-to-github-ref` - :ref:`scikit-specify-file-type-ref` .. _scikit-prepare-data-ref: Prepare Data for Scikit Model ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Prepare data for model building. .. tabs:: .. code-tab:: python Steam Python import pandas as pd url = ( "https://s3.amazonaws.com/h2o-training/events/ibm_index/CreditCard_Cat-train.csv" ) df_joint = pd.read_csv(url) df_joint = df_joint.drop(columns=["ID"]) # number of rows for train data num_train = int(df_joint.shape[0] * 0.8) # apply one hot encoding for categorical variables in dataset joint_ohe = pd.get_dummies(df_joint, prefix_sep=".", drop_first=True) # split dataset into train and test train_ohe = joint_ohe.iloc[:num_train] test_ohe = joint_ohe.iloc[num_train:] # target column label target = "DEFAULT_PAYMENT_NEXT_MONTH" # specify the target train_y = train_ohe[target] test_y = test_ohe[target] # drop target column from train dataframes train_ohe = train_ohe.drop(columns=[target]) test_ohe = test_ohe.drop(columns=[target]) .. _scikit-build-model-ref: Build Scikit Model ~~~~~~~~~~~~~~~~~~ Build your classification. .. tabs:: .. code-tab:: python Steam Python # import classifier from sklearn.ensemble import GradientBoostingClassifier # train model model = GradientBoostingClassifier() model.fit(train_ohe, train_y) .. _scikit-generate-default-autodoc-ref: Generate a Default H2O AutoDoc ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Generate AutoDoc using the default template. .. tabs:: .. code-tab:: python Steam Python from h2o_autodoc import Config from h2o_autodoc.scikit.autodoc import render_autodoc # Parameters the User Must Set: output_file_path # specify the full path to where you want your AutoDoc saved # replace the path below with your own path output_file_path = "full/path/to/your/autodoc/autodoc_report.docx" # set your AutoDoc configurations config = Config(output_path=output_file_path) # render your AutoDoc render_autodoc(config, model, train, train_y, test=test, test_label=test_y) .. _scikit-save-autodoc-to-s3-ref: Push Generated AutoDoc to S3 Bucket ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. tabs:: .. code-tab:: python Steam Python from h2o_autodoc import Config from h2o_autodoc.scikit.autodoc import render_autodoc # Parameters the User Must Set: output_file_path # specify the s3 URI/URL to where you want your AutoDoc saved # we support below patterns # s3:/// # https://.s3.amazonaws.com/ # https://s3.amazonaws.com// # if points to a directory in bucket, auto generated filename will be used # Eg: if is s3:///experiment_test/, # then generated report will be s3:///experiment_test/Experiment_Report_2021-08-31-13-37-12.docx # either you should have ~/.aws/credentials configured or below os.environ["AWS_ACCESS_KEY_ID"] = "your_aws_access_key" os.environ["AWS_SECRET_ACCESS_KEY"] = "your_aws_secret" output_file_path = "s3://h2o-datasets/autodoc-examples/autodoc_report.docx" # set your AutoDoc configurations config = Config(output_path=output_file_path) # render your AutoDoc render_autodoc(config, model, train, train_y, test=test, test_label=test_y) Note: Local copy will be erased on successful upload to s3 bucket. .. _scikit-save-autodoc-to-github-ref: Push Generated AutoDoc to Github Repository ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. tabs:: .. code-tab:: python Steam Python from h2o_autodoc import Config from h2o_autodoc.scikit.autodoc import render_autodoc # Parameters the User Must Set: output_file_path # specify the github URL to where you want your AutoDoc saved # we support below patterns # https://github.com///tree// # if points to a directory in bucket, auto generated filename will be used # Eg: if is https://github.com///tree/docs, # then generated report will be https://github.com///tree/docs/Experiment_Report_2021-08-31-13-37-12.docx # either you should have ~/.git_autodoc/credentials configured or below os.environ["GITHUB_PAT"] = "your_github_personal_access_token" output_file_path = "https://github.com/h2oai/h2o-autodoc/tree/master/tests/autodoc_report.docx" # set your AutoDoc configurations config = Config(output_path=output_file_path) # render your AutoDoc render_autodoc(config, model, train, train_y, test=test, test_label=test_y) Note: Local copy will be erased on successful upload to github repo. .. _scikit-specify-file-type-ref: Set the H2O AutoDoc File Type ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The H2O AutoDoc can generate a Word document or markdown file. The default report is a Word document (e.g., docx). **Word Document** .. tabs:: .. code-tab:: python Steam Python from h2o_autodoc import Config from h2o_autodoc.scikit.autodoc import render_autodoc # Parameters the User Must Set: output_file_path # specify the full path to where you want your AutoDoc saved # replace the path below with your own path output_file_path = "path/to/your/autodoc/my_word_report.docx" # only your output_path is required, as the default AutoDoc is a word document config = Config(output_path=output_file_path) # render your AutoDoc render_autodoc(config, model, train, train_y, test=test, test_label=test_y) **Markdown File** Note when the **main_template_type** is set to **"md"** a zip file is returned. This zip file contains the markdown file and any images that are linked in the markdown file. .. tabs:: .. code-tab:: python Steam Python from h2o_autodoc import Config from h2o_autodoc.scikit.autodoc import render_autodoc # Parameters the User Must Set: output_file_path # specify the full path to where you want your AutoDoc saved # replace the path below with your own path (make sure to keep the '.md' file extension) output_file_path = "path/to/your/autodoc/my_markdown_report.md" # set the exported AutoDoc to markdown ('md') main_template_type = "md" config = Config(output_path=output_file_path, main_template_type=main_template_type) # render your AutoDoc render_autodoc(config, model, train, train_y, test=test, test_label=test_y)