Snowflake Integration

Overview

This document describes how to use the external function feature of Snowflake to invoke Driverless AI models as HTTP REST API endpoints.

Using the external function requires some setup and configuration in Snowflake and Amazon. For more information, refer to the Snowflake documentation on external functions.

注解

Downloads:

The setup process for the Java UDF is typically easier than for the external function.

Snowflake integration overview

Requirements

  1. Snowflake login credentials

  2. Amazon EC2 login credentials

  3. Driverless AI MOJO (pipelineSF.mojo)

  • Included in the demo files

  1. DAIMojoRestServer

  • Included in the demo files

  1. Driverless AI license

  • Provided through the partnership portal

  • Copy the license to the Snowflake_H2Oai directory. Name the file license.sig.

  1. Java JDK 1.8

  • An open source JDK is included in the demo zip file and the demo scripts use that as the default. To use a different JVM, see the following steps:

    • Execute “java -version” to verify the available JVM on your platform has JDK 1.8 installed. The output of the command should indicate JDK 1.8, for example:

    java -version
    openjdk version "1.8.0_121"
    

Security

When using the external function, a call is made from Snowflake to the AWS API Gateway. This requires the configuration of trust relationships in AWS so that the call can be made.

The H2O REST Server only accepts calls from the AWS Gateway endpoint. When the parameter -DSecureModelAllowAgent=”AmazonAPIGateway.*|snowflake.*” is added to the command line, it’s even possible to further limit this to a specific AWS function.

Enabling -DModelSecureEndPoints=/** protects the Rest Server by requiring full authentication, effectivity blocking requests.

Installation

The following installation includes steps in Snowflake, AWS, and an EC2 instance where the H2O REST server is installed.

The following steps outline the REST server installation:

  1. Create an EC2 Instance, a demo system should have the following minimum specification:

    • Operating System: Linux

    • CPU: 2

    • Memory: 16GB

    • Disk: 500MB

Note: For demos where larger datasets may be used, consider using more cores and larger memory sizes. For example, using 1 million rows, 48 cores, and 64GB results in much lower latency that a 2 core system.

  1. Copy the distribution to the EC2 instance and extract the file.

unzip Snowflake_H2Oai_v1.zip
  1. Create the database.

  2. Populate the table with the sample data.

  3. Verify that the data is available.

Starting the REST Server

Use the following steps to start the H2O REST server on the EC2 instance.

  1. Ensure the current working directory is Snowflake-H2Oai/Function.

cd Snowflake-H2Oai/Function
./goServer.sh
  1. Press ENTER to background the program. The log is written to nohup.log.

  2. The REST server initiates after several seconds have passed. Check for a ready message similar to the following:

Started DaiMojoRestServer4Application in 4.778 seconds (JVM running for 6.22)

Verify REST Server Installation

To verify that the REST server and its model components were installed successfully and that the server initialized correctly:

  1. Ensure that the current working directory is Snowflake-H2Oai/Function

  2. Run the following script from a separate terminal window:

./goVerifyEndPoint.sh
{ "data":[[1,0.7779838]]}

If the response is similar to the output from this example, the REST server, model, and license have been installed correctly.

Stopping the REST Server

To stop the H2O REST server on the EC2 instance, run the following commands:

cd Snowflake-H2Oai/Function
./stopServer.sh

External Function Example

The following is an example of an external function:

create or replace api integration demonstration_external_api_integration_01
api_provider=aws_api_gateway
api_aws_role_arn='arn:aws:iam::nnnnnnnn:role/snowflake'
api_allowed_prefixes=('https://aaaaaaaa.execute-api.us-east-1.amazonaws.com/MojoTest')
enabled=true;

create or replace external function H2OPredict(v varchar, v0 number, v1 varchar, v2 number, v3 number, v4 number, v5 number, v6 varchar, v7 varchar, v8 number, v9 number, v10 number, v11 number)
returns variant
api_integration = demonstration_external_api_integration_01
as 'https://aaaaaaaa.execute-api.us-east-1.amazonaws.com/MojoTest';

Function Data Types

The preceding function passes 13 parameters (v to V11).

  • The number of parameters matches the number of parameters that the model was created with.

  • If the data in the table is a float and the function uses the

SQL Examples

Once the Snowflake and AWS Gateway has been configured, the following example SQL statements return predictions:

select H2OPredict('Modelname=pipelineSF.mojo’, LOAN_AMNT, TERM, INT_RATE, INSTALLMENT, EMP_LENGTH, ANNUAL_INC, VERIFICATION_STATUS, ADDR_STATE, DTI, DELINQ_2YRS, REVOL_BAL, REVOL_UTIL ) from LENDINGCLUB where ADDR_STATE='NJ' order by ID;

Passing Runtime Parameters

The following is a list of parameters used to pass specific values to the REST server:

  • Modelname: The name of the Driverless AI MOJO file that exists in the REST server ModelDirectory. This is pipeline.mojo by default.

  • Prediction: The numeric prediction to use. This is 0 by default.

Sample parameter usage:

select *, H2OPredict('Modelname=pipelineSF.mojo Prediction=0',LOAN_AMNT, TERM, INT_RATE, INSTALLMENT, EMP_LENGTH, ANNUAL_INC, VERIFICATION_STATUS,
              ADDR_STATE, DTI, DELINQ_2YRS, REVOL_BAL, REVOL_UTIL ) from LENDINGCLUB;

Request: 10625, 36 months,6.62,326.23,4,33000,VERIFIED - income,WA,27.38,0,6290,46.3
Response: ["bad_loan.0 : 0.917305","bad_loan.1 : 0.08269503"]

0.917305

Advanced Setup

The Snowflake External Function allows custom HTTP headers to be defined. In the following example, the HTTP header modelname is added to the function and is defined as pipeline.mojo for all calls to this function.

create or replace external function H2OPredictHDR(v0 number, v1 varchar, v2 number, v3 number, v4 number, v5 number, v6 varchar, v7 varchar, v8 number, v9 number, v10 number, v11 number)
returns variant
HEADERS=('modelname' = 'pipelineSF.mojo')
api_integration = demonstration_external_api_integration_01
as 'https://aaaaaaaa.execute-api.us-east-1.amazonaws.com/production';

This allows function calls to not require any parameters. A function by itself is enough for each model:

select id, H2OPredictHDR(LOAN_AMNT, TERM, INT_RATE, INSTALLMENT, EMP_LENGTH, ANNUAL_INC, VERIFICATION_STATUS,
              ADDR_STATE, DTI, DELINQ_2YRS, REVOL_BAL, REVOL_UTIL ) from LENDINGCLUB;

The prediction can also be passed if required. Otherwise, a probability of 0 is returned.

Building Models

The Snowflake external function feature lets you build Driverless AI models from a Snowflake worksheet. When requesting Driverless AI to build a model from a worksheet, the build status is updated in a table called MODELBUILD so that the build can be monitored.

MODELBUILD table

This shows the different stages for the riskmodel being built, from requesting the specific table in Snowflake to final deployment for scoring in the REST Server.

Note: When the build finishes, the build experiment UUID is reported for auditability purposes.

Define build function example:

create or replace external function H2OBuild(v varchar)
returns variant
api_integration = demonstration_external_api_integration_01
as 'https://bbbbb.execute-api.us-east-1.amazonaws.com/production';

Define Snowflake Table

A Snowflake table is used to track the status of the model build that

Requesting a Build Example

Use the function H2OBuild to change the requesting parameters:

select H2OBuild('Build --Table=LENDINGCLUB2 --Target=BAD_LOAN --Modelname=custchurn.mojo') ;

For more information on the parameters to the build request, see the following table:

Build Parameters

Parameter

Optional

Description

Table

no

Defines which Snowflake table to use for the model build

Target

no

The column (feature) name to use as the models target from training

Modelname

no

The name the model will have when deployed

Accuracy

yes

Model accuracy setting

Time

yes

Model experiment time

Interpretability

yes

Model interpretability setting

User

yes

Username required to access Snowflake table

Password

yes

Password required to access Snowflake table

Warehouse

yes

Snowflake warehouse

Database

yes

Snowflake database

Schema

yes

Snowflake schema

Deployment

Once the model has finished building, it is copied to the REST server and becomes available for the H2OPredict scoring function.

Configuration

The script buildmodel.py must be copied from the SnowFlake Function directory to the location where the REST server executes. By default, this is /home/ec2-user/Snowflake-H2Oai/Function.

Note: The script code must be updated based on the environment you are using.

Driverless AI Snowflake Configuration

The Driverless AI configuration uses the standard default settings except for settings related to user security. Use the authentication method that is best suited to the environment that you are using. For more information, see Using the config.toml File and Authentication Methods.

   authentication_method = "local"
   local_htpasswd_file = "/home/ec2-user/dai-1.8.5.1-linux-x86_64/.htpasswd"

This resource must be secured from unauthorized access and use.

To create a username and password using local authentication:

sudo htpasswd -B -c .htpasswd snowflake
Password yourpassword

Requirements

The build functionality invokes a Python program that uses the Driverless AI Python Client to create an experiment. The following packages must be available:

  • sudo yum install httpd

  • sudo yum install python3

  • sudo pip3 install driverlessai

  • sudo pip3 install –upgrade snowflake-connector-python

Sample Workbook

The following example shows how to use the functions once the initial setup has been completed.

Note: H2OBuild and H2OPredict must be defined for the following example to work.

  1. Use the updates to the Snowflake table MODELBUILD to track each stage of the build.

SELECT * FROM MODELBUILD;
  1. Using the Snowflake table LENDINGCLUB2, build a model to predict the BAD_LOAN column. Name the model riskmodel.mojo.

SELECT H2OBuild('Build --Table=LENDINGCLUB2 --Target=BAD_LOAN --Modelname=riskmodel.mojo') ;
  1. Once the model is deployed, retrieve the suggest Snowflake function definition.

SELECT H2OBuild('Modelname= riskmodel.mojo Function');
  1. The preceding function provides a suggested Snowflake SQL statement to call the model.

SELECT H2OBuild('Modelname= riskmodel.mojo SQL') ;
  1. Use the Snowflake external function.

SELECT H2OPredict(LOAN_AMNT,  TERM,  INT_RATE,  INSTALLMENT,  EMP_LENGTH,  HOME_OWNERSHIP,  ANNUAL_INC,  VERIFICATION_STATUS,  ADDR_STATE,  DTI,  DELINQ_2YRS,  INQ_LAST_6MNTHS,  PUB_REC,  REVOL_BAL,  REVOL_UTIL,  TOTAL_ACC ) from LENDINGCLUB;