Import Dataset with Snowflake Connector¶
First, we'll initialize a client with our server credentials and store it in the variable dai
.
In [1]:
Copied!
import driverlessai
dai = driverlessai.Client(address='http://localhost:12345', username="py", password="py")
import driverlessai
dai = driverlessai.Client(address='http://localhost:12345', username="py", password="py")
We can check that the Snowflake connector has been enabled on the Driverless AI server.
In [2]:
Copied!
dai.connectors.list()
dai.connectors.list()
Out[2]:
['upload', 'file', 'hdfs', 's3', 'recipe_file', 'recipe_url', 'snow']
The Snowflake connector is considered an advanced connector. Thus, the create methods require a data_source_config
argument to use them.
User Defined Snowflake Configuration¶
Here we manually specify the Snowflake URL, username, and the password.
In [3]:
Copied!
dataset_from_sf = dai.datasets.create(
data_source="snow",
name="credit-cards",
data="SELECT ID, BILL_AMT1, BILL_AMT2, BILL_AMT3, BILL_AMT4, BILL_AMT5 FROM CREDIT_CARD",
data_source_config=dict(
snowflake_url="https://my_org.snowflakecomputing.com",
snowflake_username="myusername",
snowflake_password="mypassword",
snowflake_warehouse="DEMO_WH",
snowflake_database="CREDIT_CARD",
snowflake_schema="PUBLIC",
),
force=True,
)
dataset_from_sf.head()
dataset_from_sf = dai.datasets.create(
data_source="snow",
name="credit-cards",
data="SELECT ID, BILL_AMT1, BILL_AMT2, BILL_AMT3, BILL_AMT4, BILL_AMT5 FROM CREDIT_CARD",
data_source_config=dict(
snowflake_url="https://my_org.snowflakecomputing.com",
snowflake_username="myusername",
snowflake_password="mypassword",
snowflake_warehouse="DEMO_WH",
snowflake_database="CREDIT_CARD",
snowflake_schema="PUBLIC",
),
force=True,
)
dataset_from_sf.head()
Complete 100.00% - [4/4] Computed stats for column BILL_AMT5
Out[3]:
ID | BILL_AMT1 | BILL_AMT2 | BILL_AMT3 | BILL_AMT4 | BILL_AMT5 |
---|---|---|---|---|---|
1 | 3913 | 3102 | 689 | 0 | 0 |
2 | 2682 | 1725 | 2682 | 3272 | 3455 |
3 | 29239 | 14027 | 13559 | 14331 | 14948 |
4 | 46990 | 48233 | 49291 | 28314 | 28959 |
5 | 8617 | 5670 | 35835 | 20940 | 19146 |
Predefined Snowflake Configuration¶
Here we use a predefined configuration that was setup on the Driverless AI server. We only need to specify the Snowflake warehouse, database, and the schema.
In [4]:
Copied!
dataset_from_sf = dai.datasets.create(
data_source="snow",
name="credit-cards",
data="SELECT ID, BILL_AMT1, BILL_AMT2, BILL_AMT3, BILL_AMT4, BILL_AMT5 FROM CREDIT_CARD",
data_source_config=dict(
snowflake_warehouse="DEMO_WH",
snowflake_database="CREDIT_CARD",
snowflake_schema="PUBLIC",
),
force=True,
)
dataset_from_sf.head()
dataset_from_sf = dai.datasets.create(
data_source="snow",
name="credit-cards",
data="SELECT ID, BILL_AMT1, BILL_AMT2, BILL_AMT3, BILL_AMT4, BILL_AMT5 FROM CREDIT_CARD",
data_source_config=dict(
snowflake_warehouse="DEMO_WH",
snowflake_database="CREDIT_CARD",
snowflake_schema="PUBLIC",
),
force=True,
)
dataset_from_sf.head()
Complete 100.00% - [4/4] Computed stats for column BILL_AMT5
Out[4]:
ID | BILL_AMT1 | BILL_AMT2 | BILL_AMT3 | BILL_AMT4 | BILL_AMT5 |
---|---|---|---|---|---|
1 | 3913 | 3102 | 689 | 0 | 0 |
2 | 2682 | 1725 | 2682 | 3272 | 3455 |
3 | 29239 | 14027 | 13559 | 14331 | 14948 |
4 | 46990 | 48233 | 49291 | 28314 | 28959 |
5 | 8617 | 5670 | 35835 | 20940 | 19146 |