Import Dataset with Google Big Query Connector¶

First, we'll initialize a client with our server credentials and store it in the variable dai.

In [1]:

Copied!

import driverlessai
dai = driverlessai.Client(address='http://localhost:12345', username="py", password="py")
import driverlessai
dai = driverlessai.Client(address='http://localhost:12345', username="py", password="py")

We can check that the Google Big Query (GBQ) connector has been enabled on the Driverless AI server.

In [2]:

Copied!

dai.connectors.list()
dai.connectors.list()

Out[2]:

['upload', 'file', 'hdfs', 's3', 'recipe_file', 'recipe_url', 'gbq']

The GBQ connector is considered an advanced connector. Thus, the create methods require a data_source_config argument to use them.

Predefined GBQ Configuration¶

Here we use a predefined configuration that was setup on the Driverless AI server. We only need to specify the BQ dataset and the Google Storage destination bucket.

In [3]:

Copied!





dataset_from_gbq = dai.datasets.create(
    data_source="gbq",
    name="credit-cards",
    data="SELECT ID, BILL_AMT1, BILL_AMT2, BILL_AMT3, BILL_AMT4, BILL_AMT5 FROM creditcard.creditcardfull",
    data_source_config=dict(
        gbq_dataset_name="creditcard",
        gbq_bucket_name="my-bucket",
    ),
    force=True,
)

dataset_from_gbq.head()
dataset_from_gbq = dai.datasets.create(
    data_source="gbq",
    name="credit-cards",
    data="SELECT ID, BILL_AMT1, BILL_AMT2, BILL_AMT3, BILL_AMT4, BILL_AMT5 FROM creditcard.creditcardfull",
    data_source_config=dict(
        gbq_dataset_name="creditcard",
        gbq_bucket_name="my-bucket",
    ),
    force=True,
)

dataset_from_gbq.head()

Complete 100.00% - [4/4] Computed stats for column BILL_AMT5

Out[3]:

ID	BILL_AMT1	BILL_AMT2	BILL_AMT3	BILL_AMT5
15777	64927	58840	42859	0
1773	32023	20233	0	0
5525	6544	4924	1990	0
8010	19849	20650	0	0
11946	651	651	651	651