Import Dataset with Google Big Query Connector¶
First, we'll initialize a client with our server credentials and store it in the variable dai
.
In [1]:
Copied!
import driverlessai
dai = driverlessai.Client(address='http://localhost:12345', username="py", password="py")
import driverlessai
dai = driverlessai.Client(address='http://localhost:12345', username="py", password="py")
We can check that the Google Big Query (GBQ) connector has been enabled on the Driverless AI server.
In [2]:
Copied!
dai.connectors.list()
dai.connectors.list()
Out[2]:
['upload', 'file', 'hdfs', 's3', 'recipe_file', 'recipe_url', 'gbq']
The GBQ connector is considered an advanced connector. Thus, the create methods require a data_source_config
argument to use them.
Predefined GBQ Configuration¶
Here we use a predefined configuration that was setup on the Driverless AI server. We only need to specify the BQ dataset and the Google Storage destination bucket.
In [3]:
Copied!
dataset_from_gbq = dai.datasets.create(
data_source="gbq",
name="credit-cards",
data="SELECT ID, BILL_AMT1, BILL_AMT2, BILL_AMT3, BILL_AMT4, BILL_AMT5 FROM creditcard.creditcardfull",
data_source_config=dict(
gbq_dataset_name="creditcard",
gbq_bucket_name="my-bucket",
),
force=True,
)
dataset_from_gbq.head()
dataset_from_gbq = dai.datasets.create(
data_source="gbq",
name="credit-cards",
data="SELECT ID, BILL_AMT1, BILL_AMT2, BILL_AMT3, BILL_AMT4, BILL_AMT5 FROM creditcard.creditcardfull",
data_source_config=dict(
gbq_dataset_name="creditcard",
gbq_bucket_name="my-bucket",
),
force=True,
)
dataset_from_gbq.head()
Complete 100.00% - [4/4] Computed stats for column BILL_AMT5
Out[3]:
ID | BILL_AMT1 | BILL_AMT2 | BILL_AMT3 | BILL_AMT4 | BILL_AMT5 |
---|---|---|---|---|---|
15777 | 64927 | 58840 | 42859 | 0 | 0 |
1773 | 32023 | 20233 | 0 | 0 | 0 |
5525 | 6544 | 4924 | 1990 | 0 | 0 |
8010 | 19849 | 20650 | 0 | 0 | 0 |
11946 | 651 | 651 | 651 | 0 | 651 |