Import Dataset with Delta Table Connector¶
Note: The Delta Table connector is supported in Driverless AI 2.1.0 and later.
First, we'll initialize a client with our server credentials and store it in the variable dai.
In [6]:
Copied!
import driverlessai
dai = driverlessai.Client(address='http://localhost:12345', username="py", password="py")
import driverlessai
dai = driverlessai.Client(address='http://localhost:12345', username="py", password="py")
We can check that the Delta Table connector has been enabled on the Driverless AI server.
In [7]:
Copied!
dai.connectors.list()
dai.connectors.list()
Out[7]:
['upload', 'file', 'hdfs', 's3', 'recipe_file', 'recipe_url', 'delta_table']
Use Delta Table path specified in the query¶
In [10]:
Copied!
dataset_from_delta_table = dai.datasets.create(
data="SELECT * FROM airlines_train.`abfss://<container-name>@<storage-account-name>.dfs.core.windows.net/path/to/airlines_train`",
data_source="delta_table",
name="From Delta Table user defined config",
force=True
)
dataset_from_delta_table.head()
dataset_from_delta_table = dai.datasets.create(
data="SELECT * FROM airlines_train.`abfss://@.dfs.core.windows.net/path/to/airlines_train`",
data_source="delta_table",
name="From Delta Table user defined config",
force=True
)
dataset_from_delta_table.head()
Complete 100.00% - [4/4] Computed stats for column DEFAULT_PAYMENT_NEXT_MONTH
Out[10]:
| ID | LIMIT_BAL | SEX | EDUCATION | MARRIAGE | AGE | PAY_0 | PAY_2 | PAY_3 | PAY_4 | PAY_5 | PAY_6 | BILL_AMT1 | BILL_AMT2 | BILL_AMT3 | BILL_AMT4 | BILL_AMT5 | BILL_AMT6 | PAY_AMT1 | PAY_AMT2 | PAY_AMT3 | PAY_AMT4 | PAY_AMT5 | PAY_AMT6 | DEFAULT_PAYMENT_NEXT_MONTH |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 20000 | female | university | married | 24 | -2 | 2 | -1 | -1 | -2 | -2 | 3913 | 3102 | 689 | 0 | 0 | 0 | 0 | 689 | 0 | 0 | 0 | 0 | 1 |
| 2 | 120000 | female | university | single | 26 | -1 | 2 | 0 | 0 | 0 | 2 | 2682 | 1725 | 2682 | 3272 | 3455 | 3261 | 0 | 1000 | 1000 | 1000 | 0 | 2000 | 1 |
| 3 | 90000 | female | university | single | 34 | 0 | 0 | 0 | 0 | 0 | 0 | 29239 | 14027 | 13559 | 14331 | 14948 | 15549 | 1518 | 1500 | 1000 | 1000 | 1000 | 5000 | 0 |
| 4 | 50000 | female | university | married | 37 | 1 | 0 | 0 | 0 | 0 | 0 | 46990 | 48233 | 49291 | 28314 | 28959 | 29547 | 2000 | 2019 | 1200 | 1100 | 1069 | 1000 | 0 |
| 5 | 50000 | male | university | married | 57 | 2 | 0 | -1 | 0 | 0 | 0 | 8617 | 5670 | 35835 | 20940 | 19146 | 19131 | 2000 | 36681 | 10000 | 9000 | 689 | 679 | 0 |
Use user defined Delta Table path¶
Here we manually specify the Delta Table path via data_source_config argument.
In [13]:
Copied!
dataset_from_delta_table = dai.datasets.create(
data="SELECT * FROM airlines_train",
data_source="delta_table",
data_source_config=dict(
table_path="abfss://<container-name>@<storage-account-name>.dfs.core.windows.net/path/to/airlines_train/",
),
name="From Delta Table user defined config",
force=True
)
dataset_from_delta_table.head()
dataset_from_delta_table = dai.datasets.create(
data="SELECT * FROM airlines_train",
data_source="delta_table",
data_source_config=dict(
table_path="abfss://@.dfs.core.windows.net/path/to/airlines_train/",
),
name="From Delta Table user defined config",
force=True
)
dataset_from_delta_table.head()
Complete 100.00% - [4/4] Computed stats for column DEFAULT_PAYMENT_NEXT_MONTH
Out[13]:
| ID | LIMIT_BAL | SEX | EDUCATION | MARRIAGE | AGE | PAY_0 | PAY_2 | PAY_3 | PAY_4 | PAY_5 | PAY_6 | BILL_AMT1 | BILL_AMT2 | BILL_AMT3 | BILL_AMT4 | BILL_AMT5 | BILL_AMT6 | PAY_AMT1 | PAY_AMT2 | PAY_AMT3 | PAY_AMT4 | PAY_AMT5 | PAY_AMT6 | DEFAULT_PAYMENT_NEXT_MONTH |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 20000 | female | university | married | 24 | -2 | 2 | -1 | -1 | -2 | -2 | 3913 | 3102 | 689 | 0 | 0 | 0 | 0 | 689 | 0 | 0 | 0 | 0 | 1 |
| 2 | 120000 | female | university | single | 26 | -1 | 2 | 0 | 0 | 0 | 2 | 2682 | 1725 | 2682 | 3272 | 3455 | 3261 | 0 | 1000 | 1000 | 1000 | 0 | 2000 | 1 |
| 3 | 90000 | female | university | single | 34 | 0 | 0 | 0 | 0 | 0 | 0 | 29239 | 14027 | 13559 | 14331 | 14948 | 15549 | 1518 | 1500 | 1000 | 1000 | 1000 | 5000 | 0 |
| 4 | 50000 | female | university | married | 37 | 1 | 0 | 0 | 0 | 0 | 0 | 46990 | 48233 | 49291 | 28314 | 28959 | 29547 | 2000 | 2019 | 1200 | 1100 | 1069 | 1000 | 0 |
| 5 | 50000 | male | university | married | 57 | 2 | 0 | -1 | 0 | 0 | 0 | 8617 | 5670 | 35835 | 20940 | 19146 | 19131 | 2000 | 36681 | 10000 | 9000 | 689 | 679 | 0 |
In [ ]:
Copied!