Skip to main content

Task 2: Load the UCI Credit Card dataset

In this task, you will learn how to import a dataset into H2O Driverless AI (DAI) from an external source. For this tutorial, we will use an Amazon S3 URL to load the UCI Credit Card dataset. This dataset is commonly used for credit default prediction and will serve as an excellent example of how to work with data within the H2O DAI platform.

To load the UCI Credit Card dataset into DAI, run the following code snippet:

credit_card_default = dai.datasets.create(
data_source="s3",
name="uci_credit_card",
data="https://s3.amazonaws.com/data.h2o.ai/DAI-Tutorials/TestDrive-Datasets/UCI_Credit_Card.csv",
force=True,
)

To successfully load the UCI Credit Card dataset, you need to define the following parameters:

  • data_source: Specifies the source of the dataset. In this case, we will use s3 as the data source.
  • name: The name you want to assign to the dataset within H2O DAI. For this tutorial, we will use the name uci_credit_card.
  • data: The URL where the dataset is stored. Here, we will use a S3 URL pointing to the UCI Credit Card dataset.
  • force: Set this parameter to True to overwrite any existing dataset with the same name if it already exists in your H2O DAI instance.

After executing the above code, the UCI Credit Card dataset will be successfully loaded into H2O DAI and will be available as a Dataset object.


Feedback