Task 4: Split the dataset for training and testing datasets

In this task, you will split the UCI Credit Card dataset into training and testing sets to prepare for model training and evaluation. By using stratified sampling, we ensure that the train and test sets maintain the same proportion of the target variable, providing a balanced dataset for accurate evaluation.

Run the following code to split the dataset into training and testing sets:

splits = credit_card_default.split_to_train_test(train_size=0.7, target_column="default payment next month")
train = splits["train_dataset"]
test = splits["test_dataset"]

train_size: The proportion of the dataset to be used for training. For this tutorial, we use 0.7 (70%).
target_column: The column used for stratification to maintain the same distribution of the target variable in both train and test sets. In this case, it is set to default payment next month.

After executing the code, the dataset will be split into two subsets:

train: Contains 70% of the dataset for training the model.
test: Contains the remaining 30% of the dataset for evaluating the model's performance.

These subsets will be accessible as Dataset objects within H2O DAI, ready for use in the next steps.

Feedback

Submit and view feedback for this page
Send feedback about H2O Driverless AI | Tutorials to cloud-feedback@h2o.ai