Combining Rows from Two Datasets -------------------------------- You can use the ``rbind`` function to combine two similar datasets into a single large dataset. This can be used, for example, to create a larger dataset by combining data from a validation dataset with its training or testing dataset. Note that when using ``rbind``, the two datasets must have the same set of columns. .. tabs:: .. code-tab:: r R library(h2o) h2o.init() # Import an existing training dataset ecg1_path <- "http://h2o-public-test-data.s3.amazonaws.com/smalldata/anomaly/ecg_discord_train.csv" ecg1 <- h2o.importFile(path = ecg1_path) print(dim(ecg1)) [1] 20 210 # Import an existing testing dataset ecg2_path <- "http://h2o-public-test-data.s3.amazonaws.com/smalldata/anomaly/ecg_discord_test.csv" ecg2 <- h2o.importFile(path = ecg2_path) print(dim(ecg2)) [1] 23 210 # Combine the two datasets into a single, larger dataset ecg_combine <- h2o.rbind(ecg1, ecg2) print(dim(ecgCombine)) [1] 43 210 .. code-tab:: python import h2o import numpy as np h2o.init() # Generate a random dataset with 100 rows 4 columns. # Label the columns A, B, C, and D. df1 = h2o.H2OFrame.from_python(np.random.randn(100,4).tolist(), column_names=list('ABCD')) df1.describe A B C D --------- ---------- --------- ---------- 0.412228 -0.991376 -1.44374 -0.276455 0.348039 -0.193704 -0.370882 0.162211 0.125303 -1.24546 -0.916738 1.08088 0.293062 0.516151 0.739798 -0.430679 -0.363344 0.0558051 -1.43888 1.13882 -1.17492 -0.332647 -1.18689 0.533313 0.154774 1.46559 0.373058 -0.915895 0.555835 -0.0891554 -1.19151 0.623667 -1.13092 0.843549 -0.532341 -0.0739869 0.752855 -0.168504 -0.750161 -2.46084 [100 rows x 4 columns] # Generate a second random dataset with 100 rows and 4 columns. # Again, label the columns, A, B, C, and D. df2 = h2o.H2OFrame.from_python(np.random.randn(100,4).tolist(), column_names=list('ABCD')) df2.describe A B C D ----------- --------- --------- --------- 0.00118227 -0.835817 1.06634 1.81794 -0.542678 -0.494483 0.109813 0.714271 -0.365611 -0.679095 0.891982 -1.93362 -0.0533568 0.86035 -2.28902 -1.287 -0.572775 1.30954 0.27412 -0.287373 0.310976 -0.594283 -0.566955 0.221888 1.34778 -1.02348 0.243686 0.319585 0.383136 -0.113979 -0.901779 -0.383478 -0.968212 -0.606603 -0.828677 0.699539 0.491119 -0.629774 -0.632143 0.2898 [100 rows x 4 columns] # Bind the rows from the second dataset into the first dataset. df1.rbind(df2) A B C D --------- ---------- --------- ---------- 0.412228 -0.991376 -1.44374 -0.276455 0.348039 -0.193704 -0.370882 0.162211 0.125303 -1.24546 -0.916738 1.08088 0.293062 0.516151 0.739798 -0.430679 -0.363344 0.0558051 -1.43888 1.13882 -1.17492 -0.332647 -1.18689 0.533313 0.154774 1.46559 0.373058 -0.915895 0.555835 -0.0891554 -1.19151 0.623667 -1.13092 0.843549 -0.532341 -0.0739869 0.752855 -0.168504 -0.750161 -2.46084 [200 rows x 4 columns]