R Client Tutorial

This tutorial describes how to use the Driverless AI R client package to use and control the Driverless AI platform. It covers the main predictive data-science workflow, including:

  1. Data load

  2. Automated feature engineering and model tuning

  3. Model inspection

  4. Predicting on new data

  5. Managing the datasets and models

Note: These steps assume that you have entered your license key in the Driverless AI UI.

Loading the Data

Before we can start working with the Driverless.ai platform (DAI), we have to import the package and initialize the connection:

library(dai)
dai.connect(uri = 'http://localhost:12345', username = 'h2oai', password = 'h2oai')

creditcard <- dai.create_dataset('/data/smalldata/kaggle/CreditCard/creditcard_train_cat.csv')
#>
  |
  |                                                                 |   0%
  |
  |================                                                 |  24%
  |
  |=================================================================| 100%

The function dai.create_dataset() loads the data located at the machine that hosts DAI. The above command assumes that the creditcard_train_cat.csv is in the /data folder on the machine running Driverless AI. This file is available at https://s3.amazonaws.com/h2o-public-test-data/smalldata/kaggle/CreditCard/creditcard_train_cat.csv.

If you want to upload the data located at your workstation, use dai.upload_dataset() instead.

If you already have the data loaded into R data.frame, you can convert it into a DAIFrame. For example:

iris.dai <- as.DAIFrame(iris)
#>
  |
  |                                                                 |   0%
  |
  |=================================================================| 100%

print(iris.dai)
#> DAI frame '7c38cb84-5baa-11e9-a50b-b938de969cdb': 150 obs. of 5 variables
#> File path: ./tmp/7c38cb84-5baa-11e9-a50b-b938de969cdb/iris9e1f15d2df00.csv.1554912339.9424415.bin

You can switch off the progress bar whenever it is displayed by setting progress = FALSE.

Upon creation of the dataset, you can display the basic information and summary statistics by calling generics print and summary:

print(creditcard)
#> DAI frame '7abe28b2-5baa-11e9-a50b-b938de969cdb': 23999 obs. of 25 variables
#> File path: tests/smalldata/kaggle/CreditCard/creditcard_train_cat.csv

summary(creditcard)
#>                      variable num_classes is_numeric count
#> 1                          ID           0       TRUE 23999
#> 2                   LIMIT_BAL          79       TRUE 23999
#> 3                         SEX           2      FALSE 23999
#> 4                   EDUCATION           4      FALSE 23999
#> 5                    MARRIAGE           4      FALSE 23999
#> 6                         AGE          55       TRUE 23999
#> 7                       PAY_1          11       TRUE 23999
#> 8                       PAY_2          11       TRUE 23999
#> 9                       PAY_3          11       TRUE 23999
#> 10                      PAY_4          11       TRUE 23999
#> 11                      PAY_5          10       TRUE 23999
#> 12                      PAY_6          10       TRUE 23999
#> 13                  BILL_AMT1           0       TRUE 23999
#> 14                  BILL_AMT2           0       TRUE 23999
#> 15                  BILL_AMT3           0       TRUE 23999
#> 16                  BILL_AMT4           0       TRUE 23999
#> 17                  BILL_AMT5           0       TRUE 23999
#> 18                  BILL_AMT6           0       TRUE 23999
#> 19                   PAY_AMT1           0       TRUE 23999
#> 20                   PAY_AMT2           0       TRUE 23999
#> 21                   PAY_AMT3           0       TRUE 23999
#> 22                   PAY_AMT4           0       TRUE 23999
#> 23                   PAY_AMT5           0       TRUE 23999
#> 24                   PAY_AMT6           0       TRUE 23999
#> 25 DEFAULT_PAYMENT_NEXT_MONTH           2       TRUE 23999
#>                    mean              std     min     max unique  freq
#> 1                 12000 6928.05889120466       1   23999  23999     1
#> 2      165498.715779824 129130.743065318   10000 1000000     79  2740
#> 3                                                             2  8921
#> 4                                                             4 11360
#> 5                                                             4 12876
#> 6      35.3808492020501  9.2710457493384      21      79     55  1284
#> 7  -0.00312513021375891 1.12344874325651      -2       8     11 11738
#> 8    -0.123463477644902 1.20059118344043      -2       8     11 12543
#> 9    -0.154756448185341 1.20405796618856      -2       8     11 12576
#> 10   -0.211675486478603 1.16657279943005      -2       8     11 13250
#> 11   -0.252885536897371    1.13700672904      -2       8     10 13520
#> 12   -0.278011583815992  1.1581916495226      -2       8     10 12876
#> 13     50598.9286636943 72650.1978092856 -165580  964511  18717  1607
#> 14     48648.0474186424 70365.3956426641  -69777  983931  18367  2049
#> 15     46368.9035376474 68194.7195202748 -157264 1664089  18131  2325
#> 16     42369.8728280345 63071.4551670874 -170000  891586  17719  2547
#> 17     40002.3330972124 60345.7282797424  -81334  927171  17284  2840
#> 18     38565.2666361098 59156.5011434754 -339603  961664  16906  3258
#> 19     5543.09804575191   15068.86272958       0  505000   6918  4270
#> 20     5815.52852202175  20797.443884891       0 1684259   6839  4362
#> 21     4969.43139297471 16095.9292948255       0  896040   6424  4853
#> 22     4743.65686070253 14883.5548720259       0  497000   6028  5200
#> 23     4783.64369348723 15270.7039035392       0  417990   5984  5407
#> 24     5189.57360723363 17630.7185745277       0  528666   5988  5846
#> 25    0.223717654902288 0.41674368928609   FALSE    TRUE      2  5369
#>                                                                                                                                                             num_hist_ticks
#> 1                                               1.0, 2400.8, 4800.6, 7200.400000000001, 9600.2, 12000.0, 14399.800000000001, 16799.600000000002, 19199.4, 21599.2, 23999.0
#> 2                                                             10000.0, 109000.0, 208000.0, 307000.0, 406000.0, 505000.0, 604000.0, 703000.0, 802000.0, 901000.0, 1000000.0
#> 3
#> 4
#> 5
#> 6                                                                                            21.0, 26.8, 32.6, 38.4, 44.2, 50.0, 55.8, 61.6, 67.4, 73.19999999999999, 79.0
#> 7                                                                                                                                        -2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8
#> 8                                                                                                                                        -2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8
#> 9                                                                                                                                        -2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8
#> 10                                                                                                                                       -2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8
#> 11                                                                                                                                          -2, -1, 0, 2, 3, 4, 5, 6, 7, 8
#> 12                                                                                                                                          -2, -1, 0, 2, 3, 4, 5, 6, 7, 8
#> 13           -165580.0, -52570.899999999994, 60438.20000000001, 173447.30000000005, 286456.4, 399465.5, 512474.6000000001, 625483.7000000001, 738492.8, 851501.9, 964511.0
#> 14                                          -69777.0, 35593.8, 140964.6, 246335.40000000002, 351706.2, 457077.0, 562447.8, 667818.6, 773189.4, 878560.2000000001, 983931.0
#> 15         -157264.0, 24871.29999999999, 207006.59999999998, 389141.8999999999, 571277.2, 753412.5, 935547.7999999998, 1117683.0999999999, 1299818.4, 1481953.7, 1664089.0
#> 16 -170000.0, -63841.399999999994, 42317.20000000001, 148475.80000000005, 254634.40000000002, 360793.0, 466951.6000000001, 573110.2000000001, 679268.8, 785427.4, 891586.0
#> 17                                                             -81334.0, 19516.5, 120367.0, 221217.5, 322068.0, 422918.5, 523769.0, 624619.5, 725470.0, 826320.5, 927171.0
#> 18                                       -339603.0, -209476.3, -79349.6, 50777.09999999998, 180903.8, 311030.5, 441157.19999999995, 571283.9, 701410.6, 831537.3, 961664.0
#> 19                                                                  0.0, 50500.0, 101000.0, 151500.0, 202000.0, 252500.0, 303000.0, 353500.0, 404000.0, 454500.0, 505000.0
#> 20                                0.0, 168425.9, 336851.8, 505277.69999999995, 673703.6, 842129.5, 1010555.3999999999, 1178981.3, 1347407.2, 1515833.0999999999, 1684259.0
#> 21                                                                  0.0, 89604.0, 179208.0, 268812.0, 358416.0, 448020.0, 537624.0, 627228.0, 716832.0, 806436.0, 896040.0
#> 22                                                                   0.0, 49700.0, 99400.0, 149100.0, 198800.0, 248500.0, 298200.0, 347900.0, 397600.0, 447300.0, 497000.0
#> 23                                                                   0.0, 41799.0, 83598.0, 125397.0, 167196.0, 208995.0, 250794.0, 292593.0, 334392.0, 376191.0, 417990.0
#> 24                                                        0.0, 52866.6, 105733.2, 158599.8, 211466.4, 264333.0, 317199.6, 370066.2, 422932.8, 475799.39999999997, 528666.0
#> 25                                                                                                                                                             False, True
#>                                               num_hist_counts        top
#> 1  2400, 2400, 2400, 2400, 2399, 2400, 2400, 2400, 2400, 2400
#> 2             10151, 6327, 3965, 2149, 1251, 96, 44, 15, 0, 1
#> 3                                                                 female
#> 4                                                             university
#> 5                                                                 single
#> 6         4285, 6546, 5187, 3780, 2048, 1469, 501, 147, 34, 2
#> 7        2086, 4625, 11738, 2994, 2185, 254, 66, 17, 9, 7, 18
#> 8          2953, 4886, 12543, 20, 3204, 268, 76, 21, 9, 18, 1
#> 9          3197, 4787, 12576, 4, 3121, 183, 64, 17, 21, 27, 2
#> 10          3382, 4555, 13250, 2, 2515, 158, 55, 29, 5, 46, 2
#> 11             3539, 4482, 13520, 2178, 147, 71, 11, 3, 47, 1
#> 12             3818, 4722, 12876, 2324, 158, 37, 9, 16, 37, 2
#> 13                2, 17603, 4754, 1193, 316, 111, 18, 1, 0, 1
#> 14                14571, 7214, 1578, 429, 155, 43, 7, 1, 0, 1
#> 15                    12977, 10150, 767, 99, 5, 0, 0, 0, 0, 1
#> 16                 2, 16619, 5775, 1181, 311, 89, 20, 1, 0, 1
#> 17                12722, 9033, 1720, 374, 113, 31, 4, 0, 1, 1
#> 18                   1, 1, 18312, 4788, 745, 131, 19, 1, 0, 1
#> 19                      23643, 249, 56, 26, 14, 8, 0, 1, 1, 1
#> 20                         23936, 50, 11, 1, 0, 0, 0, 0, 0, 1
#> 21                        23836, 130, 20, 9, 3, 0, 0, 0, 0, 1
#> 22                      23647, 235, 65, 29, 11, 5, 4, 0, 2, 1
#> 23                      23588, 234, 94, 40, 22, 7, 3, 8, 0, 3
#> 24                      23605, 235, 77, 56, 15, 5, 1, 3, 0, 2
#> 25                                                18630, 5369
#>              nonnum_hist_ticks nonnum_hist_counts
#> 1
#> 2
#> 3          female, male, Other     15078, 8921, 0
#> 4  university, graduate, Other  11360, 8442, 4197
#> 5       single, married, Other  12876, 10813, 310
#> 6
#> 7
#> 8
#> 9
#> 10
#> 11
#> 12
#> 13
#> 14
#> 15
#> 16
#> 17
#> 18
#> 19
#> 20
#> 21
#> 22
#> 23
#> 24
#> 25

A couple of other generics works as usual on a DAIFrame: dim, head, and format.

dim(creditcard)
#> [1] 23999    25

head(creditcard, 10)
#>    ID LIMIT_BAL    SEX  EDUCATION MARRIAGE AGE PAY_1 PAY_2 PAY_3 PAY_4
#> 1   1     20000 female university  married  24     2     2    -1    -1
#> 2   2    120000 female university   single  26    -1     2     0     0
#> 3   3     90000 female university   single  34     0     0     0     0
#> 4   4     50000 female university  married  37     0     0     0     0
#> 5   5     50000   male university  married  57    -1     0    -1     0
#> 6   6     50000   male   graduate   single  37     0     0     0     0
#> 7   7    500000   male   graduate   single  29     0     0     0     0
#> 8   8    100000 female university   single  23     0    -1    -1     0
#> 9   9    140000 female highschool  married  28     0     0     2     0
#> 10 10     20000   male highschool   single  35    -2    -2    -2    -2
#>    PAY_5 PAY_6 BILL_AMT1 BILL_AMT2 BILL_AMT3 BILL_AMT4 BILL_AMT5 BILL_AMT6
#> 1     -2    -2      3913      3102       689         0         0         0
#> 2      0     2      2682      1725      2682      3272      3455      3261
#> 3      0     0     29239     14027     13559     14331     14948     15549
#> 4      0     0     46990     48233     49291     28314     28959     29547
#> 5      0     0      8617      5670     35835     20940     19146     19131
#> 6      0     0     64400     57069     57608     19394     19619     20024
#> 7      0     0    367965    412023    445007    542653    483003    473944
#> 8      0    -1     11876       380       601       221      -159       567
#> 9      0     0     11285     14096     12108     12211     11793      3719
#> 10    -1    -1         0         0         0         0     13007     13912
#>    PAY_AMT1 PAY_AMT2 PAY_AMT3 PAY_AMT4 PAY_AMT5 PAY_AMT6
#> 1         0      689        0        0        0        0
#> 2         0     1000     1000     1000        0     2000
#> 3      1518     1500     1000     1000     1000     5000
#> 4      2000     2019     1200     1100     1069     1000
#> 5      2000    36681    10000     9000      689      679
#> 6      2500     1815      657     1000     1000      800
#> 7     55000    40000    38000    20239    13750    13770
#> 8       380      601        0      581     1687     1542
#> 9      3329        0      432     1000     1000     1000
#> 10        0        0        0    13007     1122        0
#>    DEFAULT_PAYMENT_NEXT_MONTH
#> 1                        TRUE
#> 2                        TRUE
#> 3                       FALSE
#> 4                       FALSE
#> 5                       FALSE
#> 6                       FALSE
#> 7                       FALSE
#> 8                       FALSE
#> 9                       FALSE
#> 10                      FALSE

You cannot, however, use DAIFrame to access all its data, nor can you use it to modify the data. It only represents the data set loaded into the DAI platform. The head function gives access only to example data:

creditcard$example_data[1:10, ]
#>    ID LIMIT_BAL    SEX  EDUCATION MARRIAGE AGE PAY_1 PAY_2 PAY_3 PAY_4
#> 1   1     20000 female university  married  24     2     2    -1    -1
#> 2   2    120000 female university   single  26    -1     2     0     0
#> 3   3     90000 female university   single  34     0     0     0     0
#> 4   4     50000 female university  married  37     0     0     0     0
#> 5   5     50000   male university  married  57    -1     0    -1     0
#> 6   6     50000   male   graduate   single  37     0     0     0     0
#> 7   7    500000   male   graduate   single  29     0     0     0     0
#> 8   8    100000 female university   single  23     0    -1    -1     0
#> 9   9    140000 female highschool  married  28     0     0     2     0
#> 10 10     20000   male highschool   single  35    -2    -2    -2    -2
#>    PAY_5 PAY_6 BILL_AMT1 BILL_AMT2 BILL_AMT3 BILL_AMT4 BILL_AMT5 BILL_AMT6
#> 1     -2    -2      3913      3102       689         0         0         0
#> 2      0     2      2682      1725      2682      3272      3455      3261
#> 3      0     0     29239     14027     13559     14331     14948     15549
#> 4      0     0     46990     48233     49291     28314     28959     29547
#> 5      0     0      8617      5670     35835     20940     19146     19131
#> 6      0     0     64400     57069     57608     19394     19619     20024
#> 7      0     0    367965    412023    445007    542653    483003    473944
#> 8      0    -1     11876       380       601       221      -159       567
#> 9      0     0     11285     14096     12108     12211     11793      3719
#> 10    -1    -1         0         0         0         0     13007     13912
#>    PAY_AMT1 PAY_AMT2 PAY_AMT3 PAY_AMT4 PAY_AMT5 PAY_AMT6
#> 1         0      689        0        0        0        0
#> 2         0     1000     1000     1000        0     2000
#> 3      1518     1500     1000     1000     1000     5000
#> 4      2000     2019     1200     1100     1069     1000
#> 5      2000    36681    10000     9000      689      679
#> 6      2500     1815      657     1000     1000      800
#> 7     55000    40000    38000    20239    13750    13770
#> 8       380      601        0      581     1687     1542
#> 9      3329        0      432     1000     1000     1000
#> 10        0        0        0    13007     1122        0
#>    DEFAULT_PAYMENT_NEXT_MONTH
#> 1                        TRUE
#> 2                        TRUE
#> 3                       FALSE
#> 4                       FALSE
#> 5                       FALSE
#> 6                       FALSE
#> 7                       FALSE
#> 8                       FALSE
#> 9                       FALSE
#> 10                      FALSE

A dataset can be split into e.g. training and test sets directly in R:

creditcard.splits <- dai.split_dataset(creditcard,
                                       output_name1 = 'train',
                                       output_name2 = 'test',
                                       ratio = .8,
                                       seed = 25,
                                       progress = FALSE)

In this case the creditcard.splits is a list with two elements with names 《train》 and 《test》, where 80% of the data went into train and 20% of the data went into test.

creditcard.splits$train
#> DAI frame '7cf3024c-5baa-11e9-a50b-b938de969cdb': 19199 obs. of 25 variables
#> File path: ./tmp/7cf3024c-5baa-11e9-a50b-b938de969cdb/train.1554912341.0864356.bin

creditcard.splits$test
#> DAI frame '7cf613a6-5baa-11e9-a50b-b938de969cdb': 4800 obs. of 25 variables
#> File path: ./tmp/7cf613a6-5baa-11e9-a50b-b938de969cdb/test.1554912341.0966916.bin

By default it yields a random sample, but you can do stratified or time-based splits as well. See the function’s documentation for more details.

Automated Feature Engineering and Model Tuning

One of the main strengths of Driverless AI is the fully automated feature engineering along with hyperparameter tuning, model selection and ensembling. The function dai.train() executes the experiment that results in a DAIModel instance that represents the model.

model <- dai.train(training_frame = creditcard.splits$train,
                   testing_frame = creditcard.splits$test,
                   target_col = 'DEFAULT_PAYMENT_NEXT_MONTH',
                   is_classification = T,
                   is_timeseries = F,
                   accuracy = 1, time = 1, interpretability = 10,
                   seed = 25)
#>
  |
  |                                                                 |   0%
  |
  |==========================                                       |  40%
  |
  |===============================================                  |  73%
  |
  |===========================================================      |  91%
  |
  |=================================================================| 100%

If you do not specify the accuracy, time, or interpretability, they will be suggested by the DAI platform. (See dai.suggest_model_params.)

Model Inspection

As with DAIFrame, generic methods such as print, format, summary, and predict work with DAIModel:

print(model)
#> Status: Complete
#> Experiment: 7e2b70ae-5baa-11e9-a50b-b938de969cdb, 2019-04-10 18:06, 1.7.0+local_0c7d019-dirty
#>   Settings: 1/1/10, seed=25, GPUs enabled
#>   Train data: train (19199, 25)
#>   Validation data: N/A
#>   Test data: test (4800, 24)
#>   Target column: DEFAULT_PAYMENT_NEXT_MONTH (binary, 22.366% target class)
#> System specs: Linux, 126 GB, 40 CPU cores, 2/2 GPUs
#>   Max memory usage: 0.406 GB, 0.167 GB GPU
#> Recipe: AutoDL (2 iterations, 2 individuals)
#>   Validation scheme: stratified, 1 internal holdout
#>   Feature engineering: 33 features scored (18 selected)
#> Timing:
#>   Data preparation: 4.94 secs
#>   Model and feature tuning: 10.13 secs (3 models trained)
#>   Feature evolution: 5.54 secs (1 of 3 model trained)
#>   Final pipeline training: 7.85 secs (1 model trained)
#>   Python / MOJO scorer building: 42.05 secs / 0.00 secs
#> Validation score: AUC = 0.77802 +/- 0.0077539 (baseline)
#> Validation score: AUC = 0.77802 +/- 0.0077539 (final pipeline)
#> Test score:       AUC = 0.7861 +/- 0.0064711 (final pipeline)

summary(model)$score
#> [1] 0.7780229

Predicting on New Data

New data can be scored in two different ways:

  • Call predict() directly on the model in R session.

  • Download a scoring pipeline and embed that into your Python or Java workflow.

Predicting in R

Generic predict() either directly returns an R data.frame with the results (by default) or it returns a URL pointing to a CSV file with the results (return_df=FALSE). The latter option may be useful when you predict on a large dataset.

predictions <- predict(model, newdata = creditcard.splits$test)
#>
  |
  |                                                                 |   0%
  |
  |=================================================================| 100%
#> Loading required package: bitops

head(predictions)
#>   DEFAULT_PAYMENT_NEXT_MONTH.0 DEFAULT_PAYMENT_NEXT_MONTH.1
#> 1                    0.8879988                   0.11200116
#> 2                    0.9289870                   0.07101299
#> 3                    0.9550328                   0.04496716
#> 4                    0.3513577                   0.64864230
#> 5                    0.9183724                   0.08162758
#> 6                    0.9154425                   0.08455751

predict(model, newdata = creditcard.splits$test, return_df = FALSE)
#>
  |
  |                                                                 |   0%
  |
  |=================================================================| 100%
#> [1] "h2oai_experiment_7e2b70ae-5baa-11e9-a50b-b938de969cdb/7e2b70ae-5baa-11e9-a50b-b938de969cdb_preds_f854b49f.csv"

Downloading Python or MOJO Scoring Pipelines

For productizing your model in a Python or Java, you can download full Python or MOJO pipelines, respectively. For more information about how to use the pipelines, refer to the R Client documentation.

dai.download_mojo(model, path = tempdir(), force = TRUE)
#>
  |
  |                                                                 |   0%
  |
  |=================================================================| 100%
#> Downloading the pipeline:
#> [1] "/tmp/RtmppsLTZ9/mojo-7e2b70ae-5baa-11e9-a50b-b938de969cdb.zip"

dai.download_python_pipeline(model, path = tempdir(), force = TRUE)
#>
  |
  |                                                                 |   0%
  |
  |=================================================================| 100%
#> Downloading the pipeline:
#> [1] "/tmp/RtmppsLTZ9/python-pipeline-7e2b70ae-5baa-11e9-a50b-b938de969cdb.zip"

Managing the Datasets and Models

After some time, you may have multiple datasets and models on your DAI server. The dai package offers a few utility functions to find, reuse, and remove the existing datasets and models.

If you already have the dataset loaded into DAI, you can get the DAIFrame object by either dai.get_frame (if you know the frame’s key) or dai.find_dataset (if you know the original path or at least a part of it):

dai.get_frame(creditcard$key)
#> DAI frame '7abe28b2-5baa-11e9-a50b-b938de969cdb': 23999 obs. of 25 variables
#> File path: tests/smalldata/kaggle/CreditCard/creditcard_train_cat.csv

dai.find_dataset('creditcard')
#> DAI frame '7abe28b2-5baa-11e9-a50b-b938de969cdb': 23999 obs. of 25 variables
#> File path: tests/smalldata/kaggle/CreditCard/creditcard_train_cat.csv

The latter directly returns you the frame if there’s only one match. Otherwise it let you select which frame to return from all the matching candidates.

Furthermore, you can get a list of datasets or models:

datasets <- dai.list_datasets()
head(datasets)
#>                                    key                     name
#> 1 7cf613a6-5baa-11e9-a50b-b938de969cdb                     test
#> 2 7cf3024c-5baa-11e9-a50b-b938de969cdb                    train
#> 3 7c38cb84-5baa-11e9-a50b-b938de969cdb     iris9e1f15d2df00.csv
#> 4 7abe28b2-5baa-11e9-a50b-b938de969cdb creditcard_train_cat.csv
#>                                                                                file_path
#> 1                 ./tmp/7cf613a6-5baa-11e9-a50b-b938de969cdb/test.1554912341.0966916.bin
#> 2                ./tmp/7cf3024c-5baa-11e9-a50b-b938de969cdb/train.1554912341.0864356.bin
#> 3 ./tmp/7c38cb84-5baa-11e9-a50b-b938de969cdb/iris9e1f15d2df00.csv.1554912339.9424415.bin
#> 4                             tests/smalldata/kaggle/CreditCard/creditcard_train_cat.csv
#>   file_size data_source row_count column_count import_status import_error
#> 1    567584      upload      4800           25             0
#> 2   2265952      upload     19199           25             0
#> 3      7064      upload       150            5             0
#> 4   2832040        file     23999           25             0
#>   aggregation_status aggregation_error aggregated_frame mapping_frame
#> 1                 -1
#> 2                 -1
#> 3                 -1
#> 4                 -1
#>   uploaded
#> 1     TRUE
#> 2     TRUE
#> 3     TRUE
#> 4    FALSE

models <- dai.list_models()
head(models)
#>                                    key description
#> 1 7e2b70ae-5baa-11e9-a50b-b938de969cdb    mupulori
#>                   dataset_name               parameters.dataset_key
#> 1 train.1554912341.0864356.bin 7cf3024c-5baa-11e9-a50b-b938de969cdb
#>   parameters.resumed_model_key      parameters.target_col
#> 1                              DEFAULT_PAYMENT_NEXT_MONTH
#>   parameters.weight_col parameters.fold_col parameters.orig_time_col
#> 1
#>   parameters.time_col parameters.is_classification parameters.cols_to_drop
#> 1               [OFF]                         TRUE                    NULL
#>   parameters.validset_key               parameters.testset_key
#> 1                         7cf613a6-5baa-11e9-a50b-b938de969cdb
#>   parameters.enable_gpus parameters.seed parameters.accuracy
#> 1                   TRUE              25                   1
#>   parameters.time parameters.interpretability parameters.scorer
#> 1               1                          10               AUC
#>   parameters.time_groups_columns parameters.time_period_in_seconds
#> 1                           NULL                                NA
#>   parameters.num_prediction_periods parameters.num_gap_periods
#> 1                                NA                         NA
#>   parameters.is_timeseries parameters.config_overrides
#> 1                    FALSE                          NA
#>                                                                                                          log_file_path
#> 1 h2oai_experiment_7e2b70ae-5baa-11e9-a50b-b938de969cdb/h2oai_experiment_logs_7e2b70ae-5baa-11e9-a50b-b938de969cdb.zip
#>                                                                    pickle_path
#> 1 h2oai_experiment_7e2b70ae-5baa-11e9-a50b-b938de969cdb/best_individual.pickle
#>                                                                                                              summary_path
#> 1 h2oai_experiment_7e2b70ae-5baa-11e9-a50b-b938de969cdb/h2oai_experiment_summary_7e2b70ae-5baa-11e9-a50b-b938de969cdb.zip
#>   train_predictions_path valid_predictions_path
#> 1
#>                                                  test_predictions_path
#> 1 h2oai_experiment_7e2b70ae-5baa-11e9-a50b-b938de969cdb/test_preds.csv
#>   progress status training_duration scorer     score test_score deprecated
#> 1        1      0          71.43582    AUC 0.7780229     0.7861      FALSE
#>   model_file_size diagnostic_keys
#> 1       695996094            NULL

If you know the key of the dataset or model, you can obtain the instance of DAIFrame or DAIModel by dai.get_model and dai.get_frame:

dai.get_model(models$key[1])
#> Status: Complete
#> Experiment: 7e2b70ae-5baa-11e9-a50b-b938de969cdb, 2019-04-10 18:06, 1.7.0+local_0c7d019-dirty
#>   Settings: 1/1/10, seed=25, GPUs enabled
#>   Train data: train (19199, 25)
#>   Validation data: N/A
#>   Test data: test (4800, 24)
#>   Target column: DEFAULT_PAYMENT_NEXT_MONTH (binary, 22.366% target class)
#> System specs: Linux, 126 GB, 40 CPU cores, 2/2 GPUs
#>   Max memory usage: 0.406 GB, 0.167 GB GPU
#> Recipe: AutoDL (2 iterations, 2 individuals)
#>   Validation scheme: stratified, 1 internal holdout
#>   Feature engineering: 33 features scored (18 selected)
#> Timing:
#>   Data preparation: 4.94 secs
#>   Model and feature tuning: 10.13 secs (3 models trained)
#>   Feature evolution: 5.54 secs (1 of 3 model trained)
#>   Final pipeline training: 7.85 secs (1 model trained)
#>   Python / MOJO scorer building: 42.05 secs / 0.00 secs
#> Validation score: AUC = 0.77802 +/- 0.0077539 (baseline)
#> Validation score: AUC = 0.77802 +/- 0.0077539 (final pipeline)
#> Test score:       AUC = 0.7861 +/- 0.0064711 (final pipeline)
dai.get_frame(datasets$key[1])
#> DAI frame '7cf613a6-5baa-11e9-a50b-b938de969cdb': 4800 obs. of 25 variables
#> File path: ./tmp/7cf613a6-5baa-11e9-a50b-b938de969cdb/test.1554912341.0966916.bin

Finally, the datasets and models can be removed by dai.rm:

dai.rm(model, creditcard, creditcard.splits$train, creditcard.splits$test)
#> Model 7e2b70ae-5baa-11e9-a50b-b938de969cdb removed
#> Dataset 7abe28b2-5baa-11e9-a50b-b938de969cdb removed
#> Dataset 7cf3024c-5baa-11e9-a50b-b938de969cdb removed
#> Dataset 7cf613a6-5baa-11e9-a50b-b938de969cdb removed

The function dai.rm deletes the objects by default both from the server and the R session. If you wish to remove it only from the server, you can set from_session=FALSE. Please note that only objects can be removed from the session, i.e. in the example above the creditcard.splits$train and creditcard.splits$test objects will not be removed from R session because they are actually function calls (recall that $ is a function).