R 客户端教程¶
本教程介绍了如何利用 Driverless AI R 客户端软件包来使用和控制 Driverless AI 平台。其中涉及了主要的预测性数据科学工作流,包括:
加载数据
自动化特征工程和模型调优
检测模型
预测新数据
管理数据集和模型
请注意:这些步骤均假设您已在 Driverless AI UI 中已经输入许可证密钥。
加载数据¶
我们必须先导入软件包并初始化连接,然后才能开始使用 Driverless AI 平台 (DAI):
library(dai)
dai.connect(uri = 'http://localhost:12345', username = 'h2oai', password = 'h2oai')
creditcard <- dai.create_dataset('/data/smalldata/kaggle/CreditCard/creditcard_train_cat.csv')
#>
|
| | 0%
|
|================ | 24%
|
|=================================================================| 100%
函数 dai.create_dataset()
会加载托管 DAI 的主机上的数据。以上命令假设 creditcard_train_cat.csv 位于运行 Driverless AI 的主机上的 /data 文件夹中。此文件可以在 https://s3.amazonaws.com/h2o-public-test-data/smalldata/kaggle/CreditCard/creditcard_train_cat.csv 上下载。
如果您想上传工作站中的数据,请使用 dai.upload_dataset()
.
如果您已经将数据加载到 R data.frame 中,则可以将其转换为 DAIFrame。例如:
iris.dai <- as.DAIFrame(iris)
#>
|
| | 0%
|
|=================================================================| 100%
print(iris.dai)
#> DAI frame '7c38cb84-5baa-11e9-a50b-b938de969cdb': 150 obs. of 5 variables
#> File path: ./tmp/7c38cb84-5baa-11e9-a50b-b938de969cdb/iris9e1f15d2df00.csv.1554912339.9424415.bin
您可以关闭进度条,前提是已通过设置 progress = FALSE
显示进度条。
在创建数据集后,您可以立即通过调用泛型打印和摘要信息,显示基本信息和摘要统计数据:
print(creditcard)
#> DAI frame '7abe28b2-5baa-11e9-a50b-b938de969cdb': 23999 obs. of 25 variables
#> File path: tests/smalldata/kaggle/CreditCard/creditcard_train_cat.csv
summary(creditcard)
#> variable num_classes is_numeric count
#> 1 ID 0 TRUE 23999
#> 2 LIMIT_BAL 79 TRUE 23999
#> 3 SEX 2 FALSE 23999
#> 4 EDUCATION 4 FALSE 23999
#> 5 MARRIAGE 4 FALSE 23999
#> 6 AGE 55 TRUE 23999
#> 7 PAY_1 11 TRUE 23999
#> 8 PAY_2 11 TRUE 23999
#> 9 PAY_3 11 TRUE 23999
#> 10 PAY_4 11 TRUE 23999
#> 11 PAY_5 10 TRUE 23999
#> 12 PAY_6 10 TRUE 23999
#> 13 BILL_AMT1 0 TRUE 23999
#> 14 BILL_AMT2 0 TRUE 23999
#> 15 BILL_AMT3 0 TRUE 23999
#> 16 BILL_AMT4 0 TRUE 23999
#> 17 BILL_AMT5 0 TRUE 23999
#> 18 BILL_AMT6 0 TRUE 23999
#> 19 PAY_AMT1 0 TRUE 23999
#> 20 PAY_AMT2 0 TRUE 23999
#> 21 PAY_AMT3 0 TRUE 23999
#> 22 PAY_AMT4 0 TRUE 23999
#> 23 PAY_AMT5 0 TRUE 23999
#> 24 PAY_AMT6 0 TRUE 23999
#> 25 DEFAULT_PAYMENT_NEXT_MONTH 2 TRUE 23999
#> mean std min max unique freq
#> 1 12000 6928.05889120466 1 23999 23999 1
#> 2 165498.715779824 129130.743065318 10000 1000000 79 2740
#> 3 2 8921
#> 4 4 11360
#> 5 4 12876
#> 6 35.3808492020501 9.2710457493384 21 79 55 1284
#> 7 -0.00312513021375891 1.12344874325651 -2 8 11 11738
#> 8 -0.123463477644902 1.20059118344043 -2 8 11 12543
#> 9 -0.154756448185341 1.20405796618856 -2 8 11 12576
#> 10 -0.211675486478603 1.16657279943005 -2 8 11 13250
#> 11 -0.252885536897371 1.13700672904 -2 8 10 13520
#> 12 -0.278011583815992 1.1581916495226 -2 8 10 12876
#> 13 50598.9286636943 72650.1978092856 -165580 964511 18717 1607
#> 14 48648.0474186424 70365.3956426641 -69777 983931 18367 2049
#> 15 46368.9035376474 68194.7195202748 -157264 1664089 18131 2325
#> 16 42369.8728280345 63071.4551670874 -170000 891586 17719 2547
#> 17 40002.3330972124 60345.7282797424 -81334 927171 17284 2840
#> 18 38565.2666361098 59156.5011434754 -339603 961664 16906 3258
#> 19 5543.09804575191 15068.86272958 0 505000 6918 4270
#> 20 5815.52852202175 20797.443884891 0 1684259 6839 4362
#> 21 4969.43139297471 16095.9292948255 0 896040 6424 4853
#> 22 4743.65686070253 14883.5548720259 0 497000 6028 5200
#> 23 4783.64369348723 15270.7039035392 0 417990 5984 5407
#> 24 5189.57360723363 17630.7185745277 0 528666 5988 5846
#> 25 0.223717654902288 0.41674368928609 FALSE TRUE 2 5369
#> num_hist_ticks
#> 1 1.0, 2400.8, 4800.6, 7200.400000000001, 9600.2, 12000.0, 14399.800000000001, 16799.600000000002, 19199.4, 21599.2, 23999.0
#> 2 10000.0, 109000.0, 208000.0, 307000.0, 406000.0, 505000.0, 604000.0, 703000.0, 802000.0, 901000.0, 1000000.0
#> 3
#> 4
#> 5
#> 6 21.0, 26.8, 32.6, 38.4, 44.2, 50.0, 55.8, 61.6, 67.4, 73.19999999999999, 79.0
#> 7 -2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8
#> 8 -2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8
#> 9 -2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8
#> 10 -2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8
#> 11 -2, -1, 0, 2, 3, 4, 5, 6, 7, 8
#> 12 -2, -1, 0, 2, 3, 4, 5, 6, 7, 8
#> 13 -165580.0, -52570.899999999994, 60438.20000000001, 173447.30000000005, 286456.4, 399465.5, 512474.6000000001, 625483.7000000001, 738492.8, 851501.9, 964511.0
#> 14 -69777.0, 35593.8, 140964.6, 246335.40000000002, 351706.2, 457077.0, 562447.8, 667818.6, 773189.4, 878560.2000000001, 983931.0
#> 15 -157264.0, 24871.29999999999, 207006.59999999998, 389141.8999999999, 571277.2, 753412.5, 935547.7999999998, 1117683.0999999999, 1299818.4, 1481953.7, 1664089.0
#> 16 -170000.0, -63841.399999999994, 42317.20000000001, 148475.80000000005, 254634.40000000002, 360793.0, 466951.6000000001, 573110.2000000001, 679268.8, 785427.4, 891586.0
#> 17 -81334.0, 19516.5, 120367.0, 221217.5, 322068.0, 422918.5, 523769.0, 624619.5, 725470.0, 826320.5, 927171.0
#> 18 -339603.0, -209476.3, -79349.6, 50777.09999999998, 180903.8, 311030.5, 441157.19999999995, 571283.9, 701410.6, 831537.3, 961664.0
#> 19 0.0, 50500.0, 101000.0, 151500.0, 202000.0, 252500.0, 303000.0, 353500.0, 404000.0, 454500.0, 505000.0
#> 20 0.0, 168425.9, 336851.8, 505277.69999999995, 673703.6, 842129.5, 1010555.3999999999, 1178981.3, 1347407.2, 1515833.0999999999, 1684259.0
#> 21 0.0, 89604.0, 179208.0, 268812.0, 358416.0, 448020.0, 537624.0, 627228.0, 716832.0, 806436.0, 896040.0
#> 22 0.0, 49700.0, 99400.0, 149100.0, 198800.0, 248500.0, 298200.0, 347900.0, 397600.0, 447300.0, 497000.0
#> 23 0.0, 41799.0, 83598.0, 125397.0, 167196.0, 208995.0, 250794.0, 292593.0, 334392.0, 376191.0, 417990.0
#> 24 0.0, 52866.6, 105733.2, 158599.8, 211466.4, 264333.0, 317199.6, 370066.2, 422932.8, 475799.39999999997, 528666.0
#> 25 False, True
#> num_hist_counts top
#> 1 2400, 2400, 2400, 2400, 2399, 2400, 2400, 2400, 2400, 2400
#> 2 10151, 6327, 3965, 2149, 1251, 96, 44, 15, 0, 1
#> 3 female
#> 4 university
#> 5 single
#> 6 4285, 6546, 5187, 3780, 2048, 1469, 501, 147, 34, 2
#> 7 2086, 4625, 11738, 2994, 2185, 254, 66, 17, 9, 7, 18
#> 8 2953, 4886, 12543, 20, 3204, 268, 76, 21, 9, 18, 1
#> 9 3197, 4787, 12576, 4, 3121, 183, 64, 17, 21, 27, 2
#> 10 3382, 4555, 13250, 2, 2515, 158, 55, 29, 5, 46, 2
#> 11 3539, 4482, 13520, 2178, 147, 71, 11, 3, 47, 1
#> 12 3818, 4722, 12876, 2324, 158, 37, 9, 16, 37, 2
#> 13 2, 17603, 4754, 1193, 316, 111, 18, 1, 0, 1
#> 14 14571, 7214, 1578, 429, 155, 43, 7, 1, 0, 1
#> 15 12977, 10150, 767, 99, 5, 0, 0, 0, 0, 1
#> 16 2, 16619, 5775, 1181, 311, 89, 20, 1, 0, 1
#> 17 12722, 9033, 1720, 374, 113, 31, 4, 0, 1, 1
#> 18 1, 1, 18312, 4788, 745, 131, 19, 1, 0, 1
#> 19 23643, 249, 56, 26, 14, 8, 0, 1, 1, 1
#> 20 23936, 50, 11, 1, 0, 0, 0, 0, 0, 1
#> 21 23836, 130, 20, 9, 3, 0, 0, 0, 0, 1
#> 22 23647, 235, 65, 29, 11, 5, 4, 0, 2, 1
#> 23 23588, 234, 94, 40, 22, 7, 3, 8, 0, 3
#> 24 23605, 235, 77, 56, 15, 5, 1, 3, 0, 2
#> 25 18630, 5369
#> nonnum_hist_ticks nonnum_hist_counts
#> 1
#> 2
#> 3 female, male, Other 15078, 8921, 0
#> 4 university, graduate, Other 11360, 8442, 4197
#> 5 single, married, Other 12876, 10813, 310
#> 6
#> 7
#> 8
#> 9
#> 10
#> 11
#> 12
#> 13
#> 14
#> 15
#> 16
#> 17
#> 18
#> 19
#> 20
#> 21
#> 22
#> 23
#> 24
#> 25
还有多个其他泛型也可在 DAIFrame 上正常使用:dim
、head
和 format
.
dim(creditcard)
#> [1] 23999 25
head(creditcard, 10)
#> ID LIMIT_BAL SEX EDUCATION MARRIAGE AGE PAY_1 PAY_2 PAY_3 PAY_4
#> 1 1 20000 female university married 24 2 2 -1 -1
#> 2 2 120000 female university single 26 -1 2 0 0
#> 3 3 90000 female university single 34 0 0 0 0
#> 4 4 50000 female university married 37 0 0 0 0
#> 5 5 50000 male university married 57 -1 0 -1 0
#> 6 6 50000 male graduate single 37 0 0 0 0
#> 7 7 500000 male graduate single 29 0 0 0 0
#> 8 8 100000 female university single 23 0 -1 -1 0
#> 9 9 140000 female highschool married 28 0 0 2 0
#> 10 10 20000 male highschool single 35 -2 -2 -2 -2
#> PAY_5 PAY_6 BILL_AMT1 BILL_AMT2 BILL_AMT3 BILL_AMT4 BILL_AMT5 BILL_AMT6
#> 1 -2 -2 3913 3102 689 0 0 0
#> 2 0 2 2682 1725 2682 3272 3455 3261
#> 3 0 0 29239 14027 13559 14331 14948 15549
#> 4 0 0 46990 48233 49291 28314 28959 29547
#> 5 0 0 8617 5670 35835 20940 19146 19131
#> 6 0 0 64400 57069 57608 19394 19619 20024
#> 7 0 0 367965 412023 445007 542653 483003 473944
#> 8 0 -1 11876 380 601 221 -159 567
#> 9 0 0 11285 14096 12108 12211 11793 3719
#> 10 -1 -1 0 0 0 0 13007 13912
#> PAY_AMT1 PAY_AMT2 PAY_AMT3 PAY_AMT4 PAY_AMT5 PAY_AMT6
#> 1 0 689 0 0 0 0
#> 2 0 1000 1000 1000 0 2000
#> 3 1518 1500 1000 1000 1000 5000
#> 4 2000 2019 1200 1100 1069 1000
#> 5 2000 36681 10000 9000 689 679
#> 6 2500 1815 657 1000 1000 800
#> 7 55000 40000 38000 20239 13750 13770
#> 8 380 601 0 581 1687 1542
#> 9 3329 0 432 1000 1000 1000
#> 10 0 0 0 13007 1122 0
#> DEFAULT_PAYMENT_NEXT_MONTH
#> 1 TRUE
#> 2 TRUE
#> 3 FALSE
#> 4 FALSE
#> 5 FALSE
#> 6 FALSE
#> 7 FALSE
#> 8 FALSE
#> 9 FALSE
#> 10 FALSE
但是,您不能使用 DAIFrame
访问其所有数据,也不能用其来修改数据。其只显示加载到 DAI 平台中的数据集。头部函数只允许访问示例数据:
creditcard$example_data[1:10, ]
#> ID LIMIT_BAL SEX EDUCATION MARRIAGE AGE PAY_1 PAY_2 PAY_3 PAY_4
#> 1 1 20000 female university married 24 2 2 -1 -1
#> 2 2 120000 female university single 26 -1 2 0 0
#> 3 3 90000 female university single 34 0 0 0 0
#> 4 4 50000 female university married 37 0 0 0 0
#> 5 5 50000 male university married 57 -1 0 -1 0
#> 6 6 50000 male graduate single 37 0 0 0 0
#> 7 7 500000 male graduate single 29 0 0 0 0
#> 8 8 100000 female university single 23 0 -1 -1 0
#> 9 9 140000 female highschool married 28 0 0 2 0
#> 10 10 20000 male highschool single 35 -2 -2 -2 -2
#> PAY_5 PAY_6 BILL_AMT1 BILL_AMT2 BILL_AMT3 BILL_AMT4 BILL_AMT5 BILL_AMT6
#> 1 -2 -2 3913 3102 689 0 0 0
#> 2 0 2 2682 1725 2682 3272 3455 3261
#> 3 0 0 29239 14027 13559 14331 14948 15549
#> 4 0 0 46990 48233 49291 28314 28959 29547
#> 5 0 0 8617 5670 35835 20940 19146 19131
#> 6 0 0 64400 57069 57608 19394 19619 20024
#> 7 0 0 367965 412023 445007 542653 483003 473944
#> 8 0 -1 11876 380 601 221 -159 567
#> 9 0 0 11285 14096 12108 12211 11793 3719
#> 10 -1 -1 0 0 0 0 13007 13912
#> PAY_AMT1 PAY_AMT2 PAY_AMT3 PAY_AMT4 PAY_AMT5 PAY_AMT6
#> 1 0 689 0 0 0 0
#> 2 0 1000 1000 1000 0 2000
#> 3 1518 1500 1000 1000 1000 5000
#> 4 2000 2019 1200 1100 1069 1000
#> 5 2000 36681 10000 9000 689 679
#> 6 2500 1815 657 1000 1000 800
#> 7 55000 40000 38000 20239 13750 13770
#> 8 380 601 0 581 1687 1542
#> 9 3329 0 432 1000 1000 1000
#> 10 0 0 0 13007 1122 0
#> DEFAULT_PAYMENT_NEXT_MONTH
#> 1 TRUE
#> 2 TRUE
#> 3 FALSE
#> 4 FALSE
#> 5 FALSE
#> 6 FALSE
#> 7 FALSE
#> 8 FALSE
#> 9 FALSE
#> 10 FALSE
数据集可在 R 中直接拆分成训练数据集和测试数据集:
creditcard.splits <- dai.split_dataset(creditcard,
output_name1 = 'train',
output_name2 = 'test',
ratio = .8,
seed = 25,
progress = FALSE)
在此例中,creditcard.splits 列表带有两个名称分别为 “train” 和 “test” 的元素,其中 80% 的数据进入训练数据集,20% 进入测试数据集。
creditcard.splits$train
#> DAI frame '7cf3024c-5baa-11e9-a50b-b938de969cdb': 19199 obs. of 25 variables
#> File path: ./tmp/7cf3024c-5baa-11e9-a50b-b938de969cdb/train.1554912341.0864356.bin
creditcard.splits$test
#> DAI frame '7cf613a6-5baa-11e9-a50b-b938de969cdb': 4800 obs. of 25 variables
#> File path: ./tmp/7cf613a6-5baa-11e9-a50b-b938de969cdb/test.1554912341.0966916.bin
默认情况下,会产生随机样本,但是您同样可以分层或基于时间分割。更多详细信息,请参见该函数的文档资料。
自动化特征工程和模型调优¶
Driverless AI 的主要优势之一是完全自动化的特征工程以及超参数调优、模型选择和集成。函数 dai.train()
负责执行产生 DAIModel 实例的实验,该实例代表模型。
model <- dai.train(training_frame = creditcard.splits$train,
testing_frame = creditcard.splits$test,
target_col = 'DEFAULT_PAYMENT_NEXT_MONTH',
is_classification = T,
is_timeseries = F,
accuracy = 1, time = 1, interpretability = 10,
seed = 25)
#>
|
| | 0%
|
|========================== | 40%
|
|=============================================== | 73%
|
|=========================================================== | 91%
|
|=================================================================| 100%
如果您不指定准确度、时间或可解释性,DAI 平台会提供建议。(请参见 dai.suggest_model_params
.)
检测模型¶
与用于 DAIFrame 一样, print
、 format
、 summary``和 ``predict
等通用方法对 DAIModel 也有效:
print(model)
#> Status: Complete
#> Experiment: 7e2b70ae-5baa-11e9-a50b-b938de969cdb, 2019-04-10 18:06, 1.7.0+local_0c7d019-dirty
#> Settings: 1/1/10, seed=25, GPUs enabled
#> Train data: train (19199, 25)
#> Validation data: N/A
#> Test data: test (4800, 24)
#> Target column: DEFAULT_PAYMENT_NEXT_MONTH (binary, 22.366% target class)
#> System specs: Linux, 126 GB, 40 CPU cores, 2/2 GPUs
#> Max memory usage: 0.406 GB, 0.167 GB GPU
#> Recipe: AutoDL (2 iterations, 2 individuals)
#> Validation scheme: stratified, 1 internal holdout
#> Feature engineering: 33 features scored (18 selected)
#> Timing:
#> Data preparation: 4.94 secs
#> Model and feature tuning: 10.13 secs (3 models trained)
#> Feature evolution: 5.54 secs (1 of 3 model trained)
#> Final pipeline training: 7.85 secs (1 model trained)
#> Python / MOJO scorer building: 42.05 secs / 0.00 secs
#> Validation score: AUC = 0.77802 +/- 0.0077539 (baseline)
#> Validation score: AUC = 0.77802 +/- 0.0077539 (final pipeline)
#> Test score: AUC = 0.7861 +/- 0.0064711 (final pipeline)
summary(model)$score
#> [1] 0.7780229
预测新数据¶
新数据可通过两种不同的方法储存:
在 R 会话中,直接对模型调用
predict()
.下载评分管道并将该管道嵌入到 Python 或 Java 工作流中。
在 R 中预测¶
泛型 predict()
直接返回带有(默认)结果的 R data.frame,或其返回指向列有结果 (return_df=FALSE) 的 CSV 文件的 URL。当您依据较大的数据集预测时,后者可能比较有用。
predictions <- predict(model, newdata = creditcard.splits$test)
#>
|
| | 0%
|
|=================================================================| 100%
#> Loading required package: bitops
head(predictions)
#> DEFAULT_PAYMENT_NEXT_MONTH.0 DEFAULT_PAYMENT_NEXT_MONTH.1
#> 1 0.8879988 0.11200116
#> 2 0.9289870 0.07101299
#> 3 0.9550328 0.04496716
#> 4 0.3513577 0.64864230
#> 5 0.9183724 0.08162758
#> 6 0.9154425 0.08455751
predict(model, newdata = creditcard.splits$test, return_df = FALSE)
#>
|
| | 0%
|
|=================================================================| 100%
#> [1] "h2oai_experiment_7e2b70ae-5baa-11e9-a50b-b938de969cdb/7e2b70ae-5baa-11e9-a50b-b938de969cdb_preds_f854b49f.csv"
下载 Python 评分管道或 MOJO 评分管道¶
要使用 Python 或 Java 对模型进行产品化,您可以分别下载完整版本的 Python 或 MOJO 管道。更多关于如何使用管道的信息,请参阅 R 客户端文档资料。
dai.download_mojo(model, path = tempdir(), force = TRUE)
#>
|
| | 0%
|
|=================================================================| 100%
#> Downloading the pipeline:
#> [1] "/tmp/RtmppsLTZ9/mojo-7e2b70ae-5baa-11e9-a50b-b938de969cdb.zip"
dai.download_python_pipeline(model, path = tempdir(), force = TRUE)
#>
|
| | 0%
|
|=================================================================| 100%
#> Downloading the pipeline:
#> [1] "/tmp/RtmppsLTZ9/python-pipeline-7e2b70ae-5baa-11e9-a50b-b938de969cdb.zip"
管理数据集和模型¶
一段时间后,您的 DAI 服务器中可能会有多个数据集和模型。DAI 包提供了一些工具函数,用于查找、重用和删除现有数据集和模型。
如果您已经将数据集加载到 DAI 中,可以通过 dai.get_frame
(如果您知道该帧的密钥)或 dai.find_dataset
(如果您知道原始路径或其至少一部分)获取 DAIFrame 对象。
dai.get_frame(creditcard$key)
#> DAI frame '7abe28b2-5baa-11e9-a50b-b938de969cdb': 23999 obs. of 25 variables
#> File path: tests/smalldata/kaggle/CreditCard/creditcard_train_cat.csv
dai.find_dataset('creditcard')
#> DAI frame '7abe28b2-5baa-11e9-a50b-b938de969cdb': 23999 obs. of 25 variables
#> File path: tests/smalldata/kaggle/CreditCard/creditcard_train_cat.csv
如果只有一项匹配,后者将直接返回该帧。否则,会让您从所有匹配的帧中选择返回哪个帧。
而且,您还可以获取数据集或模型列表:
datasets <- dai.list_datasets()
head(datasets)
#> key name
#> 1 7cf613a6-5baa-11e9-a50b-b938de969cdb test
#> 2 7cf3024c-5baa-11e9-a50b-b938de969cdb train
#> 3 7c38cb84-5baa-11e9-a50b-b938de969cdb iris9e1f15d2df00.csv
#> 4 7abe28b2-5baa-11e9-a50b-b938de969cdb creditcard_train_cat.csv
#> file_path
#> 1 ./tmp/7cf613a6-5baa-11e9-a50b-b938de969cdb/test.1554912341.0966916.bin
#> 2 ./tmp/7cf3024c-5baa-11e9-a50b-b938de969cdb/train.1554912341.0864356.bin
#> 3 ./tmp/7c38cb84-5baa-11e9-a50b-b938de969cdb/iris9e1f15d2df00.csv.1554912339.9424415.bin
#> 4 tests/smalldata/kaggle/CreditCard/creditcard_train_cat.csv
#> file_size data_source row_count column_count import_status import_error
#> 1 567584 upload 4800 25 0
#> 2 2265952 upload 19199 25 0
#> 3 7064 upload 150 5 0
#> 4 2832040 file 23999 25 0
#> aggregation_status aggregation_error aggregated_frame mapping_frame
#> 1 -1
#> 2 -1
#> 3 -1
#> 4 -1
#> uploaded
#> 1 TRUE
#> 2 TRUE
#> 3 TRUE
#> 4 FALSE
models <- dai.list_models()
head(models)
#> key description
#> 1 7e2b70ae-5baa-11e9-a50b-b938de969cdb mupulori
#> dataset_name parameters.dataset_key
#> 1 train.1554912341.0864356.bin 7cf3024c-5baa-11e9-a50b-b938de969cdb
#> parameters.resumed_model_key parameters.target_col
#> 1 DEFAULT_PAYMENT_NEXT_MONTH
#> parameters.weight_col parameters.fold_col parameters.orig_time_col
#> 1
#> parameters.time_col parameters.is_classification parameters.cols_to_drop
#> 1 [OFF] TRUE NULL
#> parameters.validset_key parameters.testset_key
#> 1 7cf613a6-5baa-11e9-a50b-b938de969cdb
#> parameters.enable_gpus parameters.seed parameters.accuracy
#> 1 TRUE 25 1
#> parameters.time parameters.interpretability parameters.scorer
#> 1 1 10 AUC
#> parameters.time_groups_columns parameters.time_period_in_seconds
#> 1 NULL NA
#> parameters.num_prediction_periods parameters.num_gap_periods
#> 1 NA NA
#> parameters.is_timeseries parameters.config_overrides
#> 1 FALSE NA
#> log_file_path
#> 1 h2oai_experiment_7e2b70ae-5baa-11e9-a50b-b938de969cdb/h2oai_experiment_logs_7e2b70ae-5baa-11e9-a50b-b938de969cdb.zip
#> pickle_path
#> 1 h2oai_experiment_7e2b70ae-5baa-11e9-a50b-b938de969cdb/best_individual.pickle
#> summary_path
#> 1 h2oai_experiment_7e2b70ae-5baa-11e9-a50b-b938de969cdb/h2oai_experiment_summary_7e2b70ae-5baa-11e9-a50b-b938de969cdb.zip
#> train_predictions_path valid_predictions_path
#> 1
#> test_predictions_path
#> 1 h2oai_experiment_7e2b70ae-5baa-11e9-a50b-b938de969cdb/test_preds.csv
#> progress status training_duration scorer score test_score deprecated
#> 1 1 0 71.43582 AUC 0.7780229 0.7861 FALSE
#> model_file_size diagnostic_keys
#> 1 695996094 NULL
如果您知道数据集或模型的密钥,可以通过 dai.get_model
或 dai.get_frame
获取 DAIFrame 或 DAIModel 的实例:
dai.get_model(models$key[1])
#> Status: Complete
#> Experiment: 7e2b70ae-5baa-11e9-a50b-b938de969cdb, 2019-04-10 18:06, 1.7.0+local_0c7d019-dirty
#> Settings: 1/1/10, seed=25, GPUs enabled
#> Train data: train (19199, 25)
#> Validation data: N/A
#> Test data: test (4800, 24)
#> Target column: DEFAULT_PAYMENT_NEXT_MONTH (binary, 22.366% target class)
#> System specs: Linux, 126 GB, 40 CPU cores, 2/2 GPUs
#> Max memory usage: 0.406 GB, 0.167 GB GPU
#> Recipe: AutoDL (2 iterations, 2 individuals)
#> Validation scheme: stratified, 1 internal holdout
#> Feature engineering: 33 features scored (18 selected)
#> Timing:
#> Data preparation: 4.94 secs
#> Model and feature tuning: 10.13 secs (3 models trained)
#> Feature evolution: 5.54 secs (1 of 3 model trained)
#> Final pipeline training: 7.85 secs (1 model trained)
#> Python / MOJO scorer building: 42.05 secs / 0.00 secs
#> Validation score: AUC = 0.77802 +/- 0.0077539 (baseline)
#> Validation score: AUC = 0.77802 +/- 0.0077539 (final pipeline)
#> Test score: AUC = 0.7861 +/- 0.0064711 (final pipeline)
dai.get_frame(datasets$key[1])
#> DAI frame '7cf613a6-5baa-11e9-a50b-b938de969cdb': 4800 obs. of 25 variables
#> File path: ./tmp/7cf613a6-5baa-11e9-a50b-b938de969cdb/test.1554912341.0966916.bin
最后,可以通过``dai.rm``删除数据集和模型:
dai.rm(model, creditcard, creditcard.splits$train, creditcard.splits$test)
#> Model 7e2b70ae-5baa-11e9-a50b-b938de969cdb removed
#> Dataset 7abe28b2-5baa-11e9-a50b-b938de969cdb removed
#> Dataset 7cf3024c-5baa-11e9-a50b-b938de969cdb removed
#> Dataset 7cf613a6-5baa-11e9-a50b-b938de969cdb removed
函数 dai.rm
默认会将对象从服务器和 R 会话中删除。如果希望只从服务器中删除对象,您可以设置 from_session=FALSE
.请注意,只有对象才能从会话中删除,也就是说,在以上示例中,creditcard.splits$train
和 creditcard.splits$test
对象将不会被从 R 会话中删除,因为它们实际是函数调用(回顾可知``$``表示函数)。