- I want to score multiple models on a huge dataset. Is it possible to score these models in parallel?
The best way to score models in parallel is to use the in-H2O binary models. To do this:
- Import the binary (non-POJO, previously exported) model into an H2O cluster
- Import the datasets into H2O as well.
- Call the predict endpoint either from R, Python, Flow, or the REST API directly.
- Export the predictions to file or download them from the server.
You can also score models in parallel by downloading a POJO or MOJO for each model, and then embedding those within a HIVE UDF to score the large dataset stored on Hadoop. Tutorials on this process can be found here (POJO) and here (MOJO).
- Which parameters are used with or for scoring?