H2O Model Validation is an application that lets you assess the robustness and stability of trained H2O Driverless AI (DAI) models. You can use H2O Model Validation to avoid model degradation by performing various validation tests that reveal weaknesses and vulnerabilities in your datasets and models.
The below sections provide answers to frequently asked questions. If you have additional questions, please send them to email@example.com.
Which specific validation tests are available to validate an experiment or dataset?
H2O Model Validation offers an array of validation tests to validate the robustness and stability of Driverless AI (DAI) experiments and datasets. Specific validation tests are available to analyze (validate) an experiment or dataset. To learn more, see Supported validation tests.
For time series models (experiments), a "size dependency" or a "backtesting" test seems to be similar. How do they differ?
At first glance, a size dependency and a backtesting test are very similar when a time variable is provided. Both tests work the same for time series models and independent and identically distributed (IID) models when date/time records are provided. However, key differences can be highlighted when discussing the tests' purpose and methodology.
- Size dependency
- A size dependency test checks whether the model's performance degrades when the dataset size increases or decreases
- A backtesting test evaluates the performance of a model on historical data by simulating how it would have performed in the past
- Size dependency
- A size dependency test for a time series model utilizes a fixed test dataset. On the other hand, a backtesting test (during each split) utilizes a rolling subsample as test data to imitate model application at the corresponding time point
- A backtesting test for a time series model starts the iterations with the oldest data and incrementally adds more recent data. On the contrary, a size dependency test does the opposite
- A backtesting test for a time series model has an option (setting) not to grow the training and test datasets but to utilize a specified rolling training and test window (for example, use only the last 12 months of data relative to the simulation time point)