H2O Driverless AI Release Notes¶

H2O Driverless AI is a high-performance, GPU-enabled, client-server application for the rapid development and deployment of state-of-the-art predictive analytics models. It reads tabular data from various sources and automates data visualization, grand-master level automatic feature engineering, model validation (overfitting and leakage prevention), model parameter tuning, model interpretability and model deployment. H2O Driverless AI is currently targeting common regression, binomial classification, and multinomial classification applications including loss-given-default, probability of default, customer churn, campaign response, fraud detection, anti-money-laundering, and predictive asset maintenance models. It also handles time-series problems for individual or grouped time-series such as weekly sales predictions per store and department, with time-causal feature engineering and validation schemes. The ability to model unstructured data is coming soon.

High-level capabilities:

Client/server application for rapid experimentation and deployment of state-of-the-art supervised machine learning models
User-friendly GUI
Python and R client API
Automatically creates machine learning modeling pipelines for highest predictive accuracy
Automates data cleaning, feature selection, feature engineering, model selection, model tuning, ensembling
Automatically creates stand-alone batch scoring pipeline for in-process scoring or client/server scoring via HTTP or TCP protocols in Python
Automatically creates stand-alone (MOJO) low latency scoring pipeline for in-process scoring or client/server scoring via HTTP or TCP protocols, in C++ (with R and Python runtimes) and Java (runs anywhere)
Multi-GPU and multi-CPU support for powerful workstations and NVidia DGX supercomputers
Machine Learning model interpretation module with global and local model interpretation
Automatic Visualization module
Multi-user support
Backward compatibility

Problem types supported:

Regression (continuous target variable like age, income, price or loss prediction, time-series forecasting)
Binary classification (0/1 or “N”/”Y”, for fraud prediction, churn prediction, failure prediction, etc.)
Multinomial classification (“negative”/”neutral”/”positive” or 0/1/2/3 or 0.5/1.0/2.0 for categorical target variables, for prediction of membership type, next-action, product recommendation, sentiment analysis, etc.)

Data types supported:

Tabular structured data, rows are observations, columns are fields/features/variables
Numeric, categorical and textual fields
Missing values are allowed
i.i.d. (identically and independently distributed) data
Time-series data with a single time-series (time flows across the entire dataset, not per block of data)
Grouped time-series (e.g., sales per store per department per week, all in one file, with 3 columns for store, dept, week)
Time-series problems with a gap between training and testing (i.e., the time to deploy), and a known forecast horizon (after which model has to be retrained)

Data types supported via custom recipes:

Image
Video
Audio
Graphs

Data sources supported:

Local file system or NFS
File upload from browser or Python client
S3 (Amazon)
Hadoop (HDFS)
Azure Blob storage
Blue Data Tap
Google BigQuery
Google Cloud storage
kdb+
Minio
Snowflake
JDBC
Custom Data Recipe BYOR (Python, bring your own recipe)

File formats supported:

Plain text formats of columnar data (.csv, .tsv, .txt)
Compressed archives (.zip, .gz, .bz2)
Excel
Parquet
Feather
Python datatable (.jay)

Architecture¶

DAI architecture¶

Roadmap¶

DAI roadmap¶

Change Log¶

Version 1.8.10 LTS (Feb 23, 2021)¶

New Features:
- Exposed new Python client for download in resources menu
- Added support for .avro file format
- Added option to use a custom AutoDoc template to generate one or more AutoReports for every experiment with the autodoc_template config.toml setting
MOJO updates:
- Upgraded MOJO runtime dependency to 2.5.10
- Added MOJO support to compute Shapley for Tree and Linear-based boosting models. Note that support for Shapley values for ensemble GLM models is currently in beta
Improvements:
- Added more verbosity to MLI logs
Bug Fixes:
- Fixed stall detected in LightGBM models on P2.8x Amazon EC2 instances

Version 1.8.9 LTS (Oct 19, 2020)¶

New Features:
- Add configurable CSRF (Cross-site request forgery) protection on API endpoints
- Add protection against concurrent sessions
Improvements:
- Hide webserver technology info from all API endpoints
- Improved BYOR security by introducing configurable static analysis of the code
- Improved session verification and authenticity
- Improved security for internals API handlers via encryption
Bug Fixes:
- Fix user session autologout after session expiration
- Fix for properly cleaning closed sessions
- Fix invalid redirection to static artifacts when using reverse proxy and URL prefix
- Fix import of files without extension

Version 1.8.8 LTS (Sep 30, 2020)¶

New Features:
- Give user control over number of saved variable importances (i.e., Python and R clients can get more than 14 values back) (max_varimp_to_save in config.toml file)
- Added holiday calendar for 24 more countries, allow user to select list of countries to create is-holiday features for time series experiments
- Enable GPU support for LightGBM models on IBM Power
- Expose k-LIME MOJO in MLI
- Support rhel8-like systems
- Introducing an option to log in using the JWT token injected by the reverse proxy
- Allow user to specify data delimiter/separator from configuration (datatable_separator see config.toml file)
- Add support of encrypted keystore for sensitive config.toml values
- Save transformed column names for Shapely value computation in MOJO
Improvements:
- Add more consistency in handling files without an extension
- Improve web server request handling and disallow redirection outside of application
- Improve log file formatting to facilitate parsing
- Improve logging for connectors
- Improve air-gapped support for custom recipes
- Allow Snowflake Stage tables to be optional
Bug Fixes:
- Fix OpenID and TLS login redirection when deploying behind reverse proxy
- Fix Cgroup memory detection on IBM Power
- Various MLI fixes
- Various UI fixes
Documentation updates:
- Added configuration security document
- Added details about MOJO Scoring Command-Line Options
- Added details about exporting artifacts to Bitbucket
- Added details about GPU usage in Driverless AI
- Improved install docs (RHEL 8 support)
- Improved description of transformed feature names

Version 1.8.7.1 LTS (June 23, 2020)¶

New Features:
- Add ability to push artifacts to a Bitbucket server
- Add per-feature user control for monotonicity constraints for XGBoostGBM, LightGBM and DecisionTree models
Bug Fixes:
- Fix Hive kerberos impersonation
- Fix a DTap connector issue by using the proper login username for impersonation
- Fix monotonicity constraints for XGBoostGBM, LightGBM and DecisionTree models

Version 1.8.7 LTS (June 15, 2020)¶

New Features:
- Add intercept term to k-LIME csv
- Add control of default categorical & numeric feature rendering in DAI PD/ICE
- Add ability to restrict custom recipe upload to a specific git repository and branch
- Add translations for Korean and Chinese
- Add ability to use multiple authentication methods simultaneously
Improvements:
- Improve behavior of systemctl in the case Driverless AI fails to start
- Improve logging behavior for JDBC and Hive connectors
- Improve behavior of C++ scorer, fewer unnecessary files saved in tmp directory
- Improve Docker image behavior in Kubernetes
- Improve LDAP authentication to allow for anonymous binding
- Improve speed of feature selection for experiments on large, wide, imbalanced datasets
- Improve speed of data import on busy system
Bug fixes:
- Fix automatic Kaggle submission and score retrieval
- Fix intermittent Java exception seen by surrogate DRF model in MLI when several MLI jobs are run concurrently
- Fix issue with deleting Deployments if linked Experiment was deleted
- Fix issue causing Jupyter Notebooks to not work properly in Docker Image
- Fix custom recipe scorers not being displayed on Diagnostics page
- Fix issue with AWS Lambda Deployment not handling dropped columns properly
- Fix issue with not being able to limit number of GPUs for specific experiment
- Fix in-server scoring inaccuracies for certain models built in 1.7.1 and 1.8.0 (standalone scoring not affected)
- Fix rare datatable type casting exception
Documentation updates:
- The “Maximum Number of Rows to Perform Permutation-Based Feature Selection” expert setting now has a default value of 500,000
- Improved Hive and Snowflake connector documentation
- Updated the Main.java example in the Java Scoring Pipeline chapter
- Added documentation describing how to change the language in the UI before starting the application
- Added information about how custom recipes are described and documented in the Autoreport
- Updated the LDAP authentication documentation
- Improved the Linux DEB and RPM installation instructions
- Improved the AWS Community AMI installation instructions
- Improved documentation for the Reproducible button

Version 1.8.6 LTS (Apr 30, 2020)¶

New Features:
- Add expert setting to reduce size of MOJO scoring pipelines (and hence reduce latency and memory usage for inference)
- Enable Lambda deployment for IBM Power
- Add restart button for Deployments
- Add automatic Kaggle submission for supported datasets, show private/public scores (requires Kaggle API Username/Key)
- Show warning if single final model is worse on back-testing splits (for time series) or cross-validation folds (for IID) than the fold models (indicates issue with signal or fit)
- Update R client API to include autodoc, experiment preview, dataset download, autovis functions
- Add button in expert settings that toggle some effective settings to make a small MOJO production pipeline
- Add an option to upload artifacts to S3 or a Git repository
Improvements:
- Improve experiment restart/refit robustness if model type is changed
- Extra protection against dropping features
- Improve implementation of Hive connector
Bug fixes:
- Upgrade datatable to fix endless loop during stats calculation at file import
- Web server and UI now respect dynamic base URL suffix
- Fix incorrect min_rows in MLI when providing weight column with small values
- Fix segfault in MOJO for TensorFlow/PyTorch models
- Fix elapsed time for MLI
- Enable GPU by default for R client
- Fix python scoring h2oai ModuleNotFound error
- Update no_drop_features toml and expert button to more generally avoid dropping features
- Fix datatable mmap strategy
Documentation updates:
- Add documentation for enabling the Hive data connector
- Add documentation for updating expired DAI licenses on AWS Lambda deployments using a script
- Documentation for uploading artifacts now includes support for S3 and Git in the artifacts store
- Improve documentation for one-hot encoding
- Improve documentation for systemd logs/journalctl
- Improve documentation for time series ‘unavailable columns at prediction time’
- Improve documentation for Azure blob storage
- Improve documentation for MOJO scoring pipeline
- Add information about reducing the size of a MOJO using a new expert setting

Version 1.8.5 LTS (Mar 09, 2020)¶

New Features:
- Handle large (up to 10k) multiclass problems, including GUI improvements in such cases
- Detect class imbalance for binary problems where target class is non-rare
- Add feature count to iteration panel
- Add experiment lineage pdf in experiment summary zip file
- Issue warnings if final pipeline scores are unstable across (cross-)validation folds
- Issue warning if Constant Model is improving quality of final pipeline (sign of bad signal)
- Report origin of leakage detection as from model fit (AUC/R2), GINI, or correlation
Improvements:
- Improve handling of ID columns
- Improve exception handling to improve stability of raising python exceptions
- Improve exception handling when any individual transformer or model throw exception or segfaults
- Improve robustness of restart and refit experiment to changes in experiment choices
- Improve handling of missing values when transforming dataset
- Improve robustness of custom recipe importing of modules
- Improve documentation for installation instructions
- Improve selection of initial lag sizes for time series
- Improve LightGBM stability for regression problems for certain mutation parameters
Documentation updates:
- Improved documentation for time-series experiments
- Added topics describing how to re-enable the Data Recipe URL and Data Recipe File connectors
- For users running older versions of the Standalone Python Scoring Pipeline, added information describing how to install upgraded versions of outdated dependencies
- Improved the description for the “Sampling Method for Imbalanced Binary Classification Problems” expert setting
- Added constraints related to the REST server deployments
- Noted required vs optional parameters in the HDFS connector topics
- Added an FAQ indicating that MOJOs are thread safe
- On Windows 10, only Docker installs are supported
- Added information about the Recommendations AutoViz graph
- Added information to the Before you Begin Installing topic that master.db files are not backward compatible with earlier Driverless AI versions
Bug fixes:
- Update LightGBM for bug fixes, including hangs and avoiding hard-coded library paths
- Stabilize use of psutil package
- Fix time-series experiments when test set has missing target values
- Fix python scoring to not depend upon original data_directory
- Fix preview for custom time series validation splits and low accuracy
- Fix ignored minimum lag size setting for single time series
- Fix parsing of Excel files with datetime columns
- Fix column type detection for columns with mostly missing values
- Fix invalid display of 0.0000 score in iteration scores
- Various MLI fixes (don’t show invalid graphs, fix PDP sort order, overlapping labels)
- Various bug fixes

H2O Driverless AI Release Notes¶

Architecture¶

Roadmap¶

Change Log¶

Version 1.8.10 LTS (Feb 23, 2021)¶

Version 1.8.9 LTS (Oct 19, 2020)¶

Version 1.8.8 LTS (Sep 30, 2020)¶

Version 1.8.7.1 LTS (June 23, 2020)¶

Version 1.8.7 LTS (June 15, 2020)¶

Version 1.8.6 LTS (Apr 30, 2020)¶

Version 1.8.5 LTS (Mar 09, 2020)¶

Version 1.8.4.1 LTS (Feb 4, 2020)¶

Version 1.8.4 LTS (Jan 31, 2020)¶

Version 1.8.3 LTS (Jan 22, 2020)¶

Version 1.8.2 LTS (Jan 17, 2020)¶

Version 1.8.1.1 (Dec 21, 2019)¶

Version 1.8.1 (Dec 10, 2019)¶

Version 1.8.0 (Oct 3, 2019)¶

Version 1.7.1 (Aug 19, 2019)¶

Version 1.6.4 LTS (Aug 19, 2019)¶

Version 1.7.0 (Jul 7, 2019)¶

Version 1.6.3 LTS (June 14, 2019)¶

Version 1.6.2 LTS (May 10, 2019)¶

Version 1.6.1.1 LTS (Apr 24, 2019)¶

Version 1.6.1 LTS (Apr 18, 2019)¶

Version 1.6.0 LTS (Apr 5, 2019)¶

Version 1.5.4 (Feb 24, 2019)¶

Version 1.5.3 (Feb 8, 2019)¶

Version 1.5.2 (Feb 2, 2019)¶

Version 1.5.1 (Jan 22, 2019)¶

Version 1.5.0 (Jan 18, 2019)¶

Version 1.4.2 (Dec 3, 2018)¶

Version 1.4.1 (Nov 11, 2018)¶

Version 1.4.0 (Oct 27, 2018)¶

Version 1.3.1 (Sep 12, 2018)¶

Version 1.3.0 (Sep 4, 2018)¶

Version 1.2.2 (July 5, 2018)¶

Version 1.2.1 (June 26, 2018)¶

Version 1.2.0 (June 11, 2018)¶

Version 1.1.6 (May 29, 2018)¶

Version 1.1.4 (May 17, 2018)¶

Version 1.1.3 (May 16, 2018)¶

Version 1.1.2 (May 8, 2018)¶

Version 1.1.1 (April 23, 2018)¶

Version 1.1.0 (April 19, 2018)¶

Version 1.0.30 (April 5, 2018)¶

Version 1.0.29 (April 4, 2018)¶

Version 1.0.28 (April 3, 2018)¶

Version 1.0.27 (March 31, 2018)¶

Version 1.0.26 (March 28, 2018)¶

Version 1.0.25 (March 22, 2018)¶

Version 1.0.24 (March 8, 2018)¶

Version 1.0.23 (March 7, 2018)¶

Version 1.0.22 (Feb 23, 2018)¶

Version 1.0.21 (Feb 21, 2018)¶

Version 1.0.20 (Feb 17, 2018)¶

Version 1.0.19 (Jan 28, 2018)¶

Version 1.0.18 (Jan 24, 2018)¶

Version 1.0.17 (Jan 23, 2018)¶

Version 1.0.16 (Jan 22, 2018)¶

Version 1.0.15 (Jan 11, 2018)¶

Version 1.0.14 (Jan 11, 2018)¶

Version 1.0.13 (Jan 10, 2018)¶

Version 1.0.11 (Dec 12, 2017)¶

Version 1.0.10 (Dec 4, 2017)¶

Version 1.0.9 (Nov 29, 2017)¶

Version 1.0.8 (Nov 21, 2017)¶

Version 1.0.7 (Nov 17, 2017)¶

Version 1.0.5 (Oct 24, 2017)¶

Version 1.0.4 (Oct 19, 2017)¶

Version 1.0.3 (Oct 9, 2017)¶

Version 1.0.2 (Oct 5, 2017)¶

Version 1.0.1 (Oct 4, 2017)¶

Version 1.0.0 (Sep 24, 2017)¶