H2O Driverless AI Release Notes

H2O Driverless AI is a high-performance, GPU-enabled, client-server application for the rapid development and deployment of state-of-the-art predictive analytics models. It reads tabular data from plain text sources and automates data visualization, feature engineering, model training, and model explanation. H2O Driverless AI is currently targeting common regression, binomial classification, and multinomial classification applications including loss-given-default, probability of default, customer churn, campaign response, fraud detection, anti-money-laundering, and predictive asset maintenance models. The ability to read data from HDFS and model unstructured data is coming soon.

High-level capabilities:

  • Client/server application for rapid experimentation and deployment of state-of-the-art supervised machine learning models
  • Automatically creates machine learning modeling pipelines for highest predictive accuracy
  • Automatically creates stand-alone scoring pipeline for in-process scoring or client/server scoring via http or tcp protocols.
  • Python API or GUI
  • Multi-GPU and multi-CPU support for powerful workstations and NVidia DGX supercomputers
  • Machine Learning model interpretation module with global and local model interpretation
  • Automatic Visualization module

Problem types supported:

  • Regression (continuous target variable, for age, income, house price, loss prediction)
  • Binary classification (0/1 or “N”/”Y”, for fraud prediction, churn prediction, failure prediction, etc.)
  • Multinomial classification (0/1/2/3 or “A”/”B”/”C”/”D” for categorical target variables, for prediction of membership type, next-action, product recommendation, etc.)

Data types supported:

  • Tabular structured data, rows are observations, columns are fields/features/variables
  • i.i.d. (identically and independently distributed) data
  • Numeric, categorical and textual fields
  • Missing values are allowed
  • Time-series data with a single time-series (time flows across the entire dataset, not per block of data)

Data types NOT supported:

  • Image/video/audio
  • Grouped time-series (e.g., sales per item per store per hour, all in one file)

File formats supported:

  • Plain text formats of columnar data (.csv, .tsv, .txt)
  • Compressed archives (.zip, .gz)

Architecture

DAI architecture

DAI architecture

Roadmap

DAI roadmap

DAI roadmap

ChangeLog

Version 1.0.15 (Jan 11 2018)

  • Fix truncated per-experiment log file
  • Various bug fixes

Version 1.0.14 (Jan 11 2018)

  • Improved performance

Version 1.0.13 (Jan 10 2018)

  • Improved estimate of generalization performance for final ensemble by removing leakage from target encoding
  • Added API for re-fitting and applying feature engineering on new (potentially larger) data
  • Remove access to pre-transformed datasets to avoid unintended leakage issues downstream
  • Added mean absolute percentage error (MAPE) scorer
  • Enforce monotonicity constraints for binary classification and regression models if interpretability >= 6
  • Use squared Pearson correlation for R^2 metric (instead of coefficient of determination) to avoid negative values
  • Separated http and tcp scoring pipeline examples
  • Reduced size of h2oai_client wheel
  • No longer require weight column for test data if it was provided for training data
  • Improved accuracy of final modeling pipeline
  • Include H2O-3 logs in downloadable logs.zip
  • Updated to H2O-3 3.16.0.2
  • Various bug fixes

Version 1.0.11 (Dec 12 2017)

  • Faster multi-GPU training, especially for small data
  • Increase default amount of exploration of genetic algorithm for systems with fewer than 4 GPUs
  • Improved accuracy of generalization performance estimate for models on small data (< 100k rows)
  • Faster abort of experiment
  • Improved final ensemble meta-learner
  • More robust date parsing
  • Various bug fixes

Version 1.0.10 (Dec 4 2017)

  • Tool tips and link to documentation in parameter settings screen
  • Faster training for multi-class problems with > 5 classes
  • Experiment summary displayed in GUI after experiment finishes
  • Python Client Library downloadable from the GUI
  • Speedup for Maxwell-based GPUs
  • Support for multinomial AUC and Gini scorers
  • Add MCC and F1 scorers for binomial and multinomial problems
  • Faster abort of experiment
  • Various bug fixes

Version 1.0.9 (Nov 29 2017)

  • Support for time column for causal train/validation splits in time-series datasets
  • Automatic detection of the time column from temporal correlations in data
  • MLI improvements, dedicated page, selection of datasets and models
  • Improved final ensemble meta-learner
  • Test set score now displayed in experiment listing
  • Original response is preserved in exported datasets
  • Various bug fixes

Version 1.0.8 (Nov 21 2017)

  • Various bug fixes

Version 1.0.7 (Nov 17 2017)

  • Sharing of GPUs between experiments - can run multiple experiments at the same time while sharing GPU resources
  • Persistence of experiments and data - can stop and restart the application without loss of data
  • Support for weight column for optional user-specified per-row observation weights
  • Support for fold column for user-specified grouping of rows in train/validation splits
  • Higher accuracy through model tuning
  • Faster training - overall improvements and optimization in model training speed
  • Separate log file for each experiment
  • Ability to delete experiments and datasets from the GUI
  • Improved accuracy for regression tasks with very large response values
  • Faster test set scoring - Significant improvements in test set scoring in the GUI
  • Various bug fixes

Version 1.0.5 (Oct 24 2017)

  • Only display scorers that are allowed
  • Various bug fixes

Version 1.0.4 (Oct 19 2017)

  • Improved automatic type detection logic
  • Improved final ensemble accuracy
  • Various bug fixes

Version 1.0.3 (Oct 9 2017)

  • Various speedups
  • Results are now reproducible
  • Various bug fixes

Version 1.0.2 (Oct 5 2017)

  • Improved final ensemble accuracy
  • Weight of Evidence features added
  • Various bug fixes

Version 1.0.1 (Oct 4 2017)

  • Improved speed of final ensemble
  • Various bug fixes

Version 1.0.0 (Sep 24 2017)

  • Initial stable release