H2O Driverless AI Release Notes¶
H2O Driverless AI is a high-performance, GPU-enabled, client-server application for the rapid development and deployment of state-of-the-art predictive analytics models. It reads tabular data from various sources and automates data visualization, grand-master level automatic feature engineering, model validation (overfitting and leakage prevention), model parameter tuning, model interpretability and model deployment. H2O Driverless AI is currently targeting common regression, binomial classification, and multinomial classification applications including loss-given-default, probability of default, customer churn, campaign response, fraud detection, anti-money-laundering, and predictive asset maintenance models. It also handles time-series problems for individual or grouped time-series such as weekly sales predictions per store and department, with time-causal feature engineering and validation schemes. The ability to model unstructured data is coming soon.
High-level capabilities:
Client/server application for rapid experimentation and deployment of state-of-the-art supervised machine learning models
User-friendly GUI
Python and R client API
Automatically creates machine learning modeling pipelines for highest predictive accuracy
Automates data cleaning, feature selection, feature engineering, model selection, model tuning, ensembling
Automatically creates stand-alone batch scoring pipeline for in-process scoring or client/server scoring via HTTP or TCP protocols in Python
Automatically creates stand-alone (MOJO) low latency scoring pipeline for in-process scoring or client/server scoring via HTTP or TCP protocols, in C++ (with R and Python runtimes) and Java (runs anywhere)
Multi-GPU and multi-CPU support for powerful workstations and NVidia DGX supercomputers
Machine Learning model interpretation module with global and local model interpretation
Automatic Visualization module
Multi-user support
Backward compatibility
Problem types supported:
Regression (continuous target variable like age, income, price or loss prediction, time-series forecasting)
Binary classification (0/1 or “N”/”Y”, for fraud prediction, churn prediction, failure prediction, etc.)
Multinomial classification (“negative”/”neutral”/”positive” or 0/1/2/3 or 0.5/1.0/2.0 for categorical target variables, for prediction of membership type, next-action, product recommendation, sentiment analysis, etc.)
Data types supported:
Tabular structured data, rows are observations, columns are fields/features/variables
Numeric, categorical and textual fields
Missing values are allowed
i.i.d. (identically and independently distributed) data
Time-series data with a single time-series (time flows across the entire dataset, not per block of data)
Grouped time-series (e.g., sales per store per department per week, all in one file, with 3 columns for store, dept, week)
Time-series problems with a gap between training and testing (i.e., the time to deploy), and a known forecast horizon (after which model has to be retrained)
Data types supported via custom recipes:
Image
Video
Audio
Graphs
Data sources supported:
Local file system or NFS
File upload from browser or Python client
S3 (Amazon)
Hadoop (HDFS)
Azure Blob storage
Blue Data Tap
Google BigQuery
Google Cloud storage
kdb+
Minio
Snowflake
JDBC
Custom Data Recipe BYOR (Python, bring your own recipe)
File formats supported:
Plain text formats of columnar data (.csv, .tsv, .txt)
Compressed archives (.zip, .gz, .bz2)
Excel
Parquet
Feather
Python datatable (.jay)
Architecture¶
Roadmap¶
Change Log¶
Version 1.8.10 LTS (Feb 23, 2021)¶
New Features:
Exposed new Python client for download in resources menu
Added support for .avro file format
Added option to use a custom AutoDoc template to generate one or more AutoReports for every experiment with the
autodoc_template
config.toml setting
MOJO updates:
Upgraded MOJO runtime dependency to 2.5.10
Added MOJO support to compute Shapley for Tree and Linear-based boosting models. Note that support for Shapley values for ensemble GLM models is currently in beta
Improvements:
Added more verbosity to MLI logs
Bug Fixes:
Fixed stall detected in LightGBM models on P2.8x Amazon EC2 instances
Version 1.8.9 LTS (Oct 19, 2020)¶
New Features:
Add configurable CSRF (Cross-site request forgery) protection on API endpoints
Add protection against concurrent sessions
Improvements:
Hide webserver technology info from all API endpoints
Improved BYOR security by introducing configurable static analysis of the code
Improved session verification and authenticity
Improved security for internals API handlers via encryption
Bug Fixes:
Fix user session autologout after session expiration
Fix for properly cleaning closed sessions
Fix invalid redirection to static artifacts when using reverse proxy and URL prefix
Fix import of files without extension
Version 1.8.8 LTS (Sep 30, 2020)¶
New Features:
Give user control over number of saved variable importances (i.e., Python and R clients can get more than 14 values back) (
max_varimp_to_save
in config.toml file)Added holiday calendar for 24 more countries, allow user to select list of countries to create is-holiday features for time series experiments
Enable GPU support for LightGBM models on IBM Power
Support rhel8-like systems
Introducing an option to log in using the JWT token injected by the reverse proxy
Allow user to specify data delimiter/separator from configuration (
datatable_separator
see config.toml file)Add support of encrypted keystore for sensitive config.toml values
Save transformed column names for Shapely value computation in MOJO
Improvements:
Add more consistency in handling files without an extension
Improve web server request handling and disallow redirection outside of application
Improve log file formatting to facilitate parsing
Improve logging for connectors
Improve air-gapped support for custom recipes
Allow Snowflake Stage tables to be optional
Bug Fixes:
Fix OpenID and TLS login redirection when deploying behind reverse proxy
Fix Cgroup memory detection on IBM Power
Various MLI fixes
Various UI fixes
Documentation updates:
Added configuration security document
Added details about MOJO Scoring Command-Line Options
Added details about exporting artifacts to Bitbucket
Added details about GPU usage in Driverless AI
Improved install docs (RHEL 8 support)
Improved description of transformed feature names
Version 1.8.7.1 LTS (June 23, 2020)¶
New Features:
Add ability to push artifacts to a Bitbucket server
Add per-feature user control for monotonicity constraints for XGBoostGBM, LightGBM and DecisionTree models
Bug Fixes:
Fix Hive kerberos impersonation
Fix a DTap connector issue by using the proper login username for impersonation
Fix monotonicity constraints for XGBoostGBM, LightGBM and DecisionTree models
Version 1.8.7 LTS (June 15, 2020)¶
New Features:
Add intercept term to k-LIME csv
Add control of default categorical & numeric feature rendering in DAI PD/ICE
Add ability to restrict custom recipe upload to a specific git repository and branch
Add translations for Korean and Chinese
Add ability to use multiple authentication methods simultaneously
Improvements:
Improve behavior of systemctl in the case Driverless AI fails to start
Improve logging behavior for JDBC and Hive connectors
Improve behavior of C++ scorer, fewer unnecessary files saved in tmp directory
Improve Docker image behavior in Kubernetes
Improve LDAP authentication to allow for anonymous binding
Improve speed of feature selection for experiments on large, wide, imbalanced datasets
Improve speed of data import on busy system
Bug fixes:
Fix automatic Kaggle submission and score retrieval
Fix intermittent Java exception seen by surrogate DRF model in MLI when several MLI jobs are run concurrently
Fix issue with deleting Deployments if linked Experiment was deleted
Fix issue causing Jupyter Notebooks to not work properly in Docker Image
Fix custom recipe scorers not being displayed on Diagnostics page
Fix issue with AWS Lambda Deployment not handling dropped columns properly
Fix issue with not being able to limit number of GPUs for specific experiment
Fix in-server scoring inaccuracies for certain models built in 1.7.1 and 1.8.0 (standalone scoring not affected)
Fix rare datatable type casting exception
Documentation updates:
The “Maximum Number of Rows to Perform Permutation-Based Feature Selection” expert setting now has a default value of 500,000
Improved Hive and Snowflake connector documentation
Updated the Main.java example in the Java Scoring Pipeline chapter
Added documentation describing how to change the language in the UI before starting the application
Added information about how custom recipes are described and documented in the Autoreport
Updated the LDAP authentication documentation
Improved the Linux DEB and RPM installation instructions
Improved the AWS Community AMI installation instructions
Improved documentation for the Reproducible button
Version 1.8.6 LTS (Apr 30, 2020)¶
New Features:
Add expert setting to reduce size of MOJO scoring pipelines (and hence reduce latency and memory usage for inference)
Enable Lambda deployment for IBM Power
Add restart button for Deployments
Add automatic Kaggle submission for supported datasets, show private/public scores (requires Kaggle API Username/Key)
Show warning if single final model is worse on back-testing splits (for time series) or cross-validation folds (for IID) than the fold models (indicates issue with signal or fit)
Update R client API to include autodoc, experiment preview, dataset download, autovis functions
Add button in expert settings that toggle some effective settings to make a small MOJO production pipeline
Add an option to upload artifacts to S3 or a Git repository
Improvements:
Improve experiment restart/refit robustness if model type is changed
Extra protection against dropping features
Improve implementation of Hive connector
Bug fixes:
Upgrade datatable to fix endless loop during stats calculation at file import
Web server and UI now respect dynamic base URL suffix
Fix incorrect min_rows in MLI when providing weight column with small values
Fix segfault in MOJO for TensorFlow/PyTorch models
Fix elapsed time for MLI
Enable GPU by default for R client
Fix python scoring h2oai ModuleNotFound error
Update no_drop_features toml and expert button to more generally avoid dropping features
Fix datatable mmap strategy
Documentation updates:
Add documentation for enabling the Hive data connector
Add documentation for updating expired DAI licenses on AWS Lambda deployments using a script
Documentation for uploading artifacts now includes support for S3 and Git in the artifacts store
Improve documentation for one-hot encoding
Improve documentation for systemd logs/journalctl
Improve documentation for time series ‘unavailable columns at prediction time’
Improve documentation for Azure blob storage
Improve documentation for MOJO scoring pipeline
Add information about reducing the size of a MOJO using a new expert setting
Version 1.8.5 LTS (Mar 09, 2020)¶
New Features:
Handle large (up to 10k) multiclass problems, including GUI improvements in such cases
Detect class imbalance for binary problems where target class is non-rare
Add feature count to iteration panel
Add experiment lineage pdf in experiment summary zip file
Issue warnings if final pipeline scores are unstable across (cross-)validation folds
Issue warning if Constant Model is improving quality of final pipeline (sign of bad signal)
Report origin of leakage detection as from model fit (AUC/R2), GINI, or correlation
Improvements:
Improve handling of ID columns
Improve exception handling to improve stability of raising python exceptions
Improve exception handling when any individual transformer or model throw exception or segfaults
Improve robustness of restart and refit experiment to changes in experiment choices
Improve handling of missing values when transforming dataset
Improve robustness of custom recipe importing of modules
Improve documentation for installation instructions
Improve selection of initial lag sizes for time series
Improve LightGBM stability for regression problems for certain mutation parameters
Documentation updates:
Improved documentation for time-series experiments
Added topics describing how to re-enable the Data Recipe URL and Data Recipe File connectors
For users running older versions of the Standalone Python Scoring Pipeline, added information describing how to install upgraded versions of outdated dependencies
Improved the description for the “Sampling Method for Imbalanced Binary Classification Problems” expert setting
Added constraints related to the REST server deployments
Noted required vs optional parameters in the HDFS connector topics
Added an FAQ indicating that MOJOs are thread safe
On Windows 10, only Docker installs are supported
Added information about the Recommendations AutoViz graph
Added information to the Before you Begin Installing topic that master.db files are not backward compatible with earlier Driverless AI versions
Bug fixes:
Update LightGBM for bug fixes, including hangs and avoiding hard-coded library paths
Stabilize use of psutil package
Fix time-series experiments when test set has missing target values
Fix python scoring to not depend upon original data_directory
Fix preview for custom time series validation splits and low accuracy
Fix ignored minimum lag size setting for single time series
Fix parsing of Excel files with datetime columns
Fix column type detection for columns with mostly missing values
Fix invalid display of 0.0000 score in iteration scores
Various MLI fixes (don’t show invalid graphs, fix PDP sort order, overlapping labels)
Various bug fixes
Version 1.8.4.1 LTS (Feb 4, 2020)¶
Add option for dynamic port allocation
Documentation for AWS community AMI
Various bug fixes (MLI UI)
Version 1.8.4 LTS (Jan 31, 2020)¶
New Features:
Added ‘Scores’ tab in experiment page to show detailed tuning tables and scores for models and folds
Added Constant Model (constant predictions) and use it as reference model by default
Show score of global constant predictions in experiment summary as reference
Added support for setting up mutual TLS for the DriverlessAI
Added option to use client/personal certificate as an authentication method
Documentation Updates:
Added sections for enabling mTLS and Client Certificate authentication
Constant Models is now included in the list of Supported Algorithms
Added a section describing the Model Scores page
Improved the C++ Scoring Pipeline documentation describing the process for importing datatable
Improved documentation for the Java Scoring Pipeline
Bug fixes:
Fix refitting of final pipeline when new features are added
Various bug fixes
Version 1.8.3 LTS (Jan 22, 2020)¶
Added option to upload experiment artifacts to a configured disk location
Various bug fixes (correct feature engineering from time column, migration for brain restart)
Version 1.8.2 LTS (Jan 17, 2020)¶
New Features:
Decision Tree model
Automatically enabled for accuracy <= 7 and interpretability >= 7
Supports all problem types: regression/binary/multiclass
Using LightGBM GPU/CPU backend with MOJO
Visualization of tree splits and leaf node decisions as part of pipeline visualization
Per-Column Imputation Scheme (experimental)
Select one of [const, mean, median, min, max, quantile] imputation scheme at start of experiment
Select method of calculation of imputation value: either on entire dataset or inside each pipeline’s training data split
Disabled by default and must be enabled at startup time to be effective
Show MOJO size and scoring latency (for C++/R/Python runtime) in experiment summary
Automatically prune low weight base models in final ensemble (based on interpretability setting) to reduce final model complexity
Automatically convert non-raw github URLs for custom recipes to raw source code URLs
Improvements:
Speed up feature evolution for time-series and low-accuracy experiments
Improved accuracy of feature evolution algorithm
Feature transformer interpretability, total count, and importance accounted for in genetic algorithm’s model and feature selection
Binary confusion matrix in ROC curve of experiment page is made consistent with Diagnostics (flipped positions of TP/TN)
Only include custom recipes in Python scoring pipeline if the experiment uses any custom recipes
Additional documentation (New OpenID config options, JDBC data connector syntax)
Improved AutoReport’s transformer descriptions
Improved progress reporting during Autoreport creation
Improved speed of automatic interaction search for imbalanced multiclass problems
Improved accuracy of single final model for GLM and FTRL
Allow config_overrides to be a list/vector of parameters for R client API
Disable early stopping for Random Forest models by default, and expose new ‘rf_early_stopping’ mode (optional)
Create identical example data (again, as in 1.8.0 and before) for all scoring pipelines
Upgraded versions of datatable and Java
Installed graphviz in Docker image, now get .png file of pipeline visualization in MOJO package and Autoreport. Note: For RPM/DEB/TAR SH installs, user can install graphviz to get this optional functionality
Documentation Updates:
Added a simple example for modifying a dataset by recipe using live code
Added a section describing how to impute datasets (experimental)
Added Decision Trees to list of supported algorithms
Fixed examples for enabling JDBC connectors
Added information describing how to use a JDBC driver that is not tested in house
Updated the Missing Values Handling topic to include sections for “Clustering in Transformers” and “Isolation Forest Anomaly Score Transformer”
Improved the “Fold Column” description
Bug Fixes:
Fix various reasons why final model score was too far off from best feature evolution score
Delete temporary files created during test set scoring
Fixed target transformer tuning (was potentially mixing up target transformers between feature evolution and final model)
Fixed tensorflow_nlp_have_gpus_in_production=true mode
Fixed partial dependence plots for missing datetime values and no longer show them for text columns
Fixed time-series GUI for quarterly data
Feature transformer exploration limited to no more than 1000 new features (Small data on 10/10/1 would try too many features)
Fixed Kaggle pipeline building recipe to try more input features than 8
Fixed cursor placement in live code editor for custom data recipe
Show correct number of cross-validation splits in pipeline visualization if have more than 10 splits
Fixed parsing of datetime in MOJO for some datetime formats without ‘%d’ (day)
Various bug fixes
Backward/Forward compatibility:
Models built in 1.8.2 LTS will remain supported in upcoming versions 1.8.x LTS
Models built in 1.7.1/1.8.0/1.8.1 are not deprecated and should continue to work (best effort is made to preserve MOJO and Autoreport creation, MLI, scoring, etc.)
Models built in 1.7.0 or earlier will be deprecated
Version 1.8.1.1 (Dec 21, 2019)¶
Bugfix for time series experiments with quarterly data when launched from GUI
Version 1.8.1 (Dec 10, 2019)¶
New Features:
Full set of scoring metrics and corresponding downloadable holdout predictions for experiments with single final models (time-series or i.i.d)
MLI Updates:
What-If (sensitivity) analysis
Interpretation of experiments on text data (NLP)
Custom Data Recipe BYOR:
BYOR (bring your own recipe) in Python: pandas, numpy, datatable, third-party libraries for fast prototyping of connectors and data preprocessing inside DAI
data connectors, cleaning, filtering, aggregation, augmentation, feature engineering, splits, etc.
can create one or multiple datasets from scratch or from existing datasets
interactive code editor with live preview
example code at https://github.com/h2oai/driverlessai-recipes/tree/rel-1.8.1/data
Visualization of final scoring pipeline (Experimental)
In-GUI display of graph of feature engineering, modeling and ensembling steps of entire machine learning pipeline
Addition to Autodoc
Time-Series:
Ability to specify which features will be unavailable at test time for time-series experiments
Custom user-provided train/validation splits (by start/end datetime for each split) for time-series experiments
Back-testing metrics for time-series experiments (regression and classification, with and without lags) based on rolling windows (configurable number of windows)
MOJO:
Java MOJO for FTRL
PyTorch MOJO (C++/Py/R) for custom recipes based on BERT/DistilBERT NLP models (available upon request)
Improvements:
Accuracy:
Automatic pairwise interaction search (+,-,*,/) for numeric features (“magic feature” finder)
Improved accuracy for time series experiments with low interpretability
Improved leakage detection logic
Improved genetic algorithm heuristics for feature evolution (more exploration)
Time-Series Recipes:
Re-enable Test-time augmentation in Python scoring pipeline for time-series experiments
Reduce default number of time-series rolling holdout predictions to same number as validation splits (but configurable)
Computation:
Faster feature evolution part for non-time-series experiments with single final model
Faster binary imbalanced models for very high class imbalance by limiting internal number of re-sampling bags
Faster feature selection
Enable GPU support for ImbalancedXGBoostGBMModel
Improved speed for importing multiple files at once
Faster automatic determination of time series properties
Enable use of XGBoost models on large datasets if low enough accuracy settings, expose dataset size limits in expert settings
Reduced memory usage for all experiments
Faster creation of holdout predictions for time-series experiments (Shapley values are now computed by MLI on demand by default)
UX Improvements:
Added ability to rename datasets
Added search bar for expert settings
Show traces for long-running experiments
All experiments create a MOJO (if possible, set to ‘auto’)
All experiments create a pipeline visualization
By default, all experiments (iid and time series) have holdout predictions on training data and a full set of metrics for final model
Documentation Updates:
Updated steps for enabling GPU persistence mode
Added information about deprecated NVIDIA functions
Improved documentation for enabling LDAP authentication
Added information about changing the column type in datasets
Updated list of experiment artifacts available in an experiment summary
Added steps describing how to expose ports on Docker for the REST service deployment within the Driverless AI Docker container
Added an example showing how to run an experiment with a custom transform recipe
Improved the FAQ for setting up TLS/SSL
Added FAQ describing issues that can occur when attempting Import Folder as File with a data connector on Windows
Bug Fixes:
Allow brain restart/refit to accept unscored previous pipelines
Fix actual vs predicted labeling for diagnostics of regression model
Fix MOJO for TensorFlow for non target transformers other than identity
Fix column type detection for Excel files
Allow experiments with default expert settings to have a MOJO
Various bug fixes
Version 1.8.0 (Oct 3, 2019)¶
Improve speed and memory usage for feature engineering
Improve speed of leakage and shift detection, and improve accuracy
Improve speed of AutoVis under high system load
Improve speed for experiments with large user-given validation data
Improve accuracy of ensembles for regression problems
Improve creation of Autoreport (only one background job per experiment)
Improve sampling techniques for ImbalancedXGBoost and ImbalancedLightGBM models, and disable them by default since can be slower
Add Python/R/C++ MOJO support for FTRL and RandomForest
Add native categorical handling for LightGBM in CPU mode
Add monotonicity constraints support for LightGBM
Add Isolation Forest Anomaly Score transformer (outlier detection)
Re-enable One-Hot-Encoding for GLM models
Add lexicographical label encoding (disabled by default)
Add ability to further train user-provided pretrained embeddings for TensorFlow NLP transformers, in addition to fine-tuning the rest of the neural network graph
Add timeout for BYOR acceptance tests
Add log and notifications for large shifts in final model variable importances compared to tuning model
Add more expert control over time series feature engineering
Add ability for recipes to be uploaded in bulk as entire (or part of) github repository or as links to python files on page
Allow missing values in fold column
Add support for feature brain when starting “New Model With Same Parameters” of a model that was previously restarted
Add support for toggling whether additional features are to be included in pipeline during “Retrain Final Pipeline”
Limit experiment runtime to one day by default (approximately enforced, can be configured in Expert Settings -> Experiment or config.toml ‘max_runtime_minutes’)
Add support for importing pickled Pandas frames (.pkl)
MLI updates:
Show holdout predictions and test set predictions (if applicable) in MLI TS for both metric and actual vs. predicted charts
Add ability to download group metrics in MLI TS
Add ability to zoom into charts in MLI TS
Add ability to use column not used in DAI model as a k-LIME cluster column in MLI
Add ability to view original and transformed DAI model-based feature importance in MLI
Add ability to view Shapley importance for original features
Add ability to view permutation importance for a DAI model when the config option autodoc_include_permutation_feature_importance is set to on
Fixed bug in binary Disparate Impact Analysis, which caused incorrect calculations amongst several metrics (ones using false positives and true negatives in the numerator)
Disable NLP TensorFlow transformers by default (enable in NLP expert settings by switching to “on”)
Reorganize expert settings, add tab for feature engineering
Experiment now informs if aborted by user, system or server restart
Reduce load of all tasks launched by server, giving priority to experiments to use cores
Add experiment summary files to aborted experiment logs
Add warning when ensemble has models that reach limit of max iterations despite early stopping, with learning rate controls in expert panel to control.
Improve progress reporting
Allow disabling of H2O recipe server for scoring if not using custom recipes (to avoid Java dependency)
Fix RMSPE scorer
Fix recipes error handling when uploading via URL
Fix Autoreport being spawned anytime GUI was on experiment page, overloading the system with forks from the server
Fix time-out for Autoreport PDP calculations, so completes more quickly
Fix certain config settings to be honored from GUI expert settings (woe_bin_list, ohe_bin_list, text_gene_max_ngram, text_gene_dim_reduction_choice, tensorflow_max_epochs_nlp, tensorflow_nlp_pretrained_embeddings_file_path, holiday_country), previously were only honored when provided at startup time
Fix column type for additional columns during scored test set download
Fix GUI incorrectly converting time for forecast horizon in TS experiments
Fix calculation of correlation for string columns in AutoVis
Fix download for R MOJO runtime
Fix parameters for LightGBM RF mode
Fix dart parameters for LightGBM and XGBoost
Documentation updates:
Included more information in the Before You Begin Installing or Upgrading topic to help making installations and upgrades go more smoothly
Added topic describing how to choose between the AWS Community and AWS Marketplace AMIs
Added information describing how to retrieve the MOJO2 Javadoc
Updated Python client examples to work with Driverless AI 1.7.x releases
Updated documentation for new features, expert settings, MLI plots, etc.
Backward/Forward compatibility:
Models built in 1.8.0 will remain supported in versions 1.8.x
Models built in 1.7.1 are not deprecated and should continue to work (best effort is made to preserve MOJO and Autoreport creation, MLI, scoring, etc.)
1.8.0 upgraded to scipy version 1.3.1 to support newer custom recipes. This might deprecate custom recipes that depend on scipy version 1.2.2 (and experiments using them) and might require re-import of those custom recipes. Previously built Python scoring pipelines will continue to work.
Models built in 1.7.0 or earlier will be deprecated
Various bug fixes
Version 1.7.1 (Aug 19, 2019)¶
Added two new models with internal sampling techniques for imbalanced binary classification problems: ImbalancedXGBoost and ImbalancedLightGBM
Added support for rolling-window based predictions for time-series experiments (2 options: test-time augmentation or re-fit)
Added support for setting logical column types for a dataset (to override type detection during experiments)
Added ability to set experiment name at start of experiment
Added leakage detection for time-series problems
Added JDBC connector
MOJO updates:
Added Python/R/C++ MOJO support for TensorFlow model
Added Python/R/C++ MOJO support for TensorFlow NLP transformers: TextCNN, CharCNN, BiGRU, including any pretrained embeddings if provided
Reduced memory usage for MOJO creation
Increased speed of MOJO creation
Configuration options for MOJO and Python scoring pipelines now have 3-way toggle: “on”/”off”/”auto”
MLI updates:
Added disparate impact analysis (DIA) for MLI
Allow MLI scoring pipeline to be built for datasets with column names that need to be sanitized
Date-aware binning for partial dependence and ICE in MLI
Improved generalization performance for time-series modeling with regulariation techniques for lag-based features
Improved “predicted vs actual” plots for regression problems (using adaptive point sizes)
Fix bug in datatable for manipulations of string columns larger than 2GB
Fixed download of predictions on user-provided validation data
Fix bug in time-series test-time augmentation (work-around was to include entire training data in test set)
Honor the expert settings flag to enable detailed traces (disable again by default)
Various bug fixes
Version 1.6.4 LTS (Aug 19, 2019)¶
ML Core updates:
Speed up schema detection
DAI now drops rows with missing values when diagnosing regression problems
Speed up column type detection
Fixed growth of individuals
Fixed n_jobs for predict
Target column is no longer included in predictors for skewed datasets
Added an option to prevent users from downloading data files locally
Improved UI split functionality
A new “max_listing_items” config option to limit the number of items fetched in listing pages
Model Ops updates:
MOJO runtime upgraded to version 2.1.3 which supports perpetual MOJO pipeline
Upgraded deployment templates to version matching MOJO runtime version
MLI updates:
Fix to MLI schema builder
Fix parsing of categorical reason codes
Added ability to handle integer time column
Various bug fixes
Version 1.7.0 (Jul 7, 2019)¶
Support for Bring Your Own Recipe (BYOR) for transformers, models (algorithms) and scorers
Added protobuf-based MOJO scoring runtime libraries for Python, R and Java (standalone, low-latency)
Added local REST server as one-click deployment option for MOJO scoring pipeline, in addition to AWS Lambda endpoint
Added R client package, in addition to Python client
Added Project workspace to group datasets and experiments and to visually compare experiments and create leaderboards
Added download of imported datasets as .csv
Recommendations for columnar transformations in AutoViz
Improved scalability and performance
Ability to provide max. runtime for experiments
Create MOJO scoring pipeline by default if the experiment configuration allows (for convenience, enables local/cloud deployment options without user input)
Support for user provided pre-trained embeddings for TensorFlow NLP models
Support for holdout splits lacking some target classes (can happen when a fold column is provided)
MLI updates:
Added residual plot for regression problems (keeping all outliers intact)
Added confusion matrix as default metric display for multinomial problems
Added Partial Dependence (PD) and Individual Conditional Expectation (ICE) plots for Driverless.ai models in MLI GUI
Added ability to search by ID column in MLI GUI
Added ability to run MLI PD/ICE on all features
Added ability to handle multiple observations for a single time column in MLI TS by taking the mean of the target and prediction where applicable
Added ability to handle integer time column in MLI TS
MLI TS will use train holdout predictions if there is no test set provided
Faster import of files with “%Y%m%d” and “%Y%m%d%H%M” time format strings, and files with lots of text strings
Fix units for RMSPE scorer to be a percentage (multiply by 100)
Allow non-positive outcomes for MAPE and SMAPE scorers
Improved listing in GUI
Allow zooming in GUI
Upgrade to TensorFlow 1.13.1 and CUDA 10 (and CUDA is part of the distribution now, to simplify installation)
Add CPU-support for TensorFlow on PPC
Documentation updates:
Added documentation for new features including
Projects
Custom Recipes
C++ MOJO Scoring Pipelines
R Client API
REST Server Deployment
Added information about variable importance values on the experiments page
Updated documentation for Expert Settings
Updated “Tips n Tricks” with new Scoring Pipeline tips
Various bug fixes
Version 1.6.3 LTS (June 14, 2019)¶
Included an Audit log feature
Fixed support for decimal types for parquet files in MOJO
Autodoc can order PDP/ICE by feature importance
Session Management updates
Upgraded datatable
Improved reproducibility
Model diagnostics now uses a weight column
MLI can build surrogate models on all the original features or on all the transformed features that DAI uses
Internal server cache now respects usernames
Fixed an issue with time series settings
Fixed an out of memory error when loading a MOJO
Fixed Python scoring package for TensorFlow
Added OpenID configurations
Documentation updates:
Updated the list of artifacts available in the Experiment Summary
Clarified language in the documentation for unsupported (but available) features
For the Terraform requirement in deployments, clarified that only Terraform versions in the 0.11.x release are supported, and specifically 0.11.10 or greater
Fixed link to the Miniconda installation instructions
Various bug fixes
Version 1.6.2 LTS (May 10, 2019)¶
This version provides PPC64le artifacts
Improved stability of datatable
Improved path filtering in the file browser
Fixed units for RMSPE scorer to be a percentage (multiply by 100)
Fixed segmentation fault on Ubuntu 18 with installed font package
Fixed IBM Spectrum Conductor authentication
Fixed handling of EC2 machine credentials
Fixed of Lag transformer configuration
Fixed KDB and Snowflake Error Reporting
Gradually reduce number of used workers for column statistics computation in case of failure.
Hide default Tornado header exposing used version of Tornado
Documentation updates:
Added instructions for installing via AWS Marketplace
Improved documentation for installing via Google Cloud
Improved FAQ documentation
Added Data Sampling documentation topic
Various bug fixes
Version 1.6.1 LTS (Apr 18, 2019)¶
Several fixes for MLI (partial dependence plots, Shapley values)
Improved documentation for model deployment, time-series scoring, AutoVis and FAQs
Version 1.6.0 LTS (Apr 5, 2019)¶
Private build only.
Fixed import of string columns larger than 2GB
Fixed AutoViz crashes on Windows
Fixed quantile binning in MLI
Plot global absolute mean Shapley values instead of global mean Shapley values in MLI
Improvements to PDP/ICE plots in MLI
Validated Terraform version in AWS Lambda deployment
Added support for NULL variable importance in AutoDoc
Made Variable Importance table size configurable in AutoDoc
Improved support for various combinations of data import options being enabled/disabled
CUDA is now part of distribution for easier installation
Security updates:
Enforced SSL settings to be honored for all h2oai_client calls
Added config option to prevent using LocalStorage in the browser to cache information
Upgraded Tornado server version to 5.1.1
Improved session expiration and autologout functionality
Disabled access to Driverless AI data folder in file browser
Provided an option to filter content that is shown in the file browser
Use login name for HDFS impersonation instead of predefined name
Disabled autocomplete in login form
Various bug fixes
Version 1.5.4 (Feb 24, 2019)¶
Speed up calculation of column statistics for date/datetime columns using certain formats (now uses ‘max_rows_col_stats’ parameter)
Added computation of standard deviation for variable importances in experiment summary files
Added computation of shift of variable importances between feature evolution and final pipeline
Fix link to MLI Time-Series experiment
Fix display bug for iteration scores for long experiments
Fix display bug for early finish of experiment for GLM models
Fix display bug for k-LIME when target is skewed
Fix display bug for forecast horizon in MLI for Time-Series
Fix MLI for Time-Series for single time group column
Fix in-server scoring of time-series experiments created in 1.5.0 and 1.5.1
Fix OpenBLAS dependency
Detect disabled GPU persistence mode in Docker
Reduce disk usage during TensorFlow NLP experiments
Reduce disk usage of aborted experiments
Refresh reported size of experiments during start of application
Disable TensorFlow NLP transformers by default to speed up experiments (can enable in expert settings)
Improved progress percentage shown during experiment
Improved documentation (upgrade on Windows, how to create the simplest model, DTap connectors, etc.)
Various bug fixes
Version 1.5.3 (Feb 8, 2019)¶
Added support for splitting datasets by time via time column containing date, datetime or integer values
Added option to disable file upload
Require authentication to download experiment artifacts
Automatically drop predictor columns from training frame if not found in validation or test frame and warn
Improved performance by using physical CPU cores only (configurable in config.toml)
Added option to not show inactive data connectors
Various bug fixes
Version 1.5.2 (Feb 2, 2019)¶
Added world-level bidirectional GRU Tensorflow models for NLP features
Added character-level CNN Tensorflow models for NLP features
Added support to import multiple individual datasets at once
Added support for holdout predictions for time-series experiments
Added support for regression and multinomial classification for FTRL (in addition to binomial classification)
Improved scoring for time-series when test data contains actual target values (missing target values will be predicted)
Reduced memory usage for LightGBM models
Improved performance for feature engineering
Improved speed for TensorFlow models
Improved MLI GUI for time-series problems
Fix final model fold splits when fold_column is provided
Various bug fixes
Version 1.5.1 (Jan 22, 2019)¶
Fix MOJO for GLM
Add back .csv file of experiment summary
Improve collection of pipeline timing artifacts
Clean up Docker tag
Version 1.5.0 (Jan 18, 2019)¶
Added model diagnostics (interactive model metrics on new test data incl. residual analysis for regression)
Added FTRL model (Follow The Regularized Leader)
Added Kolmogorov-Smirnov metric (degree of separation between positives and negatives)
Added ability to retrain (only) the final model on new data
Added one-hot encoding for low-cardinality categorical features, for GLM
Added choice between 32-bit (now default) and 64-bit precision
Added system information (CPU, GPU, disk, memory, experiments)
Added support for time-series data with many more time gaps, and with weekday-only data
Added one-click deployment to Amazon Lambda
Added ability to split datasets randomly, with option to stratify by target column or group by fold column
Added support for OpenID authentication
Added connector for BlueData
Improved responsiveness of the GUI under heavy load situations
Improved speed and reduce memory footprint of feature engineering
Improved performance for RuleFit models and enable GPU and multinomial support
Improved auto-detection of temporal frequency for time-series problems
Improved accuracy of final single model if external validation provided
Improved final pipeline if external validation data is provided (add ensembling)
Improved k-LIME in MLI by using original features deemed important by DAI instead of all original features
Improved MLI by using 3-fold CV by default for all surrogate models
Improved GUI for MLI time series (integrated help, better integration)
Added ability to view MLI time series logs while MLI time series experiment is running
PDF version of the Automatic Report (AutoDoc) is now replaced by a Word version
Various bug fixes (GLM accuracy, UI slowness, MLI UI, AutoVis)
Version 1.4.2 (Dec 3, 2018)¶
Support for IBM Power architecture
Speed up training and reduce size of final pipeline
Reduced resource utilization during training of final pipeline
Display test set metrics (ROC, ROCPR, Gains, Lift) in GUI in addition to validation metrics (if test set provided)
Show location of best threshold for Accuracy, MCC and F1 in ROC curves
Add relative point sizing for scatter plots in AutoVis
Fix file upload and add model checkpointing in python client API
Various bug fixes
Version 1.4.1 (Nov 11, 2018)¶
Improved integration of MLI for time-series
Reduced disk and memory usage during final ensemble
Allow scoring and transformations on previously imported datasets
Enable checkpoint restart for unfinished models
Add startup checks for OpenCL platforms for LightGBM on GPUs
Improved feature importances for ensembles
Faster dataset statistics for date/datetime columns
Faster MOJO batch scoring
Fix potential hangs
Fix ‘not in list’ error in MOJO
Fix NullPointerException in MLI
Fix outlier detection in AutoVis
Various bug fixes
Version 1.4.0 (Oct 27, 2018)¶
Enable LightGBM by default (now with MOJO)
LightGBM tuned for GBM decision trees, Random Forest (rf), and Dropouts meet Multiple Additive Regression Trees (dart)
Add ‘isHoliday’ feature for time columns
Add ‘time’ column type for date/datetime columns in data preview
Add support for binary datatable file ingest in .jay format
Improved final ensemble (each model has its own feature pipeline)
Automatic smart checkpointing (feature brain) from prior experiments
Add kdb+ connector
Feature selection of original columns for data with many columns to handle >>100 columns
Improved time-series recipe (multiple validation splits, better logic)
Improved performance of AutoVis
Improved date detection logic (now detects %Y%m%d and %Y-%m date formats)
Automatic fallback to CPU mode if GPU runs out of memory (for XGBoost, GLM and LightGBM)
No longer require header for validation and testing datasets if data types match
No longer include text columns for data shift detection
Add support for time-series models in MLI (including ability to select time-series groups)
Add ability to download MLI logs from MLI experiment page (includes both Python and Java logs)
Add ability to view MLI logs while MLI experiment is running (Python and Java logs)
Add ability to download LIME and Shapley reason codes from MLI page
Add ability to run MLI on transformed features
Display all variables for MLI variable importance for both DAI and surrogate models in MLI summary
Include variable definitions for DAI variable importance list in MLI summary
Fix Gains/Lift charts when observations weights are given
Various bug fixes
Version 1.3.1 (Sep 12, 2018)¶
Fix ‘Broken pipe’ failures for TensorFlow models
Fix time-series problems with categorical features and interpretability >= 8
Various bug fixes
Version 1.3.0 (Sep 4, 2018)¶
Added LightGBM models - now have [XGBoost, LightGBM, GLM, TensorFlow, RuleFit]
Added TensorFlow NLP recipe based on CNN Deeplearning models (sentiment analysis, document classification, etc.)
Added MOJO for GLM
Added detailed confusion matrix statistics
Added more expert settings
Improved data exploration (columnar statistics and row-based data preview)
Improved speed of feature evolution stage
Improved speed of GLM
Report single-pass score on external validation and test data (instead of bootstrap mean)
Reduced memory overhead for data processing
Reduced number of open files - fixes ‘Bad file descriptor’ error on Mac/Docker
Simplified Python client API
Query any data point in the MLI UI from the original dataset due to “on-demand” reason code generation
Enhanced k-means clustering in k-LIME by only using a subset of features. See The K-LIME Technique for more information.
Report k-means centers for k-LIME in MLI summary for better cluster interpretation
Improved MLI experiment listing details
Various bug fixes
Version 1.2.2 (July 5, 2018)¶
MOJO Java scoring pipeline for time-series problems
Multi-class confusion matrices
AUCMACRO Scorer: Multi-class AUC via macro-averaging (in addition to the default micro-averaging)
Expert settings (configuration override) for each experiment from GUI and client APIs.
Support for HTTPS
Improved downsampling logic for time-series problems (if enabled through accuracy knob settings)
LDAP readonly access to Active Directory
Snowflake data connector
Various bug fixes
Version 1.2.1 (June 26, 2018)¶
Added LIME-SUP (alpha) to MLI as alternative to k-LIME (local regions are defined by decision tree instead of k-means)
Added RuleFit model (alpha), now have [GBM, GLM, TensorFlow, RuleFit] - TensorFlow and RuleFit are disabled by default
Added Minio (private cloud storage) connector
Added support for importing folders from S3
Added ‘Upload File’ option to ‘Add Dataset’ (in addition to drag & drop)
Predictions for binary classification problems now have 2 columns (probabilities per class), for consistency with multi-class
Improved model parameter tuning
Improved feature engineering for time-series problems
Improved speed of MOJO generation and loading
Improved speed of time-series related automatic calculations in the GUI
Fixed potential rare hangs at end of experiment
No longer require internet to run MLI
Various bug fixes
Version 1.2.0 (June 11, 2018)¶
Time-Series recipe
Low-latency standalone MOJO Java scoring pipelines (now beta)
Enable Elastic Net Generalized Linear Modeling (GLM) with lambda search (and GPU support), for interpretability>=6 and accuracy<=5 by default (alpha)
Enable TensorFlow (TF) Deep Learning models (with GPU support) for interpretability=1 and/or multi-class models (alpha, enable via config.toml)
Support for pre-tuning of [GBM, GLM, TF] models for picking best feature evolution model parameters
Support for final ensemble consisting of mix of [GBM, GLM, TF] models
Automatic Report (AutoDoc) in PDF and Markdown format as part of summary zip file
Interactive tour (assistant) for first-time users
MLI now runs on experiments from previous releases
Surrogate models in MLI now use 3 folds by default
Improved small data recipe with up to 10 cross-validation folds
Improved accuracy for binary classification with imbalanced data
Additional time-series transformers for interactions and aggreations between lags and lagging of non-target columns
Faster creation of MOJOs
Progress report during data ingest
Normalize binarized multi-class confusion matrices by class count (global scaling factor)
Improved parsing of boolean environment variables for configuration
Various bug fixes
Version 1.1.6 (May 29, 2018)¶
Improved performance for large datasets
Improved speed and user interface for MLI
Improved accuracy for binary classification with imbalanced data
Improved generalization estimate for experiments with given validation data
Reduced size of experiment directories
Support for Parquet files
Support for bzip2 compressed files
Added Data preview in UI: ‘Describe’
No longer add ID column to holdout and test set predictions for simplicity
Various bug fixes
Version 1.1.4 (May 17, 2018)¶
Native builds (RPM/DEB) for 1.1.3
Version 1.1.3 (May 16, 2018)¶
Faster speed for systems with large CPU core counts
Faster and more robust handling of user-specified missing values for training and scoring
Same validation scheme for feature engineering and final ensemble for high enough accuracy
MOJO scoring pipeline for text transformers
Fixed single-row scoring in Python scoring pipeline (broken in 1.1.2)
Fixed default scorer when experiment is started too quickly
Improved responsiveness for time-series GUI
Improved responsiveness after experiment abort
Improved load balancing of memory usage for multi-GPU XGBoost
Improved UI for selection of columns to drop
Various bug fixes
Version 1.1.2 (May 8, 2018)¶
Support for automatic time-series recipe (alpha)
Now using Generalized Linear Model (GLM) instead of XGBoost (GBM) for interpretability 10
Added experiment preview with runtime and memory usage estimation
Added MER scorer (Median Error Rate, Median Abs. Percentage Error)
Added ability to use integer column as time column
Speed up type enforcement during scoring
Support for reading ARFF file format (alpha)
Quantile Binning for MLI
Various bug fixes
Version 1.1.1 (April 23, 2018)¶
Support string columns larger than 2GB
Version 1.1.0 (April 19, 2018)¶
AWS/Azure integration (hourly cloud usage)
Bug fixes for MOJO pipeline scoring (now beta)
Google Cloud storage and BigQuery (alpha)
Speed up categorical column stats computation during data import
Further improved memory management on GPUs
Improved accuracy for MAE scorer
Ability to build scoring pipelines on demand (if not enabled by default)
Additional target transformer for regression problems sqrt(sqrt(x))
Add GLM models as candidates for interpretability=10 (alpha, disabled by default)
Improved performance of native builds (RPM/DEB)
Improved estimation of error bars
Various bug fixes
Version 1.0.30 (April 5, 2018)¶
Speed up MOJO pipeline creation and disable MOJO by default (still alpha)
Improved memory management on GPUs
Support for optional 32-bit floating-point precision for reduced memory footprint
Added logging of test set scoring and data transformations
Various bug fixes
Version 1.0.29 (April 4, 2018)¶
If MOJO fails to build, no MOJO will be available, but experiment can still succeed
Version 1.0.28 (April 3, 2018)¶
(Non-docker) RPM installers for RHEL7/CentOS7/SLES 12 with systemd support
Version 1.0.27 (March 31, 2018)¶
MOJO scoring pipeline for Java standalone cross-platform low-latency scoring (alpha)
Various bug fixes
Version 1.0.26 (March 28, 2018)¶
Improved performance and reduced memory usage for large datasets
Improved performance for F0.5, F2 and accuracy
Improved performance of MLI
Distribution shift detection now also between validation and test data
Batch scoring example using datatable
Various enhancements for AutoVis (outliers, parallel coordinates, log file)
Various bug fixes
Version 1.0.25 (March 22, 2018)¶
New scorers for binary/multinomial classification: F0.5, F2 and accuracy
Precision-recall curve for binary/multinomial classification models
Plot of actual vs predicted values for regression problems
Support for excluding feature transformations by operation type
Support for reading binary file formats: datatable and Feather
Improved multi-GPU memory load balancing
Improved display of initial tuning results
Reduced memory usage during creation of final model
Fixed several bugs in creation of final scoring pipeline
Various UI improvements (e.g., zooming on iteration scoreboard)
Various bug fixes
Version 1.0.24 (March 8, 2018)¶
Fix test set scoring bug for data with an ID column (introduced in 1.0.23)
Allow renaming of MLI experiments
Ability to limit maximum number of cores used for datatable
Print validation scores and error bars across final ensemble model CV folds in logs
Various UI improvements
Various bug fixes
Version 1.0.23 (March 7, 2018)¶
Support for Gains and Lift curves for binomial and multinomial classification
Support for multi-GPU single-model training for large datasets
Improved recipes for large datasets (faster and less memory/disk usage)
Improved recipes for text features
Increased sensitivity of interpretability setting for feature engineering complexity
Disable automatic time column detection by default to avoid confusion
Automatic column type conversion for test and validation data, and during scoring
Improved speed of MLI
Improved feature importances for MLI on transformed features
Added ability to download each MLI plot as a PNG file
Added support for dropped columns and weight column to MLI stand-alone page
Fix serialization of bytes objects larger than 4 GiB
Fix failure to build scoring pipeline with ‘command not found’ error
Various UI improvements
Various bug fixes
Version 1.0.22 (Feb 23, 2018)¶
Fix CPU-only mode
Improved robustness of datatable CSV parser
Version 1.0.21 (Feb 21, 2018)¶
Fix MLI GUI scaling issue on Mac
Work-around segfault in truncated SVD scipy backend
Various bug fixes
Version 1.0.20 (Feb 17, 2018)¶
HDFS/S3/Excel data connectors
LDAP/PAM/Kerberos authentication
Automatic setting of default values for accuracy / time / interpretability
Interpretability: per-observation and per-feature (signed) contributions to predicted values in scoring pipeline
Interpretability setting now affects feature engineering complexity and final model complexity
Standalone MLI scoring pipeline for Python
Time setting of 1 now runs for only 1 iteration
Early stopping of experiments if convergence is detected
ROC curve display for binomial and multinomial classification, with confusion matrices and threshold/F1/MCC display
Training/Validation/Test data shift detectors
Added AUCPR scorer for multinomial classification
Improved handling of imbalanced binary classification problems
Configuration file for runtime limits such as cores/memory/harddrive (for admins)
Various GUI improvements (ability to rename experiments, re-run experiments, logs)
Various bug fixes
Version 1.0.19 (Jan 28, 2018)¶
Fix hang during final ensemble (accuracy >= 5) for larger datasets
Allow scoring of all models built in older versions (>= 1.0.13) in GUI
More detailed progress messages in the GUI during experiments
Fix scoring pipeline to only use relative paths
Error bars in model summary are now +/- 1*stddev (instead of 2*stddev)
Added RMSPE scorer (RMS Percentage Error)
Added SMAPE scorer (Symmetric Mean Abs. Percentage Error)
Added AUCPR scorer (Area under Precision-Recall Curve)
Gracefully handle inf/-inf in data
Various UI improvements
Various bug fixes
Version 1.0.18 (Jan 24, 2018)¶
Fix migration from version 1.0.15 and earlier
Confirmation dialog for experiment abort and data/experiment deletion
Various UI improvements
Various AutoVis improvements
Various bug fixes
Version 1.0.17 (Jan 23, 2018)¶
Fix migration from version 1.0.15 and earlier (partial, for experiments only)
Added model summary download from GUI
Restructured and renamed logs archive, and add model summary to it
Fix regression in AutoVis in 1.0.16 that led to slowdown
Various bug fixes
Version 1.0.16 (Jan 22, 2018)¶
Added support for validation dataset (optional, instead of internal validation on training data)
Standard deviation estimates for model scores (+/- 1 std.dev.)
Computation of all applicable scores for final models (in logs only for now)
Standard deviation estimates for MLI reason codes (+/- 1 std.dev.) when running in stand-alone mode
Added ability to abort MLI job
Improved final ensemble performance
Improved outlier visualization
Updated H2O-3 to version 3.16.0.4
More readable experiment names
Various speedups
Various bug fixes
Version 1.0.15 (Jan 11, 2018)¶
Fix truncated per-experiment log file
Various bug fixes
Version 1.0.14 (Jan 11, 2018)¶
Improved performance
Version 1.0.13 (Jan 10, 2018)¶
Improved estimate of generalization performance for final ensemble by removing leakage from target encoding
Added API for re-fitting and applying feature engineering on new (potentially larger) data
Remove access to pre-transformed datasets to avoid unintended leakage issues downstream
Added mean absolute percentage error (MAPE) scorer
Enforce monotonicity constraints for binary classification and regression models if interpretability >= 6
Use squared Pearson correlation for R^2 metric (instead of coefficient of determination) to avoid negative values
Separated HTTP and TCP scoring pipeline examples
Reduced size of h2oai_client wheel
No longer require weight column for test data if it was provided for training data
Improved accuracy of final modeling pipeline
Include H2O-3 logs in downloadable logs.zip
Updated H2O-3 to version 3.16.0.2
Various bug fixes
Version 1.0.11 (Dec 12, 2017)¶
Faster multi-GPU training, especially for small data
Increase default amount of exploration of genetic algorithm for systems with fewer than 4 GPUs
Improved accuracy of generalization performance estimate for models on small data (< 100k rows)
Faster abort of experiment
Improved final ensemble meta-learner
More robust date parsing
Various bug fixes
Version 1.0.10 (Dec 4, 2017)¶
Tool tips and link to documentation in parameter settings screen
Faster training for multi-class problems with > 5 classes
Experiment summary displayed in GUI after experiment finishes
Python Client Library downloadable from the GUI
Speedup for Maxwell-based GPUs
Support for multinomial AUC and Gini scorers
Add MCC and F1 scorers for binomial and multinomial problems
Faster abort of experiment
Various bug fixes
Version 1.0.9 (Nov 29, 2017)¶
Support for time column for causal train/validation splits in time-series datasets
Automatic detection of the time column from temporal correlations in data
MLI improvements, dedicated page, selection of datasets and models
Improved final ensemble meta-learner
Test set score now displayed in experiment listing
Original response is preserved in exported datasets
Various bug fixes
Version 1.0.8 (Nov 21, 2017)¶
Various bug fixes
Version 1.0.7 (Nov 17, 2017)¶
Sharing of GPUs between experiments - can run multiple experiments at the same time while sharing GPU resources
Persistence of experiments and data - can stop and restart the application without loss of data
Support for weight column for optional user-specified per-row observation weights
Support for fold column for user-specified grouping of rows in train/validation splits
Higher accuracy through model tuning
Faster training - overall improvements and optimization in model training speed
Separate log file for each experiment
Ability to delete experiments and datasets from the GUI
Improved accuracy for regression tasks with very large response values
Faster test set scoring - Significant improvements in test set scoring in the GUI
Various bug fixes
Version 1.0.5 (Oct 24, 2017)¶
Only display scorers that are allowed
Various bug fixes
Version 1.0.4 (Oct 19, 2017)¶
Improved automatic type detection logic
Improved final ensemble accuracy
Various bug fixes
Version 1.0.3 (Oct 9, 2017)¶
Various speedups
Results are now reproducible
Various bug fixes
Version 1.0.2 (Oct 5, 2017)¶
Improved final ensemble accuracy
Weight of Evidence features added
Various bug fixes
Version 1.0.1 (Oct 4, 2017)¶
Improved speed of final ensemble
Various bug fixes
Version 1.0.0 (Sep 24, 2017)¶
Initial stable release