Change Log

Version 2.4.4 (June 27, 2026)

Improvements:
- Migrated pretrained-model download URLs from S3 to CDN
- Upgraded PyTorch Lightning
- Upgraded AutoDoc to 1.2.5
- Upgraded dulwich to 1.2.5 (CVE-2026-42305, CVE-2026-47734). Bitbucket artifact uploads bypass .gitignore by default, preserving existing behaviour. To enforce .gitignore rules and fail loudly on ignored artifacts, set bitbucket_respect_gitignore=true.
Bug fixes:
- Fixed AutoReport generation crash under setuptools 80.x.
- Upgraded AutoDoc to fix an intermittent AutoReport generation failure.
- Updated the public recipes repository to fix CalibratedClassifier calibration on float32 models and recipe dataset download and installation failures.
- Automatically include metadata.json as an experiment artifact when linking an experiment to a project.
- Upgraded AutoViz to fix CVEs in Apache Commons Lang and Log4j
- Upgraded Java based connectors to fix CVEs in Netty and Apache Commons Configuration
- Fixed the following security vulnerabilities (CVEs):
  CVE-2024-2398
  
  CVE-2025-0725
  
  CVE-2025-48924
  
  CVE-2025-69651
  
  CVE-2025-69652
  
  CVE-2026-5773
  
  CVE-2026-6276
  
  CVE-2026-6846
  
  CVE-2026-7383
  
  CVE-2026-9076
  
  CVE-2026-25087
  
  CVE-2026-32274
  
  CVE-2026-34180
  
  CVE-2026-34181
  
  CVE-2026-34182
  
  CVE-2026-34183
  
  CVE-2026-34993
  
  CVE-2026-35188
  
  CVE-2026-41417
  
  CVE-2026-42305
  
  CVE-2026-42577
  
  CVE-2026-42578
  
  CVE-2026-42579
  
  CVE-2026-42580
  
  CVE-2026-42581
  
  CVE-2026-42582
  
  CVE-2026-42583
  
  CVE-2026-42584
  
  CVE-2026-42585
  
  CVE-2026-42586
  
  CVE-2026-42587
  
  CVE-2026-42764
  
  CVE-2026-42765
  
  CVE-2026-42766
  
  CVE-2026-42767
  
  CVE-2026-42768
  
  CVE-2026-42769
  
  CVE-2026-42770
  
  CVE-2026-44248
  
  CVE-2026-44249
  
  CVE-2026-44250
  
  CVE-2026-44890
  
  CVE-2026-44892
  
  CVE-2026-44893
  
  CVE-2026-44894
  
  CVE-2026-45205
  
  CVE-2026-45409
  
  CVE-2026-45416
  
  CVE-2026-45445
  
  CVE-2026-45446
  
  CVE-2026-45447
  
  CVE-2026-45674
  
  CVE-2026-46340
  
  CVE-2026-47265
  
  CVE-2026-47691
  
  CVE-2026-47734
  
  CVE-2026-48006
  
  CVE-2026-48059
  
  CVE-2026-48748
  
  CVE-2026-50009
  
  CVE-2026-50010
  
  CVE-2026-50011
  
  CVE-2026-50219
  
  CVE-2026-53655
  
  CVE-2026-54273
  
  CVE-2026-54274
  
  CVE-2026-54276
  
  CVE-2026-54277
  
  CVE-2026-54278
  
  CVE-2026-56131
  
  CVE-2026-56132
  
  CVE-2026-56403
  
  CVE-2026-56404
  
  CVE-2026-56405
  
  CVE-2026-56406
  
  CVE-2026-56407
  
  CVE-2026-56408
  
  CVE-2026-56409
  
  CVE-2026-56410
  
  CVE-2026-56411
  
  CVE-2026-56412
  
  GHSA-537c-gmf6-5ccf

Version 2.4.3 (May 26, 2026)

Improvements:
- Raise error message in UI when test dataset used by Model Diagnostic does not contain target column
- Added deprecated use_legacy_mlops_projects config to keep legacy MLOps project support for HAIC deployments still on older MLOps / workspaces services. Has no effect unless h2o_storage_projects_enabled=True.
- Upgraded H2O3 to 3.46.0.11
Bug fixes:
- Fixed broken autoviz charts when date column selected as axis variable.
- Fixed Model Diagnostics UI download button not response
- Fixed the following security vulnerabilities (CVEs):
  CVE-2025-8869
  
  CVE-2025-71176
  
  CVE-2026-5450
  
  CVE-2026-5928
  
  CVE-2026-41066
  
  CVE-2026-42215
  
  CVE-2026-42284
  
  CVE-2026-42338
  
  CVE-2026-44243
  
  CVE-2026-44244
  
  CVE-2026-44431
  
  CVE-2026-44432
  
  CVE-2026-41409
  
  CVE-2026-41635
  
  CVE-2026-42778
  
  CVE-2026-42779

Version 2.4.2 (April 27, 2026)

Improvements:
- Added cross-workspace AuthZ enforcement: project actions (rename, link, unlink, tag) are now checked against the target project’s workspace permissions, and the GUI shows only the actions each user is allowed to perform on each project.
- Optimized project listing performance with batch AuthZ filtering, workspace ID caching, and page-only enrichment to reduce latency for large project collections.
- Extended cross-workspace AuthZ to dataset and tag create/update/delete operations; DAI now returns a clear authz error instead of a generic storage failure when the user lacks the required mlops permission on a linked project.
Bug Fixes:
- Fixed nondeterministic results when get_population_FS is enabled because the subprocess does not start from the same RNG position for a given random state. The random state is seeded from the experiment and can eventually pollute the global RNG state.
- Fixed tree count mismatch between GA and final to GLMModel by relaxiation the convergence criterion for GLM models, which can have higher variance sometimes.
- Fixed wrong number of embedding words when pretrained embedding matrix is used in TextBiGRUV2Transformer and TextCNNV2Transformer
- Fixed TimeGroupsDetector failure on columns containing a mix of boolean and string values during time series group detection.
- Fixed an issue where users with usernames containing characters requiring sanitization (such as +) were unable to start Driverless AI, caused by comparing sanitized and unsanitized usernames during login.
- Fixed the following security vulnerabilities (CVEs):
  
  CVE-2026-40192
  
  CVE-2026-39892
  
  CVE-2026-34073
  
  CVE-2026-25645
  
  CVE-2026-22815
  
  CVE-2026-34513
  
  CVE-2026-34514
  
  CVE-2026-34515
  
  CVE-2026-34516
  
  CVE-2026-34517
  
  CVE-2026-34518
  
  CVE-2026-34519
  
  CVE-2026-34520
  
  CVE-2026-34525
  
  CVE-2026-4539
  
  CVE-2026-33750
  
  CVE-2026-4046
  
  CVE-2026-4437
  
  CVE-2026-4438
- Fixed the following CVEs in Java based connectors.
  CVE-2025-67721
  
  CVE-2026-34479
  
  CVE-2026-34477
  
  CVE-2026-34478
  
  CVE-2026-34480
  
  CVE-2026-34481

Version 2.4.1 (March 30, 2026)

New Features:
- Added aggregated deleted entity summary to UserPurgeJob for admin user purge, viewable via the Python client job message after purge completion.
- Added autodoc_s3_access_key, autodoc_s3_secret_key, and autodoc_s3_session_token config options for pulling custom AutoDoc templates from private S3 buckets.
- Added new batch TTA UI page.
Improvements:
- Enforced AuthZ permission checks in the DAI GUI for experiments, datasets, AutoViz, custom recipes, projects, MLI (interpretations), diagnostics, and deployments.
- Enforced AuthZ on the Administration menu and User Management page: menu visibility, entity tabs, user actions (delete, transfer), and license management are gated by fine-grained admin permissions.
- Renamed enable_h2o_authz to enable_dai_authz. AuthZ action names changed in 2.4.1 and are not backward compatible, so a new config name prevents unintentional AuthZ activation after upgrade. The old config name still works but logs a deprecation warning.
- Hidden deprecated TensorFlowModel from expert settings model type selection (removed in 2.4.0).
- Hidden deprecated CUMLTransformer and TextCNNTransformer from expert settings (dependencies removed in 2.4.0).
- Upgraded H2O-3 to 3.46.0.10.
- Upgraded HMLI to 1.10.38.
- Updated AutoDoc custom template cache directory to use /dai-data when available, falling back to /tmp.
Bug Fixes:
- Fixed ANSI (CP932) encoded CSV upload causing Japanese character garbling. Added opt-in data_import_enable_encoding_detection config and charset-normalizer dependency for automatic encoding detection during CSV/TSV/TXT import. Automatically active when app_language is non-English.
- Fixed bug in pulling custom AutoDoc templates from public S3 buckets where credentials were not properly passed to the S3 helper.
- Fixed the following security vulnerabilities (CVEs):
  CVE-2026-29786
  
  CVE-2026-27205
  
  CVE-2025-69534
  
  CVE-2025-68161
  
  GHSA-72hv-8253-57qq
  
  CVE-2026-30922
  
  CVE-2026-27459
  
  CVE-2026-31958
  
  GHSA-78cv-mqj4-43f7
- Fixed the following CVEs in Java based connectors.
  CVE-2026-24308
  
  CVE-2026-24281
  
  CVE-2025-67735
  
  CVE-2025-11226
  
  CVE-2026-1225
  
  CVE-2025-68161
  
  GHSA-72hv-8253-57qq

Version 2.4.0 (March 9, 2026)

New Features:
- Added Population Stability Index (PSI) calculation for features across train, validation, and test datasets.
- Added a dropdown option to view system insights for experiments.
- Added an admin license management modal with the ability to update the license and a configurable threshold for license expiry UI notifications.
- Enforced authorization (AuthZ) in the DAI GUI (Phase 1).
- Added last_active timestamp tracking and an admin API to query historical active users.
- Added quantile statistics calculation for features, displayed in AutoDoc.
- Added saving of cross-validation fold indices used for final model evaluation.
- Added batch TTA (Test Time Augmentation) inference backend support for time series.
- Included subpipeline images and tree images in AutoDoc.
Improvements:
- Upgraded MOJO runtime to 2.9.0.
  Added transformations to support ImageVectorizerV2 in DAI.
  
  ImgCenterCropOp
  
  ImgResizeShorterOp
  
  Added new category to ImageNetPreprocessingOp.
  
  IMAGENET_PREPROCESSNG_CUSTOM_TORCH, to preprocess the image using custom mean and std values.
- Improved CoOccurrenceMatrixVectorizer with relaxed max_df and min_df thresholds and PMI vectorization.
- Upgraded AutoDoc to 1.2.3.
- Improved AuthZ action names and code reuse.
- Fixed AuthZ production blockers.
Bug Fixes:
- Fixed hashed scorer name not being taken into account.
- Fixed MOJO sub-pipeline image naming.
- Fixed crash when h2o4gpu.get_gpu_info_c returns None.
- Fixed experiment summary zip being incorrectly bundled into the experiment log zip, which should only contain log files.
- Fixed the following security vulnerabilities (CVEs):
  CVE-2026-26007
  
  CVE-2025-49844
  
  CVE-2026-25990
  
  CVE-2026-26960
  
  CVE-2026-26996
  
  CVE-2026-27199
  
  CVE-2025-15467
  
  CVE-2025-69419
  
  CVE-2025-69421
  
  CVE-2025-12060
  
  CVE-2025-9906
  
  CVE-2026-24842
  
  CVE-2026-23950
  
  CVE-2026-21226
  
  CVE-2026-23949
  
  CVE-2026-23490
  
  CVE-2026-1260
  
  CVE-2026-24049
  
  CVE-2026-0994
  
  CVE-2025-15281
  
  CVE-2025-21587
  
  CVE-2026-27903
  
  CVE-2026-27904
Documentation:
- Added license management documentation.
- Updated Snowflake SSO authentication instructions.
- Added documentation for Snowflake connector setup with Azure Entra ID and Securestore.
Removed:
- Removed TensorFlow and Keras dependencies (deprecated in 2.3.0).
- Removed the following TensorFlow/Keras-based transformers.
  TensorFlowModel
  
  ImageVectorizerTransformer
  
  TextCNNTransformer
  
  TextBiGRUTransformer
  
  TextCharCNNTransformer

Version 2.3.2.1 (February 17, 2026)

Improvements:
- Upgraded MOJO runtime to 2.8.11.

Version 2.3.2 (January 29, 2026)

Improvements:
- Added support for specifying schema for temporary Snowflake stages via the snowflake_staging_schema configuration option.
- Upgraded MOJO runtime to 2.8.11-master.61.
- Upgraded openai 2.15.0, pillow to 12.1.0, pyarrow to 18.1.0, opencv to 4.12.0, and transformers to 4.57.3.
Bug Fixes:
- Fixed ORC import crashing issue.
- Fixed albumentations 1.4.24 compatibility in ClassificationDataset.
- Fixed the following security vulnerabilities (CVEs):
  CVE-2026-0861
  
  CVE-2026-0915
  
  CVE-2026-23745
  
  CVE-2026-24001
  
  CVE-2026-22701
  
  CVE-2026-21883
  
  CVE-2026-22702
  
  CVE-2025-66418
  
  CVE-2025-66471
  
  CVE-2025-50181
  
  CVE-2025-50182
  
  CVE-2025-68146
  
  CVE-2025-6176
  
  CVE-2025-5889
  
  CVE-2025-27516
Documentation:
- Updated H2O.ai address in booklet.
- Added documentation for Snowflake staging schema option.
- Updated Snowflake connector documentation with key pair authentication.

Version 2.3.1 (January 8, 2026)

New Features:
- Added authentication via the current OAuth session tokens for the JDBC connector.
Improvements:
- Upgraded H2O-3 to version 3.46.0.9.
- Upgraded the transformers package to the latest version.
- Added source information and timestamp to dataset overview table in AutoDoc.
- Updated base image to fix binutils for FedRAMP compliance.
Bug Fixes:
- Removed h2o_featurestore as a DAI dependency and upgraded to version 2.3.2, restoring the ability to update Feature Store libraries via custom recipes.
- Fixed MLI time-series experiments failing when time_series_causal_split_recipe is set to true in the config.toml.
- Fixed missing UFAPT columns for time series models which could cause TypeError in runtime prediction.
- Fixed AutoDoc generation for IsolationForest model.
- Fixed experiment deployment to MLOps failing with TypeError.
- Fixed unsupervised experiment target column handling in AutoDoc.
- Fixed AuthZ token access error after OIDC login.
- Fixed MLOps deployment wizard failing in single project mode when selecting a project.
- Fixed IRSA authentication to correctly use sts:AssumeRoleWithWebIdentity instead of sts:AssumeRole for AWS IAM role chaining.
- Fixed S3 browser Apply button to use the current input path instead of stale cached path.
- Fixed the following security vulnerabilities (CVEs):
  CVE-2024-7768
  
  CVE-2025-7545
  
  CVE-2025-7546
  
  CVE-2025-11082
  
  CVE-2025-11083
  
  CVE-2025-59419
  
  CVE-2025-64756
  
  CVE-2024-13009
Documentation:
- Added documentation for Delta Table connector.

Version 2.3.0 (October 30, 2025)

New Features:
- Added authentication via an IdP broker to the JDBC connector.
- Added an admin dashboard with user management functionality.
- Added support for cgroup v2 to pull accurate memory information when running inside Kubernetes pods.
- Added Binner transformation cutoff values to the experiment summary.
- Added the force_on_convert_incorrect_date_format configuration option to automatically convert datetime formats in predictions to match the training datetime format.
- Introduced experimental features to support new PI simulation algorithms:
  Error propagation.
  
  Bootstrap simulation.
  
  Monte Carlo simulation.
- Added integration with HAIC workspaces.
Improvements:
- Updated H2O-3 to version 3.46.0.8.
- Upgraded the h2o-featurestore client to 2.2.0.
- Simplified AuthZ actions and queried permissions as a user instead of a service account.
- Enforced AuthZ when calling non-RPC functions and RPC functions outside the RPC handler.
- Upgraded the Triton runtime client to support version 2.59.0.
- Added AWS IAM role chaining with IRSA authentication for the Driverless AI AWS S3 connector.
- Improved Japanese localization.
- Upgraded the MOJO runtime to 2.8.10.
  
  This release upgrades core dependencies (torch → 2.6.0), adds support for torch image models, aligns Keras tokenizers with the Python implementation, fixes DeBERTa/BERT test failures, and bundles/obfuscates mojo2onnx proto code for distribution.
- Improved unlinking and deletion by removing the delete option for non-existent experiments or datasets, implementing conditional deletion in batch operations, and updating dialog messages for clarity.
Bug Fixes:
- Fixed an issue where making predictions or performing a dataset split when datasets_directory is defined would hang without workers assigned if enable_experiments_priority_queue is enabled.
- Fixed missing confusion matrix (CM) for multiclass experiments.
- Fixed the experiment launch button being disabled when refitting models.
- Fixed dataset split crashes when Audit Trail is enabled.
- Hid SUMMARIZE buttons for datasets and experiments when an OpenAI or GPT key isn’t configured.
- Removed the R Client download button from the RESOURCES menu in the UI because the R Client is deprecated.
- Fixed the Driverless AI Delta Table connector not writing logs to the delta_table_connector.log file.
- Fixed a JSON parsing issue in experiment insights for time-series experiments.
- Fixed the following CVEs in Java-based connectors:
  CVE-2025-55163
  
  CVE-2025-58057
  
  CVE-2025-58056
  
  CVE-2025-58457
- Fixed the following security vulnerabilities (CVEs):
  CVE-2025-27516
  
  CVE-2025-30167
  
  CVE-2025-3262
  
  CVE-2025-3933
  
  CVE-2025-3264
  
  CVE-2025-3263
  
  CVE-2025-3777
  
  CVE-2025-6052
  
  CVE-2025-53643
  
  CVE-2024-12254
  
  CVE-2024-47081
  
  CVE-2025-46394
  
  CVE-2024-58251
  
  CVE-2025-9230
  
  CVE-2025-9231
  
  CVE-2025-9232
Documentation:
- Improved logging and time-series documentation.
- Added comprehensive documentation for Expert Settings configuration options.
- Added documentation for HAIC workspace integration.
- Added documentation for the new admin dashboard and user management features.
Deprecation notice:
- The following transformers are deprecated and will be removed in the next minor release:
  TensorFlowModel
  
  ImageVectorizerTransformer
  
  TextCNNTransformer
  
  TextBiGRUTransformer
  
  TextCharCNNTransformer

Version 2.2.5 (October 10, 2025)

Bug fixes:
- Hide R CLIENT button from RESOURCE dropdown list in UI as DAI no longer release r client.
- Hide SUMMARIZE button in UI to avoid openai key missing error.
- Fix dataset split/merge crashing when Audit Trail is eabled in cloud deployments.
- Fix openning experiment insights failing with a JSON parsing error.
- Fixed DAI Delta Table connector not writing logs to the delta_table_connector.log file.
- Fixed Feature Store connector h2o_featurestore version mismatch error.
- Removed h2o_featurestore as a DAI dependency, restoring the ability to update Feature Store libraries via custom recipes.
- Fix missing UFAPT columns for time series models which could causing TypeError in runtime prediction.
- Fixed the following CVEs in Java based connectors.
  CVE-2025-55163
  
  CVE-2025-58057
  
  CVE-2025-58056
  
  CVE-2025-58457

Version 2.2.4 (September 10, 2025)

Available here

New Features:
- Added support for IRSA authentication in the Driverless AI AWS S3 connector.

Version 2.2.3 (August 27, 2025)

Available here

Improvements:
- Restores project sharing and deployment to MLOps. The Share and Deploy options from the Project and Experiment listings are available again.
- Reduce logging from the delta lake to fix the out-of-memory issue when importing large datasets.
Bug Fixes:
- Fix launch experiment button disabled when retrain/refit timeseries experiment.

Version 2.2.2 (August 18, 2025)

Available here

Improvements:
- Hide SUMMARIZE button when generative summary is not enabled.
Bug Fixes:
- Mask AWS Secret Access Key input field in AWS S3 connector UI to improve security.
- Show missing confusion matrix for multi-class experiments.
- Fix incorrect display of “with fine-tuning” for “Pre-trained image architectures” in experiment preview.
- Fix test dataset not being set when creating a new experiment with the same settings as an imported experiment.
- Fix experiment preview styles getting removed when changing the experiment settings.
- Fix DAI Snowflake connector when authenticating with Snowflake using H2O Secure Store.
Removed:
- Removed Share and Deploy options from the Project and Experiment listings.
  
  备注
  
  In MLOps 1.0.0, the API and deployment workflow have been updated. Each project is now treated as a workspace. To share, use the workspace management. To deploy, start MLOps in the workspace containing the experiments and deploy using the MLOps UI.

Version 2.2.1 (August 6, 2025)

Available here

Improvements:
- Upgraded LightGBM to version 4.6.0.
Bug Fixes:
- Fixed the following security vulnerabilities (CVEs).
  CVE-2022-3510
  
  CVE-2022-3509
  
  CVE-2022-3171
  
  CVE-2024-7254
  
  CVE-2021-22570
  
  CVE-2021-22569
  
  CVE-2025-53864
  
  CVE-2025-48734
  
  CVE-2024-47535
  
  CVE-2025-25193
  
  CVE-2025-24970
  
  CVE-2024-48924
  
  CVE-2024-43598
  
  CVE-2025-1153
  
  CVE-2025-3198
  
  CVE-2025-8058

Version 2.2.0 (July 21, 2025)

Available here

New Features:
- Added support for setting the file type and maximum file size for files created in a Snowflake stage using the Driverless AI Snowflake connector. Use the snowflake_stages_file_format and snowflake_stages_max_file_size configuration options.
- Added key-pair authentication support for the Snowflake connector.
- Added the pip_install_options configuration to the do_not_store_list.
- Added the disable_download_logs configuration to prevent normal users from downloading logs via the web UI and Python Client.
- Added the Co-Occurrence matrix vectorizer for word embeddings.
- Added the Audit trail integration to log key user actions.
- Added initial authorization checks for backend RPC calls, with full support planned for the next release.
- Added Shapley support for TensorFlow models (for more details, see arXiv:2010.12082)
- Added Workload Identity support for Azure Blob Storage connector.
Improvements:
- Use Instance Metadata Service Version 2 (IMDSv2) to read metadata about AWS instances.
- Added support for passing AWS credentials through the UI for the AWS S3 connector: Access Key ID, Secret Access Key, Role ARN, and Session Token.
- Added the option to disable dataset and experiment linking in Projects.
- Refactored NLP preprocessing encoders used by TextCNN transformers to remove dependency on Keras.
- Refactored ImageAuto model for better handling of larger datasets.
Bug Fixes:
- Fixed Content Security Policy (CSP) compliance violations in the web UI. (Note: MLI plots and wizard are not yet CSP compliant)
- Fixed Cross-site scripting (XSS) vulnerability in modals in the web UI.
- Fixed the following security vulnerabilities (CVEs).
  CVE-2025-48734
  
  CVE-2024-13009
  
  CVE-2024-6763
  
  GHSA-58qw-p7qm-5rvh
  
  CVE-2025-1194
  
  CVE-2025-32434
  
  CVE-2025-26619
  
  CVE-2025-26619
  
  CVE-2025-26619
  
  CVE-2025-26619
Documentation:
- Updated documentation to replace nvidia-docker with docker run --gpus all throughout.
- Added documentation for the new Co-Occurrence Vectorizer used by the TextTransformer in NLP-related documentation.
Deprecation Notice:
- Wizard is officially deprecated and is planned for removal in a future release.

Version 2.1.1 (August 26, 2025)

Available here

Improvements:
- Added pip_install_options configuration to the do_not_store_list.
- Reduced logging in the Delta Lake connector to prevent out-of-memory issues when importing large datasets.

Version 2.1.0 (May 29, 2025)

Available here

New Features:
- Added support for importing Delta Lake Delta Tables using the Delta Table connector.
- Added advanced Kerberos authentication with the JDBC connector.
Improvements:
- Enhanced S3 connector to accept AWS S3 credentials from the GUI.
- Introduced the application context in the create_custom_recipe function.
Bug Fixes:
- Upgraded the snowflake-connector-python to resolve a certificate revocation error in Snowflake.
- Fixed CVE-2025-43859 in h11.
- Fixed CVE-2025-2711, CVE-2025-32414, and CVE-2025-32415 in libxml2.
- Fixed CVE-2025-47273 in setuptools
- Fixed CVE-2025-47287 in tornado
- Fixed the following CVEs in Java based connectors.
  CVE-2022-40152
  
  CVE-2021-31684
  
  CVE-2023-1370
  
  CVE-2024-29131
  
  CVE-2024-12798
  
  CVE-2022-3171
  
  CVE-2021-22569
  
  CVE-2021-22570
  
  CVE-2024-29133
  
  CVE-2024-7254
  
  CVE-2022-3510
  
  CVE-2022-3509
  
  CVE-2024-12798
  
  CVE-2024-51504
  
  CVE-2024-12798
  
  CVE-2024-51504
  
  CVE-2024-51504
Documentation:
- Updated the documentation to revise the reference to nvidia-docker.
- Added documentation for running DAI using Podman on Windows.

Version 2.0.1 (July 9, 2025)

Available here

Improvements:
- Added pip_install_options configuration to the do_not_store_list.

Version 2.0.0 (March 21st, 2025)

Available here

New Features:
- Added support for setting a description for an experiment.
- Added support for importing datasets using the Databricks connector.
- Added support for NVIDIA Multi-Instance GPU (MIG) devices.
- Option to set an experiment quota for users using default_experiments_quota_per_user and override_experiments_quota_for_users configuration options.
- Download predicted labels and raw probabilities of Model Diagnostics based on a threshold.
Improvements:
- Location of the internal database can be configured using the db_path configuration option.
- Use custom Autodoc templates from AWS S3.
- Driverless AI Docker images are now based on Chainguard wolfi-base to reduce security vulnerabilities.
- Support for migrating from 1.11.x versions.
Bug Fixes:
- Cannot view or download the logs of a visualization when there are multiple visualizations.
- Fix filtering of datasets in scoring dataset selector of project page.
- Fix model diagnostics listing page updating.
- Fix issues in deploying and scoring in remote NVIDIA Triton servers.
- Fix TextBert model reproducibility issue.
- Disable MOJO pipeline download option in GUI for image model.
- Numerous security vulnerability (CVE) fixes, bug fixes, and UI fixes.
Breaking changes:
- No migration support from DAI older than 1.11.0.
- Permanently removed local NVIDIA Triton scoring.
- Requires setting the –gpus option to enable GPU support when running on Docker.
- Custom recipes from older DAI maybe experience compatibility issues in Docker deployments due to the change of the base image.

Version 1.11.1.1 (November 26, 2024)

Available here

New Features:
- Added a migration guard to prevent unsupported upgrades from 1.10.x to 1.11.x, addressing potential issues with compatibility and stability.
Bug Fixes:
- Fixed an issue with missing the prune_entities table, which was caused by database migration from older versions.
- Fixed an issue where experiments did not complete when a small value is set for the max_runtime_minutes configuration option.
- Fixed reproducibility issues in TextBert related models.
- Fixed CVE-2024-8862 security vulnerability.
- Fixed various bugs caused by experiment migration.

Important Note: Version 1.11.1 introduces major breaking changes that significantly impact compatibility with earlier versions. Users upgrading from 1.10.x or older versions should be aware that direct upgrades are not supported. For more details, refer to the Version 1.11.1 release notes.

Version 1.11.1 (September 20, 2024)

Available here

New Features:
- Support using Autodoc custom templates from AWS S3.
Improvements:
- Keep all temporary directories within the experiment directory to avoid leaving them around after cleanup.
- Make Spectrum Conductor login asynchronous.
- Remove temporary directories created during dataset import after the import is completed.
- Snowflake connector integration with H2O Secure Store. This enables OAuth-based single sign on (SSO) authentication with Snowflake via a third-party Okta IdP.
- Added Multi-GPU support for Image and BERT models.
- Enhanced the admin API to allow listing, aborting, and deleting user entities for any user.
- Updated SpectrumConductor login to be asynchronous.
Bug Fixes:
- Enable splitting of the date column even when it is of integer type.
- Upgraded Keras & TensorFlow to address security vulnerabilities.
- Upgraded setuptools to address security vulnerabilities
- Fixed issue with LightGBM labels.
- Fixed test dataset being switched to original when creating a new experiment with the same settings of an existing experiment.
- Hide the WIZARD SETUP menu item when enable_experiment_wizard=false.
- Fixed various UI issues.
- Fixed SSL failure when connecting to OpenAI API.
Deprecation Notice:
- RAPIDS: RAPIDS is no longer supported in this release due to compatibility issues with Python 3.11. RAPIDS is an optional feature that is disabled by default.
- Triton Scoring: Triton scoring is no longer supported in this release due to incompatibility with Python 3.11.
Limitations:
- No direct upgrade path from any older versions to 1.11.1: There is no direct upgrade path from earlier versions to 1.11.1. Users must perform a fresh installation when upgrading to 1.11.1.
- Experiment Reproducibility: Upgrading to the new version of Torch with Python 3.11 results in a different randomness distribution compared to the previous release with Python 3.8. This means that the same seed/random state may produce different final models and scoreboards than before. Users may experience different outputs from the same input data when running experiments in this version.
Documentation:
- Added links to h2oGPTe documentation
- Updated the links in the MOJO Java runtime options to point to the new repository.
- Removed: “Diagnostics - not supported by Python client API”

Version 1.11.0 (August 09, 2024)

Available here

New Features:
- Built on top of the latest stable versions of all major open-source packages.
  Updated to Python 3.11.
  
  Updated to Java 17.
  
  Updated to Torch 2.3.0 and TensorFlow 2.12.1.
  
  Updated to CUDA 11.8, Requires NVIDIA CUDA Driver 470 or later.
  
  Updated h2o3, numpy, datatable, pytorch-lightning, Pandas, scikit-learn and many more dependencies.
  
  Fixed various security vulnerabilities.
- Added support for tagging datasets linked to projects.
Improvements:
- Show “No datasets were loaded” message when a data recipe returns no results.
- Turn REPRODUCIBLE ON when a seed is set in experiment expert settings.
- Lower the priority of the deletion task to prevent server hangs.
- Clean temporary directory created during dataset import.
- Add matching configuration option snowflake_keycloak_broker_token_endpoint for previously introduced environment variable DAI_SNOWFLAKE_KEYCLOAK_BROKER_TOKEN_ENDPOINT.
- Various UI improvements.
Bug Fixes:
- Fixed inconsistent state for OFF option for the time column in experiment settings.
- Enabled the following missing MLI explainers:
  Friedman’s H-statistic
  
  Absolute Permutation-Based Feature Importance
  
  Relative Permutation-Based Feature Importance
Temporarily Removed Features:
- Disabled RAPIDS because a compatible version with Python 3.11 was unavailable.
- Disabled local Triton scoring.
- Restricted image model to single GPU due an issue with combining result from multiple GPUs.

Version 1.10.7.5 (July 4, 2025)

Available here

New Features:
- Added support for AWS Instance Metadata Service Version 2 (IMDSv2) when connecting to AWS services such as AWS S3. IMDSv1 is no longer supported.
- Added support for setting the file type used for files created in a Snowflake stage when using stages with the Driverless AI Snowflake connector. Use the DRIVERLESS_AI_SNOWFLAKE_STAGES_FILE_FORMAT configuration environment variable to specify the file type. Valid values are CSV and PARQUET. The default is CSV.
- Added support for setting the maximum file size for files created in a Snowflake stage with the Driverless AI Snowflake connector. Use the DRIVERLESS_AI_SNOWFLAKE_STAGES_MAX_FILE_SIZE environment variable to specify the file size in bytes. The default is 16777216 (16 MB).
Bug Fixes:
- Fixed an issue where the Upload Your Keytab File button appeared in the Snowflake connector UI after using the JDBC connector with Kerberos authentication.

Version 1.10.7.4 (March 31, 2025)

Available here

New Features:
- Option to perform advance Kerberos authentication with the JDBC connector.

Version 1.10.7.3 (October 11, 2024)

Available here

New Features:
- Support using Autodoc custom templates from AWS S3.
- Skip populating items listed in the do_not_log_list in AutoDoc report.
Improvements:
- Use the official release of H2O Secure Store from PyPI.
Bug Fixes:
- Fixed test dataset being switched to original when creating a new experiment with the same settings of an existing experiment.

Version 1.10.7.2 (September 10, 2024)

Available here

Improvements:
- Lowered the priority of the deletion task to prevent server hangs.
- Snowflake connector integration with H2O Secure Store, which enables OAuth based authentication with Snowflake via a third-party Okta IdP.
Bug Fixes:
- Enabled splitting of the date column even when it is of integer type.
- Fixed various UI issues.

Version 1.10.7.1 (August 13, 2024)

Available here

Bug Fixes:
- Fixed several security vulnerabilities:
  CVE-2023-47248, CVE-2023-6730, CVE-2023-47248, CVE-2023-37920, CVE-2023-4863, and CVE-2023-6569.
- Enabled the following missing MLI explainers:
  Friedman’s H-statistic
  
  Absolute Permutation-Based Feature Importance
  
  Relative Permutation-Based Feature Importance
- Fixed an issue with missing ufapt when doing refit.
Improvements:
- Added matching configuration option snowflake_keycloak_broker_token_endpoint for previously introduced environment variable DAI_SNOWFLAKE_KEYCLOAK_BROKER_TOKEN_ENDPOINT.
- Added snowflake_keycloak_broker_token_type configuration option for the Snowflake connector, along with error message improvements.
- Added snowflake_session_token_filepath configuration option for the Snowflake connector when running Driverless AI in Snowpark Container Services.

Version 1.10.7 (January 19, 2024)

Available here

New Features:
- Added support for configuring a host name and port for the Snowflake connector when running Driverless AI in Snowpark Container Services.
- Added recipe support for the Feature Store data connector.
- Added a new configuration that lets users change the timeout duration when importing data from Hive, HDFS, JDBC, and kdb+ connectors.
- Ability to navigate to the linked project from the experiment details page.
- Added support for Python 3.12 in h2oai_client.
- MLI:
  Added the 2-D Partial Dependence explainer.
  
  Added the Friedman H-statistic explainer.
  
  Added PDP percentile plot.
  
  Added ICE curves in Partial Dependence explainer at every decile of predicted probabilities. This gives an indication of local prediction behavior across the dataset.
Improvements:
- In cases where a given MLI explainer is not available for a particular experiment, the new interpretation page now indicates why that MLI explainer is not available.
- Expanded DAI Python Client features for MLI. The Python client now supports the generation of plots for the following MLI explainers:
  Original feature importance
  
  Transformed feature importance
  
  Original Shapley
  
  Transformed Shapley
  
  Absolute permutation-based feature importance
  
  Relative permutation-based feature importance
  
  Random forest feature importance
  
  Random forest LOCO
  
  Random forest PDP
  
  Kernel Shapley
  
  Shapley summary
  
  Decision Tree
  
  NLP Tokenizer
  
  NLP VLM
  
  NLP LOCO
  
  DAI PDP
  
  NLP PDP
  
  Friedman H-Statistic
Bug Fixes:
- Fixed an issue that could prevent GPUs from being detected in Google Cloud Platform.
- Fixed the import of binary datatable files with date columns.
- Fixed various UI issues across a subset of MLI explainers.
- Numerous bug fixes.

Version 1.10.6.3 (July 19, 2024)

Available here

Improvements:
- Added support for OAuth authentication in DAI Snowflake connector with a Keycloak identity broker. Use the DAI_SNOWFLAKE_KEYCLOAK_BROKER_TOKEN_ENDPOINT environment variable to set the Keycloak URL.
- Added support to change the data format that is used for saving queried data in Hive and JDBC connectors. For more information, see Hive Setup and JDBC Setup.
Bug Fixes:
- Fixed an issue that could prevent GPUs from being detected in Google Cloud Platform.

Version 1.10.6.2 (October 31, 2023)

Available here

Improvements:
- Added a new configuration that lets users use their own service account when connecting to Google BigQuery (GBQ).
- Added a new configuration that lets users optionally select which service account to impersonate for the Google BigQuery (GBQ) data connector.
- Added a new configuration to control the experiment Leaderboard access globally for all users.
- DAI now reports why certain explainers are not enabled in the MLI explainer drop-down list. This is usually limited by the experiment problem type, e.g., image, multinomial, etc.
- Users are now warned when Shapley values are approximated in MLI.
Bug Fixes:
- Fixed the import of binary datatable files with date columns.
- Fixed improper handling of invalid IDs in MLI dashboard.
- Fixed missing prediction values for MLI time-series actual vs. predicted plot when target is null or nan.
- Fixed an issue where the download of a PNG file in MLI only captured the first page.
- Fixed an issue where zoom in MLI DT surrogate reset the row search number to undefined.
- Fixed handling of long y-axis values in MLI Shapley plots.
- Fixed text wrapping for various MLI explainer docstrings.
Documentation:
- Added a page on h2oGPT integration.

Version 1.10.6.1 (September 01, 2023)

Available here

Improvements:
- Improved license checks and notification when using an invalid license.
Bug Fixes:
- Fixed an issue to properly dispose progress spinners when an export to storage is completed.

Version 1.10.6 (August 18, 2023)

Available here

New Features:
- Added integration with h2oGPT that lets you optionally generate and view dataset or experiment summaries using a GPT model. For more information, see h2oGPT integration.
- Added support for Okta SSO authenticator on the Snowflake connector.
- Added disk usage quotas per user, defined by the new configuration users_disk_usage_quota.
- Added a time-series data splitter that automatically detects time column candidates.
- Added support for custom telemetry recipes.
- Added the ability for admins to see their user identity details when using OPENID authentication.
- Added storage capacity and memory information to the health API. For more information, see Health API attribute definitions.
- Added support for a PyTorch backend in Triton server.
- Added support for sharing image datasets to H2O Storage. For more information, see H2O Storage (remote storage) integration.
- You can now tag experiments inside Projects that are connected to H2O Storage (remote storage). These tags are also displayed in H2O MLOps. For more information, see Experiment tagging.
- You can now disable the logout functionality.
- The experiment setup comparison feature now allows for better side-to-side comparison of list and dictionary configs. For more information, see Experiment setup comparison.
- The Project page now displays the current number of successfully linked experiments during their upload to H2O Storage (remote storage).
- You can now browse to two unique Driverless AI instances from the same browser.
- MLI:
  DAI Model Dashboard: You can now view a new DAI model dashboard that provides comprehensive insights into the performance of models created using DAI. For more information, see MLI dashboard.
  
  Performance Charts: You can now view performance charts specifically designed for Decision Tree (DT) and Random Forest (RF) surrogate models. These charts let you evaluate the performance of these models. Note that this feature is not supported for multinomial experiments. For more information, see Surrogate Decision Tree Plot and Random Forest surrogate model plots.
  
  NLP LOCO Scoring Pipeline: Added the ability to construct a NLP LOCO scoring pipeline. Note that the NLP LOCO scoring pipeline is not built by default.
  
  Expanded DAI Python Client Features. The Python client now supports the generation of plots for individual explainers. Supported explainers include:
  
  Partial Dependence Plot.
  
  Shapley Summary Plot for Original Features (Naive Shapley Method).
  
  Shapley Values for Transformed Features.
  
  Shapley Values for Original Features (Naive Method).
  
  Surrogate Decision Tree.
  
  Dataset Naming in Sensitivity Analysis: You can now assign names to datasets before saving them in Sensitivity Analysis, enhancing organization and usability.
Improvements:
- Missing Value Handling in Partial Dependence: Enhanced partial dependence analysis by treating missing values as a distinct bin. This refinement contributes to more comprehensive insights into model behavior.
- Categorical Representation in DT Surrogate Model: Improved the display of categorical features in the DT surrogate model when categorical encoding is set to one-hot encoding. This enhances the readability of model representations.
- Enhanced Shapley Tooltip Documentation: Revamped tooltip documentation for Shapley plots to provide users with clearer and more informative explanations of the displayed information.
- Improved Shapley Plot Sorting: Shapley plots now order local importances by their absolute value. This ensures that the most important features are prominently displayed, regardless of their positive or negative impact.
- Run explainers sequentially: Added the ability to run explainers sequentially by setting the config mli_run_explainers_sequentially to True. The default value is False.
- Show actual and prediction values above the NLP LOCO, VLM, and Tokenizer plots.
- MLI Notification Center: Improved various UI components.
- Added the ability to download Detailed View in Summary Shapley plot.
- Added the ability to convert date features from categorical to numeric in Partial Dependence plot.
- Improved the ability to run specific explainers in DAI Python client.
- Improved view of the surrogate model dashboard.
Bug Fixes:
- No longer skip display of time-series properties during Experiment Wizards after making a time-based split of training data.
- Improved memory consumption when loading a large Parquet file from disk.
- Improved H2O Storage integration to allow linking of large datasets and experiments in the background.
- Improved OPENID authentication to add more resiliency in case of network issues.
- Improved the Wizard for air-gapped environments by including some third-party scripts as static resources.
- Fixed an issue where the count of original columns reflected in the experiment AutoDoc did not match the number of original columns in the training data when using arbitrary drop columns.
- Fixed a missing axis label in model diagnostics charts.
- Fixed several security vulnerabilities.
- Fixed display of categoricals in global and cluster reason code view in k-LIME.
- Fixed an issue that caused an MLI local search to block the DAI server.
- Fixed an issue that could prevent downloading multiple datasets at once using the HDFS connector.
- Fixed various UI issues across a subset of MLI explainers.
Documentation:
- Improved the section on running MLI surrogate models on residuals. For more information, see Running Surrogate Models on Residuals.

Version 1.10.5.1 (August 11, 2023)

Available here

Improvements:
- Added support for Okta SSO authenticator on the Snowflake connector.
- UI/UX improvements to datasets import panels.
Bug Fixes:
- Fixed an issue that prevented experiments from getting deleted if an Nvidia Triton inference server was not yet configured.
- Fixed a warning message that is displayed in the Microsoft Word AutoReport document.
- Broadened support for different H2O Feature Store backend versions.
- Fixed a Hive connector JSON validation that could lead to errors connecting to Hive.

Version 1.10.5 (April 27, 2023)

Available here

New Features:
- Added support for NVIDIA Triton Inference Server. Currently, only CPUs are supported. Deployments are not available in H2O MLOps. For more information, see Triton Inference Server deployment.
  Lets you make zero-click or one-click deployments of experiments to Triton.
  
  This feature is currently only available for models that support a C++ MOJO (that is, all models with a Java MOJO, as well as all default BERT NLP and TensorFlow image models).
  
  Conveniently lists the estimated P99 latency, throughput, memory and disk usage of the MOJO deployment.
  
  Supports both internal (built-in) and external Triton servers.
  
  Added a Deployments wizard with list, load, unload, delete, export, and query operations for models deployed to Triton from the Python client and the Wizard.
  
  The Deployments listing page now only displays Triton deployments. The previously available methods of creating local deployments have been removed.
- Experiment Wizard
  You can now use interactive plots created with Plotly.
  
  You can now split train and test data by a specific date or datetime. A visualization of the split is also displayed. For more information, see Dataset options.
  
  The Experiment Wizard now provides detailed control of time-series validation splits. Visualizations of training data, the potential temporal gap until production, and the forecast horizon are now also provided. For more information, see H2O Driverless AI Experiment Setup Wizard.
  
  Added the option to select specific leaderboards for IID and time-series experiments.
- Added support to transform a dataset with the experiment’s fitted pipeline (excluding any models). Note that this new option is distinct from the existing Fit & Transform option.
- Shift detection is now performed on the final model’s transformed features and target to check generalization.
- Added an Experiment Results Wizard (beta): Shows several details for a given finished experiment. For more information, see Completed Experiment Actions.
- Added the Experiment Comparison Wizard for easy comparison of expert settings and knobs. For more information, see Experiment setup comparison.
- Added training data column stats JSON file to scorers (MOJO and Python).
- Control runtime more accurately using runtime estimation.
- Added estimation of CPU memory usage during experiment preview, to help with instance sizing.
- Added CPU memory usage of C++ MOJO to experiment summary and to Deploy Wizard.
- Added a page that lets admin users view system logs.
- Experiments can be assigned to a project from experiment page through Deployments wizard.
- You can now navigate to an experiment detail in the H2O MLOps app from the experiment page.
MLI:
- Added the ability to filter input data to MLI. For more information, see Filter input data to MLI.
- Added the ability to save and download filtered dataset(s) in Sensitivity Analysis.
- Added notification center to MLI UI. For more information, see Understanding the Model Interpretation page.
- Added Python client.
- Enabled Shapley Summary plot for time-series, which displays transformed features instead of original.
- Added a link to a video tutorial for the following explainers:
  PDP/ICE.
  
  Shapley for transformed features.
  
  Shapley for original features.
  
  Shapley summary plot.
  
  Disparate Impact Analysis.
  
  Sensitivity analysis.
  
  Surrogate decision tree.
Improvements:
- The health API flag is_idle has been updated to account for large datasets being uploaded from a browser session.
- Python scorers for lag-based time-series models now keep the target column in the frame to allow test-time augmentation.
- Details about test-time augmentation are now provided in logs. (That is, the number of newly updated historic values for each time period.)
- Prediction frames now contain the original target column name in case the target column name contains special characters that require sanitization.
- Added the ability to ingest pandas sparse frames for pandas .pkl files.
- Automatically toggle GPU ON/OFF in the experiment setup page based on whether models and transformers perform better on (or must use) GPUs.
- Reduced memory usage when making test set predictions.
- Allow control over early_stopping_threshold (relative min_delta) for LightGBM.
- Added stronger overfit protection for recipe (more_overfit_protection).
- Added support for unsupervised recipes that handle text columns.
- Sped up BertModel and BertTransformer when data is text-dominated to avoid unnecessary validation repeats for small data.
- Sped up MOJO for TextTransformer.
- Improved C++ MOJO performance under high CPU load.
- Added a new log tab to allow admin users access to internal services log files.
- Added support for HTTPS SSL key file when encrypted with a passphrase.
- When importing experiment on project page, user is automatically prompted to download its datasets.
- Added explanatory videos for several MLI explainers.
- Listing pages now retain previous values for search, pagination and sorting.
- Optimized the number of default explainers in MLI to reduce runtime and increase clarity.
- Added support for microseconds in MLI time-series explainer.
- Enabled Shapley values for MLI TS when only a training dataset is used.
- Improved categorical handling in the MLI decision tree surrogate model by using one-hot encoding to encode categoricals by default.
- Improved legends and axis labels in MLI graphs by showing how data is ordered and what data is shown.
- Added display of target transformation in relevant Shapley plots in MLI.
- Added better indication of re-rendering cells after changing thresholds in MLI DIA.
- Added new tile in MLI, 3rd Party Model, when DIA is calculated on an external dataset.
Bug Fixes:
- Enabled time-series test time augmentation (TTA) scoring in case target column name contains special characters that require sanitization.
- Fixed MOJO acceptance test for time-series experiments with non-target lag features, was accidentally marking the MOJO as incorrect.
- Fixed training time and scoring time on project page when many experiments are being trained or scored at once, was measuring elapsed time since job submission (including wait time).
- Fixed creation of MOJO for BERTTransformer with xlm-robert-base, roberta-base and camembert-base (non-default) variants.
- Fixed Java/C++ MOJO for raw integer time columns.
- Fixed Java MOJO Shapley prediction column names for multiclass problems when target column name contains “.” character.
- Fixed unavailable columns at prediction time not being used except as rare new feature during evolution.
- Fixed race when importing a zipped file multiple times simultaneously by same user.
- Fixed fold ID reporting of frequencies.
- Fixed number of transformations for time-series leaderboard experiment that does target encoding.
- Avoid excessive core usage for clustering and truncated SVD.
- Fixed Parquet loading when includes index.
- Projects can now list more than 100 experiments.
- Fixed handling of transformed feature names in MLI DT surrogate Python rules.
- Changed default display of boolean columns from numeric to categorical in MLI PDP.
- Updated MLI Morris Sensitivity recipe to work with the latest version of the interpret Python package.
- Fixed re-computation feature of MLI PDP by only showing features that are capable of data type conversions.
Documentation:
- Refactored the page on deploying DAI models. For more information, see Deploy Driverless AI models.
- Added details on support for H2O MLOps deployment. For more information, see H2O MLOps deployment.
- Added details on support for Triton Inference Server deployment. For more information, see Triton Inference Server deployment.
- Added details on the experiment setup comparison feature. For more information, see Experiment setup comparison.
- Added details on downloading server logs. For more information, see Download server logs. Note that this functionality is only available to admin users.

Version 1.10.4.3 (January 27, 2023)

Available here

Improvements:
- Added support to allow users to generate and use MapR ticket with different identities.
Bug Fixes:
- Fixed a small data leak in the TensorFlowModel for regression where the minimum value of the target in the validation data was learned.
- Fixed “ratio” time-series target transformer for zero-valued target column.
- Fixed an issue that could cause multiple start attempts when DriverlessAI is used in Spectrum Conductor.
- Fixed excessive features made for target encode mode for TS leaderboard.
- Fixed leakage and shift when switching from GLM default model to LightGBM.
- Fixed case when included_models is not specified at all in config_overrides.

Version 1.10.4.2 (January 11, 2023)

Available here

Improvements:
- Added the ability to use self-signed certificate or skip the certificate validation for OPENID authentication.
Security updates:
- Upgrade Torch to version 1.13.1 to fix a security vulnerability.

Version 1.10.4.1 (December 19, 2022)

Available here

Improvements:
- Added a change to the health API is_idle flag to account for large datasets being uploaded from a browser session.
- Added a new routine to clean up old model predictions per user.
- Added more examples of text handling for expert control of unsupervised models.
- Added ability to ingest pandas sparse frames for pandas .pkl files.
- For time series problems, added an option to the H2O Driverless AI Experiment Setup Wizard to manually provide unavailable columns at prediction time.
Bug Fixes:
- Fixed erroneous warning that validation frame has 100% duplicate rows with training data.
- Fixed unsupervised transformer handling so that internal and recipe transformers can be used for unsupervised models.
- Fixed race in Experiment Wizard when choosing columns to drop too quickly.
- Fixed a path parsing issue on the GCS data connector.
- Fixed several security vulnerabilities.
- Added handling for “sklearn” dependency deprecation on Python scorers.

Version 1.10.4 (October 13, 2022)

Available here

New Features:
- (Experimental) GUI-based Wizards:
  Experiment Wizard. Configure and start experiments by clicking on a specific dataset and then clicking Predict Wizard. For more information, see H2O Driverless AI Experiment Setup Wizard.
  
  Dataset Join Wizard. Join two datasets by clicking on a specific dataset and then clicking Join Wizard. For more information, see Dataset Join Wizard.
  
  Leaderboard Wizard. Perform a business value analysis for all models in a project by clicking the Analyze button on the Project page (only for classification experiments). For more information, see Leaderboard Wizard: Business value calculator.
- Added repeated cross-validation for final ensembles for small data, resulting in improved accuracy.
- Added the BinnerTransformer for one-dimensional binning of numeric features. Uses tree splits (default) or quantiles to create bins, and can automatically reduce the number of bins based on their predictive power. Given bins, a numeric column is converted into multiple output columns by using either piece-wise linear encoding or binary encoding. For cases where target encoding isn’t allowed due to higher interpretability requirements, the highly interpretable BinnerTransformer can help create more accurate models. Only enabled by default for GLM/TensorFlow/TorchGrowNet and FTRL models at high interpretability.
- Added improved handling of duplicate rows in training data (after dropping columns to drop). Disabled by default. Option to either drop duplicate rows or convert them into single weighted rows.
- Added detection of joint rows in training, validation and testing datasets after data preparation, before modeling. If undetected, can mislead validation.
- Added support for prediction intervals for regression experiments in Java MOJO scoring (for both C++ and Java MOJO runtimes).
- Redesigned Expert Settings page. For details, see expert-settings-navigate.
- Added user preferences section for per-user data connector setup. For details, see Driverless AI user settings.
- Added ‘feature_store_mojo’ recipe type to create a MOJO to be used as a feature engineering pipeline in the H2O Feature Store.
- Added the ability to run Disparate Impact Analysis on external datasets.
- Improved Shapley plots. For local queries, row and Shapley data are now displayed in tandem.
- Added the ability to navigate to dataset and experiment in MLI.
- Consolidated all MLI explainer logs into one zip file for download.
- Added support for bulk abort of multiple experiments in a project.
Improvements:
- Significantly increased model accuracy for 1-hour runs using 39 datasets from the OpenML AutoML Benchmark, a variety of classification datasets with range of columns, rows, and classes.
- Moved execution of Diagnostics, Transform Data and Autoreport to worker node(s) for multinode configurations.
- Reduced number of features made during tuning and avoids duplication among tuning trials.
- To improve the usage of time, poorly performing models are now automatically pruned during tuning, evolution, and before making the final model.
- Improved usage of One-Hot Encoding, which was previously only used by GLM.
- Improved usage of TensorFlow model for low interpretability settings. Note: Neither Java MOJO nor Shapley values are currently supported. Only C++ MOJO and Python scoring are supported.
- Increased accuracy of max_runtime_minutes to account for each model built and which will be in final model.
- Enabled blending in link space for ensembles that include LightGBM with ‘xentropy’ binary objective, so that sum(Shapley values) = logit(preds).
- Added mini acceptance tests for MOJOs for sum(Shapley values) = preds for regression, and sum(Shapley values) = logit(preds) for binary classification.
- Increased the amount of model tuning and ensembling for small data for default settings.
- Speed-up small data handling by running more tasks in parallel with single core to more efficiently use all cores.
- Speed-up scoring in final model by parallelizing over metrics selected.
- Speed-up XGBoost fitting and predictions by using faster data handling, esp. for many rows and columns.
- Reduced the size of the MOJO zip file, which no longer includes TensorFlow unless needed by the pipeline.
- Better automatic values for worker_remote_processors and max_cores to improve many-cores (32+) DAI performance to allow more experiments to be run in parallel.
- Improved DAI startup speed.
- Improved the speed at which experiment previews are displayed when many custom recipes present.
- Improved reliability of multinode configuration.
- Separated server logs into their own directory.
- Faster type detection for text columns.
- Sample for row duplication check to avoid using too much memory, set by config.toml value of detect_duplicate_rows_max_rows_x_cols.
- Shift detection is now always performed. Previously, shift detection was turned off for low accuracy settings.
- Improved stability of sampling for leakage detection.
- Fix unsupervised expert setup.
- Reduced memory usage during MOJO creation.
- Improved experiment runtime and mojo size estimation.
- Improved selection of leaderboard model parameters.
- Improved speed of C++ MOJO runtime for Shapley predictions for tree algorithms.
- The Recipe page code editor now shows the position of all instances of the searched string on the scroll bar.
- Dataset pickers now contain links to the dataset detail page (opened in new tab). The same applies to custom recipes in recipe pickers.
- When an experiment is created with the New with same settings option, Driverless AI now checks for activity of included recipes.
- Significant performance improvements to MLI.
- Disabled Shapley Summary plot for MLI Time Series. This plot only displays original features instead of engineered features, which are more useful for time series applications.
- MLI compatibility check results are now visible in explainer progress bars.
- Significantly speed-up Shapley calculations when making predictions on training dataset for MLI and autoreport.
- Added logistic regression option for LightGBM and XGBoost.
Bug Fixes:
- Made Java MOJO Shapley for LightGBMDaskModel, XGBoostGBMDaskModel, XGBoostRFDaskModel and XGBoostRFModel consistent with new behavior in 1.10.2+, for unit_box, standardize and center target transformers, to be in target space for better interpretability.
- Fixed scaling of Shapley values for LightGBM model in ‘rf’ boosting mode.
- Avoid failing MOJO building when MOJO visualization times out.
- Fixed several security vulnerabilities.
- Gracefully handle case when dataset contains a column named ‘bias’, which conflicts with Shapley values.
- Fixed handling of parent data schema in new, restarted, or refitted experiments to avoid wrong data types.
- Allow avoidance of sampling of validation set via config.toml value of limit_validation_size.
- Time-Series: Feature values for TTA-scoring are no longer ignored if they are constant.
- Time-Series: TTA rows now always get a prediction value (these were previously NaN for scoring method “rolling”).
- Time-Series: Fixed a bug where the “ratio” target transformer was used if “difference” was selected in expert settings.
- Time-Series: Fixed a bug where downloaded test set predictions don’t match scorer predictions on same dataset (conditions: target values given for all rows, test set longer than 1 horizon, min lag size < horizon, fast_tta_test=true)
- Fixed apparent hang near end of experiment due to NaN score passed to server in case custom recipe scorer makes bad score.
- Fixed overly long AutoDoc appendix for certain time series experiments.
- Fixed unsupervised “new with same settings” workflow.
- Fixed unsupervised expert setup.
- Fixed imported individual recipes MODEL ACTIONS -> USE IN NEW EXPERIMENT workflow.
- Fixed several row querying scenarios in MLI.
- Fixed blocking main server tasks (that would affect any UI or client use) by making more MLI tasks are asynchronous.
- Fixed overuse of CPU cores by autoreport, MLI, and other post-experiment actions that can leave system and Driverless unusable.
- Fixed deletion of model in multinode to include the workers.
- Fixed deletion of dataset logs when delete dataset to improve disk usage.
- Fixed preview when reproducible is set and doing restart/refit.
- Fixed duplicate features leading to gaps between features in UI variable importance panel.
- Fixed an issue when some actions were not allowed on imported datasets from Storage.
- No longer do test-time augmentation (TTA) for MLI and autoreport when predicting on training dataset. Now, significantly faster for Shapley plots.
Deprecation Notice:
- The Completed experiment -> Deploy functionality will be deprecated in the future.
Documentation:
- Added a page that describes the Experiment Wizard. For details, see H2O Driverless AI Experiment Setup Wizard.
- Added a page that describes the Dataset Join Wizard. For details, see Dataset Join Wizard.
- Added a page that describes the Leaderboard Wizard. For details, see Leaderboard Wizard: Business value calculator.
- Added a page that describes how to configure the Feature Store data connector. For details, see H2O AI Feature Store Setup.
- Added a page that describes how to configure the H2O Drive data connector. For details, see H2O Drive setup.
- Added a page that describes Driverless AI’s imbalanced modeling capabilities. For details, see Imbalanced modeling in Driverless AI.
- Updated the documentation to reflect changes to the Expert Settings window. For details, see expert-settings-navigate.

Version 1.10.3.1 (June 15, 2022)

Available here

Improvements:
- Temporary files created by Driverless AI (as, for example, during experiment export and import) are now cleaned up automatically.
- Introduced a new configuration option to control which attributes are set for HTTP Cookies issued by the Driverless AI web server.
Bug Fixes:
- Fixed a number of security vulnerabilities.
- Fixed an issue that caused image model scorers to not work when using PyTorch.
- Fixed an issue with MLI that caused user interface slowdowns when using very large data sets.

Version 1.10.3 (May 02, 2022)

Available here

New Features:
- Original Shapley support (per-feature contributions in original feature space) for C++ MOJO runtime and its Python wrapper. For a Python code example, see C++ MOJO runtime Shapley values support. Note that Original Shapley values are already supported by the Python and Java MOJO scoring runtimes.
- Fundamental change in how ensemble models are internally stored in memory and disk for fitting, predictions, MOJO injection, etc., resulting in overall lower memory footprints. This means that DAI can now handle larger models and/or data than previously possible on machines with the same amount of memory.
- A new option to terminate a running experiment if DAI instance runs low on memory, instead of letting the system deal with a low-memory situation. For reference see: terminate_experiment_if_memory_low and memory_limit_gb_terminate. This option is off by default.
- A new option to control the data schema behavior for restart/refit of existing experiments (resume_data_schema in expert settings). This reuses the same data types for the columns as in the parent experiment. This is on for restart/refit/retrain of models but is set to off for new experiments with same settings option.
- Support for importing multiple files (folder import) into one dataset that is larger than memory. Assumes that each file can be parsed with available memory, otherwise see option below.
- Support for ingesting a single file with file size greater than memory by going through disk. For reference see: datatable_parse_max_memory_bytes. This option is off by default.
- Support for importing datasets with native time columns (previously raised AssertionError ltype.time).
- For Time series experiments, support for date columns that are unavailable at prediction time.
- Support to add custom recipes through the GUI code editor.
- Added a leaderboard to build all built-in unsupervised models on a dataset.
- Added a new Target Encode Time Series model to the Time series leaderboard (this model obtained first place in March 2022 Kaggle Playground competition).
- When integration with H2O MLOps is enabled, a new GO TO MLOPS button is now displayed for completed experiments. Clicking this button opens a new browser tab for the H2O MLOps app.
- New version of OpenID Connect authentication method with automatic provider discovery. For information, see Driverless AI OpenID Connect Authentication.
- Added ability to show queried/selected row in MLI across all relevant explainers.
- Added ability to run either default or custom MLI from the Completed Experiment page.
- Added ability to specify custom bins for MLI PDP.
Improvements:
- Performance:
  
  Reduced memory usage during various stages of experiments.
  
  Leaderboard now launches up to 10x faster when many recipes and experiments are present.
  
  Increased speed of final model building and improved performance of leak/shift detection for GrowNet.
- Accuracy:
  
  Improved numerical precision for BERT models for inference on Ampere GPUs.
  
  Improved numerical accuracy of scoring regression problems during feature evolution for cross-validated models.
- Leaderboard:
  
  Added other options in expert panel, such as ‘random’ and ‘line’ to run multiple experiments or scan through transformers.
- GUI:
  
  Added expert control over image auto pipeline building recipes.
  
  Recipe pickers now contain links to custom recipe details.
Bug Fixes:
- Fixed support of recipes using “all_cols” for min_cols or max_cols.
- Fixed performance issues for wide data with more than 20k columns.
- Fixed target transformer inverse transform operation for multiple folds-repeats in tuning-evolution.
Documentation:
- Added page on Sharing Experiments.
- Added page on Using Driverless AI configuration options.
- Added page on the new implementation of Driverless AI OpenID Connect Authentication.

Version 1.10.2 (February 17, 2022)

Available here

New Features:
- Auto-generated, editable Python code of the Best Models from any experiment pipeline.
- Shapley values for C++ MOJO runtime.
- Custom Recipe Management improvements.
Improvements:
- Improved Shapley values for linear target transformers.
- Improved accuracy for NLP models using the TextTransformer (default).
- Added support for exporting artifacts to HDFS.
Bug Fixes:
- Fixed pipeline visualization issues.
- Fixed target transformer bugs and performance issues.

Version 1.10.1.3 (January 7, 2022)

Available here

Bug Fixes:
- Upgraded log4j-2 in some bundled Java packages to version 2.17.1 to mitigate vulnerability discovered in CVE-2021-44832.

Version 1.10.1.2 (December 22, 2021)

Available here

Improvements:
- Allow notification scripts to inherit environment variables from main server using a new config option.
Bug Fixes:
- Upgraded log4j-2 in some bundled Java packages to version 2.17.0 to mitigate vulnerability discovered in CVE-2021-45105.

Version 1.10.1.1 (December 14, 2021)

Available here

Bug Fixes:
- Upgraded log4j-2 in some bundled Java packages to version 2.16.0 to fully mitigate the risk of arbitrary code execution vulnerability discovered in CVE-2021-44228.

Version 1.10.1 (November 10, 2021)

Available here

New Features:
- (Experimental) PyTorch based Deep Learning model for tabular data based on a boosting approach (GrowNet).
- Added the option to download NLP pretrained embeddings from S3.
- Added mojo size estimation to preview.
- Added ability to control the default knob settings for accuracy, time and interpretability via config.toml and expert settings.
- Added control over which target transformers to include for target transformer tuning.
- Added convergence-based early stopping for LightGBM based models, can reduce model size.
- AutoViz recommendations can now be used as feature transformations for experiment.
Improvements:
- Show low-cardinality categorical levels in transformed feature names for OneHotEncodingTransformer.
- Non-lag based time-series recipe now is same as lag-based time-series recipe except that all lags-based transformers are disabled. Allows support for gaps in validation splits and provides improved validation through moving windows, and adds holdout predictions.
- Automatically perform row sampling for SILHOUETTE scorer if dataset size is larger than configurable threshold to avoid slowness.
- Improved experiment runtime estimation for preview.
- Improved column type detection for preview, so similar to detection of types during experiment, so feature transformations more accurately shown in preview.
- Improved model-transformer detection during preview, so more accurately reflects what will happen in experiment.
- Improved text detection for Chinese/Korean/Japanese and other languages that use UTF8 characters.
- Improved feature engineering and feature evolution for time series
- Allow dataset column type override to categorical (‘cat’) independent of cardinality.
- Improved LightGBM early stopping to stop earlier if validation score does not improve significantly (depending on accuracy dial).
- Allow disabled custom recipes to be loaded and edited in custom recipe management.
- Improved native support on RedHat and other platforms by being less dependent on system libraries.
- Added capture of non-Python errors, so for supervised or unsupervised experiments no longer required to share server logs for support.
- Added better small data support, by only target encoding features with strong signal.
- Added support for “more_overfit_protection” pipeline building recipe, which (for any data size) only target encodes features with strong signal.
- Improved custom recipe acceptance testing to check for disallowed global imports (e.g. of XGBoost, LightGBM, Torch, CuPy, cuDF, etc.).
- Support custom recipes via zip that contains base wrapper as main recipe, with support files in sub-folder (e.g. can import Torch globally if wrapper imports sub-folder locally).
- Avoid wrong date/datetime detection for columns containing strings and large integers.
- Validate toml and no longer allow ignoring errors.
- Prevent deletion of datasets used as test or validations sets.
- Upgraded Java to OpenJDK 10.
- Updated XGBoost to support NVIDIA K80+ again (i.e. CUDA compute capability 3.5+).
- Allow model recipes to be run in Python environment independent from DAI environment.
- Added ability to specify the dataset location when using the Google BigQuery (GBQ) connector.
- Fixed a number of common vulnerabilities (CVE and PRISMA).
Bug Fixes:
- Fixed Python scoring by limiting packages installed. Works on base Ubuntu, UBI-8, or CentOS systems. See documentation for additional steps.
- Fixed missing ImageVectorizer Transformer on systems with multiple GPUs.
- Fixed the MOJO cloud deployment to AWS Lambda.
- Fixed import of image archives created on Apple macOS systems.
- Fixed DATA logging level anonymization.
- Fixed custom recipe management activation choices for child experiments.
- Fixed custom recipes present leading to absence of internal recipes.
- Fixed bootstrap sampling estimates.
- Fixed Python scoring for Dask-based models.
- Fixed OpenCL (for LightGBM) on native systems. See documentation for additional steps.
- Fixed pre-transformer list so not reset during experiment to all transformers.
- Fixed LightGBMDask appearing when model should be LightGBM when wide rules triggered.
- Fixed Shapley values for ConstantModel in Java MOJO runtime.
- Fixed Date format %Y in C++ MOJO runtime.
- Fixed tabs clicks in Internet Explorer 11.
- Fixed pipeline visualization for tree models if features have numeric suffixes.
- Fixed a shape mismatch for binary custom scorers.
- Fixed ingestion of Parquet files with integer columns containing missing values.
Documentation:
- Added page describing AutoDoc Custom Template Placeholders.

Version 1.10.0 (September 29, 2021)

Available here

New Features:
- Built on top of the latest stable versions of all major open-source packages.
  Updated to Python 3.8, supporting faster pickle protocol 5.
  
  Updated to Torch 1.9.0 and TensorFlow 2.4.2.
  
  Updated to NVIDIA RAPIDS 21.08, supporting GPU-based target encoding, UMAP, TSNE, RF.
  
  Updated to CUDA 11.2.2, supporting Ampere-based NVIDIA GPUs. Requires NVIDIA CUDA Driver 470 or later.
  
  Updated XGBoost, LightGBM, datatable, Pandas, scikit-learn and many more.
  
  Support Ubuntu 20.04 for DEB/TAR-SH deployments and Centos8 for RPM.
- Custom Recipe Management
  Custom recipes are versioned.
  
  Lets you activate or deactivate custom recipes.
  
  Lets you add a note to each recipe.
  
  Adds a visual code editor.
  
  Makes previous recipe versions accessible.
- Experiment Export/Import
  Experiments can be downloaded and uploaded as binary files.
  
  Supports both new and migrated experiments from DAI 1.8.x and 1.9.x.
  
  Supports experiments with custom recipes.
- Support Shapley values for original features in Java MOJO runtime.
- (Experimental) Automatic Unsupervised Machine Learning. Supports clustering, dimensionality reduction, outlier detection and full support for custom recipes. Includes automatic hyper-parameter optimization and feature selection for clustering, and visualizations for centroids.
- Force-in-Feature control. Specific features can now be forced into the model without modification. For more information, see cols_to_force_in in feature expert settings.
- Added fast approximation for regular predictions (in addition to fast approximation for Shapley values). Enabled by default for MLI/AutoDoc, disabled by default for other clients. Extent of approximation can be fully configured/disabled. Can result in significant speedup for large prediction tasks like creation of partial dependence plots and MLI in general. For more information, refer to the FAQ question on fast approximation.
- Automatically create labels for predictions for classification problems, appends predicted label column at end of prediction frame.
- Added the Health API, providing system metrics and resource utilization overview.
- Added improved support for imbalanced multiclass problems with LightGBM. Can help when confusion-matrix based scorers are optimized (such as MacroF1).
- Added new metrics for classification: MacroF1, MacroMCC. Macro scorers average the per-class scores, while micro scorers average the per-row scores. Use MacroF1 by default for imbalanced multiclass problems. MacroF1/MCC is the same as F1/MCC for binary problems.
- Added more details about all models that were fitted during the experiment (in Scores tab and in experiment artifacts).
- MLI Features:
  Added new UI for MLI time series with a focus on UX.
  
  Enabled Sensitivity Analysis for time series models.
  
  Enabled Disparate Impact Analysis for time series models.
  
  Enabled Surrogate models for time series models.
  
  Enabled Partial Dependence/ICE for time series models.
  
  Enabled original feature importance for time series models.
  
  Added human-friendly descriptions of transformed features in MLI TS explainer UI.
  
  Added MLI expert setting mli_fast_approx to speed up predictions with fast approximation.
  
  Added Vectorizer + Linear Model (VLM) explainer for NLP experiments.
  
  Add ability to use Vectorizer + Linear Model (VLM) as a means to create tokens for surrogate models in addition to TF-IDF.
  
  Added partial dependence for NLP text tokens.
  
  Added multinomial support for MLI NLP explainers.
  
  Added text sample views for local NLP explanations in MLI. For more information, see NLP Plots.
  
  Added English stop words for MLI NLP tokenizer and tokenizers used by NLP explainers.
  
  Added ability to download TF-IDF matrices in MLI.
- Documentation:
  Added info on Genetic Algorithm in Driverless AI.
  
  Added info on Variable Importance in Driverless AI.
  
  Added info on Deployment Scenarios Graphs.
  
  Added info on support for Driverless AI LTS releases. For more information, see About Version Support.
  
  Added a list of simple copy/paste configurations.
Improvements:
- Improved default leaderboard of experiments. Covers a broader range of useful experiments.
- Added another automatic leaderboard for time-series experiments to create a separate model for each prediction period (in addition to the diverse default leaderboard).
- Use PyTorch Lightning framework for BERT Models and Transformers. Leads to faster training and better memory handling.
- Improved parallelization of BERTTransformer on multi-GPU machines.
- Reduced memory usage for text transformers.
- Support arbitrary isolated Python 3.6, 3.7, 3.8 environments for custom recipes using wrap_create decorator (Git repo: Any Env).
- Preview shows any input features not covered by chosen transformer-model combination.
- Preview shows if MOJO is supported by chosen transformer-model combination.
- Improved automatic handling of zero-inflated distributions.
- Improved handling of time columns with %Y and %Y%m formats.
- Improved splitting for datasets containing images, now each split has a copy of all local images (instead of just references)
- Various improvements to the feature evolution algorithms.
- More explanation tooltips added across application.
- Improved runtime estimation for experiment preview.
- Improved heuristic for default experiment settings.
- Improved sanitization of column names.
- Optimized MLI partial dependence explainer in terms of speed.
- Improved categorical handling in MLI’s Decision Tree surrogate model.
- Various UI/UX/performance improvements to MLI.
- Improved server performance and responsiveness when many tasks (experiments, MLI, etc.) running.
- Gracefully handle failures of custom scorers without failing the experiment.
- Improved hardware utilization across the board.
- Improved clean-up of experiment temporary files after experiment finished, so server start-up can avoid long clean-up on slow disks.
- Make storage GRPC message limit configurable
Bug Fixes:
- Fixed segfault during file import for datasets with many large similar strings.
- Fixed slow MOJO generation for wide datasets.
- Fixed MOJO for text-based transformers and models for UTF8 characters.
- The max_feature_interaction_depth expert setting is now applicable for all transformers.
- Fixed incorrect Shapley bias terms in Java MOJO runtime for XGBoost regression models with exponential link functions (Poisson/gamma/Tweedie/CoxPH).
- Fixed number of cores used for prediction by XGBoost and LightGBM to avoid excessive core usage during scoring.
- Allow One Hot Encoding to be used for any model.
- Fixed exclusive_mode moderate and max modes, for use on isolated systems for maximum use of cores.
- Fixed runtime data recipes to be properly used by transform dataset and MLI.
- Fixed use of model tuning (params_tune) and override parameters.
- Fixed automatic type casting of integer columns into string columns during scoring (avoid conversion to float first)
- Fixed Optuna genetic algorithm choice for larger variety of expert choices.
- Fixed feature selection by permutation importance selection of features for wide data with categoricals.

Version 1.9.3.1 (August 5, 2021)

Available here

New Features:
- Added support for storing sensitive or secure configuration information in a keystore. For more information, see Configuration Security.
Improvements:
- Improved Hive connector to no longer require a jaas.conf file when using Kerberos authentication with impersonation. If the jaas.conf file is provided, the Hive connector will use it, otherwise it will construct the configuration details based on information provided in hive_app_configs.
Bug Fixes:
- Fixed distribution shift detection for time series models.
- For zero-inflated distribution, fixed log printout of non-zero count in the target column.
- Fixed confusion matrices for very small observation weights (disabled rounding to nearest integer).
Documentation:
- Added info on computing SHAP values in the Java MOJO runtime.
- Revised list of supported image formats. For more information, see Supported File Types for Image processing.
- Improved organization of the security documentation.

Version 1.9.3 (June 3, 2021)

Available here

Improvements:
- Added ability to specify which project to use when using the Google BigQuery (GBQ) connector
- Improved MOJO batch scoring to avoid excessive memory usage for text features
- Updated equivalent public recipes repository to fix potential FBProphet package installation errors
- Improved column type detection for text columns
- Reduce memory usage
- Added support for Ubuntu 20.04 LTS
- Added support for IBM Power
Bug Fixes:
- Fixed creation of MOJO pipeline for features only (make_mojo_scoring_pipeline_for_features_only=true)
- Fixed a segfault issue that sometimes occurred when using MLOps storage
- Fixed the computation results in custom Autoviz boxplots
- Fixed a license not found issue when using a local rest server deployment
- Fixed a login issue when using Internet Explorer 11
- Fixed an issue where AWS Lambda resources were not cleaned when a Lambda deployment was deleted from the UI
- Fixed an issue when importing a folder with empty directories in Azure Blob Storage connector
- Fixed a CSV writing issue with JDBC and Hive connectors that sometimes happens when importing large datasets with large text fields
- Fixed overly high CPU memory usage for BERTTransformer on datasets with large number of rows when running on GPUs

Version 1.9.2.2 (April 7, 2021)

Available here

Bug Fixes:
- Fix creation of MOJO pipeline for features only (make_mojo_scoring_pipeline_for_features_only=true)

Version 1.9.2.1 (April 2, 2021)

Available here

New Features:
- Added ‘center’ and ‘standardize’ target transformers
- Added XGBoost RF model
- GPU OOM protection for XGBoost by automatic feature selection
- Added ability to run partial dependence with out-of-range data
- Added ability to view histogram of feature bins in partial dependence plot
- Added ability to construct quantile bins for partial dependence
- Added ability to directly calculate partial dependence for any feature in the UI
- Added Shapley Summary Plot for Original Features
- Added MOJO support for Shapley values for XGBoost(GBM/RF/Dart), LightGBM(GBM/RF/Dart/ExtraTrees), DecisionTree, and GLM models, and ensembles
Improvements:
- Various speedups, especially for datasets with many columns, and for final model fitting
- Improved accuracy for RuleFit and TensorFlow models
- Show in preview which selected models or transformers do not support MOJO, and show if MOJO was disabled
- Use holdout predictions for calculations in Disparate Impact Analysis
- Improved speed of partial dependence calculation for wide datasets
- No longer require columns that are missing at prediction time (for time-series lag-based experiments) to be filled with missing values for Python and client-based scoring (MOJO still needs those columns filled with missing values)
- Improved GPU utilization for multi-GPU systems
- Reduced memory overhead
Bug Fixes:
- Fix MOJO failures for BERT models
- Various bug fixes

Version 1.9.2 (March 8, 2021)

New Features:
- Optuna for model hyperparameter tuning as choice for genetic algorithm in expert options
- Show size of largest transformers in logs
- Optional stacking meta learner for final ensemble (“ExtraTrees” LightGBM model instead of linear blender), with optional cross-validation
- Optimize fold splits and show Kolmogorov-Smirnov statistics for target variable across folds
Improvements:
- Optimize for wide data with more columns than rows
- Improve fold splits for regression problems
- Reduce choice of tuned target transformers for regression problems with higher interpretability settings
- Disable auto-tuning of target transformations for regression problems unless interpretability <= 5 and accuracy >= 5 (facilitates interpretation of Shapley values)
- Use GPU(s) more for XGBoost algorithm, improve memory requirements estimation
- Improved genetic algorithm for feature evolution
- Project page view was updated
- Expert settings items show additional description upon mouse hover
- Use MOJO for PDP / ICE calculations
- No longer re-create MOJO when starting MLI experiments
Bug Fixes:
- Disable GPU based NLP and Image recipes for Ampere based GPUs (due to software backward incompatibility of Ampere GPUs), automatically fall back to CPU for Image and BERT Transformers and TensorFlow model.
- Fix C++ MOJO segfault
- Remove temporary files left behind by AutoReport
- Various bug fixes

Version 1.9.1.3 (Feb 27, 2021)

Available here

New Features:
- Added support for Keycloak authentication on the Azure data connector
Bug Fixes:
- Fixed a race condition when starting the docker container from Steam

Version 1.9.1.1 (Feb 21, 2021)

Available here

New Features:
- Added support for the H2O.ai License Manager (beta)
- Added unseen values in MLI partial dependence plots
- Added ability to download the new Python client from a local path for air-gapped installs (also exposed the new Python client for download from the UI)
Improvements:
- UI/UX improvements to MLI:
  Removed abbreviations from MLI explainer tile names
  
  Improved local explanation and row searches in MLI
  
  Improved MLI explainers error handling
  
  Changed MLI explainers log levels from DEBUG to INFO
- Improved logging for BERT migration
- Various documentation updates
Bug Fixes:
- Fixed the “New with same params” option in MLI
- Fixed feature selection for PD/ICE MLI explainer to not include categorical features
- Fixed the MLI explainers log display to not be truncated
- Fixed the MLI on-demand engine invocation so that it can reuse parent’s explainer artifacts
- Various MLI UI fixes
- Various MLI explainers fixes
- Fixed the outliers display in Autoviz
- Fixed the None values interpretation in parquet files
- Various package vulnerabilities fixes (CVE)
- Fixed creation of too small time-series validation splits for newly introduced validation scheme (Time-series expert settings)

Version 1.9.1 (Jan 15, 2021)

New Features:
- Automatically use MOJO for predictions after experiment is completed (now uses MOJO scoring pipeline for Predictions, MLI, Autoreport, Diagnostics and Python scoring pipeline if available and applicable)
- Added Shapley values for original features to Python scoring pipeline and GUI/client scoring (under Model Actions)
- Ensemble blending now performed in link space by default (such that logistic or softmax of Shapley sum equals probabilities). For regression, identity_noclip target transformation achieves same.
- Built-in recipe for monotonic GBM on original numeric features
- Let user drop features with weak correlation with the target when monotonicity constraints are enabled monotonicity_constraints_drop_low_correlation_features
- Added ability to run and configure AutoDoc from the MLI recipe selection page with the option to include k-LIME and/or Decision Tree Surrogate explainers in the AutoDoc
- Show first tree for all LightGBM and XGBoost models in MOJO visualization (not just for DecisionTree)
- Show size of tree models in MOJO visualization
- Allow creation of MOJO for engineered features only, only does pipeline transform() without model predict() (experimental)
- AutoDoc can now be configured to include information about Shapley Values for Original Features, Monotonicity Constraints, and Imbalanced Models
- Implemented detection of string columns that contain a high percentage of numeric values, and added expert setting to enable auto-conversion
- Global task list displaying all running jobs (Resources -> System Info -> Workers Activity -> CPU/GPU Experiments)
- MLI Features:
  Support for Bring Your Own Recipe (BYOR) / Custom Recipes for MLI (Git repo: Responsible ML)
  
  Exposed sampling parameter for all explainers in MLI expert settings
  
  Added MOJO support for k-LIME (with download option). See Download k-LIME MOJO Reason Code Pipeline
  
  Added ability to download raw k-LIME data from MLI UI. See Download LIME Reason Codes
  
  Added ability to change threshold for Disparate Impact Analysis in DIA expert settings
  
  Added ability to run PDP on out of range data, which a user can specify in MLI recipe expert settings
  
  Added max runtime parameter to Kernel Shapley in MLI expert settings. To access, enable Original Kernel SHAP recipe and enable Kernel Explainer to Obtain Shapley Values for Original Features and toggle the max runtime from the MLI expert settings.
  
  Added ability to run PD/ICE for multinomial models in DAI
  
  Added ability to run MLI TS in typical MLI view (IID)
  
  Added ability to see rules in Decision Tree surrogate model
- Dask/RAPIDS multi-GPU/multi-node training (beta):
  Multi-GPU and/or Multi-node distributed training with XGBoost/Dask (RAPIDS on GPU) (beta)
  
  Multi-GPU and/or Multi-node distributed hyperparameter optimization with Optuna/Dask/XGBoost (RAPIDS on GPU) (beta)
  
  Multi-node distributed LightGBM/Dask on CPUs (beta)
  
  RAPIDS cuML K-Means Clustering, Truncated SVD, and DBSCAN transfomerns(beta)
- Time-Series:
  Improved validation scheme for short forecast horizons (Time-series expert settings)
  
  Greatly improved speed of creating back-testing holdout predictions
  
  New DateTimeDiffTransformer for automatic feature engineering based on temporal differences between date/time columns
  
  Improved dropout logic used for LagsTransformer
  
  LagsTransformer is now aware of features that are known ahead of time. This allows to create smaller than horizon lags for them.
  
  Added user-controllable pools of lag sizes for each of the following types of features: target, non-targets that are unknown ahead of time and non-targets that are known ahead of time
  
  Expert setting value “[0]” for lag sizes can now be used to disable lags for the corresponding group of features
  
  Added option for automatic selection of date/datetime transformations to avoid unseen values in the future (Time-series expert settings)
  
  Added option to use fixed-size train timespans during internal validation (Time-series expert settings)
  
  Added check for time invariance of lag features per sub-series to avoid redundancy
Improvements:
- Significant performance improvements, reduced latency for subprocess communication and faster experiments for small data
- Significant UI/UX improvements to MLI
- Improved test coverage for custom recipe acceptance tests
- Improved performance of tf–idf based text transformers: lowered memory footprint, increased speed, and implemented user control for vocabulary size
- Improved performance and accuracy of RuleFit model
- Improved automatic time-series leaderboard (builds 10 experiments, and can be run iteratively to get interactions of optimal expert settings)
- Improved performance of MLI by using MOJO for PDP and ICE etc.
- Residuals in sensitivity analysis are now logloss residuals for binomial classification and square residuals for regression
- Improved MLI for NLP by adding the ability to backtrack all tokens to their respective column(s)
- AutoDoc can now be configured to include information about Shapley Values for Original Features, Monotonicity Constraints, and Imbalanced Models
- Allow creation of MOJO for engineered features only
- Disable zero-inflated models for regression when have only constant non-zero target values
- Improved handling of sparse target class distributions for experiments stratified by fold_column
- Improved genetic algorithm tournament mode defaults
- Disable feature brain by default for new experiments
- Upgraded XGBoost to version 1.4.0
- Upgraded datatable
- Upgraded many Python packages
- Improved logging of model fitting and predictions
- UI/UX improvements to Dataset Details page, adding dataset actions, data recipe autosave and download
- Various Web GUI UI/UX improvements and fixes
Bug Fixes:
- Honor fast approximation settings (enabled by default) for LightGBM Shapley contributions - led to slow final model holdout predictions for time-series
- Fixed in-GUI/client scoring for experiments containing LightGBM models that were created in 1.7.1/1.8.0 (scoring artifacts not affected)
- Fixed MOJO for (non-default) regression objectives for XGBoost and LightGBM: Gamma, Tweedie, Poisson, CoxPH
- Various other migration fixes for models created in 1.7.1+
- Honor expert settings for DecisionTreeModel (such as max depth etc.)
- Feature brain related fixes: all imported models are freshly scored at start and more conservative selection for time-series experiments
- Various Autoviz fixes, yielding high correlation for categorical features and other small bugs
- Various bug fixes

Version 1.9.0.6 (Dec 22, 2020)

Available here

Bug Fixes:
- Fixed an issue that caused columns marked as being unavailable at prediction time to be dropped when leakage was detected
- Fixed row querying on demand for out-of-sample data in MLI
- Fixed failure during final test set scoring for time-series experiments with test set containing partially missing target values

Version 1.9.0.5 (Dec 09, 2020)

Available here

New features:
- Added k-LIME MOJO
- Added ability to copy/paste data from Shapley plots
- Added ability to select features for PD/ICE in MLI expert settings
- Added ability to select feature type in MLI expert settings i.e to specify which feature should be treated as categorical/numeric, etc.
- Sensitivity analysis now calculates logloss residuals for classification and squared residuals for regression
Improvements:
- Improvements in Shapley visualizations
- DAI PDP features now preserve order in reference to feature importance json file and not in alphabetical order
- Improvement in DAI brain re-scoring determination
Bug Fixes:
- Fix MOJO for ZeroInflated models when target transformer is not identity
- Various MLI fixes

Version 1.9.0.4 (Oct 13, 2020)

Available here

Bug Fixes:
- Speedup application startup by optimizing database integrity startup checks
- Fix file system artifact export

Version 1.9.0.3 (Sep 28, 2020)

Available here

New Features:
- Added holiday calendar for 24 more countries, allow user to select list of countries to create is-holiday features for time series experiments
- Support rhel8-like systems
- Introducing an option to log in using the JWT token injected by the reverse proxy
- Allow user to specify data delimiter/separator from configuration (datatable_separator see config.toml file)
Improvements:
- Added an option to skip https certificate verification on MinIO connector
- Locales and language pack improvements
- Improved logging for connectors
- Improved logging of sensitive data from OIDC
Bug Fixes:
- Various MLI fixes

Version 1.9.0.2 (Sep 8, 2020)

Available here

Improvements:
- Enable GPU support for PyTorch (BERT) models on IBM Power
- Allow specification of destination file path for downloads from Python client
- Enable large data upload for R client
Bug Fixes:
- Fix OpenID and TLS login redirection when deploying behind reverse proxy

Version 1.9.0.1 (Aug 10, 2020)

Available here

Bug Fixes:
- Fix migration for certain time-series experiments
- Fix missing files for automatic image model
- Fix MLI job status for PDP/ICE
- Fix handling of ID column for MLI kernel shapley
- Fix exception handling for startup failures
- Constrain Python environment for standalone scoring package

Version 1.9.0 (July 27, 2020)

Available here

New Features:
- Multinode training (alpha)
- Queuing of experiments to avoid system overload
- Automatic Leaderboard: Single-button creation of a project with a series of diverse experiments
- Multi-layer hierarchical feature engineering:
  Allow optional pre-processing layer for specific custom data cleanup/conversions
  
  Subsequent layers take each previous layer’s output as input (can be numeric or categorical/string)
- PyTorch deep learning backend in addition to TensorFlow
- Image classification and regression with pre-trained and fine-tuned state-of-the-art Deep Learning models:
  Image data ingest from binary archives
  
  Archives can contain (one) optional .csv file with mapping of image paths to target (regression/classification)
  
  Automatic training dataset creation and label creation (from directory structure) if no .csv provided
  
  Image Transformers (for converting image path columns
  
  “densenet121”, “efficientnetb0”, “efficientnetb2”, “inception_v3”, “mobilenetv2”, “resnet34”, “resnet50”, “seresnet50”, “seresnext50”, “xception”
  
  Optional fine-tuning
  
  Optional GPU acceleration (strongly recommended when enabling fine-tuning)
  
  Pretrained and fine-tuneable ImageVectorizer transformer with automatic dimensionality reduction
  
  Images can be provided either as zipped archives, or as paths to local or remote locations (URIs)
  
  Automatic image labeling when importing zipped archives of images (based on folder names and structure)
  
  Can handle multiple image columns with URIs in a tabular dataset
  
  Single experiment can combine image, NLP and tabular data
  
  MOJO support (also for CPU-only systems)
  
  Automatic Image model
  
  End-to-end model training, no tuning needed
  
  State-of-the-art results with grandmaster techniques
  
  Neural architecture search based on pretrained and fine-tuned TensorFlow models
  
  Multi-GPU training
  
  Visual insights in GUI (losses, sample images, augmentation, Grad-CAM visual explanations)
  
  MLI is not available for image experiments and is a work in progress
- PyTorch BERT NLP pre-trained and fine-tuned state-of-the-art Deep Learning models:
  “bert-base-uncased”, “distilbert-base-uncased”, “xlnet-base-cased”, “xlm-mlm-enfr-1024”, “roberta-base”, “albert-base-v2”, “camembert-base”, “xlm-roberta-base”
  
  Optional GPU acceleration (strongly recommended)
  
  MOJO support (also for CPU-only systems)
  
  BERT transformers (for converting text columns into numeric features for other models like GBMs)
  
  BERT models (when only have one text column)
- AutoReport now includes the following:
  Information about the time series validation strategy
  
  Experiment lineage (model lineage plot)
  
  NLP/Image architecture details
- Zero-inflated regression models for insurance use cases (combination of classification + regression models)
- Time series centering and de-trending transformations:
  Inner ML model is trained on residuals after fitting and removing trend from target signal (per time-series group)
  
  Support for constant (centering), linear and logistic trends
  
  SEIRD model for epidemic modeling of (S)usceptible, (E)xposed, (I)nfected, (R)ecovered and (D)eceased, fully configurable lower/upper bounds for model parameters
- Graphical config.toml editor for expert settings
- Empiric prediction intervals for regression problems with user-defined confidence levels (based on holdout predictions)
- Insights tab with helpful visualizations (currently only for time-series and image problems)
- For binary classification problems with F05, F1, F2, MCC scorers, use the same metric for optimal threshold determination
- Custom data recipes can now be part of the experiment’s modeling pipeline, and will be part of the Python scoring package
- Custom visualizations in AutoViz following the Grammar of Graphics
- Pass data to (custom) scorers, so can access other columns, not only actual and predicted values
- Added many new scorers for common regression and classification metrics out of the box
- Added holiday calendar for 24 more countries, allow user to select list of countries to create is-holiday features for.
- Added identity_no_clip target transformer for regression problems that never clips the predictions to observed ranges and allows extrapolation
- MLI:
  New GUI/UX for MLI
  
  Added Kernel Explainer for original feature Shapley importance
  
  Added ability to download Shapley values for original features from UI as CSV file
  
  Added intercept column to k-LIME output CSV file
  
  Added ability to run surrogate models on DAI model residuals to help debug model errors
  
  Added ability to export Decision Tree Surrogate model rules as text and Python code
  
  Added Decision Tree Surrogate model for multinomial experiments
  
  Added Leave One Covariate Out (LOCO) for multinomial experiments
  
  Added two traditional fair lending metrics for Disparate Impact Analysis (DIA): Standardized Mean Difference (SMD) and Marginal Error (ME)
  
  Added two interpretable model recipes to https://github.com/h2oai/driverlessai-recipes: GA2M and XNN (https://github.com/h2oai/driverlessai-recipes/tree/master/models/mli)
  
  Display prediction label for binary classification experiments in MLI summary page
Improvements:
- Improved parsability (machine readability) of log files
- Custom recipes are now only visible to the user that created them, previously created custom recipes remain globally visible
- Faster time-series experiments
- Improve preview to show more details about modeling part of final pipeline
- Improved notifications system
- Reduced size of MOJO
- Only allow imbalanced sampling techniques when data is larger than user controllable threshold
- Upgraded to latest H2O-3 backend for custom recipes
- Faster feature selection for large imbalanced datasets
Documentation updates:
- Added animated GIFs
- Added tabbed content
- Added more details for imbalanced sampling methods for binary classification
- New content (refer to above linked topics)
Bug fixes:
- Various bug fixes

Version 1.8.10 LTS (Feb 19, 2020)

New Features:
- Exposed new Python client for download in resources menu
- Added support for .avro file format
- Added option to generate multiple AutoDocs. This can be set using the option autodoc_template in config.toml and setting it to a list of AutoDoc file paths
MOJO updates:
- Upgraded MOJO runtime dependency to 2.5.10
- Added MOJO support to compute Shapley for Tree and Linear based boosting models
Improvements:
- Added more verbosity to MLI logs
Bug Fixes:
- Fixed stall detected in LightGBM models on P2.8x Amazon EC2 instances

Version 1.8.9 LTS (Oct 19, 2020)

Available here

New Features:
- Add configurable CSRF (Cross-site request forgery) protection on API endpoints
- Add protection against concurrent sessions
Improvements:
- Hide webserver technology info from all API endpoints
- Improved BYOR security by introducing configurable static analysis of the code
- Improved session verification and authenticity
- Improved security for internals API handlers via encryption
Bug Fixes:
- Fix user session autologout after session expiration
- Fix for properly cleaning closed sessions
- Fix invalid redirection to static artifacts when using reverse proxy and URL prefix
- Fix import of files without extension

Version 1.8.8 LTS (Sep 30, 2020)

Available here

New Features:
- Give user control over number of saved variable importances (i.e., Python and R clients can get more than 14 values back) (max_varimp_to_save in config.toml file)
- Added holiday calendar for 24 more countries, allow user to select list of countries to create is-holiday features for time series experiments
- Enable GPU support for LightGBM models on IBM Power
- Support rhel8-like systems
- Introducing an option to log in using the JWT token injected by the reverse proxy
- Allow user to specify data delimiter/separator from configuration (datatable_separator see config.toml file)
- Add support of encrypted keystore for sensitive config.toml values. Currently only available for LTS releases (1.8.8 and later)
- Save transformed column names for Shapley value computation in MOJO
Improvements:
- Add more consistency in handling files without an extension
- Improve web server request handling and disallow redirection outside of application
- Improve log file formatting to facilitate parsing
- Improve logging for connectors
- Improve air-gapped support for custom recipes
- Allow Snowflake Stage tables to be optional
Bug Fixes:
- Fix OpenID and TLS login redirection when deploying behind reverse proxy
- Fix Cgroup memory detection on IBM Power
- Various MLI fixes
- Various UI fixes
Documentation updates:
- Added configuration security document
- Added details about MOJO Scoring Command-Line Options
- Added details about exporting artifacts to Bitbucket
- Added details about GPU usage in Driverless AI
- Improved install docs (RHEL 8 support)
- Improved description of transformed feature names

Version 1.8.7.2 LTS (July 13, 2020)

Available here

Bug Fixes:
- Add and pass authentication_method parameter to use proper get_true_username and start_session
- SQL-like connector: strip unnecessary semi-colon from the end of query
Documentation updates:
- Document use of hive_app_jvm_args

Version 1.8.7.1 LTS (June 23, 2020)

Available here

New Features:
- Add ability to push artifacts to a Bitbucket server
- Add per-feature user control for monotonicity constraints for XGBoostGBM, LightGBM and DecisionTree models
Bug Fixes:
- Fix Hive kerberos impersonation
- Fix a DTap connector issue by using the proper login username for impersonation
- Fix monotonicity constraints for XGBoostGBM, LightGBM and DecisionTree models

Version 1.8.7 LTS (June 15, 2020)

Available here

New Features:
- Add intercept term to k-LIME csv
- Add control of default categorical & numeric feature rendering in DAI PD/ICE
- Add ability to restrict custom recipe upload to a specific git repository and branch
- Add translations for Korean and Chinese
- Add ability to use multiple authentication methods simultaneously
Improvements:
- Improve behavior of systemctl in the case Driverless AI fails to start
- Improve logging behavior for JDBC and Hive connectors
- Improve behavior of C++ scorer, fewer unnecessary files saved in tmp directory
- Improve Docker image behavior in Kubernetes
- Improve LDAP authentication to allow for anonymous binding
- Improve speed of feature selection for experiments on large, wide, imbalanced datasets
- Improve speed of data import on busy system
Bug fixes:
- Fix automatic Kaggle submission and score retrieval
- Fix intermittent Java exception seen by surrogate DRF model in MLI when several MLI jobs are run concurrently
- Fix issue with deleting Deployments if linked Experiment was deleted
- Fix issue causing Jupyter Notebooks to not work properly in Docker Image
- Fix custom recipe scorers not being displayed on Diagnostics page
- Fix issue with AWS Lambda Deployment not handling dropped columns properly
- Fix issue with not being able to limit number of GPUs for specific experiment
- Fix in-server scoring inaccuracies for certain models built in 1.7.1 and 1.8.0 (standalone scoring not affected)
- Fix rare datatable type casting exception
Documentation updates:
- The “Maximum Number of Rows to Perform Permutation-Based Feature Selection” expert setting now has a default value of 500,000
- Improved Hive and Snowflake connector documentation
- Updated the Main.java example in the Java Scoring Pipeline chapter
- Added documentation describing how to change the language in the UI before starting the application
- Added information about how custom recipes are described and documented in the Autoreport
- Updated the LDAP authentication documentation
- Improved the Linux DEB and RPM installation instructions
- Improved the AWS Community AMI installation instructions
- Improved documentation for the Reproducible button

Version 1.8.6 LTS (Apr 30, 2020)

Available here

New Features:
- Add expert setting to reduce size of MOJO scoring pipelines (and hence reduce latency and memory usage for inference)
- Enable Lambda deployment for IBM Power
- Add restart button for Deployments
- Add automatic Kaggle submission for supported datasets, show private/public scores (requires Kaggle API Username/Key)
- Show warning if single final model is worse on back-testing splits (for time series) or cross-validation folds (for IID) than the fold models (indicates issue with signal or fit)
- Update R client API to include autodoc, experiment preview, dataset download, autovis functions
- Add button in expert settings that toggle some effective settings to make a small MOJO production pipeline
- Add an option to upload artifacts to S3 or a Git repository
Improvements:
- Improve experiment restart/refit robustness if model type is changed
- Extra protection against dropping features
- Improve implementation of Hive connector
Bug fixes:
- Upgrade datatable to fix endless loop during stats calculation at file import
- Web server and UI now respect dynamic base URL suffix
- Fix incorrect min_rows in MLI when providing weight column with small values
- Fix segfault in MOJO for TensorFlow/PyTorch models
- Fix elapsed time for MLI
- Enable GPU by default for R client
- Fix Python scoring h2oai ModuleNotFound error
- Update no_drop_features toml and expert button to more generally avoid dropping features
- Fix datatable mmap strategy
Documentation updates:
- Add documentation for enabling the Hive data connector
- Add documentation for updating expired DAI licenses on AWS Lambda deployments using a script
- Documentation for uploading artifacts now includes support for S3 and Git in the artifacts store
- Improve documentation for one-hot encoding
- Improve documentation for systemd logs/journalctl
- Improve documentation for time series ‘unavailable columns at prediction time’
- Improve documentation for Azure blob storage
- Improve documentation for MOJO scoring pipeline
- Add information about reducing the size of a MOJO using a new expert setting

Version 1.8.5 LTS (Mar 09, 2020)

Available here

New Features:
- Handle large (up to 10k) multiclass problems, including GUI improvements in such cases
- Detect class imbalance for binary problems where target class is non-rare
- Add feature count to iteration panel
- Add experiment lineage pdf in experiment summary zip file
- Issue warnings if final pipeline scores are unstable across (cross-)validation folds
- Issue warning if Constant Model is improving quality of final pipeline (sign of bad signal)
- Report origin of leakage detection as from model fit (AUC/R2), Gini, or correlation
Improvements:
- Improve handling of ID columns
- Improve exception handling to improve stability of raising Python exceptions
- Improve exception handling when any individual transformer or model throw exception or segfaults
- Improve robustness of restart and refit experiment to changes in experiment choices
- Improve handling of missing values when transforming dataset
- Improve robustness of custom recipe importing of modules
- Improve documentation for installation instructions
- Improve selection of initial lag sizes for time series
- Improve LightGBM stability for regression problems for certain mutation parameters
Documentation updates:
- Improved documentation for time-series experiments
- Added topics describing how to re-enable the Data Recipe URL and Data Recipe File connectors
- For users running older versions of the Standalone Python Scoring Pipeline, added information describing how to install upgraded versions of outdated dependencies
- Improved the description for the “Sampling Method for Imbalanced Binary Classification Problems” expert setting
- Added constraints related to the REST server deployments
- Noted required vs optional parameters in the HDFS connector topics
- Added an FAQ indicating that MOJOs are thread safe
- On Windows 10, only Docker installs are supported
- Added information about the Recommendations AutoViz graph
- Added information to the Before you Begin Installing topic that master.db files are not backward compatible with earlier Driverless AI versions
Bug fixes:
- Update LightGBM for bug fixes, including hangs and avoiding hard-coded library paths
- Stabilize use of psutil package
- Fix time-series experiments when test set has missing target values
- Fix Python scoring to not depend upon original data_directory
- Fix preview for custom time series validation splits and low accuracy
- Fix ignored minimum lag size setting for single time series
- Fix parsing of Excel files with datetime columns
- Fix column type detection for columns with mostly missing values
- Fix invalid display of 0.0000 score in iteration scores
- Various MLI fixes (don’t show invalid graphs, fix PDP sort order, overlapping labels)
- Various bug fixes

Version 1.8.4.1 LTS (Feb 4, 2020)

Available here

Add option for dynamic port allocation
Documentation for AWS community AMI
Various bug fixes (MLI UI)

Version 1.8.4 LTS (Jan 31, 2020)

Available here

New Features:
- Added ‘Scores’ tab in experiment page to show detailed tuning tables and scores for models and folds
- Added Constant Model (constant predictions) and use it as reference model by default
- Show score of global constant predictions in experiment summary as reference
- Added support for setting up mutual TLS for the DriverlessAI
- Added option to use client/personal certificate as an authentication method
Documentation Updates:
- Added sections for enabling mTLS and Client Certificate authentication
- Constant Models is now included in the list of Supported Algorithms
- Added a section describing the Model Scores page
- Improved the C++ Scoring Pipeline documentation describing the process for importing datatable
- Improved documentation for the Java Scoring Pipeline
Bug fixes:
- Fix refitting of final pipeline when new features are added
- Various bug fixes

Version 1.8.3 LTS (Jan 22, 2020)

Available here

Added option to upload experiment artifacts to a configured disk location
Various bug fixes (correct feature engineering from time column, migration for brain restart)

Version 1.8.2 LTS (Jan 17, 2020)

Available here

New Features:
- Decision Tree model
- Automatically enabled for accuracy <= 7 and interpretability >= 7
- Supports all problem types: regression/binary/multiclass
- Using LightGBM GPU/CPU backend with MOJO
- Visualization of tree splits and leaf node decisions as part of pipeline visualization
- Per-Column Imputation Scheme (experimental)
- Select one of [const, mean, median, min, max, quantile] imputation scheme at start of experiment
- Select method of calculation of imputation value: either on entire dataset or inside each pipeline’s training data split
- Disabled by default and must be enabled at startup time to be effective
- Show MOJO size and scoring latency (for C++/R/Python runtime) in experiment summary
- Automatically prune low weight base models in final ensemble (based on interpretability setting) to reduce final model complexity
- Automatically convert non-raw github URLs for custom recipes to raw source code URLs
Improvements:
- Speed up feature evolution for time-series and low-accuracy experiments
- Improved accuracy of feature evolution algorithm
- Feature transformer interpretability, total count, and importance accounted for in genetic algorithm’s model and feature selection
- Binary confusion matrix in ROC curve of experiment page is made consistent with Diagnostics (flipped positions of TP/TN)
- Only include custom recipes in Python scoring pipeline if the experiment uses any custom recipes
- Additional documentation (New OpenID config options, JDBC data connector syntax)
- Improved AutoReport’s transformer descriptions
- Improved progress reporting during Autoreport creation
- Improved speed of automatic interaction search for imbalanced multiclass problems
- Improved accuracy of single final model for GLM and FTRL
- Allow config_overrides to be a list/vector of parameters for R client API
- Disable early stopping for Random Forest models by default, and expose new ‘rf_early_stopping’ mode (optional)
- Create identical example data (again, as in 1.8.0 and before) for all scoring pipelines
- Upgraded versions of datatable and Java
- Installed graphviz in Docker image, now get .png file of pipeline visualization in MOJO package and Autoreport. Note: For RPM/DEB/TAR SH installs, user can install graphviz to get this optional functionality
Documentation Updates:
- Added a simple example for modifying a dataset by recipe using live code
- Added a section describing how to impute datasets (experimental)
- Added Decision Trees to list of supported algorithms
- Fixed examples for enabling JDBC connectors
- Added information describing how to use a JDBC driver that is not tested in house
- Updated the Missing Values Handling topic to include sections for “Clustering in Transformers” and “Isolation Forest Anomaly Score Transformer”
- Improved the “Fold Column” description
Bug Fixes:
- Fix various reasons why final model score was too far off from best feature evolution score
- Delete temporary files created during test set scoring
- Fixed target transformer tuning (was potentially mixing up target transformers between feature evolution and final model)
- Fixed tensorflow_nlp_have_gpus_in_production=true mode
- Fixed partial dependence plots for missing datetime values and no longer show them for text columns
- Fixed time-series GUI for quarterly data
- Feature transformer exploration limited to no more than 1000 new features (Small data on 10/10/1 would try too many features)
- Fixed Kaggle pipeline building recipe to try more input features than 8
- Fixed cursor placement in live code editor for custom data recipe
- Show correct number of cross-validation splits in pipeline visualization if have more than 10 splits
- Fixed parsing of datetime in MOJO for some datetime formats without ‘%d’ (day)
- Various bug fixes
Backward/Forward compatibility:
- Models built in 1.8.2 LTS will remain supported in upcoming versions 1.8.x LTS
- Models built in 1.7.1/1.8.0/1.8.1 are not deprecated and should continue to work (best effort is made to preserve MOJO and Autoreport creation, MLI, scoring, etc.)
- Models built in 1.7.0 or earlier will be deprecated

Version 1.8.1.1 (Dec 21, 2019)

Available here

Bugfix for time series experiments with quarterly data when launched from GUI

Version 1.8.1 (Dec 10, 2019)

Available here

New Features:
- Full set of scoring metrics and corresponding downloadable holdout predictions for experiments with single final models (time-series or i.i.d)
- MLI Updates:
  - What-If (sensitivity) analysis
  - Interpretation of experiments on text data (NLP)
- Custom Data Recipe BYOR:
  - BYOR (bring your own recipe) in Python: pandas, numpy, datatable, third-party libraries for fast prototyping of connectors and data preprocessing inside DAI
  - data connectors, cleaning, filtering, aggregation, augmentation, feature engineering, splits, etc.
  - can create one or multiple datasets from scratch or from existing datasets
  - interactive code editor with live preview
  - example code at https://github.com/h2oai/driverlessai-recipes/tree/rel-1.8.1/data
- Visualization of final scoring pipeline (Experimental)
  - In-GUI display of graph of feature engineering, modeling and ensembling steps of entire machine learning pipeline
  - Addition to Autodoc
- Time-Series:
  - Ability to specify which features will be unavailable at test time for time-series experiments
  - Custom user-provided train/validation splits (by start/end datetime for each split) for time-series experiments
  - Back-testing metrics for time-series experiments (regression and classification, with and without lags) based on rolling windows (configurable number of windows)
- MOJO:
  - Java MOJO for FTRL
  - PyTorch MOJO (C++/Py/R) for custom recipes based on BERT/DistilBERT NLP models (available upon request)
Improvements:
- Accuracy:
  - Automatic pairwise interaction search (+,-,*,/) for numeric features (“magic feature” finder)
  - Improved accuracy for time series experiments with low interpretability
  - Improved leakage detection logic
  - Improved genetic algorithm heuristics for feature evolution (more exploration)
- Time-Series Recipes:
  - Re-enable Test-time augmentation in Python scoring pipeline for time-series experiments
  - Reduce default number of time-series rolling holdout predictions to same number as validation splits (but configurable)
- Computation:
  - Faster feature evolution part for non-time-series experiments with single final model
  - Faster binary imbalanced models for very high class imbalance by limiting internal number of re-sampling bags
  - Faster feature selection
  - Enable GPU support for ImbalancedXGBoostGBMModel
  - Improved speed for importing multiple files at once
  - Faster automatic determination of time series properties
  - Enable use of XGBoost models on large datasets if low enough accuracy settings, expose dataset size limits in expert settings
  - Reduced memory usage for all experiments
  - Faster creation of holdout predictions for time-series experiments (Shapley values are now computed by MLI on demand by default)
- UX Improvements:
  - Added ability to rename datasets
  - Added search bar for expert settings
  - Show traces for long-running experiments
  - All experiments create a MOJO (if possible, set to ‘auto’)
  - All experiments create a pipeline visualization
  - By default, all experiments (iid and time series) have holdout predictions on training data and a full set of metrics for final model
Documentation Updates:
- Updated steps for enabling GPU persistence mode
- Added information about deprecated NVIDIA functions
- Improved documentation for enabling LDAP authentication
- Added information about changing the column type in datasets
- Updated list of experiment artifacts available in an experiment summary
- Added steps describing how to expose ports on Docker for the REST service deployment within the Driverless AI Docker container
- Added an example showing how to run an experiment with a custom transform recipe
- Improved the FAQ for setting up TLS/SSL
- Added FAQ describing issues that can occur when attempting Import Folder as File with a data connector on Windows
Bug Fixes:
- Allow brain restart/refit to accept unscored previous pipelines
- Fix actual vs predicted labeling for diagnostics of regression model
- Fix MOJO for TensorFlow for non target transformers other than identity
- Fix column type detection for Excel files
- Allow experiments with default expert settings to have a MOJO
- Various bug fixes

Version 1.8.0 (Oct 3, 2019)

Available here

Improve speed and memory usage for feature engineering
Improve speed of leakage and shift detection, and improve accuracy
Improve speed of AutoVis under high system load
Improve speed for experiments with large user-given validation data
Improve accuracy of ensembles for regression problems
Improve creation of Autoreport (only one background job per experiment)
Improve sampling techniques for ImbalancedXGBoost and ImbalancedLightGBM models, and disable them by default since can be slower
Add Python/R/C++ MOJO support for FTRL and RandomForest
Add native categorical handling for LightGBM in CPU mode
Add monotonicity constraints support for LightGBM
Add Isolation Forest Anomaly Score transformer (outlier detection)
Re-enable One-Hot-Encoding for GLM models
Add lexicographical label encoding (disabled by default)
Add ability to further train user-provided pretrained embeddings for TensorFlow NLP transformers, in addition to fine-tuning the rest of the neural network graph
Add timeout for BYOR acceptance tests
Add log and notifications for large shifts in final model variable importances compared to tuning model
Add more expert control over time series feature engineering
Add ability for recipes to be uploaded in bulk as entire (or part of) github repository or as links to Python files on page
Allow missing values in fold column
Add support for feature brain when starting “New Model With Same Parameters” of a model that was previously restarted
Add support for toggling whether additional features are to be included in pipeline during “Retrain Final Pipeline”
Limit experiment runtime to one day by default (approximately enforced, can be configured in Expert Settings -> Experiment or config.toml ‘max_runtime_minutes’)
Add support for importing pickled Pandas frames (.pkl)
MLI updates:
- Show holdout predictions and test set predictions (if applicable) in MLI TS for both metric and actual vs. predicted charts
- Add ability to download group metrics in MLI TS
- Add ability to zoom into charts in MLI TS
- Add ability to use column not used in DAI model as a k-LIME cluster column in MLI
- Add ability to view original and transformed DAI model-based feature importance in MLI
- Add ability to view Shapley importance for original features
- Add ability to view permutation importance for a DAI model when the config option autodoc_include_permutation_feature_importance is set to on
- Fixed bug in binary Disparate Impact Analysis, which caused incorrect calculations amongst several metrics (ones using false positives and true negatives in the numerator)
Disable NLP TensorFlow transformers by default (enable in NLP expert settings by switching to “on”)
Reorganize expert settings, add tab for feature engineering
Experiment now informs if aborted by user, system or server restart
Reduce load of all tasks launched by server, giving priority to experiments to use cores
Add experiment summary files to aborted experiment logs
Add warning when ensemble has models that reach limit of max iterations despite early stopping, with learning rate controls in expert panel to control.
Improve progress reporting
Allow disabling of H2O recipe server for scoring if not using custom recipes (to avoid Java dependency)
Fix RMSPE scorer
Fix recipes error handling when uploading via URL
Fix Autoreport being spawned anytime GUI was on experiment page, overloading the system with forks from the server
Fix time-out for Autoreport PDP calculations, so completes more quickly
Fix certain config settings to be honored from GUI expert settings (woe_bin_list, ohe_bin_list, text_gene_max_ngram, text_gene_dim_reduction_choice, tensorflow_max_epochs_nlp, tensorflow_nlp_pretrained_embeddings_file_path, holiday_country), previously were only honored when provided at startup time
Fix column type for additional columns during scored test set download
Fix GUI incorrectly converting time for forecast horizon in TS experiments
Fix calculation of correlation for string columns in AutoVis
Fix download for R MOJO runtime
Fix parameters for LightGBM RF mode
Fix dart parameters for LightGBM and XGBoost
Documentation updates:
- Included more information in the Before You Begin Installing or Upgrading topic to help making installations and upgrades go more smoothly
- Added topic describing how to choose between the AWS Community and AWS Marketplace AMIs
- Added information describing how to retrieve the MOJO2 Javadoc
- Updated Python client examples to work with Driverless AI 1.7.x releases
- Updated documentation for new features, expert settings, MLI plots, etc.
Backward/Forward compatibility:
- Models built in 1.8.0 will remain supported in versions 1.8.x
- Models built in 1.7.1 are not deprecated and should continue to work (best effort is made to preserve MOJO and Autoreport creation, MLI, scoring, etc.)
- 1.8.0 upgraded to scipy version 1.3.1 to support newer custom recipes. This might deprecate custom recipes that depend on scipy version 1.2.2 (and experiments using them) and might require re-import of those custom recipes. Previously built Python scoring pipelines will continue to work.
- Models built in 1.7.0 or earlier will be deprecated
Various bug fixes

Version 1.7.1 (Aug 19, 2019)

Available here

Added two new models with internal sampling techniques for imbalanced binary classification problems: ImbalancedXGBoost and ImbalancedLightGBM
Added support for rolling-window based predictions for time-series experiments (2 options: test time augmentation or re-fit)
Added support for setting logical column types for a dataset (to override type detection during experiments)
Added ability to set experiment name at start of experiment
Added leakage detection for time-series problems
Added JDBC connector
MOJO updates:
- Added Python/R/C++ MOJO support for TensorFlow model
- Added Python/R/C++ MOJO support for TensorFlow NLP transformers: TextCNN, CharCNN, BiGRU, including any pretrained embeddings if provided
- Reduced memory usage for MOJO creation
- Increased speed of MOJO creation
- Configuration options for MOJO and Python scoring pipelines now have 3-way toggle: “on”/”off”/”auto”
MLI updates:
- Added disparate impact analysis (DIA) for MLI
- Allow MLI scoring pipeline to be built for datasets with column names that need to be sanitized
- Date-aware binning for partial dependence and ICE in MLI
Improved generalization performance for time-series modeling with regulariation techniques for lag-based features
Improved “predicted vs actual” plots for regression problems (using adaptive point sizes)
Fix bug in datatable for manipulations of string columns larger than 2GB
Fixed download of predictions on user-provided validation data
Fix bug in time-series test-time augmentation (work-around was to include entire training data in test set)
Honor the expert settings flag to enable detailed traces (disable again by default)
Various bug fixes

Version 1.6.4 LTS (Aug 19, 2019)

Available here

ML Core updates:
- Speed up schema detection
- DAI now drops rows with missing values when diagnosing regression problems
- Speed up column type detection
- Fixed growth of individuals
- Fixed n_jobs for predict
- Target column is no longer included in predictors for skewed datasets
- Added an option to prevent users from downloading data files locally
- Improved UI split functionality
- A new “max_listing_items” config option to limit the number of items fetched in listing pages
Model Ops updates:
- MOJO runtime upgraded to version 2.1.3 which supports perpetual MOJO pipeline
- Upgraded deployment templates to version matching MOJO runtime version
MLI updates:
- Fix to MLI schema builder
- Fix parsing of categorical reason codes
- Added ability to handle integer time column
Various bug fixes

Version 1.7.0 (Jul 7, 2019)

Available here

Support for Bring Your Own Recipe (BYOR) for transformers, models (algorithms) and scorers
Added protobuf-based MOJO scoring runtime libraries for Python, R and Java (standalone, low-latency)
Added local REST server as one-click deployment option for MOJO scoring pipeline, in addition to AWS Lambda endpoint
Added R client package, in addition to Python client
Added Project workspace to group datasets and experiments and to visually compare experiments and create leaderboards
Added download of imported datasets as .csv
Recommendations for columnar transformations in AutoViz
Improved scalability and performance
Ability to provide max. runtime for experiments
Create MOJO scoring pipeline by default if the experiment configuration allows (for convenience, enables local/cloud deployment options without user input)
Support for user provided pre-trained embeddings for TensorFlow NLP models
Support for holdout splits lacking some target classes (can happen when a fold column is provided)
MLI updates:
- Added residual plot for regression problems (keeping all outliers intact)
- Added confusion matrix as default metric display for multinomial problems
- Added Partial Dependence (PD) and Individual Conditional Expectation (ICE) plots for Driverless.ai models in MLI GUI
- Added ability to search by ID column in MLI GUI
- Added ability to run MLI PD/ICE on all features
- Added ability to handle multiple observations for a single time column in MLI TS by taking the mean of the target and prediction where applicable
- Added ability to handle integer time column in MLI TS
- MLI TS will use train holdout predictions if there is no test set provided
Faster import of files with “%Y%m%d” and “%Y%m%d%H%M” time format strings, and files with lots of text strings
Fix units for RMSPE scorer to be a percentage (multiply by 100)
Allow non-positive outcomes for MAPE and SMAPE scorers
Improved listing in GUI
Allow zooming in GUI
Upgrade to TensorFlow 1.13.1 and CUDA 10 (and CUDA is part of the distribution now, to simplify installation)
Add CPU-support for TensorFlow on PPC
Documentation updates:
- Added documentation for new features including
  - Projects
  - Custom Recipes
  - C++ MOJO Scoring Pipelines
  - R Client API
  - REST Server Deployment
- Added information about variable importance values on the experiments page
- Updated documentation for Expert Settings
- Updated “Tips n Tricks” with new Scoring Pipeline tips
Various bug fixes

Version 1.6.3 LTS (June 14, 2019)

Available here

Included an Audit log feature
Fixed support for decimal types for parquet files in MOJO
Autodoc can order PDP/ICE by feature importance
Session Management updates
Upgraded datatable
Improved reproducibility
Model diagnostics now uses a weight column
MLI can build surrogate models on all the original features or on all the transformed features that DAI uses
Internal server cache now respects usernames
Fixed an issue with time series settings
Fixed an out of memory error when loading a MOJO
Fixed Python scoring package for TensorFlow
Added OpenID configurations
Documentation updates:
- Updated the list of artifacts available in the Experiment Summary
- Clarified language in the documentation for unsupported (but available) features
- For the Terraform requirement in deployments, clarified that only Terraform versions in the 0.11.x release are supported, and specifically 0.11.10 or greater
- Fixed link to the Miniconda installation instructions
Various bug fixes

Version 1.6.2 LTS (May 10, 2019)

Available here

This version provides PPC64le artifacts
Improved stability of datatable
Improved path filtering in the file browser
Fixed units for RMSPE scorer to be a percentage (multiply by 100)
Fixed segmentation fault on Ubuntu 18 with installed font package
Fixed IBM Spectrum Conductor authentication
Fixed handling of EC2 machine credentials
Fixed of Lag transformer configuration
Fixed KDB and Snowflake Error Reporting
Gradually reduce number of used workers for column statistics computation in case of failure.
Hide default Tornado header exposing used version of Tornado
Documentation updates:
- Added instructions for installing via AWS Marketplace
- Improved documentation for installing via Google Cloud
- Improved FAQ documentation
- Added Data Sampling documentation topic
Various bug fixes

Version 1.6.1.1 LTS (Apr 24, 2019)

Available here

Fix in AWS role handling.

Version 1.6.1 LTS (Apr 18, 2019)

Available here

Several fixes for MLI (partial dependence plots, Shapley values)
Improved documentation for model deployment, time-series scoring, AutoVis and FAQs

Version 1.6.0 LTS (Apr 5, 2019)

Private build only.

Fixed import of string columns larger than 2GB
Fixed AutoViz crashes on Windows
Fixed quantile binning in MLI
Plot global absolute mean Shapley values instead of global mean Shapley values in MLI
Improvements to PDP/ICE plots in MLI
Validated Terraform version in AWS Lambda deployment
Added support for NULL variable importance in AutoDoc
Made Variable Importance table size configurable in AutoDoc
Improved support for various combinations of data import options being enabled/disabled
CUDA is now part of distribution for easier installation
Security updates:
- Enforced SSL settings to be honored for all h2oai_client calls
- Added config option to prevent using LocalStorage in the browser to cache information
- Upgraded Tornado server version to 5.1.1
- Improved session expiration and autologout functionality
- Disabled access to Driverless AI data folder in file browser
- Provided an option to filter content that is shown in the file browser
- Use login name for HDFS impersonation instead of predefined name
- Disabled autocomplete in login form
Various bug fixes

Version 1.5.4 (Feb 24, 2019)

Available here

Speed up calculation of column statistics for date/datetime columns using certain formats (now uses ‘max_rows_col_stats’ parameter)
Added computation of standard deviation for variable importances in experiment summary files
Added computation of shift of variable importances between feature evolution and final pipeline
Fix link to MLI Time-Series experiment
Fix display bug for iteration scores for long experiments
Fix display bug for early finish of experiment for GLM models
Fix display bug for k-LIME when target is skewed
Fix display bug for forecast horizon in MLI for Time-Series
Fix MLI for Time-Series for single time group column
Fix in-server scoring of time-series experiments created in 1.5.0 and 1.5.1
Fix OpenBLAS dependency
Detect disabled GPU persistence mode in Docker
Reduce disk usage during TensorFlow NLP experiments
Reduce disk usage of aborted experiments
Refresh reported size of experiments during start of application
Disable TensorFlow NLP transformers by default to speed up experiments (can enable in expert settings)
Improved progress percentage shown during experiment
Improved documentation (upgrade on Windows, how to create the simplest model, DTap connectors, etc.)
Various bug fixes

Version 1.5.3 (Feb 8, 2019)

Available here

Added support for splitting datasets by time via time column containing date, datetime or integer values
Added option to disable file upload
Require authentication to download experiment artifacts
Automatically drop predictor columns from training frame if not found in validation or test frame and warn
Improved performance by using physical CPU cores only (configurable in config.toml)
Added option to not show inactive data connectors
Various bug fixes

Version 1.5.2 (Feb 2, 2019)

Available here

Added world-level bidirectional GRU Tensorflow models for NLP features
Added character-level CNN Tensorflow models for NLP features
Added support to import multiple individual datasets at once
Added support for holdout predictions for time-series experiments
Added support for regression and multinomial classification for FTRL (in addition to binomial classification)
Improved scoring for time-series when test data contains actual target values (missing target values will be predicted)
Reduced memory usage for LightGBM models
Improved performance for feature engineering
Improved speed for TensorFlow models
Improved MLI GUI for time-series problems
Fix final model fold splits when fold_column is provided
Various bug fixes

Version 1.5.1 (Jan 22, 2019)

Available here

Fix MOJO for GLM
Add back .csv file of experiment summary
Improve collection of pipeline timing artifacts
Clean up Docker tag

Version 1.5.0 (Jan 18, 2019)

Available here

Added model diagnostics (interactive model metrics on new test data incl. residual analysis for regression)
Added FTRL model (Follow The Regularized Leader)
Added Kolmogorov-Smirnov metric (degree of separation between positives and negatives)
Added ability to retrain (only) the final model on new data
Added one-hot encoding for low-cardinality categorical features, for GLM
Added choice between 32-bit (now default) and 64-bit precision
Added system information (CPU, GPU, disk, memory, experiments)
Added support for time-series data with many more time gaps, and with weekday-only data
Added one-click deployment to Amazon Lambda
Added ability to split datasets randomly, with option to stratify by target column or group by fold column
Added support for OpenID authentication
Added connector for BlueData
Improved responsiveness of the GUI under heavy load situations
Improved speed and reduce memory footprint of feature engineering
Improved performance for RuleFit models and enable GPU and multinomial support
Improved auto-detection of temporal frequency for time-series problems
Improved accuracy of final single model if external validation provided
Improved final pipeline if external validation data is provided (add ensembling)
Improved k-LIME in MLI by using original features deemed important by DAI instead of all original features
Improved MLI by using 3-fold CV by default for all surrogate models
Improved GUI for MLI time series (integrated help, better integration)
Added ability to view MLI time series logs while MLI time series experiment is running
PDF version of the Automatic Report (AutoDoc) is now replaced by a Word version
Various bug fixes (GLM accuracy, UI slowness, MLI UI, AutoVis)

Version 1.4.2 (Dec 3, 2018)

Available here

Support for IBM Power architecture
Speed up training and reduce size of final pipeline
Reduced resource utilization during training of final pipeline
Display test set metrics (ROC, ROCPR, Gains, Lift) in GUI in addition to validation metrics (if test set provided)
Show location of best threshold for Accuracy, MCC and F1 in ROC curves
Add relative point sizing for scatter plots in AutoVis
Fix file upload and add model checkpointing in Python client API
Various bug fixes

Version 1.4.1 (Nov 11, 2018)

Available here

Improved integration of MLI for time-series
Reduced disk and memory usage during final ensemble
Allow scoring and transformations on previously imported datasets
Enable checkpoint restart for unfinished models
Add startup checks for OpenCL platforms for LightGBM on GPUs
Improved feature importances for ensembles
Faster dataset statistics for date/datetime columns
Faster MOJO batch scoring
Fix potential hangs
Fix ‘not in list’ error in MOJO
Fix NullPointerException in MLI
Fix outlier detection in AutoVis
Various bug fixes

Version 1.4.0 (Oct 27, 2018)

Available here

Enable LightGBM by default (now with MOJO)
LightGBM tuned for GBM decision trees, Random Forest (rf), and Dropouts meet Multiple Additive Regression Trees (dart)
Add ‘isHoliday’ feature for time columns
Add ‘time’ column type for date/datetime columns in data preview
Add support for binary datatable file ingest in .jay format
Improved final ensemble (each model has its own feature pipeline)
Automatic smart checkpointing (feature brain) from prior experiments
Add kdb+ connector
Feature selection of original columns for data with many columns to handle >>100 columns
Improved time-series recipe (multiple validation splits, better logic)
Improved performance of AutoVis
Improved date detection logic (now detects %Y%m%d and %Y-%m date formats)
Automatic fallback to CPU mode if GPU runs out of memory (for XGBoost, GLM and LightGBM)
No longer require header for validation and testing datasets if data types match
No longer include text columns for data shift detection
Add support for time-series models in MLI (including ability to select time-series groups)
Add ability to download MLI logs from MLI experiment page (includes both Python and Java logs)
Add ability to view MLI logs while MLI experiment is running (Python and Java logs)
Add ability to download LIME and Shapley reason codes from MLI page
Add ability to run MLI on transformed features
Display all variables for MLI variable importance for both DAI and surrogate models in MLI summary
Include variable definitions for DAI variable importance list in MLI summary
Fix Gains/Lift charts when observations weights are given
Various bug fixes

Version 1.3.1 (Sep 12, 2018)

Available here

Fix ‘Broken pipe’ failures for TensorFlow models
Fix time-series problems with categorical features and interpretability >= 8
Various bug fixes

Version 1.3.0 (Sep 4, 2018)

Available here

Added LightGBM models - now have [XGBoost, LightGBM, GLM, TensorFlow, RuleFit]
Added TensorFlow NLP recipe based on CNN Deeplearning models (sentiment analysis, document classification, etc.)
Added MOJO for GLM
Added detailed confusion matrix statistics
Added more expert settings
Improved data exploration (columnar statistics and row-based data preview)
Improved speed of feature evolution stage
Improved speed of GLM
Report single-pass score on external validation and test data (instead of bootstrap mean)
Reduced memory overhead for data processing
Reduced number of open files - fixes ‘Bad file descriptor’ error on Mac/Docker
Simplified Python client API
Query any data point in the MLI UI from the original dataset due to “on-demand” reason code generation
Enhanced k-means clustering in k-LIME by only using a subset of features. See The k-LIME Technique for more information.
Report k-means centers for k-LIME in MLI summary for better cluster interpretation
Improved MLI experiment listing details
Various bug fixes

Version 1.2.2 (July 5, 2018)

Available here

MOJO Java scoring pipeline for time-series problems
Multi-class confusion matrices
AUCMACRO Scorer: Multi-class AUC via macro-averaging (in addition to the default micro-averaging)
Expert settings (configuration override) for each experiment from GUI and client APIs.
Support for HTTPS
Improved downsampling logic for time-series problems (if enabled through accuracy knob settings)
LDAP readonly access to Active Directory
Snowflake data connector
Various bug fixes

Version 1.2.1 (June 26, 2018)

Added LIME-SUP (alpha) to MLI as alternative to k-LIME (local regions are defined by decision tree instead of k-means)
Added RuleFit model (alpha), now have [GBM, GLM, TensorFlow, RuleFit] - TensorFlow and RuleFit are disabled by default
Added Minio (private cloud storage) connector
Added support for importing folders from S3
Added ‘Upload File’ option to ‘Add Dataset’ (in addition to drag & drop)
Predictions for binary classification problems now have 2 columns (probabilities per class), for consistency with multi-class
Improved model parameter tuning
Improved feature engineering for time-series problems
Improved speed of MOJO generation and loading
Improved speed of time-series related automatic calculations in the GUI
Fixed potential rare hangs at end of experiment
No longer require internet to run MLI
Various bug fixes

Version 1.2.0 (June 11, 2018)

Time-Series recipe
Low-latency standalone MOJO Java scoring pipelines (now beta)
Enable Elastic Net Generalized Linear Modeling (GLM) with lambda search (and GPU support), for interpretability>=6 and accuracy<=5 by default (alpha)
Enable TensorFlow (TF) Deep Learning models (with GPU support) for interpretability=1 and/or multi-class models (alpha, enable via config.toml)
Support for pre-tuning of [GBM, GLM, TF] models for picking best feature evolution model parameters
Support for final ensemble consisting of mix of [GBM, GLM, TF] models
Automatic Report (AutoDoc) in PDF and Markdown format as part of summary zip file
Interactive tour (assistant) for first-time users
MLI now runs on experiments from previous releases
Surrogate models in MLI now use 3 folds by default
Improved small data recipe with up to 10 cross-validation folds
Improved accuracy for binary classification with imbalanced data
Additional time-series transformers for interactions and aggreations between lags and lagging of non-target columns
Faster creation of MOJOs
Progress report during data ingest
Normalize binarized multi-class confusion matrices by class count (global scaling factor)
Improved parsing of boolean environment variables for configuration
Various bug fixes

Version 1.1.6 (May 29, 2018)

Improved performance for large datasets
Improved speed and user interface for MLI
Improved accuracy for binary classification with imbalanced data
Improved generalization estimate for experiments with given validation data
Reduced size of experiment directories
Support for Parquet files
Support for bzip2 compressed files
Added Data preview in UI: ‘Describe’
No longer add ID column to holdout and test set predictions for simplicity
Various bug fixes

Version 1.1.4 (May 17, 2018)

Native builds (RPM/DEB) for 1.1.3

Version 1.1.3 (May 16, 2018)

Faster speed for systems with large CPU core counts
Faster and more robust handling of user-specified missing values for training and scoring
Same validation scheme for feature engineering and final ensemble for high enough accuracy
MOJO scoring pipeline for text transformers
Fixed single-row scoring in Python scoring pipeline (broken in 1.1.2)
Fixed default scorer when experiment is started too quickly
Improved responsiveness for time-series GUI
Improved responsiveness after experiment abort
Improved load balancing of memory usage for multi-GPU XGBoost
Improved UI for selection of columns to drop
Various bug fixes

Version 1.1.2 (May 8, 2018)

Support for automatic time-series recipe (alpha)
Now using Generalized Linear Model (GLM) instead of XGBoost (GBM) for interpretability 10
Added experiment preview with runtime and memory usage estimation
Added MER scorer (Median Error Rate, Median Abs. Percentage Error)
Added ability to use integer column as time column
Speed up type enforcement during scoring
Support for reading ARFF file format (alpha)
Quantile Binning for MLI
Various bug fixes

Version 1.1.1 (April 23, 2018)

Support string columns larger than 2GB

Version 1.1.0 (April 19, 2018)

AWS/Azure integration (hourly cloud usage)
Bug fixes for MOJO pipeline scoring (now beta)
Google Cloud storage and BigQuery (alpha)
Speed up categorical column stats computation during data import
Further improved memory management on GPUs
Improved accuracy for MAE scorer
Ability to build scoring pipelines on demand (if not enabled by default)
Additional target transformer for regression problems sqrt(sqrt(x))
Add GLM models as candidates for interpretability=10 (alpha, disabled by default)
Improved performance of native builds (RPM/DEB)
Improved estimation of error bars
Various bug fixes

Version 1.0.30 (April 5, 2018)

Speed up MOJO pipeline creation and disable MOJO by default (still alpha)
Improved memory management on GPUs
Support for optional 32-bit floating-point precision for reduced memory footprint
Added logging of test set scoring and data transformations
Various bug fixes

Version 1.0.29 (April 4, 2018)

If MOJO fails to build, no MOJO will be available, but experiment can still succeed

Version 1.0.28 (April 3, 2018)

(Non-docker) RPM installers for RHEL7/CentOS7/SLES 12 with systemd support

Version 1.0.27 (March 31, 2018)

MOJO scoring pipeline for Java standalone cross-platform low-latency scoring (alpha)
Various bug fixes

Version 1.0.26 (March 28, 2018)

Improved performance and reduced memory usage for large datasets
Improved performance for F0.5, F2 and accuracy
Improved performance of MLI
Distribution shift detection now also between validation and test data
Batch scoring example using datatable
Various enhancements for AutoVis (outliers, parallel coordinates, log file)
Various bug fixes

Version 1.0.25 (March 22, 2018)

New scorers for binary/multinomial classification: F0.5, F2 and accuracy
Precision-recall curve for binary/multinomial classification models
Plot of actual vs predicted values for regression problems
Support for excluding feature transformations by operation type
Support for reading binary file formats: datatable and Feather
Improved multi-GPU memory load balancing
Improved display of initial tuning results
Reduced memory usage during creation of final model
Fixed several bugs in creation of final scoring pipeline
Various UI improvements (e.g., zooming on iteration scoreboard)
Various bug fixes

Version 1.0.24 (March 8, 2018)

Fix test set scoring bug for data with an ID column (introduced in 1.0.23)
Allow renaming of MLI experiments
Ability to limit maximum number of cores used for datatable
Print validation scores and error bars across final ensemble model CV folds in logs
Various UI improvements
Various bug fixes

Version 1.0.23 (March 7, 2018)

Support for Gains and Lift curves for binomial and multinomial classification
Support for multi-GPU single-model training for large datasets
Improved recipes for large datasets (faster and less memory/disk usage)
Improved recipes for text features
Increased sensitivity of interpretability setting for feature engineering complexity
Disable automatic time column detection by default to avoid confusion
Automatic column type conversion for test and validation data, and during scoring
Improved speed of MLI
Improved feature importances for MLI on transformed features
Added ability to download each MLI plot as a PNG file
Added support for dropped columns and weight column to MLI stand-alone page
Fix serialization of bytes objects larger than 4 GiB
Fix failure to build scoring pipeline with ‘command not found’ error
Various UI improvements
Various bug fixes

Version 1.0.22 (Feb 23, 2018)

Fix CPU-only mode
Improved robustness of datatable CSV parser

Version 1.0.21 (Feb 21, 2018)

Fix MLI GUI scaling issue on Mac
Work-around segfault in truncated SVD scipy backend
Various bug fixes

Version 1.0.20 (Feb 17, 2018)

HDFS/S3/Excel data connectors
LDAP/PAM/Kerberos authentication
Automatic setting of default values for accuracy / time / interpretability
Interpretability: per-observation and per-feature (signed) contributions to predicted values in scoring pipeline
Interpretability setting now affects feature engineering complexity and final model complexity
Standalone MLI scoring pipeline for Python
Time setting of 1 now runs for only 1 iteration
Early stopping of experiments if convergence is detected
ROC curve display for binomial and multinomial classification, with confusion matrices and threshold/F1/MCC display
Training/Validation/Test data shift detectors
Added AUCPR scorer for multinomial classification
Improved handling of imbalanced binary classification problems
Configuration file for runtime limits such as cores/memory/harddrive (for admins)
Various GUI improvements (ability to rename experiments, re-run experiments, logs)
Various bug fixes

Version 1.0.19 (Jan 28, 2018)

Fix hang during final ensemble (accuracy >= 5) for larger datasets
Allow scoring of all models built in older versions (>= 1.0.13) in GUI
More detailed progress messages in the GUI during experiments
Fix scoring pipeline to only use relative paths
Error bars in model summary are now +/- 1*stddev (instead of 2*stddev)
Added RMSPE scorer (RMS Percentage Error)
Added SMAPE scorer (Symmetric Mean Abs. Percentage Error)
Added AUCPR scorer (Area under Precision-Recall Curve)
Gracefully handle inf/-inf in data
Various UI improvements
Various bug fixes

Version 1.0.18 (Jan 24, 2018)

Fix migration from version 1.0.15 and earlier
Confirmation dialog for experiment abort and data/experiment deletion
Various UI improvements
Various AutoVis improvements
Various bug fixes

Version 1.0.17 (Jan 23, 2018)

Fix migration from version 1.0.15 and earlier (partial, for experiments only)
Added model summary download from GUI
Restructured and renamed logs archive, and add model summary to it
Fix regression in AutoVis in 1.0.16 that led to slowdown
Various bug fixes

Version 1.0.16 (Jan 22, 2018)

Added support for validation dataset (optional, instead of internal validation on training data)
Standard deviation estimates for model scores (+/- 1 std.dev.)
Computation of all applicable scores for final models (in logs only for now)
Standard deviation estimates for MLI reason codes (+/- 1 std.dev.) when running in stand-alone mode
Added ability to abort MLI job
Improved final ensemble performance
Improved outlier visualization
Updated H2O-3 to version 3.16.0.4
More readable experiment names
Various speedups
Various bug fixes

Version 1.0.15 (Jan 11, 2018)

Fix truncated per-experiment log file
Various bug fixes

Version 1.0.14 (Jan 11, 2018)

Improved performance

Version 1.0.13 (Jan 10, 2018)

Improved estimate of generalization performance for final ensemble by removing leakage from target encoding
Added API for re-fitting and applying feature engineering on new (potentially larger) data
Remove access to pre-transformed datasets to avoid unintended leakage issues downstream
Added mean absolute percentage error (MAPE) scorer
Enforce monotonicity constraints for binary classification and regression models if interpretability >= 6
Use squared Pearson correlation for R^2 metric (instead of coefficient of determination) to avoid negative values
Separated HTTP and TCP scoring pipeline examples
Reduced size of h2oai_client wheel
No longer require weight column for test data if it was provided for training data
Improved accuracy of final modeling pipeline
Include H2O-3 logs in downloadable logs.zip
Updated H2O-3 to version 3.16.0.2
Various bug fixes

Version 1.0.11 (Dec 12, 2017)

Faster multi-GPU training, especially for small data
Increase default amount of exploration of genetic algorithm for systems with fewer than 4 GPUs
Improved accuracy of generalization performance estimate for models on small data (< 100k rows)
Faster abort of experiment
Improved final ensemble meta-learner
More robust date parsing
Various bug fixes

Version 1.0.10 (Dec 4, 2017)

Tool tips and link to documentation in parameter settings screen
Faster training for multi-class problems with > 5 classes
Experiment summary displayed in GUI after experiment finishes
Python Client Library downloadable from the GUI
Speedup for Maxwell-based GPUs
Support for multinomial AUC and Gini scorers
Add MCC and F1 scorers for binomial and multinomial problems
Faster abort of experiment
Various bug fixes

Version 1.0.9 (Nov 29, 2017)

Support for time column for causal train/validation splits in time-series datasets
Automatic detection of the time column from temporal correlations in data
MLI improvements, dedicated page, selection of datasets and models
Improved final ensemble meta-learner
Test set score now displayed in experiment listing
Original response is preserved in exported datasets
Various bug fixes

Version 1.0.8 (Nov 21, 2017)

Various bug fixes

Version 1.0.7 (Nov 17, 2017)

Sharing of GPUs between experiments - can run multiple experiments at the same time while sharing GPU resources
Persistence of experiments and data - can stop and restart the application without loss of data
Support for weight column for optional user-specified per-row observation weights
Support for fold column for user-specified grouping of rows in train/validation splits
Higher accuracy through model tuning
Faster training - overall improvements and optimization in model training speed
Separate log file for each experiment
Ability to delete experiments and datasets from the GUI
Improved accuracy for regression tasks with very large response values
Faster test set scoring - Significant improvements in test set scoring in the GUI
Various bug fixes

Version 1.0.5 (Oct 24, 2017)

Only display scorers that are allowed
Various bug fixes

Version 1.0.4 (Oct 19, 2017)

Improved automatic type detection logic
Improved final ensemble accuracy
Various bug fixes

Version 1.0.3 (Oct 9, 2017)

Various speedups
Results are now reproducible
Various bug fixes

Version 1.0.2 (Oct 5, 2017)

Improved final ensemble accuracy
Weight of Evidence features added
Various bug fixes

Version 1.0.1 (Oct 4, 2017)

Improved speed of final ensemble
Various bug fixes

Version 1.0.0 (Sep 24, 2017)

Initial stable release.