Below are some of the key features available in Driverless AI.
Flexibility of Data and Deployment¶
Driverless AI works across a variety of data sources including Hadoop HDFS, Amazon S3, and more. Driverless AI can be deployed everywhere including all clouds (Microsoft Azure, AWS, Google Cloud) and on premises on any system, but it is ideally suited for systems with GPUs, including IBM Power 9 with GPUs built in.
NVIDIA GPU Acceleration¶
Driverless AI is optimized to take advantage of GPU acceleration to achieve up to 40X speedups for automatic machine learning. It includes multi-GPU algorithms for XGBoost, GLM, K-Means, and more. GPUs allow for thousands of iterations of model features and optimizations.
Automatic Data Visualization (Autovis)¶
For datasets, Driverless AI automatically selects data plots based on the most relevant data statistics, generates visualizations, and creates data plots that are most relevant from a statistical perspective based on the most relevant data statistics. These visualizations help users get a quick understanding of their data prior to starting the model building process. They are also useful for understanding the composition of very large datasets and for seeing trends or even possible issues, such as large numbers of missing values or significant outliers that could impact modeling results. See Visualizing Datasets for more information.
Automatic Feature Engineering¶
Feature engineering is the secret weapon that advanced data scientists use to extract the most accurate results from algorithms. H2O Driverless AI employs a library of algorithms and feature transformations to automatically engineer new, high value features for a given dataset. (See Driverless AI Transformations for more information.) Included in the interface is an easy-to-read variable importance chart that shows the significance of original and newly engineered features.
Automatic Model Documentation¶
To explain models to business users and regulators, data scientists and data engineers must document the data, algorithms, and processes used to create machine learning models. Driverless AI provides an Autoreport (Autodoc) for each experiment, relieving the user from the time-consuming task of documenting and summarizing their workflow used when building machine learning models. The Autoreport includes details about the data used, the validation schema selected, model and feature tuning, and the final model created. With this capability in Driverless AI, practitioners can focus more on drawing actionable insights from the models and save weeks or even months in development, validation, and deployment process.
Driverless AI also provides a number of
autodoc_ configuration options, giving users even more control over output of the Autoreport. (Refer to the Sample Config.toml File topic for information about these configuration options.)
Click here to download and view a sample experiment report in Word format.
Time Series Forecasting¶
Time series forecasting is one of the biggest challenges for data scientists. These models address key use cases, including demand forecasting, infrastructure monitoring, and predictive maintenance. Driverless AI delivers superior time series capabilities to optimize for almost any prediction time window. Driverless AI incorporates data from numerous predictors, handles structured character data and high-cardinality categorical variables, and handles gaps in time series data and other missing values. See Time Series in Driverless AI for more information.
NLP with TensorFlow¶
Text data can contain critical information to inform better predictions. Driverless AI automatically converts short text strings into features using powerful techniques like TFIDF. With TensorFlow, Driverless AI can also process larger text blocks and build models using all available data to solve business problems like sentiment analysis, document classification, and content tagging. See NLP in Driverless AI for more information.
Automatic Scoring Pipelines¶
For completed experiments, Driverless AI automatically generates both Python scoring pipelines and new ultra-low latency automatic scoring pipelines. The new automatic scoring pipeline is a unique technology that deploys all feature engineering and the winning machine learning model in a highly optimized, low-latency, production-ready Java code that can be deployed anywhere. See Scoring Pipelines Overview for more information.
Machine Learning Interpretability (MLI)¶
Driverless AI provides robust interpretability of machine learning models to explain modeling results in a human-readable format. In the MLI view, Driverless AI employs a host of different techniques and methodologies for interpreting and explaining the results of its models. A number of charts are generated automatically (depending on experiment type), including K-LIME, Shapley, Variable Importance, Decision Tree Surrogate, Partial Dependence, Individual Conditional Expectation, Sensitivity Analysis, NLP Tokens, NLP LOCO, and more. Additionally, you can download a CSV of LIME and Shapley reasons codes from this view. See MLI Overview for more information.
Automatic Reason Codes¶
In regulated industries, an explanation is often required for significant decisions relating to customers (for example, credit denial). Reason codes show the key positive and negative factors in a model’s scoring decision in a simple language. Reasons codes are also useful in other industries, such as healthcare, because they can provide insights into model decisions that can drive additional testing or investigation.
Custom Recipe Support¶
Driverless AI allows you to import custom recipes for MLI algorithms, feature engineering (transformers), scorers, and configuration. You can use your custom recipes in combination with or instead of all built-in recipes. This allows you to have greater influence over the Driverless AI Automatic ML pipeline and gives you control over the optimization choices that Driverless AI makes. See Appendix A: Custom Recipes for more information.