A Constant Model predicts the same constant value for any input data. The constant value is computed by optimizing the given scorer. For example, for MSE/RMSE, the constant is the (weighted) mean of the target column. For MAE, it is the (weighted) median. For other scorers like MAPE or custom scorers, the constant is found with an optimization process. For classification problems, the constant probabilities are the observed priors.
A constant model is meant as a baseline reference model. If it ends up being used in the final pipeline, a warning will be issued because that would indicate a problem in the dataset or target column (e.g., when trying to predict a random outcome).
A Decision Tree is a single (binary) tree model that splits the training data population into sub-groups (leaf nodes) with similar outcomes. No row or column sampling is performed, and the tree depth and method of growth (depth-wise or loss-guided) is controlled by hyper-parameters.
Follow the Regularized Leader (FTRL) is a DataTable implementation  of the FTRL-Proximal online learning algorithm proposed in . This implementation uses a hashing trick and Hogwild approach  for parallelization. FTRL supports binomial and multinomial classification for categorical targets, as well as regression for continuous targets.
Generalized Linear Models (GLM) estimate regression models for outcomes following exponential distributions. GLMs are an extension of traditional linear models. They have gained popularity in statistical data analysis due to:
the flexibility of the model structure unifying the typical regression methods (such as linear regression and logistic regression for binary classification)
the recent availability of model-fitting software
the ability to scale well with large datasets
Driverless AI uses the XGBoost GLM implementation (booster=gblinear) for modeling. This GLM is subject to early stopping.
Isolation Forest is useful for identifying anomalies or outliers in data. Isolation Forest isolates observations by randomly selecting a feature and then randomly selecting a split value between the maximum and minimum values of that selected feature. This split depends on how long it takes to separate the points. Random partitioning produces noticeably shorter paths for anomalies. When a forest of random trees collectively produces shorter path lengths for particular samples, they are highly likely to be anomalies.
LightGBM is a gradient boosting framework developed by Microsoft that uses tree based learning algorithms. It was specifically designed for lower memory usage and faster training speed and higher efficiency. Similar to XGBoost, it is one of the best gradient boosting implementations available. It is also used for fitting Random Forest, DART (experimental), and Decision Tree models inside of Driverless AI.
PyTorch is an open source library used for deep learning tasks such as natural language processing and computer vision.
Driverless AI’s NLP BERT models are implemented using PyTorch, for details see NLP in Driverless AI.
PyTorch Grownet Model¶
Gradient Boosting Neural Networks or GrowNet applies gradient boosting to shallow neural networks. The GrowNet is an ensemble consisting of k neural net models. Each model is fed the original features and the predictions of the previous model. The predictions of all the models are summed to produce a final output. Every model can be as simple as having only one hidden layer. As per the paper, GrowNet is easy to tune and requires less computational cost and time to train, than deep neural networks and yet seems to outperform deep neural networks in regression, classification, and ranking on multiple datasets.
Random Forest averages multiple deep decision trees on different parts of the same training data.
The RuleFit  algorithm creates an optimal set of decision rules by first fitting a tree model, and then fitting a Lasso (L1-regularized) GLM model to create a linear model consisting of the most important tree leaves (rules).
MOJOs are not currently available for RuleFit models.
TensorFlow is an open source software library for performing high performance numerical computation. Driverless AI includes TensorFlow NLP recipes based on CNN ad BiGRU (RNN) Deeplearning models and Tensorflow Imagenet models for image data.
A TensorFlow model is a fully connected neural network with a few hidden layers (that is, a multilayer perceptron). It has a few tuning parameters that can add wide and deep or attention.
TensorFlow is considered a model like XGB, LGBM, or GLM. In many cases, it may not perform as well as the aforementioned models, but it can be useful for ensembles and multiclass as well as for small data recipes since there are many folds / repeats and models involved.
Only C++ MOJOs are currently available for TensorFlow models.
XGBoost is a supervised learning algorithm that implements a process called boosting to yield accurate models. Boosting refers to the ensemble learning technique of building many models sequentially, with each new model attempting to correct for the deficiencies in the previous model. In tree boosting, each new model that is added to the ensemble is a decision tree. XGBoost provides parallel tree boosting (also known as GBDT, GBM) that solves many data science problems in a fast and accurate way. For many problems, XGBoost is one of the best gradient boosting machine (GBM) frameworks today.
Driverless AI supports XGBoost GBM and XGBoost DART models.
Zero-inflated models fit the data with excess zero counts in the target variable for example in insurance claim use case. In Driverless AI, this model trains a classifier that attempts to classify zero and non-zero values. It then trains a regression model that attempts to predict the non-zero values. The classifier predictions are multiplied by the regression predictions to determine the final output.
Driverless AI supports both LightGBM and XGBoost versions of zero-inflated models.
 DataTable for Python, https://github.com/h2oai/datatable
 J. Friedman, B. Popescu. “Predictive Learning via Rule Ensembles”. 2005. http://statweb.stanford.edu/~jhf/ftp/RuleFit.pdf
 Niu, Feng, et al. “Hogwild: A lock-free approach to parallelizing stochastic gradient descent.” Advances in neural information processing systems. 2011. https://people.eecs.berkeley.edu/~brecht/papers/hogwildTR.pdf
 McMahan, H. Brendan, et al. “Ad click prediction: a view from the trenches.” Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2013. https://research.google.com/pubs/archive/41159.pdf