Supported Algorithms¶

XGBoost¶

XGBoost is a supervised learning algorithm that implements a process called boosting to yield accurate models. Boosting refers to the ensemble learning technique of building many models sequentially, with each new model attempting to correct for the deficiencies in the previous model. In tree boosting, each new model that is added to the ensemble is a decision tree. XGBoost provides parallel tree boosting (also known as GBDT, GBM) that solves many data science problems in a fast and accurate way. For many problems, XGBoost is one of the best gradient boosting machine (GBM) frameworks today.

LightGBM¶

LightGBM is a gradient boosting framework developed by Microsoft that uses tree based learning algorithms. It was specifically designed for lower memory usage and faster training speed and higher efficiency. Similar to XGBoost, it is one of the best gradient boosting implementations available. It is also used for fitting Random Forest models inside of Driverless AI.

Note: LightGBM with GPUs is not supported on Power currently.

GLM¶

Generalized Linear Models (GLM) estimate regression models for outcomes following exponential distributions. GLMs are an extension of traditional linear models. They have gained popularity in statistical data analysis due to:

the flexibility of the model structure unifying the typical regression methods (such as linear regression and logistic regression for binary classification)
the recent availability of model-fitting software
the ability to scale well with large datasets

TensorFlow¶

TensorFlow is an open source software library for performing high performance numerical computation. Driverless AI includes a TensorFlow NLP recipe based on CNN Deeplearning models.

Note: MOJOs are currently not available for TensorFlow models.

RuleFit¶

The RuleFit [1] algorithm creates an optimal set of decision rules by first fitting a tree model, and then fitting a Lasso (L1-regularized) GLM model to create a linear model consisting of the most important tree leaves (rules).

Note: MOJOs are currently not available for RuleFit models.

FTRL¶

Follow the Regularized Leader (FTRL) is a DataTable implementation [2] of the FTRL-Proximal online learning algorithm proposed in [3]. This implementation uses a hashing trick and Hogwild approach [4] for parallelization. FTRL supports binomial and multinomial classification for categorical targets, as well as regression for continuous targets.

Note: MOJOs are currently not available for FTRL models.

References¶

[1] J. Friedman, B. Popescu. “Predictive Learning via Rule Ensembles”. 2005. http://statweb.stanford.edu/~jhf/ftp/RuleFit.pdf

[2] DataTable for Python, https://github.com/h2oai/datatable

[3] McMahan, H. Brendan, et al. “Ad click prediction: a view from the trenches.” Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2013. https://research.google.com/pubs/archive/41159.pdf

[4] Niu, Feng, et al. “Hogwild: A lock-free approach to parallelizing stochastic gradient descent.” Advances in neural information processing systems. 2011. https://people.eecs.berkeley.edu/~brecht/papers/hogwildTR.pdf