h2o4gpu.util package

h2o4gpu.util.metrics module

copyright:2017-2018 H2O.ai, Inc.
license:Apache License Version 2.0 (see LICENSE for details)
h2o4gpu.util.metrics.acc_opt(actual, predicted, sample_weight=None)[source]

Computes the Accuracy after optimal predictions thresholding.

This function maximizes the Accuracy by means of optimal predictions thresholding.

Parameters:
  • actual – numpy array The ground truth value
  • predicted – numpy array The predicted value
  • sample_weight – numpy array or None sample weights
Returns:

double The optimal Accuracy

h2o4gpu.util.metrics.ae(actual, predicted)[source]

Computes the absolute error.

This function computes the absolute error between two numbers, or for element between a pair of lists or numpy arrays.

Parameters:
  • actual – int, float, list of numbers, numpy array The ground truth value
  • predicted – same type as actual The predicted value
Returns:

double or list of doubles The absolute error between actual and predicted

h2o4gpu.util.metrics.auc(actual, posterior)[source]

Computes the area under the receiver-operater characteristic (AUC)

This function computes the AUC error metric for binary classification.

Parameters:
  • actual – list of binary numbers, numpy array The ground truth value
  • posterior – same type as actual Defines a ranking on the binary numbers, from most likely to be positive to least likely to be positive.
Returns:

double The AUC between actual and posterior

h2o4gpu.util.metrics.ce(actual, predicted)[source]

Computes the classification error.

This function computes the classification error between two lists

Parameters:
  • actual – int, float, list of numbers, numpy array The ground truth value
  • predicted – same type as actual The predicted value
Returns:

double The classification error between actual and predicted

h2o4gpu.util.metrics.confusion_matrices(actual, predicted, sample_weight=None)[source]

Computes confusion matrices for ROC analysis.

This function cumputes confusion matrices for all possible prediction thresholds.

Parameters:
  • actual – numpy array The ground truth value
  • predicted – numpy array The predicted value
  • sample_weight – numpy array or None sample weights
Returns:

pandas DataFrame Confusion matrices for each unique predicted value as threshold

h2o4gpu.util.metrics.f05_opt(actual, predicted, sample_weight=None)[source]

Computes the F0.5-Score after optimal predictions thresholding.

This function maximizes the F0.5-Score by means of optimal predictions thresholding.

Parameters:
  • actual – numpy array The ground truth value
  • predicted – numpy array The predicted value
  • sample_weight – numpy array or None sample weights
Returns:

double The optimal F0.5-Score

h2o4gpu.util.metrics.f1_opt(actual, predicted, sample_weight=None)[source]

Computes the F1-Score after optimal predictions thresholding.

This function maximizes the F1-Score by means of optimal predictions thresholding.

Parameters:
  • actual – numpy array The ground truth value
  • predicted – numpy array The predicted value
  • sample_weight – numpy array or None sample weights
Returns:

double The optimal F1-Score

h2o4gpu.util.metrics.f2_opt(actual, predicted, sample_weight=None)[source]

Computes the F2-Score after optimal predictions thresholding.

This function maximizes the F2-Score by means of optimal predictions thresholding.

Parameters:
  • actual – numpy array The ground truth value
  • predicted – numpy array The predicted value
  • sample_weight – numpy array or None sample weights
Returns:

double The optimal F2-Score

h2o4gpu.util.metrics.ll(actual, predicted)[source]

Computes the log likelihood.

This function computes the log likelihood between two numbers, or for element between a pair of lists or numpy arrays.

Parameters:
  • actual – int, float, list of numbers, numpy array The ground truth value
  • predicted – same type as actual The predicted value
Returns:

double or list of doubles The log likelihood error between actual and predicted

h2o4gpu.util.metrics.log_loss(actual, predicted)[source]

Computes the log loss.

This function computes the log loss between two lists of numbers.

Parameters:
  • actual – int, float, list of numbers, numpy array The ground truth value
  • predicted – same type as actual The predicted value
Returns:

double The log loss between actual and predicted

h2o4gpu.util.metrics.mae(actual, predicted)[source]

Computes the mean absolute error.

This function computes the mean absolute error between two lists of numbers.

Parameters:
  • actual – int, float, list of numbers, numpy array The ground truth value
  • predicted – same type as actual The predicted value
Returns:

double The mean absolute error between actual and predicted

h2o4gpu.util.metrics.mcc_opt(actual, predicted, sample_weight=None)[source]

Computes the MCC after optimal predictions thresholding.

This function maximizes the Matthews Correlation Coefficient (MCC) by means of optimal predictions thresholding.

Parameters:
  • actual – numpy array The ground truth value
  • predicted – numpy array The predicted value
  • sample_weight – numpy array or None sample weights
Returns:

double The optimal MCC

h2o4gpu.util.metrics.mse(actual, predicted)[source]

Computes the mean squared error.

This function computes the mean squared error between two lists of numbers.

Parameters:
  • actual – int, float, list of numbers, numpy array The ground truth value
  • predicted – same type as actual The predicted value
Returns:

double The mean squared error between actual and predicted

h2o4gpu.util.metrics.msle(actual, predicted)[source]

Computes the mean squared log error.

This function computes the mean squared log error between two lists of numbers.

Parameters:
  • actual – int, float, list of numbers, numpy array The ground truth value
  • predicted – same type as actual The predicted value
Returns:

double The mean squared log error between actual and predicted

h2o4gpu.util.metrics.rmse(actual, predicted)[source]

Computes the root mean squared error.

This function computes the root mean squared error between two lists of numbers.

Parameters:
  • actual – int, float, list of numbers, numpy array The ground truth value
  • predicted – same type as actual The predicted value
Returns:

double The root mean squared error between actual and predicted

h2o4gpu.util.metrics.rmsle(actual, predicted)[source]

Computes the root mean squared log error.

This function computes the root mean squared log error between two lists of numbers.

Parameters:
  • actual – int, float, list of numbers, numpy array The ground truth value
  • predicted – same type as actual The predicted value
Returns:

double The root mean squared log error between actual and predicted

h2o4gpu.util.metrics.se(actual, predicted)[source]

Computes the squared error.

This function computes the squared error between two numbers, or for element between a pair of lists or numpy arrays.

Parameters:
  • actual – int, float, list of numbers, numpy array The ground truth value
  • predicted – same type as actual The predicted value
Returns:

double or list of doubles The squared error between actual and predicted

h2o4gpu.util.metrics.sle(actual, predicted)[source]

Computes the squared log error.

This function computes the squared log error between two numbers, or for element between a pair of lists or numpy arrays.

Parameters:
  • actual – int, float, list of numbers, numpy array The ground truth value
  • predicted – same type as actual The predicted value
Returns:

double or list of doubles The squared log error between actual and predicted

h2o4gpu.util.metrics.tied_rank(x)[source]

Computes the tied rank of elements in x.

This function computes the tied rank of elements in x.

Parameters:x – list of numbers, numpy array
Returns:list of numbers The tied rank f each element in x

h2o4gpu.util.gpu module

copyright:2017-2018 H2O.ai, Inc.
license:Apache License Version 2.0 (see LICENSE for details)
h2o4gpu.util.gpu.cuda_vis_check(total_gpus)[source]

Helper function to count GPUs by environment variable

h2o4gpu.util.gpu.cudaresetdevice(gpu_id, n_gpus)[source]

Resets the cuda device so any next cuda call will reset the cuda context.

Parameters:
  • gpuU_id – int device number of GPU (to start with if n_gpus>1)
  • n_gpus – int, optional, default : 0 If < 0 then apply to all available GPUs If >= 0 then apply to that number of GPUs
h2o4gpu.util.gpu.cudaresetdevice_bare(n_gpus)[source]

Resets the cuda device so any next cuda call will reset the cuda context.

h2o4gpu.util.gpu.device_count(n_gpus=0)[source]

Tries to return the number of available GPUs on this machine.

Parameters:n_gpus – int, optional, default : 0 If < 0 then return all available GPUs If >= 0 then return n_gpus or as many as possible
Returns:Adjusted n_gpus and all available devices
h2o4gpu.util.gpu.get_compute_capability(gpu_id)[source]

Get compute capability for all gpus

h2o4gpu.util.gpu.get_compute_capability_orig(gpu_id)[source]
Gets the major cuda version, minor cuda version,
and ratio of floating point single perf to double perf.
Parameters:gpuU_id – int device number of GPU
h2o4gpu.util.gpu.get_compute_capability_subprocess(gpu_id)[source]
Gets the major cuda version, minor cuda version,
and ratio of floating point single perf to double perf.
Parameters:gpuU_id – int device number of GPU
h2o4gpu.util.gpu.get_gpu_info(return_usage=False, trials=2, timeout=30, print_trials=False)[source]

Gets the GPU info.

This runs in a sub-process to avoid mixing parent-child CUDA contexts. # get GPU info, but do in sub-process # to avoid mixing parent-child cuda contexts # https://stackoverflow.com/questions/22950047/cuda-initialization-error-after-fork # Tries “trials” times to get result # If fails to get result within “timeout” seconds each trial, # then returns as if no GPU

Returns:Total number of GPUs and total available memory
h2o4gpu.util.gpu.get_gpu_info_c(return_memory=False, return_name=False, return_usage=False, return_free_memory=False, return_capability=False, return_memory_by_pid=False, return_usage_by_pid=False, return_all=False, verbose=0)[source]

Gets the GPU info from C call

Returns:
Total number of GPUs and total available memory
(and optionally GPU usage)
h2o4gpu.util.gpu.get_gpu_info_subprocess(return_usage=False)[source]

Gets the GPU info in a subprocess

Returns:
Total number of GPUs and total available memory
(and optionally GPU usage)

h2o4gpu.util.import_data module

copyright:2017-2018 H2O.ai, Inc.
license:Apache License Version 2.0 (see LICENSE for details)
h2o4gpu.util.import_data.import_data(data_path, use_pandas=False, intercept=True, valid_fraction=0.2, classification=True)[source]

Import Data for H2O GPU Edition

This function will read in data and prepare it for H2O4GPU’s GLM solver.

Note, the data is assumed to be all numeric,i.e., categoricals are one hot encoded, etc.

Parameters:
  • data_path – str A path to a dataset (The dataset needs to be all numeric)
  • use_pandas – bool Indicate if Pandas should be used to parse
  • intercept – bool Indicate if intercept term is needed
  • valid_fraction – float Percentage of dataset reserved for a validation set
  • classification – bool Classification problem?
:returns
If valid_fraction > 0 it will return the following:
train_x: numpy array of train input variables train_y: numpy array of y variable valid_x: numpy array of valid input variables valid_y: numpy array of valid y variable family : string that would either be “logistic” if classification is set to True, otherwise “elasticnet”
If valid_fraction == 0 it will return the following:
train_x: numpy array of train input variables train_y: numpy array of y variable family : string that would either be “logistic” if classification is set to True,otherwise “elasticnet”