3.46.0.7

Recent Changes

  • Change Log
  • H2O-3 Blogs

User Guide

  • Welcome to H2O-3
  • API-related changes
  • Quick Start Videos
  • Cloud Integration
  • Downloading and installing H2O-3
  • Starting H2O-3
  • H2O-3 clients
  • Getting data into your H2O-3 cluster
  • Data manipulation
  • Algorithms
  • Training Models
  • Cross-Validation
  • Variable Importance
  • Grid (Hyperparameter) Search
  • Checkpointing Models
  • Performance and Prediction
  • H2O AutoML: Automatic Machine Learning
  • Model Explainability
  • Admissible Machine Learning
  • Saving, Loading, Downloading, and Uploading Models
  • Productionizing H2O
  • MOJO Capabilities
  • Using Flow - H2O’s Web UI
  • Downloading Logs
  • H2O Architecture
  • Security
  • FAQ
  • Glossary
  • Appendix A - Parameters
    • alpha
    • auc_type
    • auuc_nbins
    • auuc_type
    • balance_classes
    • base_models
    • beta_constraints
    • beta_epsilon
    • binomial_double_trees
    • blending
    • blending_frame
    • build_tree_one_node
    • calibrate_model
    • calibration_frame
    • categorical_encoding
    • check_constant_response
    • checkpoint
    • class_sampling_factors
    • cluster_size_constraints
    • col_sample_rate
    • col_sample_rate_change_per_level
    • col_sample_rate_per_tree
    • compute_metrics
    • compute_p_values
    • custom_distribution_func
    • custom_metric_func
    • data_leakage_handling
    • distribution
    • early_stopping
    • eps_prob
    • eps_sdev
    • estimate_k
    • exclude_algos
    • export_checkpoints_dir
    • extension_level
    • family
    • fold_assignment
    • fold_column
    • gainslift_bins
    • gradient_epsilon
    • histogram_type
    • huber_alpha
    • ignore_const_cols
    • ignored_columns
    • impute_missing
    • in_training_checkpoints_dir
    • in_training_checkpoints_tree_interval
    • include_algos
    • inflection_point
    • init (GLRM, K-Means)
    • init (CoxPH)
    • interaction_constraints
    • interaction_pairs
    • interactions
    • intercept
    • k
    • keep_cross_validation_fold_assignment
    • keep_cross_validation_models
    • keep_cross_validation_predictions
    • lambda
    • lambda_min_ratio
    • lambda_search
      • Description
      • Related Parameters
      • Example
    • laplace
    • learn_rate
    • learn_rate_annealing
    • link
    • lre_min
    • max_abs_leafnode_pred
    • max_active_predictors
    • max_after_balance_size
    • max_depth
    • max_iterations
    • max_models
    • max_runtime_secs
    • max_runtime_secs_per_model
    • metalearner_algorithm
    • metalearner_params
    • metalearner_transform
    • min_prob
    • min_rows
    • min_sdev
    • min_split_improvement
    • missing_values_handling
    • model_id
    • monotone_constraints
    • mtries
    • nbins
    • nbins_cats
    • nbins_top_level
    • nfolds
    • nlambdas
    • noise
    • non_negative
    • ntrees
    • objective_epsilon
    • offset_column
    • out_of_bounds
    • pca_impl
    • pca_method
    • plug_values
    • pred_noise_bandwidth
    • prior
    • quantile_alpha
    • rand_family
    • random_columns
    • rate
    • rate_annealing
    • rate_decay
    • remove_collinear_columns
    • sample_rate
    • sample_rate_per_class
    • sample_size
    • score_each_iteration
    • score_tree_interval
    • seed
    • single_node_mode
    • smoothing
    • solver
    • sort_metric
    • standardize
    • start_column
    • stop_column
    • stopping_metric
    • stopping_rounds
    • stopping_tolerance
    • stratify_by
    • theta
    • ties
    • training_frame
    • transform
    • treatment_column
    • tweedie_link_power
    • tweedie_power
    • tweedie_variance_power
    • uplift_metric
    • upload_custom_distribution
    • upload_custom_metric
    • use_all_factor_levels
    • user_points
    • validation_frame
    • weights_column
    • x
    • y
  • Appendix B - API Reference
  • Additional Resources
H2O
  • Docs »
  • Appendix A - Parameters »
  • lambda_search
  • Edit on GitHub

lambda_search¶

  • Available in: GLM, GAM

  • Hyperparameter: no

Description¶

Setting lambda_search to TRUE enables efficient and automatic search for the optimal value of the lambda parameter. When enabled, GLM/GAM will first fit a model with maximum regularization (highest lambda value) and then keep decreasing it at each step until it reaches the minimum lambda or until overfitting occurs. The resulting model is based on the best lambda value.

Note that the algorithm will automatically calculate the minimum lambda value unless a value for lambda_min_ratio is specified. In that case, the specified value becomes the minimum lambda value. If you enter one or more values for lambda, then the lambda search is performed over only those provided lambdas.

When looking for a sparse solution (alpha > 0), lambda search can also be used to efficiently handle very wide datasets because it can filter out inactive predictors (noise) and only build models for a small subset of predictors. A possible use case for lambda search is to run it on a dataset with many predictors but limit the number of active predictors to a relatively small value.

Related Parameters¶

  • alpha

  • lambda

  • lambda_min_ratio

  • max_active_predictors

  • nlambdas

Example¶

R

Python

library(h2o)
h2o.init()

# import the boston dataset:
# this dataset looks at features of the boston suburbs and predicts median housing prices
# the original dataset can be found at https://archive.ics.uci.edu/ml/datasets/Housing
boston <- h2o.importFile("https://s3.amazonaws.com/h2o-public-test-data/smalldata/gbm_test/BostonHousing.csv")

# set the predictor names and the response column name
predictors <- colnames(boston)[1:13]
# set the response column to "medv", the median value of owner-occupied homes in $1000's
response <- "medv"

# convert the chas column to a factor (chas = Charles River dummy variable (= 1 if tract bounds river; 0 otherwise))
boston["chas"] <- as.factor(boston["chas"])

# split into train and validation sets
boston_splits <- h2o.splitFrame(data =  boston, ratios = 0.8)
train <- boston_splits[[1]]
valid <- boston_splits[[2]]

# try using the `lambda_search` parameter (boolean):
# train your model
boston_glm <- h2o.glm(x = predictors, y = response, training_frame = train,
                      validation_frame = valid,
                      lambda_search = TRUE)

# print the mse for the validation data
print(h2o.mse(boston_glm, valid = TRUE))
import h2o
from h2o.estimators.glm import H2OGeneralizedLinearEstimator
h2o.init()

# import the boston dataset:
# this dataset looks at features of the boston suburbs and predicts median housing prices
# the original dataset can be found at https://archive.ics.uci.edu/ml/datasets/Housing
boston = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/gbm_test/BostonHousing.csv")

# set the predictor names and the response column name
predictors = boston.columns[:-1]
# set the response column to "medv", the median value of owner-occupied homes in $1000's
response = "medv"

# convert the chas column to a factor (chas = Charles River dummy variable (= 1 if tract bounds river; 0 otherwise))
boston['chas'] = boston['chas'].asfactor()

# split into train and validation sets
train, valid = boston.split_frame(ratios = [.8])

# try using the `lambda_search` parameter (boolean):
# initialize the estimator then train the model
boston_glm = H2OGeneralizedLinearEstimator(lambda_search = True)
boston_glm.train(x = predictors, y = response, training_frame = train, validation_frame = valid)

# print the mse for the validation data
print(boston_glm.mse(valid=True))
Next Previous

© Copyright 2016-2024 H2O.ai. Last updated on Mar 27, 2025.

Built with Sphinx using a theme provided by Read the Docs.