public class ModelSelection extends hex.ModelBuilder<ModelSelectionModel,ModelSelectionModel.ModelSelectionParameters,ModelSelectionModel.ModelSelectionModelOutput>
Modifier and Type | Class and Description |
---|---|
class |
ModelSelection.ModelSelectionDriver |
static class |
ModelSelection.SweepModel
Contains information of a predictor subsets like predictor indices of the subset (with the newest predictor as
the last element of the array), CPM associated with predictor subset minus the latest element and the error
variance of the CPM.
|
Modifier and Type | Field and Description |
---|---|
java.lang.String[][] |
_bestModelPredictors |
double[] |
_bestR2Values |
int |
_glmNFolds |
int |
_numPredictors |
java.lang.String[] |
_predictorNames |
java.lang.String[][] |
_predictorsAdd |
java.lang.String[][] |
_predictorsRemoved |
_coordinator, _desc, _eventPublisher, _fold, _input_parms, _job, _messages, _nclass, _offset, _orig_projection_array, _origDomains, _origNames, _origTrain, _parms, _priorClassDist, _removedCols, _response, _result, _startUpOnceModelBuilder, _train, _treatment, _valid, _vresponse, _weights
Constructor and Description |
---|
ModelSelection(boolean startup_once) |
ModelSelection(ModelSelectionModel.ModelSelectionParameters parms) |
ModelSelection(ModelSelectionModel.ModelSelectionParameters parms,
water.Key<ModelSelectionModel> key) |
Modifier and Type | Method and Description |
---|---|
static GLMModel |
buildExtractBestR2Model(water.fvec.Frame[] trainingFrames,
ModelSelectionModel.ModelSelectionParameters parms,
int glmNFolds,
java.lang.String foldColumn,
hex.Model.Parameters.FoldAssignmentScheme foldAssignment)
Given the training Frame array, build models for each training frame and return the GLMModel with the best
R2 values.
|
hex.ModelCategory[] |
can_build() |
protected void |
checkMemoryFootPrint(int p) |
ModelSelection.SweepModel |
forwardStep(java.util.List<java.lang.Integer> currSubsetIndices,
java.util.List<java.lang.Integer> validSubsets,
java.util.Set<java.util.BitSet> usedCombo,
java.util.BitSet predIndices,
int[][] predInd2CPMInd,
ModelSelection.SweepModel bestModel,
boolean hasIntercept)
Given current predictor subset in currSubsetIndices, this method will add one more predictor to the subset and
choose the one that will increase the R2 by the most.
|
static GLMModel |
forwardStep(java.util.List<java.lang.Integer> currSubsetIndices,
java.util.List<java.lang.String> coefNames,
int predPos,
java.util.List<java.lang.Integer> validSubsets,
ModelSelectionModel.ModelSelectionParameters parms,
java.lang.String foldColumn,
int glmNFolds,
hex.Model.Parameters.FoldAssignmentScheme foldAssignment) |
static GLMModel |
forwardStep(java.util.List<java.lang.Integer> currSubsetIndices,
java.util.List<java.lang.String> coefNames,
int predPos,
java.util.List<java.lang.Integer> validSubsets,
ModelSelectionModel.ModelSelectionParameters parms,
java.lang.String foldColumn,
int glmNFolds,
hex.Model.Parameters.FoldAssignmentScheme foldAssignment,
java.util.Set<java.util.BitSet> usedCombo)
Given a predictor subset with indices stored in currSubsetIndices, one more predictor from the coefNames
that was not found in currSubsetIndices was added to the subset to form a new Training frame.
|
ModelSelection.SweepModel |
forwardStepR(java.util.List<java.lang.Integer> currSubsetIndices,
java.util.List<java.lang.Integer> validSubsets,
java.util.Set<java.util.BitSet> usedCombo,
java.util.BitSet predIndices,
int[][] predInd2CPMInd,
ModelSelection.SweepModel bestModel,
double bestErrVar,
int predPos)
Given a currSubsetIndices and a predPos, this function will try to look for new predictor that will decrease the
error variance compared to bestErrVar.
|
boolean |
haveMojo() |
boolean |
havePojo() |
void |
init(boolean expensive) |
boolean |
isSupervised() |
protected int |
nModelsInParallel(int folds) |
ModelSelection.SweepModel |
replacement(java.util.List<java.lang.Integer> currSubsetIndices,
java.util.List<java.lang.Integer> validSubset,
java.util.Set<java.util.BitSet> usedCombos,
java.util.BitSet predIndices,
ModelSelection.SweepModel bestModel,
int[][] predictorIndex2CPMIndices)
consider the predictors in subset as pred0, pred1, pred2 (using subset size 3 as example):
a.
|
static GLMModel |
replacement(java.util.List<java.lang.Integer> currSubsetIndices,
java.util.List<java.lang.String> coefNames,
double bestR2,
ModelSelectionModel.ModelSelectionParameters parms,
int glmNFolds,
java.lang.String foldColumn,
java.util.List<java.lang.Integer> validSubset,
hex.Model.Parameters.FoldAssignmentScheme foldAssignment,
java.util.Set<java.util.BitSet> usedCombos)
consider the predictors in subset as pred0, pred1, pred2 (using subset size 3 as example):
a.
|
protected ModelSelection.ModelSelectionDriver |
trainModelImpl() |
algoName, algos, builderVisibility, canLearnFromNAs, checkCustomMetricForEarlyStopping, checkDistributions, checkEarlyStoppingReproducibility, checkMemoryFootPrint_impl, checkMemoryFootPrint, checkResponseVariable, clearInitState, clearValidationErrors, computeCrossValidation, computePriorClassDistribution, cv_buildModels, cv_canBuildMainModelInParallel, cv_computeAndSetOptimalParameters, cv_initStoppingParameters, cv_mainModelScores, cv_makeAggregateModelMetrics, cv_scoreCVModels, cv_updateOptimalParameters, defaultKey, desiredChunks, dest, error_count, error, get, getMessagesByFieldAndSeverity, getName, getSysProperty, getToEigenVec, hasFoldCol, hasOffsetCol, hasTreatmentCol, hasWeightCol, hide, ignoreBadColumns, ignoreConstColumns, ignoreInvalidColumns, ignoreStringColumns, ignoreUuidColumns, info, init_adaptFrameToTrain, init_getNClass, initWorkspace, isClassifier, isResponseOptional, isStopped, javaName, logMe, make, make, make, makeCVMetrics, makeCVModelBuilder, makeParameters, makePojoWriter, message, nclasses, nFoldCV, nFoldWork, nModelsInParallel, nModelsInParallel, numSpecialCols, paramName, raiseReproducibilityWarning, rebalance, remainingTimeSecs, response, schemaDirectory, separateFeatureVecs, setMaxRuntimeSecsForMainModel, setTrain, setValid, shouldReorder, smallDataSize, stop_requested, timeout, train, trainModel, trainModel, trainModelNested, trainModelNested, trainModelOnH2ONode, valid, validateBinaryResponse, validateStoppingMetric, validationErrors, validationWarnings, vresponse, warn
public java.lang.String[][] _bestModelPredictors
public double[] _bestR2Values
public java.lang.String[][] _predictorsAdd
public java.lang.String[][] _predictorsRemoved
public int _numPredictors
public java.lang.String[] _predictorNames
public int _glmNFolds
public ModelSelection(boolean startup_once)
public ModelSelection(ModelSelectionModel.ModelSelectionParameters parms)
public ModelSelection(ModelSelectionModel.ModelSelectionParameters parms, water.Key<ModelSelectionModel> key)
protected int nModelsInParallel(int folds)
nModelsInParallel
in class hex.ModelBuilder<ModelSelectionModel,ModelSelectionModel.ModelSelectionParameters,ModelSelectionModel.ModelSelectionModelOutput>
protected ModelSelection.ModelSelectionDriver trainModelImpl()
trainModelImpl
in class hex.ModelBuilder<ModelSelectionModel,ModelSelectionModel.ModelSelectionParameters,ModelSelectionModel.ModelSelectionModelOutput>
public hex.ModelCategory[] can_build()
can_build
in class hex.ModelBuilder<ModelSelectionModel,ModelSelectionModel.ModelSelectionParameters,ModelSelectionModel.ModelSelectionModelOutput>
public boolean isSupervised()
isSupervised
in class hex.ModelBuilder<ModelSelectionModel,ModelSelectionModel.ModelSelectionParameters,ModelSelectionModel.ModelSelectionModelOutput>
public boolean haveMojo()
haveMojo
in class hex.ModelBuilder<ModelSelectionModel,ModelSelectionModel.ModelSelectionParameters,ModelSelectionModel.ModelSelectionModelOutput>
public boolean havePojo()
havePojo
in class hex.ModelBuilder<ModelSelectionModel,ModelSelectionModel.ModelSelectionParameters,ModelSelectionModel.ModelSelectionModelOutput>
public void init(boolean expensive)
init
in class hex.ModelBuilder<ModelSelectionModel,ModelSelectionModel.ModelSelectionParameters,ModelSelectionModel.ModelSelectionModelOutput>
protected void checkMemoryFootPrint(int p)
public ModelSelection.SweepModel forwardStep(java.util.List<java.lang.Integer> currSubsetIndices, java.util.List<java.lang.Integer> validSubsets, java.util.Set<java.util.BitSet> usedCombo, java.util.BitSet predIndices, int[][] predInd2CPMInd, ModelSelection.SweepModel bestModel, boolean hasIntercept)
public ModelSelection.SweepModel replacement(java.util.List<java.lang.Integer> currSubsetIndices, java.util.List<java.lang.Integer> validSubset, java.util.Set<java.util.BitSet> usedCombos, java.util.BitSet predIndices, ModelSelection.SweepModel bestModel, int[][] predictorIndex2CPMIndices)
public ModelSelection.SweepModel forwardStepR(java.util.List<java.lang.Integer> currSubsetIndices, java.util.List<java.lang.Integer> validSubsets, java.util.Set<java.util.BitSet> usedCombo, java.util.BitSet predIndices, int[][] predInd2CPMInd, ModelSelection.SweepModel bestModel, double bestErrVar, int predPos)
public static GLMModel buildExtractBestR2Model(water.fvec.Frame[] trainingFrames, ModelSelectionModel.ModelSelectionParameters parms, int glmNFolds, java.lang.String foldColumn, hex.Model.Parameters.FoldAssignmentScheme foldAssignment)
trainingFrames
- public static GLMModel forwardStep(java.util.List<java.lang.Integer> currSubsetIndices, java.util.List<java.lang.String> coefNames, int predPos, java.util.List<java.lang.Integer> validSubsets, ModelSelectionModel.ModelSelectionParameters parms, java.lang.String foldColumn, int glmNFolds, hex.Model.Parameters.FoldAssignmentScheme foldAssignment, java.util.Set<java.util.BitSet> usedCombo)
currSubsetIndices:
- stored predictors that are chosen in the subsetcoefNames:
- predictor names of full training framepredPos:
- index/location of predictor to be added into currSubsetIndicespublic static GLMModel forwardStep(java.util.List<java.lang.Integer> currSubsetIndices, java.util.List<java.lang.String> coefNames, int predPos, java.util.List<java.lang.Integer> validSubsets, ModelSelectionModel.ModelSelectionParameters parms, java.lang.String foldColumn, int glmNFolds, hex.Model.Parameters.FoldAssignmentScheme foldAssignment)
public static GLMModel replacement(java.util.List<java.lang.Integer> currSubsetIndices, java.util.List<java.lang.String> coefNames, double bestR2, ModelSelectionModel.ModelSelectionParameters parms, int glmNFolds, java.lang.String foldColumn, java.util.List<java.lang.Integer> validSubset, hex.Model.Parameters.FoldAssignmentScheme foldAssignment, java.util.Set<java.util.BitSet> usedCombos)