public class MRUtils
extends java.lang.Object
| Modifier and Type | Class and Description |
|---|---|
static class |
MRUtils.ClassDist
Compute the class distribution from a class label vector
(not counting missing values)
Usage 1: Label vector is categorical
------------------------------------
Vec label = ...;
assert(label.isCategorical());
double[] dist = new ClassDist(label).doAll(label).dist();
Usage 2: Label vector is numerical
----------------------------------
Vec label = ...;
int num_classes = ...;
assert(label.isInt());
double[] dist = new ClassDist(num_classes).doAll(label).dist();
|
static class |
MRUtils.ClassDistQuasibinomial
Compute the class distribution for qusibinomial distribution from a class label vector
(not counting missing values)
|
static class |
MRUtils.Dist |
| Constructor and Description |
|---|
MRUtils() |
| Modifier and Type | Method and Description |
|---|---|
static Frame |
sampleFrame(Frame fr,
long rows,
long seed)
Sample rows from a frame.
|
static Frame |
sampleFrame(Frame fr,
long rows,
java.lang.String weightColumn,
long seed)
Sample rows from a frame with weight column.
|
static Frame |
sampleFrameSmall(Frame fr,
int rows,
long seed)
Sample small number of rows from a frame.
|
static Frame |
sampleFrameSmall(Frame fr,
int rows,
java.util.Random rand)
Sample small number of rows from a frame.
|
static Frame |
sampleFrameStratified(Frame fr,
Vec label,
Vec weights,
float[] sampling_ratios,
long seed,
boolean debug,
java.lang.String[] quasibinomialDomain)
Stratified sampling
|
static Frame |
sampleFrameStratified(Frame fr,
Vec label,
Vec weights,
float[] sampling_ratios,
long maxrows,
long seed,
boolean allowOversampling,
boolean verbose)
Stratified sampling for classifiers - FIXME: For weights, this is not accurate, as the sampling is done with uniform weights
|
static Frame |
sampleFrameStratified(Frame fr,
Vec label,
Vec weights,
float[] sampling_ratios,
long maxrows,
long seed,
boolean allowOversampling,
boolean verbose,
java.lang.String[] quasibinomialDomain)
Stratified sampling for classifiers - FIXME: For weights, this is not accurate, as the sampling is done with uniform weights
|
static Frame |
shuffleFramePerChunk(Frame fr,
long seed)
Row-wise shuffle of a frame (only shuffles rows inside of each chunk)
|
public static Frame sampleFrame(Frame fr, long rows, long seed)
fr - Input framerows - Approximate number of rows to sample (across all chunks)seed - Seed for RNGpublic static Frame shuffleFramePerChunk(Frame fr, long seed)
fr - Input framepublic static Frame sampleFrameStratified(Frame fr, Vec label, Vec weights, float[] sampling_ratios, long maxrows, long seed, boolean allowOversampling, boolean verbose)
fr - Input framelabel - Label vector (must be categorical)weights - Weights vector, can be nullsampling_ratios - Optional: array containing the requested sampling ratios per class (in order of domains), will be overwritten if it contains all 0smaxrows - Maximum number of rows in the returned frameseed - RNG seed for samplingallowOversampling - Allow oversampling of minority classesverbose - Whether to print verbose infopublic static Frame sampleFrameStratified(Frame fr, Vec label, Vec weights, float[] sampling_ratios, long maxrows, long seed, boolean allowOversampling, boolean verbose, java.lang.String[] quasibinomialDomain)
fr - Input framelabel - Label vector (must be categorical)weights - Weights vector, can be nullsampling_ratios - Optional: array containing the requested sampling ratios per class (in order of domains), will be overwritten if it contains all 0smaxrows - Maximum number of rows in the returned frameseed - RNG seed for samplingallowOversampling - Allow oversampling of minority classesverbose - Whether to print verbose infoquasibinomialDomain - quasibinomial domainpublic static Frame sampleFrameStratified(Frame fr, Vec label, Vec weights, float[] sampling_ratios, long seed, boolean debug, java.lang.String[] quasibinomialDomain)
fr - Input framelabel - Label vector (from the input frame)weights - Weight vector (from the input frame), can be nullsampling_ratios - Given sampling ratios for each class, in order of domainsseed - RNG seeddebug - Whether to print debug infoquasibinomialDomain - quasibinomial domainpublic static Frame sampleFrame(Frame fr, long rows, java.lang.String weightColumn, long seed)
fr - Input framerows - Approximate number of rows to sample (across all chunks)weightColumn - Weight column nameseed - Seed for RNGpublic static Frame sampleFrameSmall(Frame fr, int rows, long seed)
fr - Input framerows - Exact number of rows to sampleseed - Seed for RNGpublic static Frame sampleFrameSmall(Frame fr, int rows, java.util.Random rand)
fr - Input framerows - Exact number of rows to samplerand - Random Generator