public class RegexTokenizer extends MRTask<RegexTokenizer>
Example usage:
final RegexTokenizer tokenizer = new RegexTokenizer.Builder()
.setRegex("[,;]")
.setMinLength(2)
.setToLowercase(true)
.create();
final Frame tokens = tokenizer.transform(inputFrame);
| Modifier and Type | Class and Description |
|---|---|
static class |
RegexTokenizer.Builder |
MRTask.PostMapAction<T extends MRTask.PostMapAction<T>>DTask.DKeyTask<T extends DTask.DKeyTask,V extends Keyed>, DTask.RemoveCall| Constructor and Description |
|---|
RegexTokenizer(java.lang.String regex) |
| Modifier and Type | Method and Description |
|---|---|
void |
map(Chunk[] cs,
NewChunk nc)
The handy method to generate a new vector based on existing vectors.
|
Frame |
transform(Frame input)
Tokenizes a given Frame
|
appendables, asyncExecOnAllNodes, block, closeLocal, compute2, dfork, dfork, dfork, dfork, dfork, dinvoke, doAll, doAll, doAll, doAll, doAll, doAll, doAll, doAll, doAll, doAll, doAll, doAll, doAllNodes, getResult, getResult, isReleasable, map, map, map, map, map, map, map, map, map, map, map, modifiesVolatileVecs, onCompletion, onExceptionalCompletion, outputFrame, outputFrame, outputFrame, postGlobal, profile, profString, reduce, self, setupLocal, withPostMapActioncopyOver, getDException, hasException, logVerbose, onAck, onAckAck, setExceptionasBytes, clone, compute, compute1, currThrPriority, frozenType, icer, priority, read, readJSON, reloadFromBytes, write, writeJSON__tryComplete, addToPendingCount, compareAndSetPendingCount, complete, exec, getCompleter, getPendingCount, getRawResult, setCompleter, setPendingCount, setRawResult, tryCompleteadapt, adapt, adapt, cancel, compareAndSetForkJoinTaskTag, completeExceptionally, fork, get, get, get, getException, getForkJoinTaskTag, getPool, getQueuedTaskCount, getSurplusQueuedTaskCount, helpQuiesce, inForkJoinPool, invoke, invokeAll, invokeAll, invokeAll, isCancelled, isCompletedAbnormally, isCompletedNormally, isDone, join, peekNextLocalTask, pollNextLocalTask, pollTask, quietlyComplete, quietlyInvoke, quietlyJoin, reinitialize, setForkJoinTaskTag, tryUnforkpublic void map(Chunk[] cs, NewChunk nc)
MRTaskmap in class MRTask<RegexTokenizer>cs - input vectorsnc - output vectorpublic Frame transform(Frame input)
input - Input Frame is expected to only contain String columns. Each row of the Frame represents a logical
sentence. The sentence can span one or more cells of the row.