Package org.apache.lucene.analysis.standard
package org.apache.lucene.analysis.standard
Fast, general-purpose grammar-based tokenizer
StandardTokenizer implements the Word Break rules from the
Unicode Text Segmentation algorithm, as specified in Unicode Standard Annex #29. Unlike
UAX29URLEmailTokenizer from the analysis module, URLs and email addresses are not
tokenized as single tokens, but are instead split up into tokens according to the UAX#29 word
break rules. StandardAnalyzer includes StandardTokenizer, LowerCaseFilter and StopFilter.-
ClassesClassDescriptionFilters
StandardTokenizerwithLowerCaseFilterandStopFilter, using a configurable list of stop words.A grammar-based tokenizer constructed with JFlex.Factory forStandardTokenizer.This class implements Word Break rules from the Unicode Text Segmentation algorithm, as specified in Unicode Standard Annex #29.