Class CustomAnalyzer
java.lang.Object
org.apache.lucene.analysis.Analyzer
org.apache.lucene.analysis.custom.CustomAnalyzer
- All Implemented Interfaces:
Closeable,AutoCloseable
A general-purpose Analyzer that can be created with a builder-style API. Under the hood it uses
the factory classes
TokenizerFactory, TokenFilterFactory, and CharFilterFactory.
You can create an instance of this Analyzer using the builder by passing the SPI names (as
defined by ServiceLoader interface) to it:
Analyzer ana = CustomAnalyzer.builder(Paths.get("/path/to/config/dir"))
.withTokenizer(StandardTokenizerFactory.NAME)
.addTokenFilter(LowerCaseFilterFactory.NAME)
.addTokenFilter(StopFilterFactory.NAME, "ignoreCase", "false", "words", "stopwords.txt", "format", "wordset")
.build();
The parameters passed to components are also used by Apache Solr and are documented on their
corresponding factory classes. Refer to documentation of subclasses of TokenizerFactory,
TokenFilterFactory, and CharFilterFactory.
This is the same as the above:
Analyzer ana = CustomAnalyzer.builder(Paths.get("/path/to/config/dir"))
.withTokenizer("standard")
.addTokenFilter("lowercase")
.addTokenFilter("stop", "ignoreCase", "false", "words", "stopwords.txt", "format", "wordset")
.build();
The list of names to be used for components can be looked up through: TokenizerFactory.availableTokenizers(), TokenFilterFactory.availableTokenFilters(), and
CharFilterFactory.availableCharFilters().
You can create conditional branches in the analyzer by using CustomAnalyzer.Builder.when(String, String...) and CustomAnalyzer.Builder.whenTerm(Predicate):
Analyzer ana = CustomAnalyzer.builder()
.withTokenizer("standard")
.addTokenFilter("lowercase")
.whenTerm(t -> t.length() > 10)
.addTokenFilter("reversestring")
.endwhen()
.build();
- Since:
- 5.0.0
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic final classBuilder forCustomAnalyzer.static classFactory class for aConditionalTokenFilterNested classes/interfaces inherited from class org.apache.lucene.analysis.Analyzer
Analyzer.ReuseStrategy, Analyzer.TokenStreamComponents -
Field Summary
Fields inherited from class org.apache.lucene.analysis.Analyzer
GLOBAL_REUSE_STRATEGY, PER_FIELD_REUSE_STRATEGY -
Method Summary
Modifier and TypeMethodDescriptionstatic CustomAnalyzer.Builderbuilder()Returns a builder for custom analyzers that loads all resources from Lucene's classloader.static CustomAnalyzer.BuilderReturns a builder for custom analyzers that loads all resources from the given file system base directory.static CustomAnalyzer.Builderbuilder(ResourceLoader loader) Returns a builder for custom analyzers that loads all resources using the givenResourceLoader.protected Analyzer.TokenStreamComponentscreateComponents(String fieldName) Returns the list of char filters that are used in this analyzer.intgetOffsetGap(String fieldName) intgetPositionIncrementGap(String fieldName) Returns the list of token filters that are used in this analyzer.Returns the tokenizer that is used in this analyzer.protected ReaderinitReader(String fieldName, Reader reader) protected ReaderinitReaderForNormalization(String fieldName, Reader reader) protected TokenStreamnormalize(String fieldName, TokenStream in) toString()Methods inherited from class org.apache.lucene.analysis.Analyzer
attributeFactory, close, getReuseStrategy, normalize, tokenStream, tokenStream
-
Method Details
-
builder
Returns a builder for custom analyzers that loads all resources from Lucene's classloader. All path names given must be absolute with package prefixes. -
builder
Returns a builder for custom analyzers that loads all resources from the given file system base directory. Place, e.g., stop word files there. Files that are not in the given directory are loaded from Lucene's classloader. -
builder
Returns a builder for custom analyzers that loads all resources using the givenResourceLoader. -
initReader
- Overrides:
initReaderin classAnalyzer
-
initReaderForNormalization
- Overrides:
initReaderForNormalizationin classAnalyzer
-
createComponents
- Specified by:
createComponentsin classAnalyzer
-
normalize
-
getPositionIncrementGap
- Overrides:
getPositionIncrementGapin classAnalyzer
-
getOffsetGap
- Overrides:
getOffsetGapin classAnalyzer
-
getCharFilterFactories
Returns the list of char filters that are used in this analyzer. -
getTokenizerFactory
Returns the tokenizer that is used in this analyzer. -
getTokenFilterFactories
Returns the list of token filters that are used in this analyzer. -
toString
-