Class FuzzyQuery
false to the transpositions parameter.
This query uses MultiTermQuery.TopTermsBlendedFreqScoringRewrite as default. So terms
will be collected and scored according to their edit distance. Only the top terms are used for
building the BooleanQuery. It is not recommended to change the rewrite mode for fuzzy
queries.
At most, this query will match terms up to 2 edits. Higher distances (especially with transpositions enabled), are generally not useful and will match a significant amount of the term dictionary. If you really want this, consider using an n-gram indexing technique (such as the SpellChecker in the suggest module) instead.
NOTE: terms of length 1 or 2 will sometimes not match because of how the scaled distance between two terms is computed. For a term to match, the edit distance between the terms must be less than the minimum length term (either the input term, or the candidate term). For example, FuzzyQuery on term "abcd" with maxEdits=2 will not match an indexed term "ab", and FuzzyQuery on term "a" with maxEdits=2 will not match an indexed term "abc".
-
Nested Class Summary
Nested classes/interfaces inherited from class org.apache.lucene.search.MultiTermQuery
MultiTermQuery.RewriteMethod, MultiTermQuery.TopTermsBlendedFreqScoringRewrite, MultiTermQuery.TopTermsBoostOnlyBooleanQueryRewrite, MultiTermQuery.TopTermsScoringBooleanQueryRewrite -
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final intstatic final intstatic final intstatic final booleanFields inherited from class org.apache.lucene.search.MultiTermQuery
CONSTANT_SCORE_BLENDED_REWRITE, CONSTANT_SCORE_BOOLEAN_REWRITE, CONSTANT_SCORE_REWRITE, DOC_VALUES_REWRITE, field, rewriteMethod, SCORING_BOOLEAN_REWRITE -
Constructor Summary
ConstructorsConstructorDescriptionFuzzyQuery(Term term) FuzzyQuery(Term term, int maxEdits) FuzzyQuery(Term term, int maxEdits, int prefixLength) FuzzyQuery(Term term, int maxEdits, int prefixLength, int maxExpansions, boolean transpositions) CallsFuzzyQuery(Term, int, int, int, boolean, org.apache.lucene.search.MultiTermQuery.RewriteMethod)FuzzyQuery(term, maxEdits, prefixLength, maxExpansions, defaultRewriteMethod(maxExpansions))FuzzyQuery(Term term, int maxEdits, int prefixLength, int maxExpansions, boolean transpositions, MultiTermQuery.RewriteMethod rewriteMethod) Create a new FuzzyQuery that will match terms with an edit distance of at mostmaxEditstoterm. -
Method Summary
Modifier and TypeMethodDescriptionstatic MultiTermQuery.RewriteMethoddefaultRewriteMethod(int maxExpansions) Creates a default top-terms blended frequency scoring rewrite with the given max expansionsbooleanOverride and implement query instance equivalence properly in a subclass.static intfloatToEdits(float minimumSimilarity, int termLen) Helper function to convert from "minimumSimilarity" fractions to raw edit distances.Returns the compiled automata used to match termsstatic CompiledAutomatongetFuzzyAutomaton(String term, int maxEdits, int prefixLength, boolean transpositions) Returns theCompiledAutomatoninternally used byFuzzyQueryto match terms.intintReturns the non-fuzzy prefix length.getTerm()Returns the pattern term.protected TermsEnumgetTermsEnum(Terms terms, AttributeSource atts) Construct the enumeration to be used, expanding the pattern term.booleanReturns true if transpositions should be treated as a primitive edit operation.inthashCode()Override and implement query hash code properly in a subclass.Prints a query to a string, withfieldassumed to be the default field and omitted.voidvisit(QueryVisitor visitor) Recurse through the query tree, visiting any child queries.Methods inherited from class org.apache.lucene.search.MultiTermQuery
getField, getRewriteMethod, getTermsCount, getTermsEnum, rewriteMethods inherited from class org.apache.lucene.search.Query
classHash, createWeight, sameClassAs, toString
-
Field Details
-
defaultMaxEdits
public static final int defaultMaxEdits- See Also:
-
defaultPrefixLength
public static final int defaultPrefixLength- See Also:
-
defaultMaxExpansions
public static final int defaultMaxExpansions- See Also:
-
defaultTranspositions
public static final boolean defaultTranspositions- See Also:
-
-
Constructor Details
-
FuzzyQuery
public FuzzyQuery(Term term, int maxEdits, int prefixLength, int maxExpansions, boolean transpositions, MultiTermQuery.RewriteMethod rewriteMethod) Create a new FuzzyQuery that will match terms with an edit distance of at mostmaxEditstoterm. If aprefixLength> 0 is specified, a common prefix of that length is also required.- Parameters:
term- the term to search formaxEdits- must be>= 0and<=LevenshteinAutomata.MAXIMUM_SUPPORTED_DISTANCE.prefixLength- length of common (non-fuzzy) prefixmaxExpansions- the maximum number of terms to match. If this number is greater thanIndexSearcher.getMaxClauseCount()when the query is rewritten, then the maxClauseCount will be used instead.transpositions- true if transpositions should be treated as a primitive edit operation. If this is false, comparisons will implement the classic Levenshtein algorithm.rewriteMethod- the rewrite method to use to build the final query
-
FuzzyQuery
public FuzzyQuery(Term term, int maxEdits, int prefixLength, int maxExpansions, boolean transpositions) CallsFuzzyQuery(Term, int, int, int, boolean, org.apache.lucene.search.MultiTermQuery.RewriteMethod)FuzzyQuery(term, maxEdits, prefixLength, maxExpansions, defaultRewriteMethod(maxExpansions)) -
FuzzyQuery
-
FuzzyQuery
-
FuzzyQuery
-
-
Method Details
-
defaultRewriteMethod
Creates a default top-terms blended frequency scoring rewrite with the given max expansions -
getMaxEdits
public int getMaxEdits()- Returns:
- the maximum number of edit distances allowed for this query to match.
-
getPrefixLength
public int getPrefixLength()Returns the non-fuzzy prefix length. This is the number of characters at the start of a term that must be identical (not fuzzy) to the query term if the query is to match that term. -
getTranspositions
public boolean getTranspositions()Returns true if transpositions should be treated as a primitive edit operation. If this is false, comparisons will implement the classic Levenshtein algorithm. -
getAutomata
Returns the compiled automata used to match terms -
getFuzzyAutomaton
public static CompiledAutomaton getFuzzyAutomaton(String term, int maxEdits, int prefixLength, boolean transpositions) Returns theCompiledAutomatoninternally used byFuzzyQueryto match terms. This is a very low-level method and may no longer exist in case the implementation of fuzzy-matching changes in the future.- Parameters:
term- the term to search formaxEdits- must be>= 0and<=LevenshteinAutomata.MAXIMUM_SUPPORTED_DISTANCE.prefixLength- length of common (non-fuzzy) prefixtranspositions- true if transpositions should be treated as a primitive edit operation. If this is false, comparisons will implement the classic Levenshtein algorithm.- Returns:
- A
CompiledAutomatonthat matches terms that satisfy input parameters. - NOTE: This API is for internal purposes only and might change in incompatible ways in the next release.
-
visit
Description copied from class:QueryRecurse through the query tree, visiting any child queries. -
getTermsEnum
Description copied from class:MultiTermQueryConstruct the enumeration to be used, expanding the pattern term. This method should only be called if the field exists (ie, implementations can assume the field does exist). This method should not return null (should instead returnTermsEnum.EMPTYif no terms match). The TermsEnum must already be positioned to the first matching term. The givenAttributeSourceis passed by theMultiTermQuery.RewriteMethodto share information between segments, for exampleTopTermsRewriteuses it to share maximum competitive boosts- Specified by:
getTermsEnumin classMultiTermQuery- Throws:
IOException
-
getTerm
Returns the pattern term. -
toString
Description copied from class:QueryPrints a query to a string, withfieldassumed to be the default field and omitted. -
hashCode
public int hashCode()Description copied from class:QueryOverride and implement query hash code properly in a subclass. This is required so thatQueryCacheworks properly.- Overrides:
hashCodein classMultiTermQuery- See Also:
-
equals
Description copied from class:QueryOverride and implement query instance equivalence properly in a subclass. This is required so thatQueryCacheworks properly.Typically a query will be equal to another only if it's an instance of the same class and its document-filtering properties are identical to those of the other instance. Utility methods are provided for certain repetitive code.
- Overrides:
equalsin classMultiTermQuery- See Also:
-
floatToEdits
public static int floatToEdits(float minimumSimilarity, int termLen) Helper function to convert from "minimumSimilarity" fractions to raw edit distances.- Parameters:
minimumSimilarity- scaled similaritytermLen- length (in unicode codepoints) of the term.- Returns:
- equivalent number of maxEdits
-