Difference between revisions of "Knowledge:Lucene configuration"
(Created page with '== Lucene case sensitive & insensitive search == Lucene search is case-sensitive, but all input is usually lowercased when passing through QueryParser, so it feels like it is cas…') |
|||
Line 7: | Line 7: | ||
Unlike other types of Lucene queries, Wildcard, Prefix, and Fuzzy queries are not passed through the Analyzer, which is the component that performs operations such as stemming and lowercasing. The reason for skipping the Analyzer is that if you were searching for "dogs*" | Unlike other types of Lucene queries, Wildcard, Prefix, and Fuzzy queries are not passed through the Analyzer, which is the component that performs operations such as stemming and lowercasing. The reason for skipping the Analyzer is that if you were searching for "dogs*" | ||
you would not want "dogs" first stemmed to "dog", since that would then match "dog*", which is not the intended query. These queries are case-insensitive anyway because QueryParser makes them lowercase. This behavior can be changed using the setLowercaseExpandedTerms(boolean) method. | you would not want "dogs" first stemmed to "dog", since that would then match "dog*", which is not the intended query. These queries are case-insensitive anyway because QueryParser makes them lowercase. This behavior can be changed using the setLowercaseExpandedTerms(boolean) method. | ||
+ | |||
+ | == Configuration test == | ||
+ | Field: to | ||
+ | Index: Index.UN_TOKENIZED | ||
+ | Content: "OKM Paco Avila" | ||
+ | Search: "OKM" -> NADA | ||
+ | Search: "OKM*" -> OK | ||
+ | Search: "okm" -> NADA | ||
+ | Search: "okm*" -> NADA | ||
+ | |||
+ | Field: to | ||
+ | Index: Index.TOKENIZED | ||
+ | Content: "OKM Paco Avila" | ||
+ | Search: "OKM" -> NADA | ||
+ | Search: "OKM*" -> NADA | ||
+ | Search: "okm" -> OK | ||
+ | Search: "okm*" -> OK |
Latest revision as of 13:51, 17 October 2012
Lucene case sensitive & insensitive search
Lucene search is case-sensitive, but all input is usually lowercased when passing through QueryParser, so it feels like it is case insensitive (This is the case of the findBySimpleQuery() method.
In other words, don't lowercase your input before indexing, and don't lowercase your queries. For this, pick an Analyzer that does not lowercase like KeywordAnalyzer.
Are Wildcard, Prefix, and Fuzzy queries case sensitive?
Unlike other types of Lucene queries, Wildcard, Prefix, and Fuzzy queries are not passed through the Analyzer, which is the component that performs operations such as stemming and lowercasing. The reason for skipping the Analyzer is that if you were searching for "dogs*" you would not want "dogs" first stemmed to "dog", since that would then match "dog*", which is not the intended query. These queries are case-insensitive anyway because QueryParser makes them lowercase. This behavior can be changed using the setLowercaseExpandedTerms(boolean) method.
Configuration test
Field: to Index: Index.UN_TOKENIZED Content: "OKM Paco Avila" Search: "OKM" -> NADA Search: "OKM*" -> OK Search: "okm" -> NADA Search: "okm*" -> NADA
Field: to Index: Index.TOKENIZED Content: "OKM Paco Avila" Search: "OKM" -> NADA Search: "OKM*" -> NADA Search: "okm" -> OK Search: "okm*" -> OK