Difference between revisions of "Knowledge:Lucene configuration"

From OpenKM Documentation
Jump to: navigation, search
(Created page with '== Lucene case sensitive & insensitive search == Lucene search is case-sensitive, but all input is usually lowercased when passing through QueryParser, so it feels like it is cas…')
 
 
Line 7: Line 7:
 
Unlike other types of Lucene queries, Wildcard, Prefix, and Fuzzy queries are not passed through the Analyzer, which is the component that performs operations such as stemming and lowercasing. The reason for skipping the Analyzer is that if you were searching for "dogs*"
 
Unlike other types of Lucene queries, Wildcard, Prefix, and Fuzzy queries are not passed through the Analyzer, which is the component that performs operations such as stemming and lowercasing. The reason for skipping the Analyzer is that if you were searching for "dogs*"
 
you would not want "dogs" first stemmed to "dog", since that would then match "dog*", which is not the intended query. These queries are case-insensitive anyway because QueryParser makes them lowercase. This behavior can be changed using the setLowercaseExpandedTerms(boolean) method.
 
you would not want "dogs" first stemmed to "dog", since that would then match "dog*", which is not the intended query. These queries are case-insensitive anyway because QueryParser makes them lowercase. This behavior can be changed using the setLowercaseExpandedTerms(boolean) method.
 +
 +
== Configuration test ==
 +
Field: to
 +
Index: Index.UN_TOKENIZED
 +
Content: "OKM Paco Avila"
 +
Search: "OKM" -> NADA
 +
Search: "OKM*" -> OK
 +
Search: "okm" -> NADA
 +
Search: "okm*" -> NADA
 +
 +
Field: to
 +
Index: Index.TOKENIZED
 +
Content: "OKM Paco Avila"
 +
Search: "OKM" -> NADA
 +
Search: "OKM*" -> NADA
 +
Search: "okm" -> OK
 +
Search: "okm*" -> OK

Latest revision as of 13:51, 17 October 2012

Lucene case sensitive & insensitive search

Lucene search is case-sensitive, but all input is usually lowercased when passing through QueryParser, so it feels like it is case insensitive (This is the case of the findBySimpleQuery() method.

In other words, don't lowercase your input before indexing, and don't lowercase your queries. For this, pick an Analyzer that does not lowercase like KeywordAnalyzer.

Are Wildcard, Prefix, and Fuzzy queries case sensitive?

Unlike other types of Lucene queries, Wildcard, Prefix, and Fuzzy queries are not passed through the Analyzer, which is the component that performs operations such as stemming and lowercasing. The reason for skipping the Analyzer is that if you were searching for "dogs*" you would not want "dogs" first stemmed to "dog", since that would then match "dog*", which is not the intended query. These queries are case-insensitive anyway because QueryParser makes them lowercase. This behavior can be changed using the setLowercaseExpandedTerms(boolean) method.

Configuration test

Field: to
Index: Index.UN_TOKENIZED
Content: "OKM Paco Avila"
Search: "OKM" -> NADA
Search: "OKM*" -> OK
Search: "okm" -> NADA
Search: "okm*" -> NADA
Field: to
Index: Index.TOKENIZED
Content: "OKM Paco Avila"
Search: "OKM" -> NADA
Search: "OKM*" -> NADA
Search: "okm" -> OK
Search: "okm*" -> OK