Query language and collation

When specifying the lexical search type for term matching there is a need to specify the collation used, and to specify the default collation for the language in which the terms are to be matched are represented.

Examples based on mysql collation behavior:

"AAO" matches "ÅÄÖ" in utf8_generic_ci and utf8_unicode_ci (and utf8_german2_ci) but not in utf8_swedish_ci collation.

"Aåa" matches "aåa" in utf8_generic_ci and utf8_swedish_ci but not in utf8_bin collation (i.e. case insesitive vs. sensitive, sometimes you need case sensitivity when searching...).

Similar behavior can be implemented e.g. by java.text.Collator in java or by the collection.find() or cursor.collation() method in MongoDB.

Space shortcuts

Page tree

1 Comment

Daniel Karlsson