This page is published as Draft for Trial Use. The recommendations on this page will be reviewed and may be updated following feedback from implementation experiences.
To promote consistency between implementations of ECL, the following collation principles are recommended:
- Search and match - The default behaviour of a system implementing ECL queries with term filters, is to use locale-specific asymmetric searching at the secondary comparison strength level -as specified in the Unicode Technical Standard #10 - Unicode Collation Algorithm. This means that the search is, by default, case insensitive, with some language-specific character normalization behaviour.
- Asymmetric: Asymmetric searches require characters in the query that are unmarked (i.e. the 'base letters') to match characters in the target that are either marked or unmarked (with the same base letter). However, a character in the query that is marked will only match a character in the target that is marked in the same way.
- Secondary strength: Searches with a strength of secondary will only consider level 1 differences (e.g. "d" vs "e") and level 2 differences (e.g. "e" vs "é" in English). However, level 3 differences (e.g. "e" vs "E") are not considered. This provides the same effect as queries being case insensitive. For example, in English, "e" in the query will match both "e" and "E" in the target; and "E" in the query will similarly match both "e" and "E" in the target.
- Language customizations - Locale-based customizations of the standard are specified in the Unicode Common Locale Data Repository (CLDR). The unicode CLDR specifies the characters that are considered to be 'marked' variants of the base letters, identical base letters, and/or contractions in each specified language. The description terms in the substrate should be indexed separately for each language supported.
For example, the following search behaviour is expected in the locales specified below.
- In English, Swedish and Danish, the following search behaviour is expected:
Note: No customizations are made in these 3 locales for the characters used in these searches. Therefore, the CLDR root collation order is used.
Search Term | Target Matches | Target does NOT Match |
---|---|---|
resume | resume, Resume, RESUME, résumé, rèsumè, Résumé, RÉSUMÉ, … | - |
Resume | resume, Resume, RESUME, résumé, rèsumè, Résumé, RÉSUMÉ, … | - |
résumé | résumé, Résumé, RÉSUMÉ, … | resume, Resume, RESUME, ... |
Résumé | résumé, Résumé, RÉSUMÉ, … | resume, Resume, RESUME, ... |
- In English, the following search behaviour is expected (based on the CLDR 'en' locale, which uses the CLDR root collation order):
Search Term | Target Matches | Target does NOT Match |
---|---|---|
sjogren | sjogren, Sjogren, SJOGREN, sjögren, Sjögren, SJÖGREN, sjøgren, Sjøgren, SJØGREN, ... | - |
sjögren | sjögren, Sjögren, SJÖGREN, ... | sjogren, Sjogren, SJOGREN, sjøgren, Sjøgren, SJØGREN, ... |
Angstrom | angstrom, Angstrom, ANGSTROM, ångström, Ångström, ÅNGSTRÖM, ångstrøm, Ångstrøm, ÅNGSTRØM, ... | ångstrœm, Ångstrœm, ÅNGSTRŒM, ... |
Ångström | ångström, Ångström, ÅNGSTRÖM, ... | angstrom, Angstrom, ANGSTROM, ångstrøm, Ångstrøm, ÅNGSTRØM, ... |
Ångstrøm | ångstrøm, Ångstrøm, ÅNGSTRØM, ... | angstrom, Angstrom, ANGSTROM, ångström, Ångström, ÅNGSTRÖM, ... |
aangstrøm | aangstrøm, Aangstrøm, AANGSTRØM, ... | angstrom, Angstrom, ANGSTROM, ångström, Ångström, ÅNGSTRÖM, ångstrøm, Ångstrøm, ÅNGSTRØM, ångstrœm, Ångstrœm, ÅNGSTRŒM, ... |
- In Swedish, the following search behaviour is expected (based on the customizations in the CLDR 'sv' locale):
Search Term | Target Matches | Target does NOT Match |
---|---|---|
sjogren | sjogren, Sjogren, SJOGREN, ... | sjögren, Sjögren, SJÖGREN, sjøgren, Sjøgren, SJØGREN, ... |
sjögren | sjögren, Sjögren, SJÖGREN, sjøgren, Sjøgren, SJØGREN, ... | sjogren, Sjogren, SJOGREN , ... |
Angstrom | angstrom, Angstrom, ANGSTROM, ... | ångström, Ångström, ÅNGSTRÖM, ångstrøm, Ångstrøm, ÅNGSTRØM, ångstrœm, Ångstrœm, ÅNGSTRŒM, aangström, Aangström, AANGSTRÖM, ... |
Ångström | ångström, Ångström, ÅNGSTRÖM, ångstrøm, Ångstrøm, ÅNGSTRØM, ångstrœm, Ångstrœm, ÅNGSTRŒM, ... | angstrom, Angstrom, ANGSTROM, aangström, Aangström, AANGSTRÖM, ... |
Ångstrøm | ångstrøm, Ångstrøm, ÅNGSTRØM, ... | angstrom, Angstrom, ANGSTROM, ångström, Ångström, ÅNGSTRÖM, ångstrœm, Ångstrœm, ÅNGSTRŒM, ... |
aangstrøm | aangstrøm, Aangstrøm, AANGSTRØM, ... | angstrom, Angstrom, ANGSTROM, ångström, Ångström, ÅNGSTRÖM, ångstrøm, Ångstrøm, ÅNGSTRØM, ångstrœm, Ångstrœm, ÅNGSTRŒM, ... |
- And in Danish, the following search behaviour is expected (based on the customizations in the CLDR 'da' locale):
Search Term | Target Matches | Target does NOT Match |
---|---|---|
sjogren | sjogren, Sjogren, SJOGREN, ... | sjögren, Sjögren, SJÖGREN, sjøgren, Sjøgren, SJØGREN, ... |
sjögren | sjögren, Sjögren, SJÖGREN, ... | sjogren, Sjogren, SJOGREN, sjøgren, Sjøgren, SJØGREN, ... |
Angstrom | angstrom, Angstrom, ANGSTROM, ... | ångström, Ångström, ÅNGSTRÖM, ångstrøm, Ångstrøm, ÅNGSTRØM, ångstrœm, Ångstrœm, ÅNGSTRŒM, aangstrøm, Aangstrøm, AANGSTRØM ... |
Ångström | ångström, Ångström, ÅNGSTRÖM, aangström, Aangström, AANGSTRÖM, ... | angstrom, Angstrom, ANGSTROM, ångstrøm, Ångstrøm, ÅNGSTRØM, ångstrœm, Ångstrœm, ÅNGSTRŒM, ... |
Ångstrøm | ångstrøm, Ångstrøm, ÅNGSTRØM, ångström, Ångström, ÅNGSTRÖM, aangstrøm, Aangstrøm, AANGSTRØM, aangström, Aangström, AANGSTRÖM, ... | angstrom, Angstrom, ANGSTROM, ångstrœm, Ångstrœm, ÅNGSTRŒM, ... |
aangstrøm | ångstrøm, Ångstrøm, ÅNGSTRØM, ångström, Ångström, ÅNGSTRÖM, aangstrøm, Aangstrøm, AANGSTRØM, aangström, Aangström, AANGSTRÖM, ... | angstrom, Angstrom, ANGSTROM, ångstrœm, Ångstrœm, ÅNGSTRŒM, ... |
Feedback
Overview
Content Tools
Apps