Page tree

Currently, when you conduct a search the International Spanish Edition in the SNOMED International Browser using the diacritic (accent), it retrieves only terms without accents. When you conduct the search with an accent, it retrieved BOTH terms with and without accent. 

In Snowstorm we have configuration to change how accented characters are handled. This is configured per language because each language seems to have different requirements. We have found that in some languages some accented/diacritic characters are considered as completely different characters and are included separately in the alphabet of that language. Whereas other accented characters do not appear separately in the alphabet and are just considered as modified versions of the original character.

This separation is important when it comes to search. For example in Sweden they consider characters 'å', 'ä' and 'ö' as unique characters which should never be simplified to 'a' or 'o' for search. This simplification is called character folding. In the latest Snowstorm release by default we are not folding/simplifying the characters: áéíóúüñ.
This means that these exact characters have to be used in the search term in order to match descriptions with those characters.


Questions for the SNOMED CT Spanish User Support Group: 


1 - Is it okay to have the search conducted with or without accents and able to retrieve terms with or without accent across-Spanish dialects? Or would this need to be specific for each Spanish Extension shown on the SNOMED Browser?  


2 - What characters should be and/or should NOT be folded/simplified? 


3 - If you believe this needs a meeting/discussion please let me know. 


If you could please leave comments below by Friday, April 23, 2021 so we can move ahead on this fix (if agreed by all). Thank you in advance for your help! 





Corresponds to ticket:  BROWSE-415 - Getting issue details... STATUS

  • No labels

11 Comments

  1. Though I'm quite strict with orthography, I think a search engine must be easy to use and should give all the intended results... and I say intended because not all the people know how to properly use accents in Spanish.


    1. I think the search should ignore accents both in the search box and results.

    As an example: if you search fémur or femur (incorrect)... it will give "fémur". And if you search fractura or fráctura (this is incorrect)... it will result "fractura".

    You could also make a trigger/option available so the user could switch "accent exactitude" on or off.


    2. I don't have an answer for this, as I think the search engine should be flexible on accents.

    3. I don't think a meeting is required for this.


    Regards,

  2. Hi all,

    1. We think it would be OK to have just one configuration for all spanish extensions, as far as we know there 's no more than one accent character for Spanish (´). It is totally necessary to have the search conducted without accent too), as even though it's a linguistic fault not to use an accent, in terms of usage it's a really common practice to browse terms without accent. We think correct spelling should be maintained for the retrieved answers though.
    2. In Spanish we do not have folded characters, our only accent is considered as a completely different character. There's only one exception of a symbol called diéresis (ü), that goes over "u "letter, that may be considered as folded but it's used in really few words (ex. vergüenza)
    3. We don't think a meeting is required for now, but thanks.

    Uruguay NRCteam

  3. Hi all,
    In Argentina, we use snowstorm in the implementation and we have configured the service to be able to search with and without accent in the query, this was a demand from our users to have the same feature when using the browser.
    1. We agree with the partners and believe that the search should ignore the accents and return the result anyway.
    2. We don't have much to contribute to this point other than what Betania points out.
    3. We do not consider a meeting necessary for this.

    Argentina NRCteam

  4. Hello everyone. I concur with all comments. Agree that browse would be configured with or without accents but should retrieve the accented term. Agree with Betania about diéresis. We must remember the letter ñ, which must be considered in searches and retrieval.

  5. No need for a meeting.

  6. Thank you for your quick responses! I really appreciate that we have a pretty global response as well - thank you! 

    Okay outcomes - 

    1 - On the SNOMED Browser, for Spanish versions, we should allow for searches to use or not use the diacritic (accent) AND the return terms with and without accents. 

    2 - the following will be folded (or allowed) for Spanish versions:  á é ü ñ


    Kai Kewleydo you need any additional information from the group for the update/fix ?  

  7. Suzy Roy I am hearing that there are no special cases where the simpler form of a character should not match a diacritic character. To confirm my understanding I've included a table of some of the search use cases mentioned here. The table illustrates that regardless of the characters used in the SNOMED CT description term if the user inputs a simple search term with no accents the terms should be matched.

    SNOMED CT TermSearch TermMatch
    frácturafráctura(tick)
    frácturafractura(tick)
    vergüenzavergüenza(tick)
    vergüenzaverguenza(tick)
    luxación de muñecaluxación de muñeca(tick)
    luxación de muñecaluxacion de muneca(tick)

    Could I get a confirmation of that please?

    This means that all characters can be folded into their simpler form for the search index. (No characters not-folded).

  8. Hola! 

    Update - this search/retrieval functionality has been implemented! 

  9. Great news! Thanks.