Internationalized indexing

  • different languages have different rules for:

    • which index terms should go into the same group

      • e.g. in Czech words starting with letters „c“ and „č“ go to separate groups, but words starting both with „a“ and „á“ go to the same group

    • how the terms are alphabetically sorted

      • e.g. in Czech there is special two character letter „ch“. This letter is ordered between „h“ and „i“. Similar letters are in traditional Spanish or Hungarian.

  • some languages combine previously mentioned behavior together

    • e.g. in German letter „ö“ is placed into the same group as „o“, but it is sorted as if it were two letter sequence „oe

  • things are getting complicated and we even did not start talking about CJKV languages