The Unicode standard allows for certain (visually) identical characters to be represented in different ways. For example the character ä may be represented as a single combined codepoint "Latin Small Letter A with Diaeresis" (U+00E4) or by the combination of "Latin Small Letter A" (U+0061) followed by "Combining Diaeresis" (U+0308). The semantic meaning and visual representation is exactly the same, but the codepoints are different. How do you deal with that?