Default Comparison

From Codepoint

The MOO server uses a default comparison algorithm that folds case, although it can also make case-sensitive comparisons on request. It also uses several matching, hashing, etc functions that are compatible with its case folding.

For Unicode MOO we need a similar set of functions that don't arbitrarily cut off their dwiminess above the ASCII limit.

These should certainly respect canonical equivalence, and maybe compatibility equivalence. They should also try to be case-insensitive at least in a 'generic' way. Maybe they should be diacritic-insensitive too, but I'm not sure about this. The Swedes would not be happy.

(The Turks won't be happy in either case ...)

Hashing functions can manage to respect canonical equivalence without actually doing all the mappings. They just need to be able to compute the decomposed hash directly for decomposable characters and need to have an ordering-insensitive combining function in the case of reorderable combiners.

Of course, this causes collisions in the case of sequences of reorderable combiners that can in fact not be reordered because they have the same combining class. Is this going to be an important enough case we should worry about it? Certainly not for ordinary use, but it might open up DOS vulnerabilities for hash tables that use these functions. (On the other hand, such vulnerabilities already exist in the current codebase since the hash functions are not cryptographically secure; further, str_hash is used for fast comparisons and not for any kind of bucket lookup, so the actual lookups involved are linear anyway.)