Previous Next Contents

2. The I18N solution and ISO 14651

The I18N (Internationalization) approach handles international sorting rules with its Locale descriptions. The current implementation allows the definition of letters (character classification) via its LC_CTYPE feature and the definition of a collation order by the declarations in the LC_COLLATE category. A good description of these topic can be found in [4]. The forthcoming ISO 14651 describes the principles of the LC_COLLATE approach and discusses various aspects of this solution.

A brief description of their solution can be given as follows:

Given is a set of strings to be ordered according to a set of some rule sets. A string is sorted according to the first rule set. For example, in the first rule set there is no difference in the case of letters and their accents (two words are then called quasi-homograph). The result of this first step is a partial order on the keywords. There may be left sets of keywords for which the relative order of the set members is not known, yet. This happens because these (initially different) keywords were mapped into some kind of equivalence class. For example, when ignoring case the words foo, FÓO and foô are all considered to be equal.

For each of these equivalence classes another set of rules is applied trying to further make the partial order become less partial. The second rule set should now further define the relative position into an equivalence class. This can be done by now taking the accents into consideration. After the second rule set is applied to the keywords there may be still equivalence classes left, for example the keyword differing only in the case of their letters (called homograph). A result of this step might be the keywords foo, FOO, Foo.

In the next run using another set of rules the amiguity can be eliminated and a total order can be obtained.

The ISO standard proposes at least four different runs to be sure to obtain a total order on Latin-based alphabets. This includes backward comparison, which is needed for implementing the French sorting rules.


Previous Next Contents