
Roger wrote --
> I have followed your discussion and I'd like to bring another topic
> into discussion once more.
>
> As learned from several discussions and readings the sorting problem
> is still not very well solved. Specifying sorting rules can be an
> tedious task which is error-prone in many cases. Based on the
> evaluation of the ISO standard (see our Homepage) I have developed the
> following concept, which I have partially implemented this weekend.
>
1. This concept of attaching properties to letters (or rather defining
letters to be such objects) seems to me a good solution to the
following common (in many senses) requirement of sorting real
languages:
a multi-level sort in which levels are defined by such things as:
case; accents; other diacritics
One small point about the name used: `partial-order' implies a
different structure. These orders are total orders on different sets,
so I would call them sub-orders.
One practical point about the example: I think that the :accent
property and "sub-ordering" needs to include the value `none'.
One question: you put this into your define-total-order example:
(:accent backwards)
but you did not specify what values are allowed instead of backwards
(or what it means, but I think I can work that out:-).
2. It is not clear to me that this approach will directly support
other common requirements, such as the sub-orders required in sorting
German, so that if u-umlaut and ue have been merged at the top-level,
the order is defined for two words that are identical except that one
has u-umlaut whereas the other has ue .
This could be done with yet another property of the letter class, called??
:umlaut-oder-e
having values:
umlaut e none irrelevant
(the last being used for all letters that never take umlauts) but such
an approach begins to get messy.
3. You probably expect me to come up with a general solution...well I
guess the counter to that is some questions:
is there a special collection of merge-rules that come from
real-world multi-level sorting rules?
do these lead to a reasonable collection of letter-properties that
need to be added to support specification of these rules?
4.
> Still missing is a appropriate mapping that transforms a string (a
> sequence of chars) into a sequence of letters (which have become real
> objects now).
>
> This could look like:
>
> (define-mapping "umlaut-u" ("\~"u" "ü"))
> (define-mapping "umlaut-A" ("\~"A" "Ä"))
>
> [I hope you can see the ISO-Latin chars as well]
Well, I can see \374 in my emacs, will that do?:-)
But I do not understand the syntax you are using here.
5.
> What I was just discussing with Gabor is the problem of markup (once
> more). Often indexes contain commands such as "\index" (see for
> example the LaTeX Companion) for with different index entries must be
> specified for the command "\index" and the word "index" sorted as
>
This is a very ad hoc solution to what is probably an example of a
more general class of sorting rules, in which words (ie the things to
be indexed) have "types".
This one could be done by a merge-rule that "ignores the \" and a
sub-order that reinserts it (at the beginning or the end).
Or it could be done by a letter property ":backslash-before"
with values: yes no
I hope this helps, there are certainly still a lot of things to
discover and to discuss here, I suspect that the ISO document does not
cover all the sorting requirements of complex documents.
chris