Book logo xindy

A Flexible Indexing System


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Discussion about international sorting order



Roger wrote --

> I have followed your discussion and I'd like to bring another topic
> into discussion once more.
> 
> As learned from several discussions and readings the sorting problem
> is still not very well solved. Specifying sorting rules can be an
> tedious task which is error-prone in many cases. Based on the
> evaluation of the ISO standard (see our Homepage) I have developed the
> following concept, which I have partially implemented this weekend.
> 

1.  This concept of attaching properties to letters (or rather defining
letters to be such objects) seems to me a good solution to the
following common (in many senses) requirement of sorting real
languages:

  a multi-level sort in which levels are defined by such things as:

    case; accents; other diacritics

One small point about the name used: `partial-order' implies a
different structure.  These orders are total orders on different sets,
so I would call them sub-orders.

One practical point about the example:  I think that the :accent
property and "sub-ordering" needs to include the value `none'.

One question:  you put this into your define-total-order example:

   (:accent backwards)

but you did not specify what values are allowed instead of backwards
(or what it means, but I think I can work that out:-).

2.  It is not clear to me that this approach will directly support
other common requirements, such as the sub-orders required in sorting
German, so that if  u-umlaut  and  ue  have been merged at the top-level,
the order is defined for two words that are identical except that one
has  u-umlaut  whereas the other has  ue .

This could be done with yet another property of the letter class, called??

  :umlaut-oder-e

having values:

  umlaut  e  none  irrelevant

(the last being used for all letters that never take umlauts) but such
an approach begins to get messy.


3.  You probably expect me to come up with a general solution...well I
guess the counter to that is some questions:

  is there a special collection of merge-rules that come from
  real-world multi-level sorting rules?

  do these lead to a reasonable collection of letter-properties that
  need to be added to support specification of these rules?


4.
> Still missing is a appropriate mapping that transforms a string (a
> sequence of chars) into a sequence of letters (which have become real
> objects now).
> 
> This could look like:
> 
> 	(define-mapping "umlaut-u" ("\~"u" "ü"))
> 	(define-mapping "umlaut-A" ("\~"A" "Ä"))
> 
> [I hope you can see the ISO-Latin chars as well]
Well, I can see \374 in my emacs, will that do?:-)

But I do not understand the syntax you are using here.


5.
> What I was just discussing with Gabor is the problem of markup (once
> more). Often indexes contain commands such as "\index" (see for
> example the LaTeX Companion) for with different index entries must be
> specified for the command "\index" and the word "index" sorted as
>

This is a very ad hoc solution to what is probably an example of a
more general class of sorting rules, in which words (ie the things to
be indexed) have "types".

This one could be done by a merge-rule that "ignores the \" and a
sub-order that reinserts it (at the beginning or the end).

Or it could be done by a letter property ":backslash-before"  
with values:  yes  no


I hope this helps, there are certainly still a lot of things to
discover and to discuss here, I suspect that the ISO document does not
cover all the sorting requirements of complex documents.


chris