Book logo xindy

A Flexible Indexing System


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

\indexindy and sorting rules




	Hello out there,

I have followed your discussion and I'd like to bring another topic
into discussion once more.

As learned from several discussions and readings the sorting problem
is still not very well solved. Specifying sorting rules can be an
tedious task which is error-prone in many cases. Based on the
evaluation of the ISO standard (see our Homepage) I have developed the
following concept, which I have partially implemented this weekend.

1. Letters are entities owning properties that can be used for sorting
   purposes. A letter can be defined with the following declaration

	(define-letter "umlaut-u with circumflex"
		(:case lower)
		(:accent circ)
		(:letter "u"))

   This defines the letter "umlaut-u with circumflex" to have the
   properties as defined above. Another example is

	(define-letter "umlaut-U with trema"
		(:case upper)
		(:accent trema)
		(:letter "u"))

2. Sorting is done on a sequence of partial orderings that should
   result in a total order. Partial orders can be defined with
   definitions such as

	(define-partial-order :letter
		("a" "b" "c" ... "u" "v" ...))

	(define-partial-order :case
		(upper lower))

	(define-partial-order :accent
		(trema acute circ tilde hat))

   The names of the partial orders directly refer to the property
   names above.

3. A total order can be specified with the declaration

	(define-total-order
		(:letter)
		(:accent backwards)
		(:case))

   This sorts the a word (a sequence of letters) first according to
   the weights as given by the partial order :letter, then according
   to the weights from :accents (this is the French sorting order) and
   finally according to the :case.

As long as we have a sorting model that is based on this scheme we are
finished.

Still missing is a appropriate mapping that transforms a string (a
sequence of chars) into a sequence of letters (which have become real
objects now).

This could look like:

	(define-mapping "umlaut-u" ("\~"u" "ü"))
	(define-mapping "umlaut-A" ("\~"A" "Ä"))

[I hope you can see the ISO-Latin chars as well]

What I was just discussing with Gabor is the problem of markup (once
more). Often indexes contain commands such as "\index" (see for
example the LaTeX Companion) for with different index entries must be
specified for the command "\index" and the word "index" sorted as

 a) <i markup=cmd><n markup=cmd><d markup=cmd><e markup=cmd><x markup=cmd>

versus

 b) <i><n><d><e><x>

Here the <...> notation indicates a letter-object with additional
properties.

A partial order

	(define-partial-order :markup
		(cmd other))

can then be used to solve the remaining ambiguities. The question
remains how to define the mapping

    "\index" -> a)

    "index"  ->  b)

Two schemes seem to be possible:

1. A mapping is based on string or regexp-transformations (such as the
   current sort-rules) but extended with mapping rules.

   Informally we could say that "\index" must be written as
   "\cmd{index}" and there is a mapping rule that says

	(define-mapping "\cmd{(.*)}" "\1"
		:with-property (:markup cmd))

   indicating that the replacement text "\1" will be further mapped
   onto letters that have the additional property (:markup cmd).

   This needs a flexible and dynamically configurable parser (not too
   hard to implement).


2. We try to tackle the problem the other way around. This concerns
   the discussion about \indexindy command. Something like

	\indexindy[markup=texttt,...]{foo}

   instead of

	\indexindy[...]{\texttt{foo}}

   could solve the problem.

   Markup is not embedded in the plain keyword. A scanner is not
   necessary anymore. Markup can be done in the markup-backend with
   something like

	(markup-keyword :markup "texttt" :open "\texttt{" :close "}")

   This would effectively yield the same results. It suffers from the
   fact that not more than one markup can be associated with a
   keyword, which seems be the case rarely.


Any comments are really welcome on this topic. Please participate
which solution you prefer most.

If there are open questions, ask me. Maybe I'm too deep into this
stuff that my explanations are not unterstandable :)

Thanks for your patience.


Bye

--
======================================================================
Roger Kehr			   kehr@iti.informatik.th-darmstadt.de
Computer Science Department          Technical University of Darmstadt