Gilles Detillieux (grdetil@scrc.umanitoba.ca)
Fri, 12 Mar 1999 10:26:13 -0600 (CST)
According to Hans-Peter Nilsson:
> I plan to add a new attribute: extra_word_characters.
> It is the opposite (or something) to valid_punctuation, it marks a
> (possibly) non-alphanumeric as a valid word-character.
It's like valid_punctuation, in that it's taken as part of the word, but
unlike valid_punctuation in that it's not stripped out before the word
is put in the database, if I understand you correctly.
> This way (and no other I know of), I can make "_" characters part of
> words, and searchable as such.
>
> A (hopefully) positive side-effect is that people having problems making
> their systems understand their locale (i.e. it is broken in that it
> handles everything as the "C" locale) can state characters here that the
> locale would normally handle.
>
> Examples:
> extra_word_characters: _
> extra_word_characters: "åäöÅÄÖ"
>
> (If you didn't get the last one, don't worry.)
> Specifying characters handled by the locale as isalpha would be a no-op.
>
> Comments welcome.
Sounds like a good idea to me. I'm planning a round of changes to
HTML.cc next week, especially dealing with space handling, but also with
word handling, so it would be a good idea if we try to avoid stepping
on each others toes. If you get your changes in by Monday or Tuesday,
then I can follow with mine. I want to get my changes in to 3.1.2,
which will eventually get merged into 3.2. My concern is if I change
the same part of the code in 3.1.2 that you change in 3.2, the cvs merge
may not put it all together right.
-- Gilles R. Detillieux E-mail: <grdetil@scrc.umanitoba.ca> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 ------------------------------------ To unsubscribe from the htdig3-dev mailing list, send a message to htdig3-dev@htdig.org containing the single word "unsubscribe" in the SUBJECT of the message.
This archive was generated by hypermail 2.0b3 on Fri Mar 12 1999 - 08:47:48 PST