htfuzzy

ht://Dig Copyright © 1995-2004 The ht://Dig Group
Please see the file COPYING for license information.


Synopsis

htfuzzy [-c configfile][-v] algorithm ...

Description

Htfuzzy creates indexes for different "fuzzy" search algorithms. These indexes can then be used by the htsearch program.

Options

-c configfile
Use the specified configuration file instead of the default.
-v
Verbose mode. Used once will provide progress feedback, used more than once will overflow even the biggest buffers. :-)

Algorithms

Indexes for the following search algorithms can currently be created:
soundex
Creates a slightly modified soundex key database. A soundex key encodes letters as digits, with similar sounding letters (c, k, q) given the same digit. Vowels are not coded. Differences with the standard soundex algorithm are:
  • Keys are 6 digits.
  • The first letter is also encoded.
metaphone
Creates a metaphone key database. This algorithm is more specific to English, but will get fewer "weird" matches than the soundex algorithm.
accents
Creates an accents key database. This algorithm will map all accented letters to their unaccented counterparts, so that a search for the unaccented word will yield all variations of this word with accents.
endings
Creates two databases which can be used to match common word endings. The creation of these databases requires a list of affix rules and a dictionary which uses those affix rules. The format of the affix rules and dictionary files are the ones used by the ispell program. Included with the distribution are the affix rules for English and a fairly small English dictionary. Other languages can be supported by getting the appropriate affix rules and dictionaries. These are available for many languages; check the ispell distribution for more details.
synonyms
Creates a database of synonyms for words. It reads a text database of synonyms and creates a database that htsearch can then use. Each line of the text database consists of words where the first word will have the other words on that line as synonyms.

Files

CONFIG_DIR/htdig.conf
The default configuration file.
DATABASE_DIR/db.accents.db
(Output) Maps between characters with and without accents for accents fuzzy rule
DATABASE_DIR/db.metaphone.db
(Output) Database of similar-sounding words for metaphone fuzzy rule
DATABASE_DIR/db.soundex.db
(Output) Database of similar-sounding words for soundex fuzzy rule
COMMON_DIR/english.0, COMMON_DIR/english.aff
(Input) List of words and affix rules used to generate endings
COMMON_DIR/root2word.db, COMMON_DIR/word2rood.db
(Output) Database used for endings fuzzy rule
COMMON_DIR/synonyms
(Input) List of groups of words considered synonymous
COMMON_DIR/synonyms.db
(Output) Database used for synonyms fuzzy rule

See Also

htdig, htmerge, htsearch, Configuration file format, and ispell.

Last modified: $Date: 2004/06/14 08:49:46 $