Running ht://Dig

ht://Dig Copyright © 1995-2002 The ht://Dig Group
Please see the file COPYING for license information.

This document will attempt to show the steps needed to use the ht://Dig system, after obtaining, installing and configuring it.
The main sections are:

Building the databases
Testing and troubleshooting
Maintaining the system

Building the databases

After setting up all the configuration files, you can build the required databases simply by running rundig. This script will run htdig first to build the initial database, then it runs htmerge to create a document index and word database from the files that were created by htdig. It then runs htnotify, and finally runs htfuzzy if necessary, to build the endings and synonyms databases if they're missing or outdated. The rundig script can be customized for your specific needs, or you can develop your own script that runs any of these programs. Read the reference sections for each of these programs to get a better understanding of what each one does.

The htfuzzy program deserves a bit more explaining. It is used to build databases that are used by some of the fuzzy match algorithms selected by htsearch's search_algorithm attribute. The endings and synonyms algorithms use static dictionaries, so their databases only need to be rebuilt by htfuzzy when the dictionary files are changed, or when ht://Dig is initially installed. The rundig script handles the building of these two databases as needed for the default setup. A few of the other fuzzy match algorithms use databases that are derived from the word database built by htdig/htmerge, so if you use these algorithms you should rebuild their databases with htfuzzy every time you update your index. This isn't done in rundig, but the comments in the script show where you can add your htfuzzy commands as needed. Some fuzzy match algorithms don't need their own database, as they just operate on the word database, so they don't need any special setup.

Testing and troubleshooting

Once the databases are built, you should test out htsearch. It's recommended that you first try a few queries running htsearch on the command line, as it helps to separate problems that are specific to ht://Dig from web server or CGI problems. Once you have that working, try running htsearch from your web browser, using the search form you configured.

If you run into problems at any point in the building and testing of your databases, there are many things you can do. All ht://Dig programs feature a -v option to get some debugging output. The more of these options you put on the command line, the more output you'll usually get. To get help with common problems, or with interpreting some of the debugging output, please look to the ht://Dig FAQ (frequently asked questions) as your first line of support. Most of the problems that ht://Dig users have are explained there, and the on-line FAQ on the website is updated frequently as new problems arise. The FAQ will also tell you where you can turn if your question isn't answered there. Remember that questions may not be phrased exactly as you'd state them, so look carefully for anything that seems similar to the problem you're trying to solve.

Maintaining the system

Once everything is running, you have to deal with the question of how you can keep everything running and up to date. The databases don't automatically update themselves, of course, so you'll need to figure out how to schedule automatic updates of the database. Most users use the crontab facility on their systems to schedule daily or weekly updates of their database. This can be as simple as running "rundig" or "rundig -a" from your crontab, or from a file in /etc/cron.daily if your system uses this, to rebuild from scratch every night. For a small site, this may take only a few minutes to run. Other sites will run more elaborate update scripts, to update their existing databases nightly, and schedule complete rebuilds less frequently, such as monthly.

You need to pay close attention to how long updates take to run. There are no database lockouts in ht://Dig, so you don't want to schedule update or reindexing runs so frequently that they run into each other.

Last modified: $Date: 2002/02/01 05:35:22 $