Contributed by Steve Eidemiller on 10/10/2003 (steve.eidemiller@childrenshc.org) This package includes exe and dll binaries from the following sources: htdig http://www.htdig.org Cygwin http://www.cygwin.com catdoc http://packages.debian.org/stable/text/catdoc.html xpdf http://www.foolabs.com/xpdf/download.html This package contains Windows binaries built from the htdig 3.1.6 distro using Cygwin 1.3.12-1 and gcc 2.95.3-5 (cygwin special). These binaries have been tested on Windows 2000 Professional SP4, Windows 2000 Server SP4, and Windows XP Professional SP1. I mostly followed Jim Kerslake's "Idiot's Guide to installing ht://dig on Win32", with a few modifications: prefix=c:/htdig CGIBIN_DIR=c:/Inetpub/wwwroot/cgi-bin IMAGE_DIR=c:/Inetpub/wwwroot/htdig/images IMAGE_URL_PREFIX=/htdig/images SEARCH_DIR=c:/Inetpub/wwwroot/htdig This date tag patch: ======================================= FROM: Gilles Detillieux DATE: 02/07/2002 13:40:30 SUBJECT: [htdig] PATCH - fix meta date tag parsing in 3.1.6 This patch fixes a problem introduced in 3.1.6's handling of use_doc_date, which wasn't in the 3.1.5 patches for this feature. The new date parsing code in 3.1.6 didn't allow a '-' character after the year in the content attribute of meta date tags, but only allowed white space, which is obviously not in accordance with the ISO 8601 date format standard. Apply this patch in your main htdig-3.1.6 source directory using the command: patch -p0 < this-message-file --- htdig/Retriever.cc.orig Thu Jan 31 17:47:17 2002 +++ htdig/Retriever.cc Thu Feb 7 14:47:27 2002 @@ -1139,7 +1139,7 @@ parsedcdate(char *date) year += 1900; else if (year >= 19100) // seen some programs do it, why not check? year -= (19100-2000); - while (isspace(*s)) + while (*s == '-' || isspace(*s)) s++; // get month... -- Gilles R. Detillieux E-mail: <> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 ======================================= And, the following changes to the retry code in htlib\Connection.cc: ======================================= To increase the number of retries for any given page, and to extend the timeout between retries, the following four lines were changed in "htlib\Connection.cc". Lines changed are marked by "SteveE" and indicate the old value. Make sure BOTH sets of values are changed!! Extra lines from the .cc code are shown so you can get your bearings. Connection::Connection() { sock = -1; connected = 0; peer = 0; server_name = 0; all_connections.Add(this); timeout_value = 0; retry_value = 6; //Old value = 1 -- SteveE 09/25/2003 wait_time = 10; // wait 5 seconds after a failed connection //Old value = 5 -- SteveE 09/25/2003 } Connection::Connection(int socket) { sock = socket; connected = 0; GETPEERNAME_LENGTH_T length = sizeof(server); if (getpeername(socket, (struct sockaddr *)&server, &length) < 0) { perror("getpeername"); } peer = 0; server_name = 0; all_connections.Add(this); timeout_value = 0; retry_value = 6; //Old value = 1 -- SteveE 09/25/2003 wait_time = 10; //Old value = 5 -- SteveE 09/25/2003 } ======================================= I also modified conv_doc.pl and added two .bat files that launch catdoc.exe and the pdf converters. Please reference c:\htdig\contrib\htdig.conf from this distribution to see my settings for external_parsers. TO INSTALL THIS DISTRIBUTION: ============================= 1. Unzip the entire contents to C:\htdig (exe files should end up in C:\htdig\bin) 2. Copy C:\htdig\Inetpub\wwwroot\cgi-bin files to your virtual host's cgi-bin folder and set execute permissions appropriately. 3. Copy C:\htdig\Inetpub\wwwroot\htdig files to /htdig on your virtual host 4. Edit C:\htdig\conf\htdig.conf appropriately. I have included my .conf file at C:\htdig\contrib\htdig.conf for reference (edited to remove confidential information). Notable things to edit are the start_url, limit_urls_to, exclude_urls, and maintainer. 5. Run C:\htdig\bin\htdig.bat to create your databases in C:\htdig\db. I use htdig.bat instead of rundig because it generates nice htdig.log and htdig_error.log files. My conf file is setup to make use of the external parsers (included) and generate conv_errors.log to log conversion errors as needed. You can also copy C:\htdig\contrib\BrokenLinks.asp to an IIS folder and browse it to see a broken link report from the htdig.log file generated by my htdig.bat. 6. Edit the htdig.exe and htmerge.exe parameters in htdig.bat to fit your needs. 7. You may wish to setup Task Scheduler (or equivalent) to run htdig.bat on a routine basis. Many thanx to all the wonderful contributors on the htdig and related projects !!