Subject: [htdig3-dev] [ANNOUNCE] ht://Dig 3.2.0b1
From: Geoff Hutchison (ghutchis@wso.williams.edu)
Date: Fri Feb 04 2000 - 22:05:21 PST
I'm very glad to announce the release of version 3.2.0b1. As the
version number denotes, this is a beta release. We're looking for
feedback on the 3.2 codebase, as far as documentation, performance,
features, suggestions, and of course bugs.
The documentation for the 3.2.0bX series can be found in
http://dev.htdig.org/htdig-3.2/
The release notes for 3.2.0b1 are at
<http://dev.htdig.org/htdig-3.2/RELEASE.html>
To download the source, see <http://www.htdig.org/files/htdig-3.2.0b1.tar.gz>
Feedback on the release should be primarily directed to htdig3-dev@htdig.org
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/
Release notes for htdig-3.2.0b14 Feb 2000
This marks the first beta version of the 3.2.0 codebase, over a year
in the works. Since it has not received as much testing as the 3.1.x
series, it is *not* recommended for production environments. A full
description of how to upgrade is provided at
<http://dev.htdig.org/htdig-3.2/upgrade.html>
NOTE: Read this document before upgrading. You have been warned.
* Fixed a bug in htdig where hopcounts could be calculated
incorrectly between multiple servers.
* Fixed a bug that could cause problems with 8-bit characters on
some systems.
* Fixed handling of unreachable servers. First, the new
[4]max_retries attribute allows htdig to attempt multiple
connections. Secondly, if the server is not available, htdig will
stop trying to connect.
* Fixed handling of SGML entities: htdig will still decode them to
store as single characters in the database, but htsearch now
encodes them back for compliant results.
* Rewrote the database formats, allowing room for more sophisticated
searches and compression of the word database using the new
attribute wordlist_compress. These changes include the removal
of the word_list file (db.wordlist) and the addition of the new
doc_excerpt database.
* Cleaned up many parts of the code, including the URL and HTML
parsers. Additionally, on platforms that support it, much of the
code will be built as shared libraries, which should help memory
utilization, especially under high load.
* Removed the modification_time_is_now attribute, which is now on by
default. This means the time at indexing is taken as the date of
the document if the server does not return a date.
* Added the new attribute use_doc_date to use the date specified
in a META date tag.
* Merged all heading_factor attributes into one new attribute,
heading_factor.
* As a result of the new database format, all _factor attributes
(like title_factor and keywords_factor are now dynamic--you
do not have to rebuild your database to change the scaling.
* Changed attributes bad_querystr, exclude_urls,
limit_urls_to, limit_normalized, http_proxy_exclude to
allow full regular expressions when the regex are surrounded by [
and ].
* Changed htsearch fields restrict and exclude to allow regular
expressions when the regex are surrounded by [ and ].
* Added phrase searching support to htsearch--queries enclosed in
quotes will be checked to ensure the words occur in that exact
order in the documents.
* Added the build_select_lists attribute to allow the config
file to specify <select> form elements in htsearch output as a
template variable, much like $(SORT) and $(METHOD).
* Added a regex fuzzy method. This will allow searches to include
regex that match words. The fuzzy method will return up to
regex_max_words matches.
* Added a speling [sic] fuzzy method. This attempts several simple
spelling mistakes (like transposed letters and extra letters) to
find matches. This adds the new attribute
minimum_speling_length to restrict whether small words should
be checked. Transposing letters in smaller words can give
unrelated correctly-spelled words.
* Added support for external transport methods, using the
external_protocols attribute, an analogue of the
external_parsers system.
* Added support for HTTP/1.1, including persistent connections. This
can be configured using the new attributes
persistent_connections, head_before_get, and
max_connection_requests.
* Added support for file:// URLs and support for using the
mime_types file to decide whether local files are parsable.
* Added two new formats for variables in htsearch templates,
$%(var), which escapes the variable for a URL, and $&(var), which
HTML-escapes the variable as necessary.
* Added support for reading the list of URLs to index with htdig
by supplying the command-line option -.
* Added a flag -m to htdig to index only the files given in the
filename.
* There are many more changes especially to the internal code
structure, so a huge thank you goes out to everyone who helped
make this release!
------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
htdig3-dev-unsubscribe@htdig.org
You will receive a message to confirm this.
This archive was generated by hypermail 2b28 : Fri Feb 04 2000 - 22:14:38 PST