From: Neal Richter To: htdig-dev@lists.sourceforge.net Message-ID: Subject: [htdig-dev] latest libhtdig-3.2.0.so snapshot with PHP Wrappers Date: Thu, 21 Mar 2002 14:16:49 -0700 (MST) Hey all, There's a new libhtdig snapshot http://www.htdig.org/files/contrib/other/libhtdig-03212002.tgz There's a new directoty 'libhtdigphp' which will build a seperate PHP wrapper for libhtdig. Status: The indexing (htdig), merging (htmerge), and searching (htsearch) API calls are functional. The htfuzzy API is not yet working. First version of a functional PHP wrapper library/module for libhtdig. TODO: Make APIs of the other utils binaries. Implement 'indicators'. This indicator is returned by and xxxx_open() call and used by any follow up calls, and freed by calling xxxx_close(). This is similar to functionality in SQL libraries. This kind of feature is a first step in making libhtdig usable in a server or apache-module type of setting. Thanks! -- Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site -------------------------------------------------------------------------- From: Neal Richter To: htdig-dev@lists.sourceforge.net Message-ID: Subject: [htdig-dev] latest libhtdig-3.2.0.so snapshot Date: Fri, 8 Mar 2002 12:45:34 -0700 (MST) Hey all, Soon there will be a new snapshot of the libhtdig project in the http://www.htdig.org/files/contrib/other/ Status: The indexing (htdig), merging (htmerge), and searching (htsearch) API calls are functional. The htfuzzy API is not yet working. TODO: Make APIs of the other utils binaries. Implement 'indicators'. This indicator is returned by and xxxx_open() call and used by any follow up calls, and freed by calling xxxx_close(). This is similar to functionality in SQL libraries. This kind of feature is a first step in making libhtdig usable in a server or apache-module type of setting. I'm posting a follow up e-mail shortly about licensing questions. Thanks. -- Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site -------------------------------------------------------------------------- From: Neal Richter Subject: [htdig-dev] libhtdig.3.2.0.so Date: Mon, 28 Jan 2002 18:37:38 -0700 (MST) Hey, I sent a file called libhtdig-3.2.0.b4.tgz to Gilles and asked him to stick it in the 'contrib' directory. It's part of a larger project to restructure the code for these goals: 1. All htdig code will be contained in the library 2. This library will be able to index data & respond to querying APIs from PHP or any other cgi-bin or program. Unzip it in your base htdig directory. It contains these files: libhtdig/prepare.sh libhtdig/Makefile libhtdig/libhtdig_htdig.cc libhtdig/libhtdig_htmerge.cc libhtdig/libhtdig_api.h after doing a configure & make on the latest snapshot, you can run prepare.sh and make within the new 'libhtdig' directory. The prepare.sh copies the htdig/htdig code files to this directory. The libhtdig_htdig.cc is a callable replacement for the htdig executable. The libhtdig_htmerge.cc is a callable replacement for the htmerge executable. The makefile compiles these files using htdig's standard method, then links these files and all files necessary for building the 6 htdig .so files into ONE libhtdig.3.2.0.so. Here's an example of its use: htdig_parameters_struct htdig_params; htdig_params.debug = 1; htdig_params.initial = TRUE; htdig_params.create_text_database = FALSE; htdig_params.report_statistics = FALSE; htdig_params.alt_work_area = FALSE; sprintf(htdig_params.configFile, "/etc/htdig/htdig.conf"); strcpy(htdig_params.credentials,""); strcpy(htdig_params.max_hops, ""); //9 digit limit strcpy(htdig_params.minimalFile, ""); strcpy(htdig_params.URL, ""); //stdin HTTP addrs htdig_index_open(&htdig_params); htdig_index_urls(); htdig_index_close(); Here's the TODO: 1. Generalize the 'Retriever' class in htdig-exe It would be nicer to have a base class for the Retriever class, the current class would be inherited from the new base class. This would enable developers to create their own retrievers and parsers and be able to mix and match them. Currently the parser classes receive a Retriever object as a parameter and issue callback-style calls to the Retriever object. You could derive new classes from the current Retriever object, but you would carry around all kinds of junk that may be unneeded if you are indexing documents from other sources. see: htdig_index_open(&htdig_params); htdig_index_document(.....); htdig_index_close(); htdig_merge(&htmerge_params); //merges new index with existing index. 2. Include some code from htsearch & create PHP wrapper functions for the searching code. The current htsearch-php3.0.1.1 module written Torsten Neuer calls the htsearch cgi-bin and repackages the output. PHP is a very flexible web-language with strong string-manipulation ability (much like perl). It would be more elegant to have a set of PHP wrappers written in C that provide an interface back and forth to the core searching code. While I'm not suggesting that this replace the htsearch cgi-bin, alot of the query parsing code could be replaced with fewer lines of PHP. A developer could even write a different query language for the core searching code (think 'SearchSQL' or something like it). This will be especially powerful once the 'indexable fields' feature is incorporated. At the end of this process htdig can be integrated with other software in a variety of ways.. in some cases taking the place of a SQL database for storing, querying & optionally displaying documents. We will hopefully be using this as a document archiving tool that will eliminate lots of old/infrequently used documents from a SQL database. I'm going to try to keep this project as a patch from the current snapshot. Unzip it, run a script file (copies files, diffs code), make and you are done. Feedback is welcome! -- Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site