Swish-e

Open source project that will help you index files and web pages
Download

Swish-e Ranking & Summary

Advertisement

  • Rating:
  • License:
  • GPL
  • Price:
  • FREE
  • Publisher Name:
  • The Swish-e Project
  • Publisher web site:
  • http://swish-e.org/
  • Operating Systems:
  • Mac OS X
  • File Size:
  • 1.4 MB

Swish-e Tags


Swish-e Description

Open source project that will help you index files and web pages Swish-e is a fast, flexible, and free open source system for indexing collections of Web pages or other files. Swish-e is ideally suited for collections of a million documents or smaller. Using the GNOME libxml2 parser and a collection of filters, Swish-e can index plain text, Microsoft Word/PowerPoint/Excel, e-mail, PDF, HTML, XML, and just about any file that can be converted to XML or HTML text. Swish-e is also often used to supplement databases like the MySQL DBMS for very fast full-text searching. Here are some key features of "Swish-e": · Quickly index a large number of documents in different formats including text, HTML, and XML. · Use "filters" to index other types of files such as PDF, gzip, or PostScript. · Includes a web spider for indexing remote documents over HTTP. Follows Robots Exclusion Rules (including META tags). · Can use an external program to supply documents to Swish-e, such as an advanced spider for your web server or a program to read and format records from a relational database. · Document "properties" (some subset of the source document, usually defined as a META or XML elements) may be stored in the index and returned with search results. · Document summaries can be returned with each search. · Word stemming, soundex, metaphone, and double-metaphone indexing for "fuzzy" searching · Phrase searching and wildcard searching · Limit searches to HTML links. · Use powerful Regular Expressions to select documents for indexing or exclusion. · Easily limit searches to parts or all of your web site. · Results can be sorted by relevance or by any number of properties in ascending or descending order. · Limit searches to parts of documents such as certain HTML tags (META, TITLE, comments, etc.) or to XML elements. · Can report structural errors in your XML and HTML documents. · Index file is portable between platforms. · A Swish-e library is provided to allow embedding Swish-e into your applications for very fast searching. A Perl module is available that provides a standard API for accessing Swish-e. · Includes example search script with context summaries and search term and phrase highlighting. Can be used with popular Perl templating systems. · Swish-e is fast. · It's Open Source and FREE! You can customize Swish-e and you can contribute your fancy new features to the project. · Supported by on-line user and developer groups. What's New in This Release: · Fixed 'deflate' handling in spider.pl · Re-indexing required · Fixed stemmer bug introduced in 2.4.4 · Now fork/exec to run filters · Fixed signed/unsigned warnings from gcc 4.x · Makefile.mingw included in distrib


Swish-e Related Software