Terrier

Terrier - Highly flexible, efficient, and robust search engine, readily deployable on large-scale collections of documents
Download

Terrier Ranking & Summary

Advertisement

  • Rating:
  • License:
  • Freeware
  • Price:
  • FREE
  • Publisher Name:
  • University of Glasgow
  • Publisher web site:
  • http://ir.dcs.gla.ac.uk/terrier/index.html
  • Operating Systems:
  • Mac OS X
  • File Size:
  • 5.8 MB

Terrier Tags


Terrier Description

Terrier - Highly flexible, efficient, and robust search engine, readily deployable on large-scale collections of documents Terrier is a highly flexible, efficient, effective, and robust search engine, readily deployable on large-scale collections of documents. Terrier implements state-of-the-art indexing and retrieval functionalities. Terrier provides an ideal platform for the rapid development of large-scale retrieval applications.The open source version of Terrier provides a flexible, comprehensive, transparent, and robust platform for research and experimentation in text retrieval.The research put into Terrier constantly expands towards new branches of the wider information retrieval field, making Terrier an ideal, strong, modular, and state-of-the-art platform for developing, assessing, and evaluating new concepts and ideas.Terrier is written in Java, and was used for Web and Enterprise search, Desktop, Intranet and Vertical search engines, as well as developing and evaluating novel large-scale text information retrieval techniques and applications.Terrier is being developed in the Department of Computing Science, at the University of Glasgow.Here are some key features of "Terrier":General:· Indexing support for common desktop file formats, and for commonly used TREC research collections (eg TREC CDs 1-5, WT2G, WT10G, GOV, GOV2, Blogs06).· Many document weighting models, such as many parameter-free Divergence from Randomness weighting models, Okapi BM25 and language modelling.· Conventional query language supported, including phrases, and terms occurring in tags.· Handling full-text indexing of large-scale document collections, in a centralised architecture to at least 25 million documents.· Modular and open indexing and querying APIs, to allow easy extension for your own applications and research.· Active Information Retrieval research fed into the Open Source platform.· Open Source (Mozilla Public Licence).· Written in cross-platform Java - works on Windows, Mac OS X, Linux and Unix.· Large user-base over 3 years of public release.Indexing:· Out-of-the box indexing of tagged document collections, such as the TREC test collections.· Out-of-the box indexing for documents of various formats, such as HTML, PDF, or Microsoft Word, Excel and PowerPoint files.· Indexing of field information, such as TITLE, H1, HTML tags information· Indexing of position information on a word, or a block (e.g. a window of terms within a distance) level.· Support for various encodings of documents (UTF), to facilitate multi-lingual retrieval.· Highly compressed index disk data structures.· Highly compressed direct file for efficient query expansion.· Alternative faster single-pass indexing.· Various stemming techniques supported, including the Snowball stemmer for European languages.Retrieval:· Provides standard querying facilities, as well as Query Expansion (pseudo-relevance feedback)· Can be applied in interactive applications, such as the included Desktop Search, or in a batch setting for research & experimentation.· Provides many standard document weighting models, including upto 126 Divergence From Randomness (DFR) document ranking models, and other models such as Okapi BM25, language modelling and TF-IDF. The new DFRee DFR weighting model is also included, which provides robust performance on a range of test collections without the need for any paramter tuning or training.· Advanced query language that supports boolean operators, +/- operators, phrase and proximity search, and fields.· Provides a number of parameter-free DFR term weighting models for automatic query expansion, in addition to Rocchio's query expansion.· Flexible processing of terms through a pipeline of components, such as stop-words removers and stemmers.Experimentation:· Handles all currently available TREC test collections - see TREC Experimentation Examples for examples and known settings.· Easily scriptable to evaluate many parameter settings, or many weighting models in batch form.· In-built evaluation tools for use with TREC ad-hoc and known-item search retrieval results, to produce various Precision and Recall measures.NOTE: Terrier is released under the Mozilla Public License.


Terrier Related Software