Terrier

A probabilistic Java toolkit for building search engines.
Download

Terrier Ranking & Summary

Advertisement

  • Rating:
  • License:
  • MPL
  • Price:
  • FREE
  • Publisher Name:
  • University of Glasgow
  • Publisher web site:
  • http://ir.dcs.gla.ac.uk/terrier/

Terrier Tags


Terrier Description

A probabilistic Java toolkit for building search engines. Terrier project is a probabilistic Java toolkit for building search engines.Terrier is software for the rapid development of Web, intranet, and desktop search engines. More generally, it is a modular platform for building large-scale information retrieval applications, providing indexing and probabilistic retrieval functionalities. It comes with a desktop search application.Terrier has various cutting-edge features including parameter-free probabilistic retrieval approaches (such as Divergence from Randomness models), automatic query expansion/re-formulation methodologies, and efficient data compression techniques. Terrier comes with a powerful proof-of-concept Desktop search application , and full TREC capabilities including the ability to index, query and evaluate the standard TREC collections, such as AP, WSJ, WT10G, .GOV and .GOV2. Terrier is written in Java and has been successfully used for adhoc retrieval, Web search and cross-language retrieval, in a centralised or distributed setting. Currently, it is also being used for running various applications. Here are some key features of "Terrier": · Open Source (Mozilla Public Licence) · Written in cross-platform Java · Highly compressed disk data structures. · Handling large-scale document collections. · Direct file for efficient query expansion. · Modular and open indexing and querying APIs. · Testbed for indexing and retrieval from standard TREC test collections. · Interactive querying application. · Desktop search application for searching various types of documents. · Input/output of gamma, unary and binary encoded integers for compressing streams or random access files. · Standard evaluation of TREC ad-hoc and known-item search retrieval results. · Indexing of tagged document collections, as well as documents of various formats, such as HTML, PDF, or Microsoft Word, Excel and Powerpoint files. · Indexing of field information. · Indexing of position information on a word, or a block level. · Support for classic retrieval models, such as tf-idf, BM25 and Ponte-Croft language model, and Rocchio's query expansion. · Provides a number of Divergence From Randomness (DFR) document ranking models. · Provides a number of parameter-free DFR term weighting models for automatic query expansion. · Advanced query language that supports AND/NOT operators, phrase and proximity search. · Flexible processing of terms through a pipeline of components, such as stop-words removers and stemmers. What's New in This Release: · This is a substantial update, which includes new support for Hadoop, primarily a Hadoop Map Reduce indexing system, allowing large collections of documents to be indexed in a highly distributed fashion. · Also included are various minor improvements, including improved support for the IIT CDIP1 (TREC Legal track) collection, and various bug fixes. · This is intended to be the ultimate release in the 2.x series.


Terrier Related Software