The Lemur Project

Language modeling and information retrieval application
Download

The Lemur Project Ranking & Summary

Advertisement

  • Rating:
  • License:
  • Freeware
  • Price:
  • FREE
  • Publisher Name:
  • The Lemur Team
  • Publisher web site:
  • http://www.lemurproject.org/
  • Operating Systems:
  • Mac OS X
  • File Size:
  • 63.6 MB

The Lemur Project Tags


The Lemur Project Description

Language modeling and information retrieval application The Lemur Toolkit is a free and open source application designed to facilitate research in language modeling and information retrieval. The Lemur Toolkit includes technologies such as ad hoc and distributed retrieval, cross-language IR, summarization, filtering, and classification. Here are some key features of "The Lemur Project": · Sophisticated structured query languages (using InQuery and Indri) · Support for XML and structured document retrieval · Used commonly with a wide range of research test collections (e.g., TREC CDs 1-5, wt10g, RCV1, gov, gov2) · Index your web pages with an "out-of-the-box" site search capability · Interactive interfaces for Windows, Linux, and Web · Distributed information retrieval and document clustering applications · Cross-platform, fast and modular code written in C++ · C++, Java and C# APIs · Free and open-source software · In use for over 6 years by a large and growing user community Indexing: · Multiple indexing methods for small, medium and large-scale (terabyte) collections · Built-in support for English, Chinese and Arabic text · Porter and Krovetz word stemming · Incremental indexing · Out-of-the-box indexing support for TREC Text, TREC Web, plain text, HTML, XML, PDF, MBox, Microsoft Word, and Microsoft PowerPoint · Indexes inline and offset text annotations (e.g., part-of-speech and named entities) · Indexes document attributes Retrieval: · Supports major language modeling approaches such as Indri and KL-divergence, as well as vector space, tf.idf, Okapi and InQuery · Relevance- and pseudo-relevance feedback · Wildcard term expansion (using Indri) · Passage and XML element retrieval · Cross-lingual retrieval · Smoothing via Dirichlet priors and Markov chains · Supports arbitrary document priors (e.g., Page Rank, URL depth) What's New in This Release: · 2799440 TermInfo returned by IndriTermInfoList has no positions · 2794361 harvestlinks fails to create harvest directories · 2788507 KrovetzStemmerTransformation can overflow a buffer · 2788504 AnchorTextAnnotator can overflow a buffer · 2787935 pagerank dumps core if links path is bad · 2784994 Wrong article · 2783665 TextTokenizer prematurely terminates quoted tag attributes · 2782954 indri::parse::HTMLParser::handleTag can overflow a buffer · 2772914 irevalGUI.jar gives crazy results · 2772846 bin/ireval.jar is an invalid .JAR file (Lemur v48) · 2770916 documentLength buffer corruption with multiple threads · 2747981 WARCDocumentIterator misses documents in warc file · 2747707 TextTokenizer does not recognize some quoted tag attributes


The Lemur Project Related Software