The Lemur Toolkit

Free language modeler for Mac OS X
Download

The Lemur Toolkit Ranking & Summary

Advertisement

  • Rating:
  • License:
  • Freeware
  • Price:
  • FREE
  • Publisher Name:
  • The Lemur Team
  • Publisher web site:
  • http://www.lemurproject.org/
  • Operating Systems:
  • Mac OS X
  • File Size:
  • 63.2 MB

The Lemur Toolkit Tags


The Lemur Toolkit Description

Free language modeler for Mac OS X The Lemur Toolkit is designed to facilitate research in language modeling and information retrieval, including such technologies as ad hoc and distributed retrieval, summarization, cross-language IR, filtering, and classification. What's New in This Release: · 4.9 corrects various issues in the 4.8 distribution package, provides a new · FileClassEnvironment for WARC file input, various indexing speed · optimizations for indri; and more. · Applications compiled with the Lemur Toolkit require the following · libraries: z, iberty, pthread, and m on linux, and additionally socket · and nsl on solaris. Applications built in Visual Studio require the · additional library wsock32.lib. The java jar files were built with · Java 5 (jdk 1.5.0). The java UIs require Java 5. We have tested using GCC · 3.2 (solaris), 3.2.2(linux), 3.4(linux), 3.4.3(linux x86_64), 4.0.2(linux), · 4.3.1 (OS/X), VC++ .NET 7.1(Windows XP), and Visual Studio 2005 (Windows · XP). Enhancements: · The LayoutManager constraints have been modified to improve the resize · behavior of the components. · The Query Log Toolbar and server support the automatic uploading of log · files on a scheduled basis. This preference can be set by the user of the · toolbar to completely automatic, automatic with confirmation required · before upload, or manual upload only. · A new FileClassEnvironment, warc, has been added to indri. This environment · enables indexing of the ClueWeb09 corpus, · http://boston.lti.cs.cmu.eduData/clueweb09/ · Indri indexing speed optimizations, providing 5-15% speedup for GOV2 (25 million documents) sized collections: · 1) reduce the number of memory allocations/deallocations in · IndexWriter::_lookupTermID. · 2) Use the full tables option, -Cf, of flex. · 3) Don't do ASCII case normalization in · UTF8CaseNormalizationTransformation, as it is redundant. · 4) Use hash_set from the STL instead of string_set for stopwords. · 5) If the deleted count is 0 in DeletedDocumentList, don't acquire the read · lock before returning false. · 6) Use trim, rather than merge, in the RepositoryMaintenanceThread, to · reduce the number times temporary indexes are copied. Stop collecting trim · candidates when an index twice the size of the preceding index is · encountered. · 7) Take the size of the DiskIndexes into account when estimating memory · usage. · 8) Limit the total amount of memory used to cache document lengths to 20MB · (5,000,000 documents).


The Lemur Toolkit Related Software