ClearTK

A toolkit for developing statistical natural language processing components in Java
Download

ClearTK Ranking & Summary

Advertisement

  • Rating:
  • License:
  • BSD
  • Price:
  • FREE
  • Publisher Name:
  • ClearTK Team
  • Publisher web site:
  • http://code.google.com/p/cleartk/
  • Operating Systems:
  • Mac OS X
  • File Size:
  • 435 KB

ClearTK Tags


ClearTK Description

A toolkit for developing statistical natural language processing components in Java The ClearTK toolkit is based on the Apache UIMA framework for text analysis.ClearTK is a project developed at the Center for Computational Language and Education Research (CLEAR) at the University of Colorado at Boulder. In a nutshell, ClearTK provides a framework for developing statistical natural language processing (NLP) components in Java and it provides two libraries: ClearTK-framework and ClearTK-toolkit which are briefly summarized below.ClearTK Framework:The ClearTK framework provides infrastructure for developing UIMA analysis engines that use statistical learning as a foundation for decision making and annotation creation. The ClearTK framework provides the following: · A rich feature extraction library · A common interface and wrappers for popular machine learning libraries based on models such as maximum entropy, support vector machines and conditional random fields. It currently supports LIBSVM, OpenNLP MaxEnt, Mallet classifiers, Mallet Conditional Random Fields, SVMlight. The developers approach allows one use a best-of-breed approach by allowing one to swap out one machine learning library for another in such a way that the code that implements the core logic of the analysis engine does not have to be changed. · The app provides a type system agnostic approach. The ClearTK framework does not depend on or provide any specific type system. The code provided by the framework is intended to be used as a basis for creating new analysis engines in your environment such that you can create components that are specific to your needs and type system. · The framework can be downloaded from the downloads page, checked out from the subversion repository as an eclipse project, or added as a maven dependency if you use maven to build your project (see below).ClearTK Toolkit:The ClearTK toolkit provides UIMA components and/or infrastructure for addressing specific tasks. The toolkit provides the following: · Collection readers for commonly used corpora (e.g. CoNLL, ACE, PennTreebank, GENIA, TimeML) · Infrastructure for creating NLP components for specific tasks such as part-of-speech tagging, BIO-style chunking, named entity recognition, syntactic parsering, semantic role labeling, temporal resolution, etc. · wrappers for common NLP components such as the Snowball stemmer and OpenNLP components. · The ClearTK toolkit does provide a type system and many of the components (and unit tests) depend on this type system. However, we have worked hard to make much of the code in the toolkit type-system agnostic by parameterizing components by types or by making components extensible via generic typing. · The toolkit is currently only available as an eclispe project which can be checked out from the subversion repository Requirements: · Java


ClearTK Related Software