CLaRK System

An XML-based software system for corpora development implemented in JAVA
Download

CLaRK System Ranking & Summary

Advertisement

  • Rating:
  • License:
  • Freeware
  • Publisher Name:
  • CLaRK Team
  • Operating Systems:
  • Windows All
  • File Size:
  • 3.8 MB

CLaRK System Tags


CLaRK System Description

The main aim behind the design of the system is the minimization of human intervention during the creation of language resources. It incorporates several technologies: 1. XML technology; 2. Unicode; 3. Regular Cascaded Grammars; 4. Constraints over XML Documents. For document management, storing and querying, we chose the XML technology because of its popularity and its ease of understanding. The core of CLaRK is an Unicode XML Editor, which is the main interface to the system. Besides the XML language itself, we implemented an XPath language for navigation in documents and an XSLT language for transformation of XML documents. For multilingual processing tasks, CLaRK is based on an Unicode encoding of the information inside the system. There is a mechanism for the creation of a hierarchy of tokenisers. They can be attached to the elements in the DTDs and in this way there are different tokenisers for different parts of the documents. The basic mechanism of CLaRK for linguistic processing of text corpora is the cascaded regular grammar processor. The main challenge to the grammars in question is how to apply them on XML encoding of the linguistic information. The system offers a solution using an XPath language for constructing the input word to the grammar and an XML encoding of the categories of the recognised words. Give CLaRK System a try to fully assess its capabilities!


CLaRK System Related Software