VoDoo/Stream

Allows users to define transducers dedicated to document analysis
Download

VoDoo/Stream Ranking & Summary

Advertisement

  • Rating:
  • License:
  • GPL
  • Price:
  • FREE
  • Publisher Name:
  • Didier Plaindoux
  • Publisher web site:
  • http://d.plaindoux.free.fr/vodoo-stream/overview.html
  • Operating Systems:
  • Mac OS X
  • File Size:
  • 448 KB

VoDoo/Stream Tags


VoDoo/Stream Description

Allows users to define transducers dedicated to document analysis Such transducers describe how fragments are matched and transformed. Finally a document can be an XML fragment, a free text or something else depending on extensionsVoDoo/Stream project is based on three concepts:· First one inspired by event-based programming style like SAX or generic lexer in Objective-Caml provides a stream based for data denotation.· Second one provides expressive and classical automata in order to match and recognize patterns when analyzing streams.· The last one was a hight level structuration of automata done in order to provide expressive mechanism for data transformation.Finally a XSLT like language is defined in order to express data transformations.Stream representationStream was a simple formalism based on opening and closing a level, labels and text. Using this simple grammar we provide a simple tree (XML for example) stream denotation (XML was given by a dedicate SAX handler). Current supported formats are XML and free text. More formalisms can be supported and done using stream extension facility. A stream interpreation was provided for Document Object Model. Then a stream can manipulate either a pure text, an ad-hoc stream and a DOM based data.In comparison the STAX approach was a low level XML matching integration based on token stream representation of XML fragments. The Stream representation used with classical switch/case conditional structure is similar to STAX approach but such integration is two low level and do not provide an expressive layer for XML management and was in fact at the same level than SAX.Automata for Stream recognitionAutomata provides a hight level for pattern recognition and variable binding. It produces DAG with specific attributes for variable denotations. Such automata is able to find or also to match a given stream. An automata was built using a given stream containing extended formalism including pattern like repetition, any kind of label or text and choice. Such stream was analysed in order to given a direct acyclic graph used for the automata generation (classical approach).Transducer for Stream transformationTransducers are in fact ordered set of rules. A rule has a selection part and a body. A selection can deal with pathes (tree visitor) and current entity. A first entity was the tree node and selection can be done filtering its name or attributes. A second entity was the string which can be filtered using usual pattern matching. A body was a piece of java code which is able to continue parsing or not (recursive descent).Transducer Stream Processor language: XSPFinally a transducer language - called XSP - expressed in XML is defined. This language has a bootstrap definition in XML (only for XML and text transformation for the moment). Such XSP definition was extended in order to provide rules supporting code written in languages providing a BSF handler (jRuby, Javascript, Jython, Beanshell, etc). Here are some key features of "VoDoo/Stream": · Namespace support has been designed. Then matching can be done using batch element name and/or the corresponding namespace. · Review of the transformation process without compatibility problem with transducers written with previous versions. This change increase the expressivity and the stream management possibilities. · Then it is possible to dispatch analyses like any LL parser catching element with content filter and sibling content. · Location added in order to easily track error when parsing XML file or any kind of document. Now each document as a location maintained during transducing operations and can be used to link locations. · XSP extension to XML synthesis and manipulation providing an XML to XML transformation paradigm. · JEM rewritten using the last improvements done for the parsing and extension for embedded XML term.


VoDoo/Stream Related Software