Txr

A Pattern Matching Utility for Convenient Text Extraction
Download

Txr Ranking & Summary

Advertisement

  • Rating:
  • License:
  • BSD License
  • Price:
  • FREE
  • Publisher Name:
  • Kaz Kylheku
  • Publisher web site:
  • http://common-lisp.net/project/meta-cvs/

Txr Tags


Txr Description

A Pattern Matching Utility for Convenient Text Extraction Txr is an interpreter for the txr query language. A txr query matches text and extracts pieces by binding them to variables that are embedded in the query. Txr can output the raw bindings gathered from the data, or substitute them into a template-driven report.Great, but we already have sed, awk, perl ...Though these tools support pattern matching in the form of regular expressions, they do not implement a whole-input pattern matching paradigm like txr.All but the simplest text extraction tasks are difficult with sed, which is basically a regexp filtering program. When the data format spans multiple lines which correlate together, sed starts to show its weakness. Awk and perl are programming languages. They can be used to perform complex text extraction, but it's expressed as an algorithm.A pattern is some form which resembles that which it matches. A perl or awk program isn't a pattern; it bears no resemblance to the data which is being processed; it describes the detailed steps of the process more than the data. For many such processes, a clearer, more succinct Txr query can be written to do the same thing. An analogy may be drawn to other pattern languages such as grammars. A BNF grammar describes a language in a way that, say, the C++ source code of a recursive descent parser does not.To develop a txr query, the user typically starts with sample data. The raw data itself is already likely a txr query which matches itself, after care is taken to escape some characters which have a special meaning to txr. All that is left is to identify the parts that need to be variables, and to summarize the variations so that the query generalizes to all instances of the data.In short, a truly practical extraction and report language has arrived, and its name is Txr.Talk is cheap; how about an example?Fine. Instead of "Hello, world", how about something more advanced? One tool that I dislike in Unix and Linux is the ps utility for listing processes. I've been using Unix since 1989 and Linux since 1993, and I'm not dumb; yet, whenever I need ps to do something slightly out of the ordinary, I have to resort to the man page, and then I still can't get it to do what I want half the time.With Txr, we can easily make a quick and dirty ps utility (which relies on the /proc filesystem on Linux). Here is what the query looks like. This might be saved in a file called ps.txr:@(next)$/proc@(collect)@{process /+/}@ (next)/proc/@process/statusName:@ @nameState:@ @state (@state_desc)@(skip)Tgid:@ @tgidPid:@ @proc_idPPid:@ @parent_id@(bind pid proc_id)@(bind ppid parent_id)@(skip)Uid:@ @uid@ @/.*/Gid:@ @gid@ @/.*/@ (next)$/proc/@process/task@ (collect)@thr@ (end)@ (bind thread thr)@ (some)@ (next)/etc/passwd@ (skip)@user:@pw:@uid:@/.*/@ (or)@ (bind user uid)@ (end)@(end)@(output)USER PID PPID S NAME THREADS@ (repeat)@{user 8} @{proc_id -5} @{parent_id -5} @state @{name 16} @(rep)@thr, @(first)@(last)@thr@(single)~@(end)@ (end)@(end)Now, we can run the query like this:shell$ txr ps.txrWe get output which looks like this:USER PID PPID S NAME THREADSroot 1 0 S init ~root 2 1 S ksoftirqd/0 ~root 3 1 S events/0 ~root 4 3 S khelper ~root 5 3 S kacpid ~root 16 3 S kblockd/0 ~root 29 3 S aio/0 ~root 17 1 S khubd ~root 2954 2953 S bash ~root 16134 1887 S sshd ~kaz 16136 16134 S sshd ~kaz 16137 16136 S bash ~kaz 3628 2175 S slrn ~root 3721 1963 S crond ~root 3722 3721 S run-parts ~root 3723 3722 S 00-logwatch ~root 3724 3722 S awk ~root 3940 3723 S mail ~root 4049 3723 S zz-disk_space ~root 4051 4049 S df ~root 4052 4049 S grep ~kaz 4266 1 S ssh-agent ~kaz 4331 16137 S vim ~kaz 4426 31908 R txr ~The Txr query works by processing the numeric entries under the /proc directory, reading the /proc/< pid >/status file of each process, and the list of threads under /proc/< pid >/tasks. The user ID's are resolved by matching through the /etc/passwd file. What's New in This Release: · There is a new freeform directive for unstructured matching across multiple lines. · Variables can be bound to regexes and used for matching.


Txr Related Software