Squawk

SQL query tool and library for static files
Download

Squawk Ranking & Summary

Advertisement

  • Rating:
  • License:
  • BSD License
  • Price:
  • FREE
  • Publisher Name:
  • Samuel Stauffer
  • Publisher web site:
  • http://danga.com/gearman/

Squawk Tags


Squawk Description

SQL query tool and library for static files Squawk is a library and command line tool for running SQL queries against structured/semi-structured static files. (e.g. Apache logs, csv files, tcpdump output).GoalThe purpose is Squawk is to make querying for data in log files or other structured files easier. Everything that Squawk does can be done by combining various unix tools, but Squawk makes it ever easier to express more complex relationships. It is in no way a database or meant to be used as such. It's merely a reporting tool.Squawk can be used from the command line for ad-hoc queries, and it can also be used as a library as a part of a more in-depth reporting tool.StatusStill in major development. API is guaranteed to change.Supported SQL Features * Aggregates: count, min, max, avg, sum * GROUP BY * ORDER BY (single column) * LIMIT * OFFSET * WHERE * Column aliases * Subqueries in FROMDepartures from Standard SQL * Table list in FROM uses a space rather than a comma as a separator. This makes it easier on the command line to specify files. (e.g. FROM access.log* )Parsers * Common access file formats (nginx, apache) * CSVOutput Formats * Basic tabular for console (like most database command line tools) * JSON * CSVExamplesSQL query on the command line:$ squawk "SELECT COUNT(1) AS n, status FROM access.log GROUP BY status ORDER BY n DESC"n | status----------------------------------------381353 | 200180668 | 30217976 | 40412952 | 30110836 | 304735 | 403420 | 206376 | 416123 | 40046 | 5005 | 5023 | 4083 | 4051 | 504SQL based query through API:query = Query( "SELECT COUNT(1) AS n, remote_addr" " FROM file" " WHERE status = 200" " AND remote_addr != '-'" " GROUP BY remote_addr" " ORDER BY n DESC" " LIMIT 10")source = AccessLogParser("access.log")output_console(query(source))# orquery = Query( "SELECT COUNT(1) AS n, remote_addr" " FROM file" " WHERE status = 200" " AND remote_addr != '-'" " GROUP BY remote_addr" " ORDER BY n DESC" " LIMIT 10")source = AccessLogParser("access.log")for row in query(source): print rowCode generated query:source = AccessLogParser("access.log")filtered = Filter(source, lambda row:row == 200)group_by = GroupBy(filtered, group_by="remote_addr", select=)order_by = OrderBy(group_by, 'count(1)', True)limit = Limit(order_by, 10)for row in limit: print row Requirements: · Python · pyparsing


Squawk Related Software