gridd

Grammar-based Reconstruction of Information-Dense Data tables
Download

gridd Ranking & Summary

Advertisement

  • Rating:
  • License:
  • MIT/X Consortium Lic...
  • Price:
  • FREE
  • Publisher Name:
  • Marco D. Adelfio
  • Publisher web site:
  • https://github.com/madelfio/

gridd Tags


gridd Description

gridd is a Python library for extracting schema information from data tables.Sample usageUse gridd to extract data from a table in XLS or HTML format and output it (as CSV by default).> gridd extract file.xlsCategory,Country,Residents,ApplicationsNorth America,United States,30700700,224912North America,Canada,33739900,5067North America,Mexico,112033369,230801Asia,Japan,127557958,295315Asia,China,1331380000,229096Asia,South Korea,48747000,127316You can choose your output format (JSON provides more schema info):> gridd extract -o json file.xlsOr ask for more verbose output:> gridd extract -v file.xlsSeveral extraction methods are built-in. By default, the parser method is used, but the bayes and webtables methods are available. Support for additional methods is planned.> gridd extract -m webtables file.xlsUse predefined external sets of values to improve extraction accuracy.> gridd extract --use-sets file.xlsTrain the gridd classifier using custom annotations.> gridd train -a annotations.txt file1.xls file2.xls file3.xls...Successfully trained using 3 files.Model parameters stored in training.jsonRun a web interface that shows both the raw data table and the extracted data table.> gridd web file.xls * Running on http://0.0.0.0:5000/Product's homepage


gridd Related Software