griddGrammar-based Reconstruction of Information-Dense Data tables | |
Download |
gridd Ranking & Summary
Advertisement
- License:
- MIT/X Consortium Lic...
- Price:
- FREE
- Publisher Name:
- Marco D. Adelfio
- Publisher web site:
- https://github.com/madelfio/
gridd Tags
gridd Description
gridd is a Python library for extracting schema information from data tables.Sample usageUse gridd to extract data from a table in XLS or HTML format and output it (as CSV by default).> gridd extract file.xlsCategory,Country,Residents,ApplicationsNorth America,United States,30700700,224912North America,Canada,33739900,5067North America,Mexico,112033369,230801Asia,Japan,127557958,295315Asia,China,1331380000,229096Asia,South Korea,48747000,127316You can choose your output format (JSON provides more schema info):> gridd extract -o json file.xlsOr ask for more verbose output:> gridd extract -v file.xlsSeveral extraction methods are built-in. By default, the parser method is used, but the bayes and webtables methods are available. Support for additional methods is planned.> gridd extract -m webtables file.xlsUse predefined external sets of values to improve extraction accuracy.> gridd extract --use-sets file.xlsTrain the gridd classifier using custom annotations.> gridd train -a annotations.txt file1.xls file2.xls file3.xls...Successfully trained using 3 files.Model parameters stored in training.jsonRun a web interface that shows both the raw data table and the extracted data table.> gridd web file.xls * Running on http://0.0.0.0:5000/Product's homepage
gridd Related Software