text-sentence

A text tokenizer and sentence splitter tool
Download

text-sentence Ranking & Summary

Advertisement

  • Rating:
  • License:
  • BSD License
  • Price:
  • FREE
  • Publisher Name:
  • Robert Lujo
  • Publisher web site:
  • http://bitbucket.org/trebor74hr/

text-sentence Tags


text-sentence Description

A text tokenizer and sentence splitter tool The text-sentence is a text tokenizer and sentence splitter library.Input is for main function is text, list of known names and abbreviations. Result is list of tokens. Each token has type and other attributes i.e.: * is word, * is number, * is roman number, * is sentence end, * is abbreviation, * is name, * is end of chapter * etc.Determining end of sentence needs special logic and care what is the main reason for naming package with "text-sentence".FEATURESSystem is based on unicode strings.Check Getting started.INSTALLATIONInstallation instructions - if you have installed pip package http://pypi.python.org/pypi/pip:pip install text-sentenceIf not, then do it old-fashioned way: * download zip from http://pypi.python.org/pypi/text-sentence/ * unzip * open shell * go to distribution directory * python setup.py installDevelopment version you can see at http://bitbucket.org/trebor74hr/text-sentence.or Mercurial clone with:hg clone https://bitbucket.org/trebor74hr/text-sentenceGETTING STARTEDTODO:Usage example - start python shell:>>> from text_sentence import ...FurtherSince there is currently no good documentation, the best source of further information is by reading tests inside of module and tests test_sentence. More information in Running tests. You can allways read a source.DOCUMENTATIONCurrently there is no documentation. In progress ...SUPPORTSince this project is limited by my free time, support is limited.REPORT BUG OR REQUEST FEATUREIf you encounter bug, the best is to report it to the bitbucket web page http://bitbucket.org/trebor74hr/text-sentence.The best way to contact me is by mail (find in LICENCE).TODO list is in readme.txt (dev version).CONTRIBUTIONSince this project is not currently in the stable API phase, contribution should wait for a while.RUNNING TESTSAll tests are doctests (not unittests). There are two type of tests in the package: 1. doctests in module i.e. in __init__.py 2. doctests in test_sentence.txtRunning module directly will run 1. and 2.To run tests: * goto text_sentence directory * run tests by running module, e.g.: > python __init__.py __main__: running doctests test_sentence.txt: running doctests * other with: > python -m"text_sentence" Requirements: · Python What's New in This Release: · is_contraction token attribute - e.g. isn't or o?'


text-sentence Related Software