NoAhoNon-Overlapping Aho-Corasick Trie | |
Download |
NoAho Ranking & Summary
Advertisement
- License:
- MIT/X Consortium Lic...
- Price:
- FREE
- Publisher Name:
- Jeff Donner
- Publisher web site:
- https://github.com/JDonner/
NoAho Tags
NoAho Description
NoAho provides fast, non-overlapping simultaneous multiple keyword search.Features:- 'short' and 'long' (longest matching key) searches, both one-off and iteration over all non-overlapping keyword matches in some text.- Works with both unicode and str in Python 2, and unicode in Python 3 (it's all UCS4 under the hood).- Allows you to associate an arbitrary Python object payload with each keyword, and supports dict operations len(), [], and 'in' for the keywords (though no del or traversal).- Does the 'compilation' (generation of Aho-Corasick failure links) of the trie on-demand; you can mix adding keywords and searching text freely.- Can be used commercially, it's under the minimal, MIT license.Anti-Features:- Will not find overlapped keywords (eg given keywords "abcde" and 'defgh", will not find "defgh" in "abcdefgh"; would find both in "abcdedefgh"), unless you move along the string manually, one character at a time, which would defeat the purpose. The package 'Acora' is an alternative package for this use.- Lacking overlap, find_short is kind of useless.- Lacks key iteration and deletion from the mapping (dict) protocol- Memory leaking untested (should be ok but ...)- No /testcase/ for unicode in Python 2 (did manual test however)- Unicode chars represented as ucs4, and, each character has its own hashtable, so it's relatively memory-heavy.- Requires a C++ compiler.Bug reports and patches welcome of course!Product's homepage
NoAho Related Software