NoAho

Non-Overlapping Aho-Corasick Trie
Download

NoAho Ranking & Summary

Advertisement

  • Rating:
  • License:
  • MIT/X Consortium Lic...
  • Price:
  • FREE
  • Publisher Name:
  • Jeff Donner
  • Publisher web site:
  • https://github.com/JDonner/

NoAho Tags


NoAho Description

NoAho provides fast, non-overlapping simultaneous multiple keyword search.Features:- 'short' and 'long' (longest matching key) searches, both one-off and iteration over all non-overlapping keyword matches in some text.- Works with both unicode and str in Python 2, and unicode in Python 3 (it's all UCS4 under the hood).- Allows you to associate an arbitrary Python object payload with each keyword, and supports dict operations len(), [], and 'in' for the keywords (though no del or traversal).- Does the 'compilation' (generation of Aho-Corasick failure links) of the trie on-demand; you can mix adding keywords and searching text freely.- Can be used commercially, it's under the minimal, MIT license.Anti-Features:- Will not find overlapped keywords (eg given keywords "abcde" and 'defgh", will not find "defgh" in "abcdefgh"; would find both in "abcdedefgh"), unless you move along the string manually, one character at a time, which would defeat the purpose. The package 'Acora' is an alternative package for this use.- Lacking overlap, find_short is kind of useless.- Lacks key iteration and deletion from the mapping (dict) protocol- Memory leaking untested (should be ok but ...)- No /testcase/ for unicode in Python 2 (did manual test however)- Unicode chars represented as ucs4, and, each character has its own hashtable, so it's relatively memory-heavy.- Requires a C++ compiler.Bug reports and patches welcome of course!Product's homepage


NoAho Related Software