Text::TermExtract

Extract terms from text
Download

Text::TermExtract Ranking & Summary

Advertisement

  • Rating:
  • License:
  • Perl Artistic License
  • Price:
  • FREE
  • Publisher Name:
  • Michael Schilli
  • Publisher web site:
  • http://search.cpan.org/~mschilli/

Text::TermExtract Tags


Text::TermExtract Description

Extract terms from text Text::TermExtract is a Perl module to extract terms from text.SYNOPSIS use Text::TermExtract; my $text = { Hey, hey, how's it going? Wanna go to Wendy's tonight? Wendy's has great sandwiches." }; my $ext = Text::TermExtract->new(); for my $word ( $ext->terms_extract( $text, { max => 3 }) ) { print "$word "; } # "sandwiches" # "tonight" # "wendy"Text::TermExtract takes a simple approach at extracting the most interesting terms from documents of arbitrary length.There's more scientific methods to term extraction, like Yahoo's online term extraction API (but you can't have it locally) and the Lingua::YaTeA module on CPAN (which is so poorly documented that I couldn't figure out how to use it).So I wrote Text::TermExtract, which first tries to guess the language a text is written in, kicks out the language- specific stopwords, weighs the rest with a hand-crafted formula and returns a list of (hopefully) interesting words.This is a very crude approach to term extraction, if you have a better method and want to include it in Text::TermExtract, drop me an email, I'm interested. Requirements: · Perl


Text::TermExtract Related Software