SpamProbe

SpamProbe - Fast, intelligent, automatic spam detector
Download

SpamProbe Ranking & Summary

Advertisement

  • Rating:
  • License:
  • Freeware
  • Price:
  • FREE
  • Publisher Name:
  • Brian Burton
  • Publisher web site:
  • Operating Systems:
  • Mac OS X
  • File Size:
  • 256 KB

SpamProbe Tags


SpamProbe Description

SpamProbe - Fast, intelligent, automatic spam detector SpamProbe is a fast, intelligent, automatic spam detector using Paul Graham style Bayesian analysis of word counts in spam and non-spam emails. Filtering adapts to personal tastes automatically. No manual rule creation is required. SpamProbe is intended for use with procmail and maild.SpamProbe operates on a different basis entirely. Instead of using pattern matching and a set of human generated rules SpamProbe relies on a Bayesian analysis of the frequency of words used in spam and non-spam emails received by an individual person. The process is completely automatic and tailors itself to the kinds of emails that each person receives.SpamProbe is known to compile and run on a wide range of *nix systems including Mac OS X, Darwin, Linux (RedHat and Debian), FreeBSD, Solaris, and AIX. SpamProbe can also be compiled to run on Windows under the Cygwin environment.NOTE: SpamProbe is released under the Qt Public License (QPL).Here are some key features of "SpamProbe":· Spam detection using Bayesian analysis of terms contained in each email. Words used often in spams but not in good email tend to indicate that a message is spam. Generally over 90% effective at detecting spam once a few hundred spams have been classified. My personal database is over 99% effective.· Automatically learns from incoming mails as they are classified. Incorporates user's feedback to tailor classification to each user's personal tastes.· Works with procmail, maildrop, or a similar tool to produce a complete server or client side spam filtering system.· Written in C++ for good performance. Database access using Peter Graf's PBL ISAM library or Berkeley DB for quick startup and fast term count retrieval. Also supports a fast, fixed size hash file format for maximum speed or when a fixed size database is essential.· Recognition and decoding of MIME attachments in quoted-printable and base64 encoding. Automatically skips non-text attachments. MIME decoding enables SpamProbe to make decisions based on words in the emails rather than base64 gobbledigook.· Analyzes image attachments to derive useful information from them. This feature allows SpamProbe to detect spams that contain an image and little to no text content.· Counts two word phrases as well as single words for higher precision. Can easily be configured to use longer phrases if desired.· Ignores HTML tags in emails for scoring purposes unless the -h command line option is used. Many spams use HTML and few humans do so HTML tends to become a powerful recognizer of spams. However in the author's opinion this also substantially increases the likelihood of false positives if someone does send a non-spam emai containing HTML tags. SpamProbe does pull urls from inside of html tags however since those tend to be spammer specific.· Locks mboxes and databases using fcntl file locking to avoid problems when multiple emails arrive simultaneously.· Scores only the Received, Subject, To, From, and Cc headers. All other headers are ignored to make it hard for spammers to hide non-spammy words in X- headers to fool the filter. The -H command line option can be used to override this.· Natively supports mbox, MBX, and Maildir mail box formats.· Supports Content-Length: field in mbox headers. This can be disabled using -Y option to use only From_ to recognize new messages.· Uses MD5 hash of emails to recognize reclassification of an already classified spam to avoid distortion of the word counts if emails are reclassified. This way emails can be kept in a mailbox that is repeatedly scanned by spamprobe without counting them more than once.· Provides a date stamp based database cleanup command to remove terms from the database if their counts never rise above a certain threshold value (normally 2).· Provides an edit-term command allowing users to directly modify the counts of individual terms. For example to force a particular term to be considered spammy or good.


SpamProbe Related Software