[luau] SpamAssassin and Procmail question

Thu Sep 26 22:30:01 PDT 2002

On Thu, 2002-09-26 at 22:11, Ben Beeson wrote:
> Aloha,
> 
> 	I am pondering SpamAssassin for my box.  The volume I now get after becoming 
> a moderator on a mailing list is pretty disgusting.  I find it even more 
> depressing that most of the UCE does not have opt-outs.  Since I can't 
> opt-out, I need a better filter.  

Please be aware that you should NEVER opt-out!  In most cases it may
stop that particular source of spam, but many sell their opt-out lists
to other spammers because they are known and confirmed to be active
addresses.

This is also the reason why Linux mail clients do not display images in
incoming mail by default.  These images can easily have uniquely
identifiable codes in the URL that can tell the spammers "I'm an active
legit address.  Spam me more!"  

> 
> 	I read the fine docs and understand that I need procmail and a few Perl 
> modules etc to work with SpamAssassin.  What I am not sure of is whether or 
> not I need to use a tool like fetchmail to fetch my mail form the ISP before 
> can filter it.  Anybody familiar with this process?
> 

1) fetchmail downloads mail from your POP3 accounts
2) procmail does general filtering, you can use simple regular
expression matching to filter stuff into different mailboxes (like for
mailing lists).
3) procmail can forward all remaining messages after your mailing list
rules to spamassassin.
4) spamassassin uses intelligent analysis of the headers and body of the
message to calculate a spam "score".  If the score is above a certain
configurable threshold it can be filtered by procmail.  I set my
procmail to put score of 5.5 or higher into my SPAM folder, which I
review every once in a while just to make sure spamassassin didn't guess
wrong.  If you set your spam threshold too low, there is a chance that
it may incorrectly identify legit mail as spam so be careful.  Most
folks should probably set their threshold to perhaps 10 or 15, and add
Vipul's Razor to spamassassin's checks in order to increase spam
detection accuracy.

I've seen perhaps 99.99% effective spam filtering, with only 2 out of
2000 filtered messages being false positives (incorrectly identified as
being spam).

Nobody should use SpamAssassin without studying how it works and
carefully adjusting settings.  I plan on deploying it site-wide for
several organization e-mail servers this year.  I will use more liberal
settings that may only be 95% effective in spam filtering, but that
should reduce the chance of false positives to nothing.

http://www-106.ibm.com/developerworks/linux/library/l-spam/?t=gr%2clnxw03=StampSpam
Here's a very helpful IBM article about SpamAssassin, with links to more
helpful information.

http://razor.sourceforge.net/
You should also take a look at how Vipul's Razor works, it is another
very interesting spam filtering system that uses a distributed checksum
network.  Very advanced stuff, and it is fairly easy to add Razor
filtering to SpamAssassin's several checks.