August 22nd, 2005


Centralized Bayesian Spam Filter Management

I started using this scheme about a week ago, and it has been working like a charm!  It is useful when there are multiple computers you check your email from and you are sick of re-training the local Bayesian filter, like the one in Thunderbird, on each computer; this is why I developed this scheme in the first place. :)

Comment or catch me on IM if you are interested in trying this out.


How It Works

  1. procmail(1) runs SpamAssassin on all incoming mail to tag it as spam or ham.

  2. procmail(1) then delivers messages tagged as spam into the Junk IMAP folder (the default junk mail folder for Thunderbird); messages marked as ham continues through the filtering chain, and usually end up in the main inbox.

  3. I check my mail in both inbox and the Junk folder.

  4. If I see spams which were mistagged as ham and ended up in the inbox, I move the messages to the X-Spam IMAP folder.

  5. If I see hams which were mistagged as spam and ended up in the Junk folder, I move the messages to the X-Ham IMAP folder.

  6. The IMAP server polls X-Spam and X-Ham folders every five minutes, and teaches the SpamAssassin bayesian filter, using sa-learn(1), that all messages in X-Spam (that SpamAssassin mistagged as ham) are really spam, and vice versa.

  7. Once the server has taught SpamAssassin about mistagged spam/ham messages, it moves all messages in X-Ham into the inbox, and all messages in X-Spam to Junk.