Still Fighting Spam

Volume 8, Issue 7; 15 Jan 2005; last modified 08 Oct 2010

The game of cat and mouse continues. Herewith a few notes on my most recent attempts to stay ahead of the bastards.

For a year or so, I've been using a combination of SpamAssassin and SpamBayes to fight spam. I also use Nikos K. Kantarakias’ set of procmail recipes, YAVR, to filter out viruses. It was all working, but not without some inconvenience. For one thing, it was pretty CPU-intensive, but also, downloading all the mail to my laptop so that I could throw 80% of it away seemed…suboptimal.

Edd Dumbill’s essay about DSPAM inspired me to try something different.

My cunning plan was to move all the spam processing to the server in my closet. That machine could run fetchmail every few minutes to collect mail from various places and filter it, then I could collect my mail from there and get only “clean” mail on my laptop.

Well, DSPAM recommends mysql version 4.1 and I initially installed 4.0 by mistake. When I got 4.1 installed, it complained bitterly. I don't recall the details, but Google suggested the upgrading to the 2.6 kernel was the answer, so I fiddled for another day or so. Still, you gotta love the fact that the download, configure, install, reboot process for 2.6.10 was painless and worked perfectly on the first try.

DSPAM is a purely statistical filter, so it has to be trained. Over the course of a few days, I got it mostly trained. What I noticed, however, was that my corporate email was not as well filtered as I imagined. I can't collect that mail on the box in my closet, so I was living with a fair amount of spam after all. That was disappointing.

Then, one night last week, the box in my closet ran out of swap space and dropped my training database all over the floor. The best laid plans of mice and men, as they say, are usually about equal.

So I moved DSPAM processing onto my laptop, which has significantly more memory and horsepower. My new setup is:

Download personal mail to the box in my closet every five minutes or so, run it through procmail using YAVR and throw the viruses away. That saves me several megabytes of mail a day.
Run that mail through another procmail recipe that discards all mail from mailer-daemon, postmaster and other such places if it isn't to me. That's another couple of megabytes I don't have to see.
Finally, download mail from the box in my closet, and sometimes from corporate servers, to my laptop where DSPAM runs. DSPAM is still learning, but it's already doing a pretty good job, and it's a lot less demanding on my CPU.

The hardest part was getting DSPAM integrated with exim4, but Odhiambo Washington on the dspam-users mailing list finally got me through it. (I'm using exim4 because that's an MTA supported by DynDns' Mailhop^SM Outbound service.)

The trick was to compile DSPAM with exim4 as the default delivery agent, and getting the dspam_router configuration right:

dspam_router:
   no_verify
   check_local_user
   # When to scan a message :
   # - it isn't already flagged as spam from DSPAM
   # - it isn't already scanned
   condition   = "${if and { \
                           {!def:h_X-FILTER-DSPAM:} \
                           {!eq {$received_protocol}{spam-scanned}} \
                           }\
                           {1}{0}}"
   headers_add  = "X-FILTER-DSPAM: by $primary_hostname on $tod_full"
   driver       = accept
   transport    = dspam_spamcheck

This comes right after the userforward router in the exim4 configuration. The routers to handle spam and ham come next:

dspam_addspam_router:
  driver = accept
  local_part_prefix = spam-
  transport = dspam_addspam

dspam_falsepositive_router:
  driver = accept
  local_part_prefix = ham-
  transport = dspam_falsepositive

Finally, down in the transporters section, we have:

dspam_spamcheck:
  driver = pipe
  command = "/usr/local/bin/dspam --deliver=innocent --user ${lc:$local_part} -f '$sender_address' -bm %u"
  user = Debian-exim
  group = mail
  log_output = true
  return_fail_output = true
  return_path_add = false
  message_prefix =
  message_suffix =

dspam_addspam:
  driver = pipe
  command = "/usr/local/bin/dspam --class=spam --source=error --user ${lc:$local_part} -f '$sender_address' -bm %u"
  home_directory = "/tmp"
  current_directory = "/tmp"
  user = Debian-exim
  group = mail
  log_output = true
  return_fail_output = true
  return_path_add = false
  message_prefix =
  message_suffix =

dspam_falsepositive:
  driver = pipe
  command = "/usr/local/bin/dspam --class=innocent --source=error --deliver=innocent --user ${lc:$local_part} -f '$sender_address' -bm %u"
  home_directory = "/tmp"
  current_directory = "/tmp"
  user = Debian-exim
  group = mail
  log_output = true
  return_fail_output = true
  return_path_add = false
  message_prefix =
  message_suffix =

Don't ask me why that works, I'm really not quite sure, I'm just happy that it does.

I think this configuration is a winner.

Comments

Nice solution. Although I still have not read DSPAM documentation, I'm curious on how do you tell DSPAM that he's wron in any way (either spam or false positive). By the rules I see on exim configuration, you have to resend the e-mail to ham- or spam- e-mails, right? (where "username" is your user name). Please, tell me if this is the case. Best regards, diego

Exactly. The dspam_addspam_router configuration intercepts any message to spam-xxx and routes it to dspam marking it as spam for user xxx. Since it's all delivered locally, it's fast and convenient. There are mechanisms for feeding dspam a corpus of spam or ham, but I haven't bothered.

What does the following section ?
-f '$sender_address' -bm
I found I had to remove it (at least from the first router section) else I was getting extra envelope to address (-f@domain.com, -bm@domain.com as well as duplicates for the user) Tim

I'm sorry, Timothy, I can't explain that. It was folks on the dspam mailing list, mostly Odhiambo, that got me through the configuration process. I don't really grok all the details.