Still Fighting Spam
The game of cat and mouse continues. Herewith a few notes on my most recent attempts to stay ahead of the bastards.
For a year or so, I've been using a combination of SpamAssassin and SpamBayes to fight spam. I also use Nikos K. Kantarakias’ set of procmail recipes, YAVR, to filter out viruses. It was all working, but not without some inconvenience. For one thing, it was pretty CPU-intensive, but also, downloading all the mail to my laptop so that I could throw 80% of it away seemed…suboptimal.
Edd Dumbill’s essay about DSPAM inspired me to try something different.
My cunning plan was to move all the spam processing to the server in my closet. That machine could run fetchmail every few minutes to collect mail from various places and filter it, then I could collect my mail from there and get only “clean” mail on my laptop.
Well, DSPAM recommends mysql version 4.1 and I initially installed 4.0 by mistake. When I got 4.1 installed, it complained bitterly. I don't recall the details, but Google suggested the upgrading to the 2.6 kernel was the answer, so I fiddled for another day or so. Still, you gotta love the fact that the download, configure, install, reboot process for 2.6.10 was painless and worked perfectly on the first try.
DSPAM is a purely statistical filter, so it has to be trained. Over the course of a few days, I got it mostly trained. What I noticed, however, was that my corporate email was not as well filtered as I imagined. I can't collect that mail on the box in my closet, so I was living with a fair amount of spam after all. That was disappointing.
Then, one night last week, the box in my closet ran out of swap space and dropped my training database all over the floor. The best laid plans of mice and men, as they say, are usually about equal.
So I moved DSPAM processing onto my laptop, which has significantly more memory and horsepower. My new setup is:
-
Download personal mail to the box in my closet every five minutes or so, run it through procmail using YAVR and throw the viruses away. That saves me several megabytes of mail a day.
-
Run that mail through another procmail recipe that discards all mail from
mailer-daemon
,postmaster
and other such places if it isn't to me. That's another couple of megabytes I don't have to see. -
Finally, download mail from the box in my closet, and sometimes from corporate servers, to my laptop where DSPAM runs. DSPAM is still learning, but it's already doing a pretty good job, and it's a lot less demanding on my CPU.
The hardest part was getting DSPAM
integrated with exim4, but
Odhiambo Washington on the
dspam-users
mailing list finally got me through it.
(I'm using exim4 because that's an MTA
supported by
DynDns'
MailhopSM
Outbound
service.)
The trick was to compile DSPAM with
exim4 as the default delivery agent, and
getting the dspam_router
configuration right:
dspam_router:
no_verify
check_local_user
# When to scan a message :
# - it isn't already flagged as spam from DSPAM
# - it isn't already scanned
condition = "${if and { \
{!def:h_X-FILTER-DSPAM:} \
{!eq {$received_protocol}{spam-scanned}} \
}\
{1}{0}}"
headers_add = "X-FILTER-DSPAM: by $primary_hostname on $tod_full"
driver = accept
transport = dspam_spamcheck
This comes right after the userforward
router
in the exim4 configuration. The routers
to handle spam and ham come next:
dspam_addspam_router:
driver = accept
local_part_prefix = spam-
transport = dspam_addspam
dspam_falsepositive_router:
driver = accept
local_part_prefix = ham-
transport = dspam_falsepositive
Finally, down in the transporters section, we have:
dspam_spamcheck:
driver = pipe
command = "/usr/local/bin/dspam --deliver=innocent --user ${lc:$local_part} -f '$sender_address' -bm %u"
user = Debian-exim
group = mail
log_output = true
return_fail_output = true
return_path_add = false
message_prefix =
message_suffix =
dspam_addspam:
driver = pipe
command = "/usr/local/bin/dspam --class=spam --source=error --user ${lc:$local_part} -f '$sender_address' -bm %u"
home_directory = "/tmp"
current_directory = "/tmp"
user = Debian-exim
group = mail
log_output = true
return_fail_output = true
return_path_add = false
message_prefix =
message_suffix =
dspam_falsepositive:
driver = pipe
command = "/usr/local/bin/dspam --class=innocent --source=error --deliver=innocent --user ${lc:$local_part} -f '$sender_address' -bm %u"
home_directory = "/tmp"
current_directory = "/tmp"
user = Debian-exim
group = mail
log_output = true
return_fail_output = true
return_path_add = false
message_prefix =
message_suffix =
Don't ask me why that works, I'm really not quite sure, I'm just happy that it does.
I think this configuration is a winner.
Comments
Nice solution. Although I still have not read DSPAM documentation, I'm curious on how do you tell DSPAM that he's wron in any way (either spam or false positive). By the rules I see on exim configuration, you have to resend the e-mail to ham- or spam- e-mails, right? (where "username" is your user name). Please, tell me if this is the case. Best regards, diego
Exactly. The dspam_addspam_router configuration intercepts any message to spam-xxx and routes it to dspam marking it as spam for user xxx. Since it's all delivered locally, it's fast and convenient. There are mechanisms for feeding dspam a corpus of spam or ham, but I haven't bothered.
What does the following section ?
-f '$sender_address' -bm
I found I had to remove it (at least from the first router section) else I was getting extra envelope to address (-f@domain.com, -bm@domain.com as well as duplicates for the user) Tim
I'm sorry, Timothy, I can't explain that. It was folks on the dspam mailing list, mostly Odhiambo, that got me through the configuration process. I don't really grok all the details.