<?xml version="1.0" encoding="UTF-8"?>
<essay xml:lang="en" version="pto" xmlns="http://docbook.org/ns/docbook" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:gal="http://norman.walsh.name/rdf/gallery#">
<info>
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
<title>Still Fighting Spam</title><biblioid class="uri">http://norman.walsh.name/2005/01/15/spam</biblioid>
<volumenum>8</volumenum>
<issuenum>7</issuenum>
<pubdate>2005-01-15T16:33:15-05:00</pubdate>
<date>$Date: 2005-09-11 10:27:02 -0400 (Sun, 11 Sep 2005) $</date>
<author>
      <personname>
<firstname>Norman</firstname>
	<surname>Walsh</surname>
</personname>
    </author>
<copyright>
      <year>2004</year>
      <holder>Norman Walsh</holder>
    </copyright>
<abstract>
<para>The game of cat and mouse continues. Herewith a few notes on
my most recent attempts to stay ahead of the bastards.</para>
</abstract>
<dc:subject rdf:resource="http://norman.walsh.name/knows/taxonomy#Email"/>
<dc:subject rdf:resource="http://norman.walsh.name/knows/taxonomy#Spam"/>
</info>

<para xml:id="p1">For a year or so, I've been using a combination of
<link xlink:href="http://spamassassin.apache.org/">SpamAssassin</link><indexterm>
<primary>SpamAssassin</primary>
    </indexterm> and
<link xlink:href="http://spambayes.sourceforge.net/">SpamBayes</link><indexterm>
<primary>SpamBayes</primary>
    </indexterm> to fight spam. I also use
<personname>
      <firstname>Nikos K.</firstname>
<surname>Kantarakias</surname>
    </personname>’ set of
procmail recipes,
<link xlink:href="http://agriroot.aua.gr/~nikant/nkvir/">YAVR</link>,
to filter out viruses.
It was all working,
but not without some inconvenience. For one thing, it was pretty
CPU-intensive, but also, downloading all the mail to my laptop so that
I could throw 80% of it away seemed…suboptimal.</para>

<para xml:id="p2"><personname>
<firstname>Edd</firstname>
      <surname>Dumbill</surname>
    </personname>’s
<link xlink:href="http://usefulinc.com/edd/blog/contents/2004/12/03-dspam/read">essay</link>
about
<link xlink:href="http://www.nuclearelephant.com/projects/dspam">DSPAM</link>
inspired me to try something different.</para>

<para xml:id="p3">My cunning plan was to move all the spam processing to the server
in my closet. That machine could run <command>fetchmail</command> every
few minutes to collect mail from various places and filter it, then I
could collect my mail from there and get only “clean” mail on my laptop.</para>

<para xml:id="p4">Well, <application>DSPAM</application> recommends
<application>mysql</application> version 4.1 and I initially installed 4.0
by mistake. When I got 4.1 installed, it complained bitterly. I don't
recall the details, but Google suggested the upgrading to the 2.6 kernel
was the answer, so I fiddled for another day or so. Still, you gotta love
the fact that the download, configure, install, reboot process for 2.6.10 was
painless and worked perfectly on the first try.</para>

<para xml:id="p5">DSPAM is a purely statistical filter, so it has to be trained.
Over the course of a few days, I got it mostly trained. What I noticed, however, 
was that my corporate email was not as well filtered as I imagined. I can't
collect that mail on the box in my closet, so I was living with a fair amount
of spam after all. That was disappointing.</para>

<para xml:id="p6">Then, one night last week, the box in my closet ran out of swap space
and dropped my training database all over the floor. The best laid plans
of mice and men, as they say, are usually about equal.</para>

<para xml:id="p7">So I moved <application>DSPAM</application> processing onto my
laptop, which has significantly more memory and horsepower. My new
setup is:</para>

<orderedlist>
<listitem>
<para xml:id="p8">Download personal mail to the box in my closet every five minutes
or so, run it through
<application>procmail</application> using <application>YAVR</application>
and throw the viruses away. That saves me several megabytes of mail
<emphasis>a day</emphasis>.</para>
</listitem>

<listitem>
<para xml:id="p9">Run that mail through another <application>procmail</application> recipe
that discards
all mail from <literal>mailer-daemon</literal>, <literal>postmaster</literal>
and other such places if it isn't <emphasis>to</emphasis> me. That's
another couple of megabytes I don't have to see.</para>
</listitem>

<listitem>
<para xml:id="p10">Finally, download mail from the box in my closet, and sometimes
from corporate servers, to my laptop where
<application>DSPAM</application> runs.
<application>DSPAM</application> is still learning, but it's already
doing a pretty good job, and it's a lot less demanding on my CPU.
</para>
</listitem>
</orderedlist>

<para xml:id="p11">The hardest part was getting <application>DSPAM</application>
integrated with <application>exim4</application>, but
<personname>
      <firstname>Odhiambo</firstname>
<surname>Washington</surname>
    </personname> on the
<literal>dspam-users</literal> mailing list finally got me through it.
(I'm using <application>exim4</application> because that's an MTA
supported by
<link xlink:href="http://www.dyndns.org/">DynDns</link>'
Mailhop<superscript>SM</superscript>
<link xlink:href="http://www.dyndns.org/services/mailhop/outbound/">Outbound</link>
service.)
</para>

<para xml:id="p12">The trick was to compile <application>DSPAM</application> with
<application>exim4</application> as the default delivery agent, and
getting the <literal>dspam_router</literal>
configuration right:</para>

<programlisting>dspam_router:
   no_verify
   check_local_user
   # When to scan a message :
   # - it isn't already flagged as spam from DSPAM
   # - it isn't already scanned
   condition   = "${if and { \
                           {!def:h_X-FILTER-DSPAM:} \
                           {!eq {$received_protocol}{spam-scanned}} \
                           }\
                           {1}{0}}"
   headers_add  = "X-FILTER-DSPAM: by $primary_hostname on $tod_full"
   driver       = accept
   transport    = dspam_spamcheck</programlisting>

<para xml:id="p13">This comes right after the <literal>userforward</literal> router
in the <application>exim4</application> configuration. The routers
to handle spam and ham come next:</para>

<programlisting>dspam_addspam_router:
  driver = accept
  local_part_prefix = spam-
  transport = dspam_addspam

dspam_falsepositive_router:
  driver = accept
  local_part_prefix = ham-
  transport = dspam_falsepositive</programlisting>

<para xml:id="p14">Finally, down in the transporters section, we have:</para>

<programlisting>dspam_spamcheck:
  driver = pipe
  command = "/usr/local/bin/dspam --deliver=innocent --user ${lc:$local_part} -f '$sender_address' -bm %u"
  user = Debian-exim
  group = mail
  log_output = true
  return_fail_output = true
  return_path_add = false
  message_prefix =
  message_suffix =

dspam_addspam:
  driver = pipe
  command = "/usr/local/bin/dspam --class=spam --source=error --user ${lc:$local_part} -f '$sender_address' -bm %u"
  home_directory = "/tmp"
  current_directory = "/tmp"
  user = Debian-exim
  group = mail
  log_output = true
  return_fail_output = true
  return_path_add = false
  message_prefix =
  message_suffix =

dspam_falsepositive:
  driver = pipe
  command = "/usr/local/bin/dspam --class=innocent --source=error --deliver=innocent --user ${lc:$local_part} -f '$sender_address' -bm %u"
  home_directory = "/tmp"
  current_directory = "/tmp"
  user = Debian-exim
  group = mail
  log_output = true
  return_fail_output = true
  return_path_add = false
  message_prefix =
  message_suffix =</programlisting>

<para xml:id="p15">Don't ask me why that works, I'm really not quite sure, I'm just
happy that it does.</para>

<para xml:id="p16">I think this configuration is a winner.</para>

</essay>

