Plotting Spam

Volume 6, Issue 80; 05 Sep 2003

Spam, spam, and yet more spam. [Update: Plotting a new threat.]

The price one pays for pursuing any profession or calling is an intimate knowledge of its ugly side.

James Baldwin

No, not a nefarious Ponzi scheme to net $5000 a week for only 20 minutes a day. Rather a graph of the quantity of spam that crosses my desk.

[Update: 01 Feb 2004] The SoBig threat is but a fading memory while MyDoom remains a pain in the inbox. Back in November, I tweaked my mail collection scripts; now everything is goes through procmail and is uniformly instrumented. Here’s a new plot (from 07 Apr 2004):

Quantity of Spam By Day
Quantity of Spam By Day

The spam spike isn’t as dramatic as the incoming mail spike when MyDoom first hits. That’s because most of it slipped through my filters at first. Not anymore. Not as much, anyway.

SoBig History

The rest of this essay is what I wrote back in September when SoBig hit.

Before SoBig, I simply downloaded all of my mail and let SpamAssassin and SpamBayes sort out the good from the bad. But even at broadband speeds, downloading every copy of SoBig was painful.

Figuring it would go away in a day or two, I attacked the problem first by writing pop3-del, a script to delete POP mail. I would download my mail with the threshold set to 80k, scan the headers for all the messages left behind, and then delete them by running pop3-del by hand.

After a week, that became tiresome and I tweaked my mail downloading script. If your message trips this filter, it's toast. I never see it, I have no record of it, it didn't happen:

# Handle SoBig virus and its cousins separately...
if ($msg_size[$msg_num] > 95*1024) {
    if (($name =~ /docbook-reject/
         || $name =~ /Mail Delivery/
         || $name =~ /MAILER-DAEMON/
         || $name =~ /postmaster\@/)
        || ($name =~ /^\s*$/
            && ($subj =~ /^(Re: )*Details$/
                || $subj =~ /^(Re: )*Wicked screensaver$/
                || $subj =~ /^(Re: )*That movie$/
                || $subj =~ /^(Re: )*Your application$/
                || $subj =~ /^(Re: )*Your details$/
                || $subj =~ /^(Re: )*Approved$/
                || $subj =~ /^(Re: )*My details$/
                || $subj =~ /^(Re: )*Thank you\!$/))) {
        push (@delete, $msg_num) if !$keep;
        $del_count++;
        $spam = "\#";
        next;
    }
}

The “docbook-reject” line is in there because I moderate the DocBook mailing lists and so I see all the SoBig spam that goes there too.

A few days later, when I was still getting thousands of SoBig messages a day, I instrumented the scripts to keep track of the quantity of spam that passes through them.

Quantity of Spam for a Month
Quantity of Spam for a Month

I'll update the graph periodically.

Over the weekend, I added a line for the total quantity of mail I receive as well, just for a comparison. In case you're interested, on Saturday 06 Sep 2003, only 3.35% of the mail that I received was “ham”. That's 6,121 pieces of spam caught by my spam filters.

These statistics apply only to my nwalsh.com addresses. The corporate firewall does a pretty good job on the stuff that comes to me through Sun.COM and everything else (accounts at my cable ISP, Yahoo and Hotmail accounts, etc.) is 100% spam.