Randal is a coauthor of Programming Perl, Learning Perl, Learning Perl for Win32 Systems, and Effective Perl Programming, as well as a founding board member of the Perl Mongers (perl.org). Randal can be reached at [email protected].
Spam: What a mess. The wonderful Mail::SpamAssassin can catch most of it, looking at various spammy things in headers and bodies of messages, and even check external realtime block lists (RBLs) to help determine if a particular piece of e-mail is indeed a reasonable e-mail or simply someone getting a free ride to promote their own commercial activity. This month, I'd like to look at a recent change I made at the stonehenge.com mail server to help deal with the rapidly increasing amount of spam on the Internet.
Mail for the stonehenge.com domain is handled by blue .stonehenge.com, a server at a rack location provided by Sprocket Data (http://sprocketdata.com/). We're running OpenBSD and Postfix because I like to sleep at night and not worry about the "hack of the week."
Prior to the recent change, I had most of the mail for the stonehenge.com domain (with a few notable exceptions) be delivered to my personal merlyn account's mail, rewriting the destination to use the plus extended address format provided by Postfix. I accomplished this with an /etc/postfix/virtual_regexp entry that looked something like:
/^stonehenge\.com$/ whatever # other stonehenge.com rewrites are here # final catch-all: /^(.*)@stonehenge.com$/ merlyn+for-stonehenge+$1
I also included this line into /etc/postfix/main.cf:
virtual_maps = regexp:/etc/postfix/virtual_regexp
so that the virtual map was properly specified. As each e-mail came in, procmail would launch and consult my .procmailrc file, which looked something like:
LOGFILE=$HOME/.procmail.log LOGABSTRACT=yes ## :0c ## $HOME/JustInCase/ :0w: | Sortmail >>SORTMAIL.LOG 2>&1 LOG="... Sortmail failed, bouncing ... " EXITCODE=75 :0 /dev/null
The key here is that each mail would be piped to my Sortmail program, but if anything went wrong, Postfix would retain the message in its own queue for a subsequent delivery. This saved my bacon more than once when I was editing Sortmail and forgot to check syntax before writing it out.
Within my Sortmail program, I extracted the very first "Delivered-To" header, and then undid the transformation applied by the virtual rewrite. This got me back to the original stonehenge.com address that had been requested.
I then constructed a Mail::Internet and Mail::Audit object from the incoming message:
my $mi = Mail::Internet->new(\*STDIN); my $ma = Mail::Audit->new(data => [$mi->as_string =~ /(.*\n?)/g], noexit=> 1, log => '-', loglevel => 2);
I did this because, although I started with Mail::Audit, I later found out it lacked some of the header-access functions that I needed, so I had to punt and use Mail::Internet instead. Eventually, I hope to eliminate Mail::Audit entirely, as I've found it to be too funky and ugly for my needs.
After sorting through the delivery address to determine how the message would be delivered or autoresponded, I started adding checks using Mail::SpamAssassin to try not to autorespond to spam or deliver spam into my significant inboxes. Anything addressed to me personally that was spammish got dropped into my "ube" folder (unsolicited bulk e-mail). Anything that was not addressed to me got dropped into my "ubetrap" file (as in a "spamtrap" address, but I always pronounced this rhyming with "boobytrap").
For any message that required an autoresponse, I eventually hooked in a Template Toolkit-based response template, passing the Mail::Internet object, the Mail::Audit object, and the constructed reply headers. A typical template looks like:
[% head.Subject = "Your recent message to $to\n"; INCLUDE normal_header; %] Why did you send a message to [% to %]? (It's very possible given the current sorry state of Microsoft so-called operating systems security that your address has been forged. If so, please ignore me. Sorry.) Randal L. Schwartz [email protected] [% INCLUDE signature %]
This template handles "bounces" (addresses that aren't otherwise assigned within stonehenge.com). I send out a human message instead of a normal sendmail-like message because I need to know if they really intended the message for some other domain that was similar to stonehenge.com: It's amazing how many of those are out there. Most people ignore sendmail-like messages, but they'll respond in plain English to this letter.
There are other little parts to the mail system, but I hope you get the sense that it's a bunch of bailing wire and duct tape, because it is. And it evolved slowly over time, starting out initially as procmailrc targets, then evolving to use the Perl-based MailAgent, and then to this bizarre hodgepodge.
Initially, this system worked rather well. I was dealing with about 300 to 500 pieces of e-mail a day, sorting them into mailing list folders, personal mail, autoresponse mail for our Perl Training services and my legal case, and handling comp.lang.perl.announce postings, and a few lightweight mailing list rebroadcasters. Each incoming message triggered just two forks (procmail, then my Perl sortmail), and then got delivered.
But around the summer of 2003, the various Microsoft virus mailers started hitting. They started sending mail from randomly selected addresses to many targets, carrying their DNA along to infect the next system on the list.
Right away, I noticed that I was getting a lot of "your mail contains a virus" mail, from well-intentioned antivirus programs. Let me make this perfectly clear. If you write an antivirus program, and your antivirus program can recognize that the virus fakes the "From" line, do not send a response to that clearly faked "From." These "your mail contains a virus" mails are worse than the virus mails themselves, at least for me.
But what I also noticed was a steady increase in the MIRVs. I'm using MIRV here in the "multiple independently targetable reentry vehicles" sense. An incoming spam letter would be addressed to a number of "stonehenge.com" addresses, and delivered in one SMTP connection.
The trouble with MIRVs is that they get burst by the local Postfix and delivered as separate messages (although sharing one Message ID) to separate invocations of procmail, and then to separate invocations of my Sortmail. Each Sortmail would eventually get around to wondering if this was spam, and would make all the regex matches against the mail and all the RBL checks out on the Internet, and come to identical conclusions (usually "yes, it's spam").
So, each MIRV caused 20 new processes to fire up on my box in the space of about two seconds, beating up on a lot of memory and CPU as those complex regexen were dragged through the mail, and generating a lot of DNS Internet traffic to see about RBLs. Ugh. It was a nice design before MIRV spam, but clearly failing now.
But how to fix it? It wouldn't be enough to move to spamc/ spamd, which at least would remove the need to fork and reload all of that SpamAssassin code on each mail, because we still are asking the same question 10 times because of the MIRVs.
But luckily, I recently stumbled across a Slashdot posting that mentioned Amavis (A Mail Virus Scanner), found at http:// www.amavis.org/. Unlike mail-user-agent (MUA) tools like spamc or simple Mail::SpamAssassin-based custom tools, I could hook Amavis in at the mail-transfer-agent (MTA) level. Ahh! Before the MIRV has burst! This looked very promising.
Even more promising is that Amavis is written in Perl, and uses Spam::Assassin and Net::Server, both technologies with which I was familiar. I figured that if I had any trouble with Amavis, my Perl skills were probably sufficient enough to either reverse-engineer for understanding or customize the needed features.
Although Amavis can be used as the normal port-25 listener on a server, I didn't want to remove the known reliability and flexibility of having Postfix be my port-25 listener. Luckily, in the installation instructions, I saw how to make Amavis work alongside Postfix, and followed the instructions rather directly.
First, I unpacked Amavis into /opt/amavisd/, and created an etc and sbin directory alongside the now unpacked source directory. I also created an amavis user, allowing the home directory to default to /home/amavis.
Next, I copied amavisd to the sbin directory, and amavisd.conf to the etc directory. I edited the amavisd.conf file (putting it under RCS first) to reflect local preferences. Many of the settings were as recommended by the README.postfix file included with the distribution.
First, I fixed $MYHOME to /home/amavis and $mydomain to stonehenge.com. (I like that the config file is in Perl and not some obscure config language.) Then I set $daemon_user and $daemon_group both to amavis, as I had chosen, and pushed $TEMPBASE into the tmp subdirectory.
Skimming down, I found the "POSTFIX" section, uncommenting the lines for $forward_method and $notify_method there. Finally, I uncommented the line that set @bypass_virus_checks_acl to a single period. Since I didn't care about virus checks, and only about spam, I wanted to keep Amavis single minded.
I changed the $QUARANTINEDIR to be below /home/amavis for simplicity. And finally, I commented the $sa_local_tests_only line, causing SpamAssassin to also consider the RBL tests.
After making all these changes, I then proceeded with the testing as indicated in the README.postfix file, double checking every step because my machine was handling live e-mail. Ultimately, my /etc/postfix/master.cf was altered to comment out three lines:
#AMAVIS# smtp inet n - - - - smtpd #AMAVIS# pickup fifo n - - 60 1 pickup #AMAVIS# cleanup unix n - - - 0 cleanup replacing those with: smtp inet n - - - - smtpd -o cleanup_service_name=pre-cleanup pickup fifo n - - 60 1 pickup -o cleanup_service_name=pre-cleanup cleanup unix n - - - 0 cleanup -o mime_header_checks= -o nested_header_checks= -o body_checks= -o header_checks=
and adding these new services as well:
pre-cleanup unix n - - - 0 cleanup -o virtual_alias_maps= -o canonical_maps= -o sender_canonical_maps= -o recipient_canonical_maps= -o masquerade_domains= smtp-amavis unix - - y - 2 lmtp -o smtp_data_done_timeout=1200 -o disable_dns_lookups=yes 127.0.0.1:10025 inet n - y - - smtpd -o content_filter= -o local_recipient_maps= -o relay_recipient_maps= -o smtpd_restriction_classes= -o smtpd_client_restrictions= -o smtpd_helo_restrictions= -o smtpd_sender_restrictions= -o smtpd_recipient_restrictions=permit_mynetworks,reject -o mynetworks=127.0.0.0/8 -o strict_rfc821_envelopes=yes
I also added the startup for amavisd to /etc/rc.local:
if [ -x /opt/amavisd/sbin/amavisd ]; then echo -n ' amavis '; sudo -u amavis /opt/amavisd/sbin/amavisd -c /opt/amavisd/etc/amavisd.co\ nf fi;
But that's it. It's been working quite well for me, and my load average has been significantly less. Now the MIRVs get processed once instead of 10 times, and all is well. Almost immediately after making this change, Internet connections on the box were more stable, and my system didn't suddenly spike and freeze up my Emacs session nearly as much as it formerly did. And I'm not dealing with anywhere near the volume of identical spams, so my personal mail volume is also lower.
So, consider adding Amavis to your MTA, and help fight your neverending spam battle in an efficient way. Until next time, enjoy!
TPJ