Anti-Spam

Spam Removal Lists Info
This page explains why you should never submit your email address for removal from spammer lists. Any requests that actually do go through are generally used for confirming real addresses or trying other scams, such as charging you $5 to remove each address from these lists. Spamhaus provides valuable anti-spam services, such as their Spamhaus Block List, which lets your mailserver deny incoming messages from known spam sources, and their Register Of Known Spam Operations, which provides details on the largest spam sources.

Spamhaus was recently the target of an email worm sent out by spammers. This worm is unique in that it sends the victim an email stating that their credit card will be charged and child porn CDs have been mailed to them. According to the email, you can stop this by sending an email to spamhaus.org. Spamhaus did not send the email to you, there are no porn CDs on the way, and sending them emails only interferes with their legitimate anti-spam efforts. http://www.spamhaus.org/cyberattacks/ has more info on these attacks.

Enter the IP address of a mail server here to see if it is on the SBL.

Paul Graham's Spam Info
Paul Graham became very popular recently with his article "A Plan for Spam". He brought the concept of Bayesian spam filtering into the public's eye. Others, including Microsoft, had already tried using this old statistical method for filtering spam without much luck. Paul did things a little differently, and got some very impressive results (99.5% filtering with no false positives.).

I truly believe that this is the best filtering method currently available. It works by looking at the words in "good" and "bad" emails, and gathering statistics on how often words appear in each. Unlike a simple "bad word blacklist", a Bayesian filter won't discard a joke email from a friend just because it contains the word "viagra". It works like a human mind, examining all the little bits to determine the validity of the email as a whole.

It does have some drawbacks because it is based on statistics. To be accurate, the filter needs to look at a large number of existing emails to get a list of "good" and "bad" words. The database takes up space, and you need an easy way to classify emails as good or bad so that the filter can generate useful statistics on the words in those emails.

SpamAssassin
SpamAssassin is probably the best out-of-the-box spam filter available for mail servers. The newest versions include a Bayesian filter in addition to their regular tests, so it can "learn" from especially obvious spam and non-spam messages. SA also works similar to a human mind, using a lot of small tests to determine the fate of the whole email. While a Bayesian filter works simply on word statistics, SA has a bunch of hard-coded tests. While it's not adaptive like Bayesian filtering (without revising the tests coded into the program), each of SA's tests is worth a certain number of points. For example, the word "viagra" in the subject adds 1.878 to 3.482 points (depending on other settings). After being subjected to all the tests, an email with a total score over 5 is considered junk. You can change the value needed for an email to be considered spam, as well as the value assigned by each test. If you're an engineer at Pfizer who studies Viagra all day, you could change it so that the "viagra" test only raises it .001, effectively making that rule less sensitive. You can also call other spam test systems from within SA. Rather than relying on one system to give a yes/no determination, SA adds a certain number of points (just like its other tests) if the other system marks the message as spam.

SpamAssassin's biggest downfall is that it's really designed for Unix mail servers. It's open source, so you can do whatever you like to make it work in your system, but you can't just download it and plug it into Exchange or your local email client. It's easiest to stick it on a mail server, or a client using an MTA. In these setups, the email message can be passed through SA and filtered. You can have SA alter the subject or add a header so that you and/or your email client can easily sort out the junk. You can pass a copy of the email to SA and have it return a yes/no answer, so that an external mail processing program (like procmail) can sort it into the proper folder. SA is so customizable that the possibilites are nearly endless.

Distributed spam classification
Vipul's Razor
SpamNet
Pyzor
DCC
Firetrust CFS
These are all separate products, but work the same way. When someone gets a junk email, they report it as spam. The server part of the system keeps a database of the email (usually a unique signature generated from the email, rather than the actual message) and that it's spam. Other users check incoming messages against the server's database. If an incoming mail matches one stored in the server as spam, the client program marks the message as spam. The idea is that once one person gets a certain spam, nobody else will have to.

The biggest problem with this method is that you're having someone else decide for you what is spam and what isn't. The different systems have different ways of dealing with this. SpamNet gives users a trust rating. If someone votes against the majority (i.e. a spammer trying to validate his own spam), the user's rating goes down. The user's future votes then count for less, as they're less "trustworthy". With CFS, users basically just suggest that a certain message is spam. Someone at Firetrust then manually verifies that the message really is spam. This is good, but it requires a lot of personal intervention.

Newsletters are the biggest problem with these types of systems. Many people mark valid newsletters as spam when they don't want them anymore, rather than unsubscribing like they should. I have also run into the problem of valid newsletters having broken unsubscribe functions. I subscribed to one newsletter a while back. When I no longer wanted to receive it, I unsubscribed per their instructions. I still got the newsletter, so I unsubscribed again. And again. Eventually I just accepted the fact that their system doesn't work right, and added them to my spam filter. I'm sure there are many other people who do want this newsletter, which they signed up for (I was in this group originally). But at this point, it's unsolicited commercial email to me. How do you deal with a situation like this in one of these systems? The only sure way is to whitelist the newsletters you want, in case someone else marks them as spam, since it's generally not very easy for a spam server to determine if someone is blocking a newsletter because they don't have a functional unsubscribe, or if the user is just too lazy to unsubscribe properly. In other words, these systems really don't work on newsletters.

A quick overview of the different products... Vipul's Razor is an open source project. SpamNet is the commercial version of Razor. I'm not sure if they share the same server network, or if they just use the same software on two different networks, but the two are related. SpamNet does have a very friendly Outlook plugin, while Razor is again aimed at Linux and similar systems. Pyzor started out as simply a Python clone of the Razor client. But since the rest of the Razor system excluding the client is closed source, they decided to make their own server and protocol and everything. It's essentially an open source clone. DCC is another open source option. It focuses on detecting all bulk mail, and you have to whitelist any valid bulk mail that you do want to receive. Firetrust CFS works as an additional feature inside MailWasher Pro.

invisibill@invisibill.net