Improving SpamAssassin accuracy on cPanel (or any other) mail servers - with statistics

July 25, 2013
For some time we've been frustrated by the amount of spam not caught by our spam filter. We're delighted to say we've developed a simple configuration that results in extremely high accuracy. Though some of the techniques in this article are related to cPanel servers, the most important points should work with any mail server running SpamAssassin. Keep reading to see the rules we use and why we use them, as well as statistics for actual mail addressed to Mango's personal email address. We have observed very accurate detection using a combination of DNS blocklists, Bayesian filters, and SPF. SpamAssassin comes pre-configured with several good blocklists. We've chosen to increase the default scores for URIBL.COM, Spamhaus, and Barracuda. These blocklists have an extremely low false positive rates and are operated on a "free for most" basis. (We find the "free for most" providers most helpful.) Here's the standard ~/.spamassassin/user_prefs file that we offer to our users:
score BAYES_40 1
score BAYES_50 2
score BAYES_60 3
score BAYES_80 4
score BAYES_95 5
score BAYES_99 6
score SPF_FAIL 5
score SPF_PASS 0
score SPF_NEUTRAL 0
score URIBL_BLACK 10
describe URIBL_BLACK Contains a URL listed in black.uribl.com
score RCVD_IN_SBL 10
describe RCVD_IN_SBL Rcvd via a relay in Spamhaus SBL (Direct UBE)
score RCVD_IN_XBL 10
describe RCVD_IN_XBL Last ext relay in Spamhaus XBL (exploits)
score RCVD_IN_PBL 10
describe RCVD_IN_PBL Last ext relay in Spamhaus PBL (Non-MTA IPs)
score URIBL_DBL_SPAM 10
describe URIBL_DBL_SPAM Contains a URL listed in the Spamhaus DBL
score RCVD_IN_BRBL_LASTEXT 10
describe RCVD_IN_BRBL_LASTEXT Last external relay in Barracuda RBL
score RCVD_IN_BL_SPAMCOP_NET 0 1.246 0 1.347 # false positives - occasionally blocks Hotmail.  Default was 15.
Note that Barracuda requires free registration. Besides the RBLs, we've also assigned higher-than-default scores to Bayesian filters, a moderately high score to SPF_FAIL, and a score of 0 to SPF_PASS and SPF_NEUTRAL. With regards to Bayesian filters, for training we simply use the default autolearn settings. This causes the Bayesian system to learn high-scoring mail as spam and low-scoring mail as ham. Now, if you have root access to your cPanel server, let's tweak a few settings in WHM. It's extremely important to use a caching nameserver for DNS blocklists to work properly. Most public DNS servers quickly get blocked due to excessive queries. From within WHM, navigate to Nameserver Selection. BIND is an excellent choice and may already be selected and working. Once BIND is working, navigate to Resolver Configuration. Enter the public IP address of your server as the Primary Resolver and complete the wizard. It may be worth it to note that we observed unexpected results when using 127.0.0.1. Now, navigate to Exim Configuration Manager. You may wish to configure SpamAssassin™ reject spam score threshold. The advantage of setting this to a low number like 15 is that the sender will know that the mail has not arrived. It's theoretically possible a spammer could clean their database by removing email addresses that bounce. The advantage of setting this to a high number like 100 is that the user can manage their spam in the way that they require. As for our preference, by default we reject spam with a score greater than 100. This delivers most spam to the user (with the subject tagged ***SPAM***) and allows the user to configure filtering in their email client or their own cPanel account. Mango has configured his personal cPanel account to assign a score of 100 to the above RBLs so that he rejects spam, for his account only. We typically turn off RBLs from within Exim as we prefer to configure them within SpamAssassin. This allows you to have special configuration for individual cPanel accounts. If your server is powerful enough, we recommend setting SpamAssassin™: message size threshold to scan to a high number like 1024. Finally, we recommend that you SSH into your server as root and run sa-update to be sure you have the latest spam detection rules. Here are our statistics from the month of May 2013 for mail sent to Mango's personal email address (before he started rejecting spam).

Ham vs. Spam

324
legitimate mail
1,478
spam mail
 

Blocklist Performance

Highest and Lowest Scores

3
highest legitimate score
-6.9
lowest legitimate score
88.1
highest spam score
5.2
lowest spam score
 

Blocklists vs Other Rules

Accuracy

3
false
negatives
99.8%
correctly tagged
0
false
positives
Tyler
October 2, 2014
With this config, what threshold are most of your users set on? 5?
Mango
October 2, 2014
We lowered it to 4.5.
Jessica
April 12, 2015
If we place these options into ~/.spamassassin/user_prefs will they be overwritten when cPanel auto-updates/restarts, or when the SpamAssassin preferences are re-saved via cPanel?
Jessica
April 12, 2015
Never mind -- just tried placing the rules in there and re-visited the SpamAssassin settings in cPanel, and see that they are all listed in the fields there, so they won't be deleted when the settings are saved. Awesome!