Amavis and Spamassassin howto: Difference between revisions
Line 175: | Line 175: | ||
When there are a lot of messages in the quarantine and you want to look for ham that fell in accidently, it helps to clear out the things that are obvious spam first. | When there are a lot of messages in the quarantine and you want to look for ham that fell in accidently, it helps to clear out the things that are obvious spam first. | ||
But we want to keep the spam long enough to do bayesian training with it. So move everything into a temporary folder, and unzip it all. | But we want to keep the spam long enough to do bayesian training with it. So move everything into a temporary folder, and unzip it all. | ||
Then train on the bad-headers spam and delete it. | |||
<pre> | |||
sa-learn --spam --dbpath /var/spool/amavisd/.spamassassin --progress ./badh* | |||
rm -f badh* | |||
</pre> | |||
Sort the remaining spam by spam score to look for things that are close to the borderline. | |||
<pre> | |||
grep Spam-Score * | sort -k 2 -nr | |||
</pre> | |||
===Order of service restart=== | ===Order of service restart=== |
Revision as of 17:44, 5 December 2015
per-user spam configuration
- make sure the local user has a ~/.spamassassin directory
global configuration
- /etc/mail/spamassassin/local.cf
- after adjusting global rules, restart spamassassin service to make the changes take effect
smoke test
Running these tests requires that the spamassassin service be running.
/usr/share/doc/spamassassin/examples$ spamc -R <sample-nonspam.txt 0.0/5.0 Spam detection software, running on the system "weasel.finninday.net", has identified this incoming email as possible spam. The original message has been attached to this so you can view it (if it isn't spam) or label similar future email. If you have any questions, see @@CONTACT_ADDRESS@@ for details. Content preview: -----BEGIN PGP SIGNED MESSAGE----- TBTF ping for 2001-04-20: Reviving T a s t y B i t s f r o m t h e T e c h n o l o g y F r o n t [...] Content analysis details: (0.0 points, 5.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- /usr/share/doc/spamassassin/examples$ spamc -R <sample-spam.txt 1000.0/5.0 Spam detection software, running on the system "weasel.finninday.net", has identified this incoming email as possible spam. The original message has been attached to this so you can view it (if it isn't spam) or label similar future email. If you have any questions, see @@CONTACT_ADDRESS@@ for details. Content preview: This is the GTUBE, the Generic Test for Unsolicited Bulk Email If your spam filter supports it, the GTUBE provides a test by which you can verify that the filter is installed correctly and is detecting incoming spam. You can send yourself a test mail containing the following string of characters (in upper case and with no white spaces and line breaks): [...] Content analysis details: (1000.0 points, 5.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 NO_RELAYS Informational: message was not relayed via SMTP 1000 GTUBE BODY: Generic Test for Unsolicited Bulk Email -0.0 NO_RECEIVED Informational: message has no Received headers
test a real mail sample
- ctrl-u in thunderbird to view the full source of an email
- copy and paste to a text file
- feed to spamc
$ spamc -R <spam.txt 9.9/5.0 Spam detection software, running on the system "weasel.finninday.net", has identified this incoming email as possible spam. The original message has been attached to this so you can view it (if it isn't spam) or label similar future email. If you have any questions, see @@CONTACT_ADDRESS@@ for details. Content preview: FtP://tbk.dOWnsizEWherevEr.NeT/index.html [...] Content analysis details: (9.9 points, 5.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- 10 NEWMAN_FROM_RULE Stop mail from yahoo that uses my facebook contacts 0.0 FREEMAIL_FROM Sender email is freemail (bishtalpanaghx[at]yahoo.com) -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at http://www.dnswl.org/, low trust [98.138.229.72 listed in list.dnswl.org] 0.0 UNPARSEABLE_RELAY Informational: message has unparseable relay lines -0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from author's domain 0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily valid -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature
Be aware that the spam threshold reported by testing through the spamc command is irrelevant since amavis overrides this setting. http://www.ijs.si/software/amavisd/#faq-spam
Still, the spamc command is the best test method.
How to tune bayes classification
In the case of a message being wrongly classified as ham
Just rerun the command specifying ham or spam to correct the mistake.
# sa-learn --dbpath /var/spool/amavisd/.spamassassin --spam ~rday/spam.txt Learned tokens from 0 message(s) (1 message(s) examined)
Note: using the sa-learn command takes a lock on the spamassassin database, so any scripted tasks should be done serially.
To see how many items are in the database
# sa-learn --dbpath /var/spool/amavisd/.spamassassin --backup | wc 233525 1032008 8614945
To train on a directory of ham
# sa-learn --ham --dbpath /var/spool/amavisd/.spamassassin --mbox --progress ./2005-09 98% [=================================================================================================== ] 6.89 msgs/sec 00m36s DONE Learned tokens from 252 message(s) (252 message(s) examined)
or
# sa-learn --dbpath /var/spool/amavisd/.spamassassin --ham ./cur --progress
To train on a directory of spam
# sa-learn --dbpath /var/spool/amavisd/.spamassassin --spam ./cur --progress 100% [=====================================================================================================] 9.49 msgs/sec 00m02s DONE Learned tokens from 20 message(s) (21 message(s) examined)
Prevent local spam reports from being classified as spam
TBD
URIBL was blocked
Running spamc on a test message takes a long time (over 4 seconds) and includes this message:
0.0 URIBL_BLOCKED ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [URIs: no1trader.com]
I should run a caching nameserver to be more efficient.
As of Jan 30, 2015, a caching nameserver is in service.
Flow of control
Amavis is consulted when a message is delivered:
not spam
Jan 30 15:25:54 weasel amavis[29157]: (29157-14) ESMTP::10024 /var/lib/amavis/tmp/amavis-20150130T114744-29157-OJ30GQP6: <someone@gmail.com> -> <localaccount@finninday.net> SIZE=2167 Received: from weasel.finninday.net ([127.0.0.1]) by localhost (weasel.finninday.net [127.0.0.1]) (amavisd-new, port 10024) with ESMTP for <localaccount@finninday.net>; Fri, 30 Jan 2015 15:25:54 -0800 (PST) Jan 30 15:25:54 weasel amavis[29157]: (29157-14) Checking: OCmIOOajiWvk [209.85.220.43] <someone@gmail.com> -> <localaccount@finninday.net> Jan 30 15:25:55 weasel amavis[29157]: (29157-14) FWD from <someone@gmail.com> -> <localaccount@finninday.net>,BODY=7BIT 250 2.0.0 from MTA(smtp:[127.0.0.1]:10025): 250 2.0.0 Ok: queued as BBD3C1238CEF Jan 30 15:25:55 weasel amavis[29157]: (29157-14) Passed CLEAN {RelayedInbound}, [209.85.220.43]:53109 [67.189.73.245] <someone@gmail.com> -> <localaccount@finninday.net>, Queue-ID: E63A8123818B, Message-ID: <54CC12FF.9090402@gmail.com>, mail_id: OCmIOOajiWvk, Hits: -0.302, size: 2166, queued_as: BBD3C1238CEF, dkim_sd=20120113:gmail.com, 1726 ms
spam
Jan 30 15:27:40 weasel amavis[14059]: (14059-07) ESMTP::10024 /var/lib/amavis/tmp/amavis-20150130T143841-14059-KJt3rAKZ: <LoanOfficerLeague@successrefinancefor.org> -> <localaccount@finninday.net> SIZE=5178 BODY=8BITMIME RET=HDRS Received: from weasel.finninday.net ([127.0.0.1]) by localhost (weasel.finninday.net [127.0.0.1]) (amavisd-new, port 10024) with ESMTP for <localaccount@finninday.net>; Fri, 30 Jan 2015 15:27:40 -0800 (PST) Jan 30 15:27:41 weasel amavis[14059]: (14059-07) Checking: sYjuE0v8n77d [69.65.46.172] <LoanOfficerLeague@successrefinancefor.org> -> <localaccount@finninday.net> Jan 30 15:27:43 weasel amavis[14059]: (14059-07) local delivery: <LoanOfficerLeague@successrefinancefor.org> -> spam-quarantine, mbx=/var/lib/amavis/virusmails/s/spam-sYjuE0v8n77d.gz Jan 30 15:27:43 weasel amavis[14059]: (14059-07) Blocked SPAM {DiscardedInbound,Quarantined}, [69.65.46.172]:50512 [69.65.46.172] <LoanOfficerLeague@successrefinancefor.org> -> <localaccount@finninday.net>, quarantine: s/spam-sYjuE0v8n77d.gz, Queue-ID: AF431123808D, Message-ID: <f77586e5b50510cfdb616afe4f20267d.23349111.13759887@successrefinancefor.org>, mail_id: sYjuE0v8n77d, Hits: 11.771, size: 5169, 2552 ms
To make this happen, postfix is configured to use amavis as a content filter in main.cf
content_filter = smtp-amavis:[localhost]:10024
Amavis is configured to listen on that port in /etc/amavis/conf.d by default.
Cleaning out the quarantine
/var/spool/amavisd/quarantine
/var/spool/amavis/quarantine# find . -type f | wc 33895 33895 839415
When there are a lot of messages in the quarantine and you want to look for ham that fell in accidently, it helps to clear out the things that are obvious spam first. But we want to keep the spam long enough to do bayesian training with it. So move everything into a temporary folder, and unzip it all.
Then train on the bad-headers spam and delete it.
sa-learn --spam --dbpath /var/spool/amavisd/.spamassassin --progress ./badh* rm -f badh*
Sort the remaining spam by spam score to look for things that are close to the borderline.
grep Spam-Score * | sort -k 2 -nr
Order of service restart
Restarting should be as follows:
service spamassassin restart service amavisd restart
References
- General setup and testing
http://www.stearns.org/doc/spamassassin-setup.current.html
- Postfix, amavis, spamassassin integration
http://wiki.apache.org/spamassassin/IntegratedInPostfixWithAmavis
- may want to use this RBL at some point