Amavis and Spamassassin howto

From Finninday
Revision as of 01:50, 14 December 2020 by Rday (Talk | contribs)

Jump to: navigation, search

per-user spam configuration

  • make sure the local user has a ~/.spamassassin directory

global configuration

  • /etc/mail/spamassassin/local.cf
  • after adjusting global rules, restart spamassassin and amavisd service to make the changes take effect

test configuration

 spamassassin -D --lint 2>&1 | less

smoke test

Running these tests requires that the spamassassin service be running.

/usr/share/doc/spamassassin/examples$ spamc -R <sample-nonspam.txt 
0.0/5.0
Spam detection software, running on the system "weasel.finninday.net", has
identified this incoming email as possible spam.  The original message
has been attached to this so you can view it (if it isn't spam) or label
similar future email.  If you have any questions, see
@@CONTACT_ADDRESS@@ for details.

Content preview:  -----BEGIN PGP SIGNED MESSAGE----- TBTF ping for 2001-04-20:
   Reviving T a s t y B i t s f r o m t h e T e c h n o l o g y F r o n t [...]
   

Content analysis details:   (0.0 points, 5.0 required)

 pts rule name              description
---- ---------------------- --------------------------------------------------

/usr/share/doc/spamassassin/examples$ spamc -R <sample-spam.txt 
1000.0/5.0
Spam detection software, running on the system "weasel.finninday.net", has
identified this incoming email as possible spam.  The original message
has been attached to this so you can view it (if it isn't spam) or label
similar future email.  If you have any questions, see
@@CONTACT_ADDRESS@@ for details.

Content preview:  This is the GTUBE, the Generic Test for Unsolicited Bulk Email
   If your spam filter supports it, the GTUBE provides a test by which you can
   verify that the filter is installed correctly and is detecting incoming spam.
   You can send yourself a test mail containing the following string of characters
   (in upper case and with no white spaces and line breaks): [...] 

Content analysis details:   (1000.0 points, 5.0 required)

 pts rule name              description
---- ---------------------- --------------------------------------------------
-0.0 NO_RELAYS              Informational: message was not relayed via SMTP
1000 GTUBE                  BODY: Generic Test for Unsolicited Bulk Email
-0.0 NO_RECEIVED            Informational: message has no Received headers

test a real mail sample

  • ctrl-u in thunderbird to view the full source of an email
  • copy and paste to a text file
  • feed to spamc (this won't use all the amavis rules)
$ spamc -R <spam.txt 
9.9/5.0
Spam detection software, running on the system "weasel.finninday.net", has
identified this incoming email as possible spam.  The original message
has been attached to this so you can view it (if it isn't spam) or label
similar future email.  If you have any questions, see
@@CONTACT_ADDRESS@@ for details.

Content preview:  FtP://tbk.dOWnsizEWherevEr.NeT/index.html [...] 

Content analysis details:   (9.9 points, 5.0 required)

 pts rule name              description
---- ---------------------- --------------------------------------------------
  10 NEWMAN_FROM_RULE       Stop mail from yahoo that uses my facebook contacts
 0.0 FREEMAIL_FROM          Sender email is freemail (bishtalpanaghx[at]yahoo.com)
-0.0 RCVD_IN_DNSWL_NONE     RBL: Sender listed at http://www.dnswl.org/, low
                            trust
                            [98.138.229.72 listed in list.dnswl.org]
 0.0 UNPARSEABLE_RELAY      Informational: message has unparseable relay lines
-0.1 DKIM_VALID_AU          Message has a valid DKIM or DK signature from author's
                            domain
 0.1 DKIM_SIGNED            Message has a DKIM or DK signature, not necessarily valid
-0.1 DKIM_VALID             Message has at least one valid DKIM or DK signature

Be aware that the spam threshold reported by testing through the spamc command is irrelevant since amavis overrides this setting. http://www.ijs.si/software/amavisd/#faq-spam

Another option is to resend the message like this:

[root@mail spamassassin]# sendmail -i me@myaddress.net < /var/spool/amavisd/quarantine/spam-Iy2P0CAFFxBK

Neither way is perfect as it is difficult to replicate the path and headers perfectly. Still, the spamc command is the best test method.

How to tune bayes classification

In the case of a message being wrongly classified as ham

Just rerun the command specifying ham or spam to correct the mistake.

# sa-learn --dbpath /usr/local/spamassassin/.spamassassin --spam ~rday/spam.txt 
Learned tokens from 0 message(s) (1 message(s) examined)

Note: using the sa-learn command takes a lock on the spamassassin database, so any scripted tasks should be done serially.

To see how many items are in the database

# sa-learn --dump magic
0.000          0          3          0  non-token data: bayes db version
0.000          0      16365          0  non-token data: nspam
0.000          0       1470          0  non-token data: nham
0.000          0     388963          0  non-token data: ntokens
0.000          0 1441110161          0  non-token data: oldest atime
0.000          0 1449342088          0  non-token data: newest atime
0.000          0          0          0  non-token data: last journal sync atime
0.000          0 1449336747          0  non-token data: last expiry atime
0.000          0    2764800          0  non-token data: last expire atime delta
0.000          0      14505          0  non-token data: last expire reduction count


# sa-learn --dbpath /usr/local/spamassassin/.spamassassin --backup | wc
 233525 1032008 8614945

To train on a directory of ham

# sa-learn --ham --dbpath /usr/local/spamassassin/.spamassassin --mbox --progress ./2005-09
 98% [===================================================================================================  ]   6.89 msgs/sec 00m36s DONE
Learned tokens from 252 message(s) (252 message(s) examined)

or

# sa-learn --dbpath /usr/local/spamassassin/.spamassassin --ham ./cur --progress

To train on a directory of spam

# sa-learn --dbpath /usr/local/spamassassin/.spamassassin --spam ./cur --progress
100% [=====================================================================================================]   9.49 msgs/sec 00m02s DONE
Learned tokens from 20 message(s) (21 message(s) examined)

Tidy up token database

[root@mail temp]# sa-learn --force-expire
expired old bayes database entries in 11 seconds
139695 entries kept, 93719 deleted
token frequency: 1-occurrence tokens: 64.36%
token frequency: less than 8 occurrences: 19.31%

Prevent local spam reports from being classified as spam

TBD

URIBL was blocked

Running spamc on a test message takes a long time (over 4 seconds) and includes this message:

 0.0 URIBL_BLOCKED          ADMINISTRATOR NOTICE: The query to URIBL was blocked.
                            See
                            http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block
                             for more information.
                            [URIs: no1trader.com]

I should run a caching nameserver to be more efficient.

As of Jan 30, 2015, a caching nameserver is in service.

Flow of control

Amavis is consulted when a message is delivered:

not spam

Jan 30 15:25:54 weasel amavis[29157]: (29157-14) ESMTP::10024 /var/lib/amavis/tmp/amavis-20150130T114744-29157-OJ30GQP6: <someone@gmail.com> -> <localaccount@finninday.net> SIZE=2167 Received: from weasel.finninday.net ([127.0.0.1]) by localhost (weasel.finninday.net [127.0.0.1]) (amavisd-new, port 10024) with ESMTP for <localaccount@finninday.net>; Fri, 30 Jan 2015 15:25:54 -0800 (PST)
Jan 30 15:25:54 weasel amavis[29157]: (29157-14) Checking: OCmIOOajiWvk [209.85.220.43] <someone@gmail.com> -> <localaccount@finninday.net>
Jan 30 15:25:55 weasel amavis[29157]: (29157-14) FWD from <someone@gmail.com> -> <localaccount@finninday.net>,BODY=7BIT 250 2.0.0 from MTA(smtp:[127.0.0.1]:10025): 250 2.0.0 Ok: queued as BBD3C1238CEF
Jan 30 15:25:55 weasel amavis[29157]: (29157-14) Passed CLEAN {RelayedInbound}, [209.85.220.43]:53109 [67.189.73.245] <someone@gmail.com> -> <localaccount@finninday.net>, Queue-ID: E63A8123818B, Message-ID: <54CC12FF.9090402@gmail.com>, mail_id: OCmIOOajiWvk, Hits: -0.302, size: 2166, queued_as: BBD3C1238CEF, dkim_sd=20120113:gmail.com, 1726 ms

spam

Jan 30 15:27:40 weasel amavis[14059]: (14059-07) ESMTP::10024 /var/lib/amavis/tmp/amavis-20150130T143841-14059-KJt3rAKZ: <LoanOfficerLeague@successrefinancefor.org> -> <localaccount@finninday.net> SIZE=5178 BODY=8BITMIME RET=HDRS Received: from weasel.finninday.net ([127.0.0.1]) by localhost (weasel.finninday.net [127.0.0.1]) (amavisd-new, port 10024) with ESMTP for <localaccount@finninday.net>; Fri, 30 Jan 2015 15:27:40 -0800 (PST)
Jan 30 15:27:41 weasel amavis[14059]: (14059-07) Checking: sYjuE0v8n77d [69.65.46.172] <LoanOfficerLeague@successrefinancefor.org> -> <localaccount@finninday.net>
Jan 30 15:27:43 weasel amavis[14059]: (14059-07) local delivery: <LoanOfficerLeague@successrefinancefor.org> -> spam-quarantine, mbx=/var/lib/amavis/virusmails/s/spam-sYjuE0v8n77d.gz
Jan 30 15:27:43 weasel amavis[14059]: (14059-07) Blocked SPAM {DiscardedInbound,Quarantined}, [69.65.46.172]:50512 [69.65.46.172] <LoanOfficerLeague@successrefinancefor.org> -> <localaccount@finninday.net>, quarantine: s/spam-sYjuE0v8n77d.gz, Queue-ID: AF431123808D, Message-ID: <f77586e5b50510cfdb616afe4f20267d.23349111.13759887@successrefinancefor.org>, mail_id: sYjuE0v8n77d, Hits: 11.771, size: 5169, 2552 ms

To make this happen, postfix is configured to use amavis as a content filter in main.cf

content_filter = smtp-amavis:[localhost]:10024

Amavis is configured to listen on that port in /etc/amavis/conf.d by default.


Cleaning out the quarantine

/var/spool/amavisd/quarantine

/var/spool/amavis/quarantine# find . -type f | wc
  33895   33895  839415

When there are a lot of messages in the quarantine and you want to look for ham that fell in accidently, it helps to clear out the things that are obvious spam first. But we want to keep the spam long enough to do bayesian training with it. So move everything into a temporary folder, and unzip it all.

Then train on the bad-headers spam and delete it.

sa-learn --spam --dbpath /usr/local/spamassassin/.spamassassin --progress ./badh*
rm -f badh*

Move the obvious spam aside for training

grep ^From * | grep \.faith\> | awk -F: '{ print $1 }' | xargs -i mv {} spam

Sort the remaining spam by spam score to look for things that are close to the borderline.

grep Spam-Score * | sort -k 2 -nr

Order of service restart

Restarting should be as follows:

service spamassassin restart
service amavisd restart or service clamd@amavisd restart

postscreen config

I started running postscreen in order to better deal with my haproxy setup. The config looks like this:

postscreen_upstream_proxy_protocol = haproxy
postscreen_access_list = permit_mynetworks
postscreen_greet_banner =
postscreen_dnsbl_threshold = 2
postscreen_dnsbl_sites = zen.spamhaus.org*2 bl.spamcop.net*1 b.barracudacentral.org*1

But I wasn't actually allowing postscreen to drop any spammers. After running for awhile with it configured to pass all connections along to postfix, I give it the power to drop connections that it determines to be from spam zombies. I also add a blacklist that I can use to easily drop IPs that are scanning me for relay access.

postscreen_upstream_proxy_protocol = haproxy
postscreen_access_list = permit_mynetworks, cidr:/etc/postfix/postscreen_access.cidr
postscreen_greet_banner =
postscreen_dnsbl_threshold = 2
postscreen_dnsbl_sites = zen.spamhaus.org*2 bl.spamcop.net*1 b.barracudacentral.org*1
postscreen_dnsbl_action = enforce
postscreen_greet_action = enforce

Now the logs show spam getting dropped by postscreen like this:

Jan 23 12:28:45 mail postfix/postscreen[19285]: CONNECT from [201.164.202.227]:53722 to [64.184.245.226]:25
Jan 23 12:28:45 mail postfix/dnsblog[19287]: addr 201.164.202.227 listed by domain bl.spamcop.net as 127.0.0.2
Jan 23 12:28:45 mail postfix/dnsblog[19286]: addr 201.164.202.227 listed by domain b.barracudacentral.org as 127.0.0.2
Jan 23 12:28:45 mail postfix/postscreen[19285]: DNSBL rank 2 for [201.164.202.227]:53722
Jan 23 12:28:48 mail postfix/postscreen[19285]: NOQUEUE: reject: RCPT from [201.164.202.227]:53722: 550 5.7.1 Service unavailable; 
 client [201.164.202.227] blocked using bl.spamcop.net; from=<Bryant_Edmund@wellholm.com>, to=<4a8b71ff.9010301@finninday.net>, 
 proto=SMTP, helo=<customer-GDL-202-227.megared.net.mx>

log parsing

So now I have multiple levels of spam checks. If I want to find all blocked spam in my logs, I need to look for different patterns to catch them all.

  • postscreen blocks: "postscreen.*NOQUEUE"
  • amavis blocks: "amavis.*Blocked SPAM"

And then general failures to deliver:

  • postfix not delivered: "smtpd.*NOQUEUE"

References

  • General setup and testing

http://www.stearns.org/doc/spamassassin-setup.current.html

  • Postfix, amavis, spamassassin integration

http://wiki.apache.org/spamassassin/IntegratedInPostfixWithAmavis

  • may want to use this RBL at some point

http://www.barracudacentral.org/rbl