Sunday 1 June 2008

Add a new regex string and keep your fingers crossed!?

WARNING: The information presented in this article is provided without warranty. Use at your own risk! Do not implement any features without full understanding of the implications. Using these measures incorrectly MAY prevent e-mail from reaching your server.

Whilst testing a complex regex string is a must, you can never tell if it is correct or, Postfix won't throw a hissing fit when it is live. Hard as you try to, you never know if a string will accidently block legitimate mail.

I run my own personal server for self experimentation and, the only users are family members. If you can, put a new string on a test server first. Always keep an eye on your logs, to see what is getting blocked by a new string. If unwanted mail is still reaching your mail client application, fine tune the string but a keep a copy of the previous string below it.

i.e.

Quote:   
# rolex or cartier
/([rR][-\. ]?[oO0][-\. ]?[lL][-\. ]?[eE][-\. ]?[xX][\.\? ]?)|([cC][-\. ]?[aA4][-\. ]?[rR][-\. ]?[tT][-\. ]?[iI1][-\. ]?[eE][-\. ]?[rR][\.\? ]?)/ REJECT Message body rejected [101]

# rolex
/[rR][-\. ]?[oO0][-\. ]?[lL][-\. ]?[eE][-\. ]?[xX][\?\. ]?/ REJECT Message body rejected [102x]
/[rR][oO0][lL][eE][xX]/ REJECT Message body rejected [102]

# cartier
/[cC][-\. ]?[aA4][-\. ]?[rR][-\. ]?[tT][-\. ]?[iI1][-\. ]?[eE][-\. ]?[rR][\.\? ]?/ REJECT Message body rejected [104x]
/[cC][aA4][rR][tT][iI1][eE][rR]/ REJECT Message body rejected [104]
   


Here, I have amalgamated my rolex and cartier strings into one. Until I am happy with it, I have the separate strings below it. In fact, whilst I was testing the new cartier string, I kept a basic string below it. Just to make sure that I still blocked some junk! Note that anything after the REJECT is just a textual string for your log/s. Numbering the strings helps, when checking your log/s, to show which strings are working the best.


header_checks, body_checks and regexp examples

WARNING: The information presented in this article is provided without warranty. Use at your own risk! Do not implement any features without full understanding of the implications. Using these measures incorrectly MAY prevent e-mail from reaching your server.

There are plenty of tutorials for setting up filtering on Postfix and that regular expressions are a useful tool. But I've found it really difficult to learn how to create regexe strings. They've driven me crazy, in my pursuit to block spam cretins!

This is not aimed to be a tutorial. Just notes, if you like, of what has worked for me.

I found a Quick Regular Expressions reference webpage that was understandable and gave me a start at, Quick Regular Expressions reference.

I found a useful free tool to experiment and test regular experessions, The Regex Coach at http://www.weitz.de/regex-coach/.

As well as the above tool, you can test your Postfix filtering from the CLI, using postmap.

postmap -q 'Even if you have no erection problems Cialis would help you to make' regexp:/etc/postfix/body_checks


Returns the following for my working body_checks..

REJECT Message body rejected [056]



body_checks examples

The following will reject permatations of the word Viagra. I'm not saying it is a perfect string. A work in progress. To list a few and, to save space, shows only lower-case though it does test upper-case:-

viagra v1agra vi4gra vi@gra viagr4 viagr@ v-iagra v i a g r a v-i-a-g-r-a vijagra vjiagra

# viagra
/(VP[\-]RX|[vV][ j!\-\.]?[iI1][ j!\-\.]?[aA4@][ !j\-\.]?[gG][ !j\-\.]?[rR][ !j\-\.]?[aA4][ !j\-\.]?)/ REJECT Message body rejected [301]


I'll show a few more drug related strings below.

# valium
/[vV][j\\-\.! ]?[aAA4@][j\\-\.! ]?[lL|1][j\\-\.! ]?[iI\|1][j\\-\.! ]?[uU][j\\-\.! ]?[mM][j\\-\.! ]?/ REJECT Message body rejected [401]

# levitra
/[lL\|1][j\\-\.! ]?[eE][j\\-\.! ]?[vV][j\\-\.! ]?[iI\|1][j\\-\.! ]?[tT][j\\-\.! ]?[rR][j\\-\.! ]?[aA][j\\-\.! ]?/ REJECT Message body rejected [303]

# ciallis
/[cC][j\\-\.! ]?[iI\|1][j\\-\.! ]?[aA4@][j\\-\.! ]?[aA]?[lL\|1]*[j\\-\.! ]?[iI\|1][j\\-\.! ]?[sS5][j\\-\.! ]/ REJECT Message body rejected [304]

#MaxGain+
/[mM][-\. ]?[aA4][-\. ]?[xX][-\. ]?[gG][-\. ]?[aA4][-\. ]?[iI1][-\. ]?[nN][+]?[!\.\? ]?/ REJECT Message body rejected [210]

# Replica goods - watches, shoes.
/[wW]atch(es)? [rR]eplica[!\.,\?s ]?/ REJECT Message body rejected [013]
/[rR]eplica(s)? [wW]atch(es)?|[lL]eather|[sS]hoes|[bB]oots|[fF]ootwear[\.\? ]?/ REJECT Message body rejected [049]
/([pP]opular|[eE]xquisit[e])? [rR]eplica(s)?[\.\? ]?/ REJECT Message body rejected [055]
/[lL]ux(ury|urious|)[\., ]? ([lL]eather|[sS]hoes|[bB]oots|[fF]ootwear)[!\.\? ]/ REJECT Message body rejected [043]

# rolex or cartier
/([rR][-\. ]?[oO0][-\. ]?[lL][-\. ]?[eE][-\. ]?[xX][\.\? ]?)|([cC][-\. ]?[aA4][-\. ]?[rR][-\. ]?[tT][-\. ]?[iI1][-\. ]?[eE][-\. ]?[rR][\.\? ]?)/ REJECT Message body rejected [101]

# Here is one that is doing the rounds. The vertical pipe symbol acts as an OR. The words and whitespace are inside brackets, with a question mark afterwards. This means match either word 1 or 0 times. Meaning either word may, or may not be in the string. Useful!
/[cC]aught( |you|me)?naked[\.\? ]?/ REJECT Message body rejected [038]


I could go on, with lots of examples. I'll list my body_checks file soon.

header_checks examples

I'll list a few for now...

# rolex or cartier
/^Subject:(.*)([rR][-\. ]?[oO0][-\. ]?[lL][-\. ]?[eE][-\. ]?[xX][\.? ]?)|([cC][-\. ]?[aA4][-\. ]?[rR][-\. ]?[tT][-\. ]?[iI\|1][-\. ]?[eE][-\. ]?[rR][\.? ]?)/ REJECT Message header rejected [065x]

# Forged mail. One of my domains is taurus2.plus.com and if I receive these headers, they are forged!
/^Received:(.*) +from +(taurus2\.plus\.com) +/ REJECT Forged client name in Received: header: $1
/^Received:(.*) +from +[^ ]+ +\(([^ ]+ +[he]+lo=|[he]+lo +)(taurus2\.plus\.com)\)/ REJECT Forged client name in Received: header: $2
/^Message-ID:.*@(taurus2\.plus\.com)/ REJECT Forged domain name in Message-ID: header: $1

# viagra
# Looks simular to my body_checks entry, with just '^Subject:.*' prefixing the string.
/^Subject:.*(VP[-]?RX|[vV][ j_\-]?[iI1][ j_\-]?[aA4@][ j_\-]?[gG][ j_\-]?[rR][ j_\-]?[aA4@])/ REJECT Message header rejected [073]

# I don't bank with any of the following so is spam
# Most banks won't email their customers in any case so, maybe add your bank too!
/^(To|From|Cc|Reply-To):.*@tcfbank.com/ REJECT Message header rejected [050]
/^(To|From|Cc|Reply-To):.*@lloydstsb.com/ REJECT Message header rejected [051]
/^(To|From|Cc|Reply-To):.*@natwest.com/ REJECT Message header rejected [052]
/^(To|From|Cc|Reply-To):.efoxpay/ REJECT Message header rejected [053]
/^(To|From|Cc|Reply-To):.*@bankofscotland.co.uk/ REJECT Message header rejected [054]

# Block non-english text
/^Content-Type:.*charset=.*(big5|euc-kr|gb2312|iso-.*-jp|ks_c_5601-1987)/ REJECT Message header rejected [066]
/^Subject:.*=\?(big5|euc-kr|gb2312|iso-.*-jp|ks_c_5601-1987)\?/ REJECT Message header rejected [067]
/^Subject:.*=?KOI8/ REJECT Message header rejected [060]

# Block mail with really old dates
/^Date:.*200[0-7]/ REJECT Message header rejected [058] - Date too old!
/^Date:.*19[0-9][0-9]/ REJECT Message header rejected [059] - Date too old!


I could go on, with lots of examples. I'll list my header_checks file soon.

I'm still experimenting!