Anti Spam tactics

7 January, 2007

Isn't spam great? every day I get new inspiration in my email for Spamusement (which seems a bit neglected lately), and lately attempts to inspire me via comments on my site have been getting through. Trouble is, I've already got a big willy, my breasts are ample, the cupboard is stocked with Viagra, my insurance isn't up for renewal, and I don't trust stock tips from anyone other than the group of monkeys with typewriters I use as consultants for that purpose.

What can be done

There are lots of things that can be done, and what I'm about to present is intended to complement rather than replace an anti-spam system.

Generally speaking, as far as I know, there are three general techniques that are employed (often together) against spam, namely

Identify malicious submissions by IP address, mal-formed headers (i.e. complete lack of) or any other data you can pinpoint as incorrect
Identify form submissions that didn't load the source form in the first place by using the referrer (unreliable) or some kind of token in the form data
Checking the content of a form submission and checking against some criteria to decide if it's spam or not

An extension to the second item is to include a hidden bait input field, that a human user with CSS wouldn't see or edit but a bot would fill with something inspirational. I recently started thinking about randomly renaming all form fields such that it would be much more difficult to know which form inputs expect which type of data, and whilst thinking/researching I came across the term Comment Flak. Without actually checking the content of a submission (and if the check is automated, it could give false-positives or false-negatives), making use of hidden input fields seems the best defense, and may for some applications be more than sufficient.

What is Comment Flak

The principle is quite simple, for any text-area field, include a number of hidden text-area fields and if any hidden text-area is submitted with text in it - you know the submitter is not a normal webuser.

Applied to CakePHP

I figured: Why stop there? Rather than go into too many code details, take a look at the online demo (you can download the source for which from the demo tab).

The technique as presented makes use of a component, helper and dynamic CSS file and can be used any any type of input field. For each input there is a random number (for the demo configured as between 5 and 10) of duplicate form inputs, and one of them at random is the 'live' input field. An uncached CSS file hides each of the dummy input fields meaning that webusers are unaware that there are any more than 4 form fields.

It's use is fairly transparent, a simple call in the controller to initialize the component, and a call (syntactically identical to the 1.2 Form helper) to the helper in the view. Data validation, is handled as you might expect, with some intelligence added to the way the form inputs work; the hidden input fields act exactly the same as the actual form field, including the form validation. If you enter text in a form field and submit the form, each duplicate (should the submission be valid but fail validation) is returned with exactly the same structure, the same contents and a different random instance of the input will be the live one. The goal of which being to make it as difficult as possible for anything other than a human to know which form fields to enter text into/edit.

If you are franticly clicking on the submit button and don't see anything changing, that's what's supposed to happen (visibly, that is) :). To clarify, I added 3 modes to the demo form:

normal (only live inputs shown)
bot (all inputs shown)
debug/clarify (all inputs shown, live inputs with unique labels)

If you click the link to change mode, note that the form data is submitted with it.

Detecting Spam

Using the technique presented, spam is detected if any of the following are true:

No Session data
A form field that should be present is missing
A form field that should not be present is submitted
A dummy field is submitted that doesn't contain the default text

As such there can be some confidence that a submission that passes the tests was submitted by someone sitting in front of their pc preferably wearing some Cake Schwag :).

Wrapping Up

Making user of the component, helper and dynamic CSS file from the demo, will make it significantly more difficult for spam to get through an online form. This technique can compliment/be complimented by detecting successive failed submissions, IP blacklisting and 3rd party spam-detection services (such as Akismet) to make a very effective defense against comment/forum/web-form spam. If you have any ideas about how this technique can be improved I'd be glad to hear them. Bake On!