Skip to main content

Anti Comment Spam System in ColdFusion

This is a system I've been working for for probably a couple years now total. It runs on a fairly busy site, and is meant to prevent comment spam. The comment system itself uses flat files, and is a custom-coded ColdFusion system, built to order.

I'm not a huge fan of ColdFusion - it seems like everything I try to do with it is about twice as difficult as it should be - perhaps I'm just too used to PHP and Perl. Anyways, I'm going to give a general overview of the system. I can't post code, because the code belongs to the client. But perhaps the general idea of the system will be enough to be helpful to someone out there.

So, when someone leaves a comment, it's immediately visible and an email is sent to the admin, so he knows and can take action if the comment is spam. That worked fine for far longer than you'd expect, but eventually the spammers found it and started bombarding him with crap.

So, the first line of defense we added was quite simple - it blocked people trying to submit urls using common forum codes, i.e. "[url]some spammy site[/url]" - this was an easy fix, since the site is custom code and it doesn't say anywhere to use such a code to add a url, there wasn't much chance of getting false positives.

That helped quite a bit, but eventually we needed more. Some spammers will just post straight links, and some just post random gibberish for no apparent reason - just apparently random strings of pill names and other crap. So we added the 2nd line of defense, a customizable list of terms that the admin could use to block any comment that had such a term in any of its fields.

Now when a comment is blocked by the system, it is basically ignored - nothing is logged, it never shows up on the site, no email is sent to the admin.

Eventually, the client needed a third level of protection - spammers were still posting links. As a temporary fix, he blocked "http", but spammers would still post domain names, etc. And plus it was possible that actual legit users would want to post links as they discuss the articles on the site, so he knew he needed a better system.

We talked about using a Captcha (those visual puzzles where you have to type the letters in the picture) but decided against it both for accessibility reasons (lots of people have problems with those, plus how is a blind user to use it?) and because I'm not sure they provide much protection any more - after all the gmail captcha has been broken for quite a while!

So I decided to implement one of those simple question and answer systems. Basically, the user must answer a simple arithmetic problem before posting. The assumption is that bots would not be smart enough to answer such a problem, at least not without custom programming by the spammers - and they'd rather just move on to an easier target - of which there are many!

To implement it, the system picks to random one digit positive numbers, and asks the user to enter the sum. The tricky part is that the answer must be encoded in the form itself, so that the system knows what the correct answer is. So, the actual correct answer is also stored in a hidden field in the form. But obviously this makes it easy for a bot to "see" and use, so it must be obfuscated somehow. To do this, I hash the correct answer, using a key which changes daily.

A "hash" basically "makes a hash" of a string - generating a unique bunch of gibberish, using the key that's passed to it. The system picks a key using a number of factors including things like the date, the ip of the user, the article url, etc. The idea is you need a key that is fairly hard to guess, but that the form processing script can also come up with on its own, without passing the key to it somehow.

So, this correct answer is hashed and put into a hidden field. When the user submits the form, the comment script takes the users answer and hashes it with that same key, then compares the result to the hashed correct answer submitted in the hidden field. If the answer is correct, then the comment is posted (assuming it passed the older tests as well) - if not the user is told to go back and try again.

So far this system is working. I still have some tricks up my sleeve for when it needs to be updated again - one that looks promising is taking advantage of the fact that bots tend to fill out all the fields in a form - even fields that a real user can't see. For example, you could create some dummy fields and hide them via CSS. A real user wouldn't see them, while a bot reading the source code of the page would. So if you see text in one of those dummy fields, you know it's a bot - and block it.

I also haven't done anything with IP addresses - so adding a simple IP blocking system may be worth pursuing at some point.

Comments

Popular posts from this blog

Another VI tip - using macros, an example

God I love VI. Well, actually, vim but whatever. Here's another reason why. Suppose you need to perform some repetitive task over and over, such as updating the copyright date in the footer of a static website. (Yes, yes I know you could do a javascript thing or whatever, just bear with me.) Of course you could just search and replace in some text editor, changing "2007" to "2008" (if you're stupid) - and you'll end up with a bunch of incorrect dates being changed, most likely. What you need to do is only change that date at the bottom. And suppose that because of the formatting, you can't use the "Copy" part of the string in a search replace - perhaps some of the pages use "©", some spell out "Copyright" etc. This is where vi macros come in handy. A macro in vi is exactly what you expect, it records your actions and allows you to play them back. To start recording, press q followed by a character to use to "stor...

Using FIle FIlters in FileZilla

Here's a handy tip for situations when you want to download a large number of files - but only of a certain type. For example, perhaps you want to download all the PHP files from a largish website, scattered through many subdirectories. Perhaps you're making a backup and don't want any image files, etc. FileZilla (still the best FTP in my opinion) has a handy feature called filename filters - located under the Edit menu. Here you can set various filters that filter out files based on their filename. Took me a minute to figure that out - you're saying show only PHP files, rather you're saying filter out files that do not have ".php" as their suffix. For some reason, that seems a little backwards to me, but whatever. It works quite well. You can also check whether the filter applies only to files, only to directories - or both. In this example, you'd want to check only files, as otherwise you won't see any directories unless they happen to end in...

Great google article

Over on Maximum PC - there were a few things I didn't know you could do with the various Google apps. One is uploading files to google docs - any file. Which ties in well with my previous post about storing passwords - I uploaded a copy of my password safe file to google docs as a backup. Can't hurt, right? Also, I wasn't aware that you could set up forms in google docs that act as surveys, and then store the results in a google docs spreadsheet. This is a little alarming, as a decent amount of my work involves coding up custom surveys similar to this...