Skip to main content

Anti Comment Spam System in ColdFusion

This is a system I've been working for for probably a couple years now total. It runs on a fairly busy site, and is meant to prevent comment spam. The comment system itself uses flat files, and is a custom-coded ColdFusion system, built to order.

I'm not a huge fan of ColdFusion - it seems like everything I try to do with it is about twice as difficult as it should be - perhaps I'm just too used to PHP and Perl. Anyways, I'm going to give a general overview of the system. I can't post code, because the code belongs to the client. But perhaps the general idea of the system will be enough to be helpful to someone out there.

So, when someone leaves a comment, it's immediately visible and an email is sent to the admin, so he knows and can take action if the comment is spam. That worked fine for far longer than you'd expect, but eventually the spammers found it and started bombarding him with crap.

So, the first line of defense we added was quite simple - it blocked people trying to submit urls using common forum codes, i.e. "[url]some spammy site[/url]" - this was an easy fix, since the site is custom code and it doesn't say anywhere to use such a code to add a url, there wasn't much chance of getting false positives.

That helped quite a bit, but eventually we needed more. Some spammers will just post straight links, and some just post random gibberish for no apparent reason - just apparently random strings of pill names and other crap. So we added the 2nd line of defense, a customizable list of terms that the admin could use to block any comment that had such a term in any of its fields.

Now when a comment is blocked by the system, it is basically ignored - nothing is logged, it never shows up on the site, no email is sent to the admin.

Eventually, the client needed a third level of protection - spammers were still posting links. As a temporary fix, he blocked "http", but spammers would still post domain names, etc. And plus it was possible that actual legit users would want to post links as they discuss the articles on the site, so he knew he needed a better system.

We talked about using a Captcha (those visual puzzles where you have to type the letters in the picture) but decided against it both for accessibility reasons (lots of people have problems with those, plus how is a blind user to use it?) and because I'm not sure they provide much protection any more - after all the gmail captcha has been broken for quite a while!

So I decided to implement one of those simple question and answer systems. Basically, the user must answer a simple arithmetic problem before posting. The assumption is that bots would not be smart enough to answer such a problem, at least not without custom programming by the spammers - and they'd rather just move on to an easier target - of which there are many!

To implement it, the system picks to random one digit positive numbers, and asks the user to enter the sum. The tricky part is that the answer must be encoded in the form itself, so that the system knows what the correct answer is. So, the actual correct answer is also stored in a hidden field in the form. But obviously this makes it easy for a bot to "see" and use, so it must be obfuscated somehow. To do this, I hash the correct answer, using a key which changes daily.

A "hash" basically "makes a hash" of a string - generating a unique bunch of gibberish, using the key that's passed to it. The system picks a key using a number of factors including things like the date, the ip of the user, the article url, etc. The idea is you need a key that is fairly hard to guess, but that the form processing script can also come up with on its own, without passing the key to it somehow.

So, this correct answer is hashed and put into a hidden field. When the user submits the form, the comment script takes the users answer and hashes it with that same key, then compares the result to the hashed correct answer submitted in the hidden field. If the answer is correct, then the comment is posted (assuming it passed the older tests as well) - if not the user is told to go back and try again.

So far this system is working. I still have some tricks up my sleeve for when it needs to be updated again - one that looks promising is taking advantage of the fact that bots tend to fill out all the fields in a form - even fields that a real user can't see. For example, you could create some dummy fields and hide them via CSS. A real user wouldn't see them, while a bot reading the source code of the page would. So if you see text in one of those dummy fields, you know it's a bot - and block it.

I also haven't done anything with IP addresses - so adding a simple IP blocking system may be worth pursuing at some point.

Comments

Popular posts from this blog

Security Tips - Passwords and Logins

Passwords are something we all have to live with. There are other authentication methods slowly coming into use (i.e. two-factor) but it's hard to see passwords going away anytime soon. I assume everyone knows the basics - use "good" passwords, don't share them between sites, don't write them on a sticky note on your desk, don't save them in a file named "passwords.txt" on your computer, etc etc. That's all well and good, but there's so much more you can do! Good Passwords A "good" password is hard to guess, is what we're told. I think most people are unclear about what exactly "guess" means. These days, it means that it needs to be resistant to password cracking attacks that are getting ever more fast and sophisticated. Just making sure that you have numbers, characters, upper/lower case, etc isn't enough. The gold standard most important thing about a password is that it is long . The longer the better.

Another VI tip - using macros, an example

God I love VI. Well, actually, vim but whatever. Here's another reason why. Suppose you need to perform some repetitive task over and over, such as updating the copyright date in the footer of a static website. (Yes, yes I know you could do a javascript thing or whatever, just bear with me.) Of course you could just search and replace in some text editor, changing "2007" to "2008" (if you're stupid) - and you'll end up with a bunch of incorrect dates being changed, most likely. What you need to do is only change that date at the bottom. And suppose that because of the formatting, you can't use the "Copy" part of the string in a search replace - perhaps some of the pages use "©", some spell out "Copyright" etc. This is where vi macros come in handy. A macro in vi is exactly what you expect, it records your actions and allows you to play them back. To start recording, press q followed by a character to use to "stor

Debugging a DOS

I'm not a sysadmin, but I end up doing my best now and then when one of my sites gets into trouble. This is a sort of "after action report" of an incident that I just resolved (hopefully). I woke up and happened to check email on my phone (don't always do this, will now) and was greeted with a uptime robot email that one of my sites was down, and had been for about 4 hours. I quickly checked the site on my phone and yup, it wasn't loading. Ran to the office and hopped on my laptop. SSH to the server, and everything seems fine. Very little load on the server (AWS instance). Did a restart of apache/php/mysql and the site is still down. Weird. Running the site's index.php file on the command line works as expected and fast. Ask a few other people to check, and it's down for them. Then I logged into the AWS console and checked on status there - everything is up and running.... WTF? This is a lightsail instance, and then I noticed the outgoing network traffic h