Skip to main content

What to do with a metric ton of spam?

I have a site hosted on one of my servers that has a phpBB installation. The admin for the board had basically walked away from the site.

Anyone who's worked with phpBB, or any forum for that matter - knows that it's like crack cocaine for spammers - they love that crap. So while moving sites off this old server, I noticed it was still there. I logged in and HOLY SH^T it's the spamocalypse in there. I think just one of the sections had like 8k new posts. Of course, any legitimate users had no doubt been driven away.

So I closed it down. But just before I did the rm -rf to dump the boards, I thought - perhaps there's something useful here? So I just put an htaccess password on it instead to keep the spammers out, as well as any SEO benefit they may get from google spidering all that crap.

At the very least there's a wealth of IP addresses and email addresses used by spammers. And perhaps much more - after all many of the posts are basically the same thing over and over, usually porn. It seems like it might be somehow useful for some sort of anti-spam tool. And it doesn't cost me to hang onto it, at least for now. Althought the mysql dump file was over 1 gig in size, which gives you an idea as to just how much spam is in that puppy.

So, if you have any ideas, let me. Otherwise it'll probably just take up a bit of disk space until I decide I'll never get around to using it for anything.

Comments

Popular posts from this blog

Another VI tip - using macros, an example

God I love VI. Well, actually, vim but whatever. Here's another reason why. Suppose you need to perform some repetitive task over and over, such as updating the copyright date in the footer of a static website. (Yes, yes I know you could do a javascript thing or whatever, just bear with me.) Of course you could just search and replace in some text editor, changing "2007" to "2008" (if you're stupid) - and you'll end up with a bunch of incorrect dates being changed, most likely. What you need to do is only change that date at the bottom. And suppose that because of the formatting, you can't use the "Copy" part of the string in a search replace - perhaps some of the pages use "©", some spell out "Copyright" etc. This is where vi macros come in handy. A macro in vi is exactly what you expect, it records your actions and allows you to play them back. To start recording, press q followed by a character to use to "stor...

Cleaning content from OpenOffice using Perl

Open office is great software for a number of things - I use it as my office software instead of paying a premium for Microsoft office. But one thing it's not so hot at is converting documents to clean HTML. And one of the main things I use it for is adding content to sites that clients send me in word files or excel spreadsheets. Of course, you can always cut and paste, but that loses a lot of formatting. For example, if the content uses a lot of italics, bold text, etc. it can be a huge pain to go back and put all that back in. Another common situation is a client sending some sort of tablular data in a spreadsheet - for example a list of events. It's the kind of data that can change a lot, and it also needs to be in a table with some decent formatting to be usable. Doing it manually is a lot of grunt work. But grunt work is what computers excel at, and I'm not very good at. So I've developed a number of perl scripts to help streamline this kind of job. I'll go ...

Using FIle FIlters in FileZilla

Here's a handy tip for situations when you want to download a large number of files - but only of a certain type. For example, perhaps you want to download all the PHP files from a largish website, scattered through many subdirectories. Perhaps you're making a backup and don't want any image files, etc. FileZilla (still the best FTP in my opinion) has a handy feature called filename filters - located under the Edit menu. Here you can set various filters that filter out files based on their filename. Took me a minute to figure that out - you're saying show only PHP files, rather you're saying filter out files that do not have ".php" as their suffix. For some reason, that seems a little backwards to me, but whatever. It works quite well. You can also check whether the filter applies only to files, only to directories - or both. In this example, you'd want to check only files, as otherwise you won't see any directories unless they happen to end in...