Musings (original) (raw)
October 15, 2003
Software Monoculture
A lot of the blogs I read have been deluged in recent days with robots posting comments.
I got a couple of robot comments posted over the weekend, before I took steps to deal with the problem. Since then, I have not gotten any (though I have seen numerous attempts).
The problem is simple. All MovableType blogs have a comment-entry CGI script mt-comments.cgi
and — by default — the comment-entry template makes no attempt to prevent Search engines from indexing it. The result? If you go to Google and search for mt-comments.cgi
, you’ll get millions of hits of MT comment-entry forms. Write a 'bot to post comments to that form and sit back and enjoy watching your Google PageRank explode. How could any spammer resist?
This needed fixing and, after the first shot over my bow, I wasted no time in taking the following steps to put a stop to it.
- Add a
<meta name="robots" content="noindex,nofollow" />
line to the comment-entry template, so they don’t get indexed in the future.
- Change the name of the CGI script so that the previously-indexed one is inaccessible and spammers can’t go after the new one with a shot-in-the-dark URL.
- Point to the new script in
mt.cfg
:CommentScript somenewname.cgi
and rebuild your blog pages.
- Sit back and enjoy watching spammers hammer away, attempting to access the old location of the comment-entry CGI script (adding their IP addresses to your IP Ban List).
But what of the future?
Once spammers tire of this little game (I give 'em another month, maybe), there are several directions they can go. Needless to say, I think I’m ready. But I’m not going to give the game away just yet. Check back in a few months to read about the next stage in the arms race.
Update (10/16/2003): Ben Trott weighs in:
We’ve all seen that comment spam is becoming a serious problem. Particularly on Movable Type weblogs, where the generated pages are all very similar in structure and semantics, …
Yeah, Ben that’s the problem, which is why content-based filtering is not really the solution. The real solution is to make robot-posting (regardless of content) infeasible. The above suggestions are the first step in that direction. I’ve implemented some further safeguards on this blog (which can be revealed by some assiduous viewing of source) and I’ve a few more tricks waiting in reserve for when the chickenboners wise up.
Update (10/16/2003): In a comment to this entry, I wrote “I, personally, prefer the CGI script to simply go ‘404’.” That, of course, is silly. What I really want is for the CGI script to go “410” (permanently gone). That’s a one line addition to the mod_rewrite rules for the MovableType CGI directory (which have been modified to reflect the new comment script location):
RewriteRule ^mt-comments - [G]
Update (11/17/2003): One month later, and still spam-free. Read this followup article for some further thoughts.
Posted by distler at October 15, 2003 10:02 AM
TrackBack URL for this Entry: https://golem.ph.utexas.edu/cgi-bin/MT-3.0/dxy-tb.fcgi/236
- R.I.P. Osirusoft —Aug 30, 2003
- The Spam Legacy —Jun 29, 2003
- Spam Comments —Jun 13, 2003
Re: Software Monoculture
Why shouldn’t comments be indexed? Are they not valid content? I think all comment spam solutions should follow a hippocratic rule. By the way, new problem: if I’m in this comment box and I press tab twice, the window closes. Moz 1.5 RC2.
Re: Software Monoculture
I’ve had a bit of the spam problem on my site, but far worse fro me has been that Google has only been indexing my comment and trackback forms. So until my new design which doesn’t make use of those links is finished, I’m stuck with some unwanted links in Google. Or so I thought.
Your META tag just made me realize the solution to my problem. Now my comment and trackback forms all have this tag at the top. Hopefully my search results can return to some state of normalcy in the near future.
Cheers!
Re: Software Monoculture
Okay, so I took your advice, changed the name of the .CGI program …
… then took step 3 … with a twist.
After I record their IP, user agent, etc …, I then redirect them to the URL they’re advertising.
If they’re launching a large automated distribution against my site (like they did this weekend) … then the will now launch a denial of service attack against themselves … or at least gobble up some extra bandwidth.
Regarding your update
Content based filtering is the answer because this isn’t like email spam. Comment “spam” is intent on improving ranking for keywords, so those keywords can be targeted and the comment can be dealt with as appropiate from there, most likely with some sort of moderation.
Read the post Comment Spam: Fuck to YOU!
Weblog: Black Coffee
Excerpt: My problem with spams being added to my comments has recently escalated. They're coming more regularly, and instead of promoting stupid zipcode websites, they're promoting 'lolita' sites. That's too much. Must stop. When I searched Google it was clear ...
Tracked: October 16, 2003 9:48 AM
Read the post comment spam-preventing measures
Weblog: fuddland
Excerpt: having not ever received a single spam comment, i'm in a fairly unique position amongst mt users to try out jacques distler's preventative measures before any robots figure out where my mt-comments.cgi script is. once they've got its location, the...
Tracked: October 16, 2003 10:30 AM
Re: Software Monoculture
It is interesting that such a high proportion of (tech oriented) blogs use Moveable Type. I assume this is because it is the most feature laden weblog-type CMS avaliable (although I really don’t know since I haven’t ever installed anything more complex than Blosxom). However, this undoubtedly contributes to the comment spam problem in many ways. Apart from the obvious fact that many blogs have identical default setups so making robot based attacks easier, the closed*-source nature of Moveable Type and somewhat restrictive license that entails means that this type of problem does not get fixed as quickly as it might in an Open Source system.
In fact, the lack of a free open-source weblogging application to match moveable type is a bit strange - on the surface it looks like it should be an ideal open source type project as:
- There are many potential users, lots of whom are technically inclined and so likely to submit patches
- The software is not especially complex, so there is a small learning curve for those wishing to contribute.
- Lots of the people using current commercial offerings are strong advocates of the Open Source concept
My best guess at the problem is that there are many people who have a custom built weblogging system, with a very narrow range of features (I have lots of ideas and a little code for a meta-data strong system that uses a lot of RDF, for example), but few people who are prepared to customise an existing project to suit their needs rather than start building for the ground up. It would be nice if this situation were to change.
*That is closed as-in ‘not open source’ as opposed to closed as in ‘not avaliable’
Read the post grab-bag
Weblog: Snapping Links II (The Revenge)
Excerpt: fisheye menus (neat concept), business blogging, distance learning accessibility, monocultures and spam, and the return of a list apart.
Tracked: October 22, 2003 6:25 PM
Read the post gathered all in one place
Weblog: Snapping Links II (The Revenge)
Excerpt: all the anti-spam resources I've gathered so far.
Tracked: October 27, 2003 7:12 PM
Read the post How much spam did you get daily?
Weblog: kurcula.com
Excerpt: Just today, we've received 6 spam comments. That's it, I'm disabling posting comments. Just kiddin'. I took some steps to...
Tracked: November 12, 2003 6:19 PM
Read the post Spam in Blog Comments
Weblog: Stratified
Excerpt: My old weblog gets spam in the comments every couple of days, mostly having to do with enlarging a certain part of the male anatomy. With the increased adoption of MovableType, most weblogs operate on a very similar architecture. In
Tracked: November 14, 2003 9:13 AM
Read the post Reducing comment spam
Weblog: Raw
Excerpt: Experience a D'oh! moment as you read Jacques Distler's little trick for reducing comment spam in MT. 5 minute job....
Tracked: January 24, 2004 10:58 AM
Read the post Reducing Comment Spam
Weblog: Ranting and Roaring
Excerpt: Here's a word for you: Monoculture. Anyway, I'm just posting this to remind myself to do this some day (or get Kathy to do it for me). Via Danny....
Tracked: January 27, 2004 12:11 PM
Read the post Stepping Stones to a Safer Blog
Weblog: Burningbird
Excerpt: In the last few weeks, I've been hit not only by comment spammers, but a new player who doesn't seem to like our party: the crapflooders, people who use automated applications (you may have heard of MTFlood or some variation) to literally flood comment...
Tracked: January 28, 2004 7:14 PM
Read the post Comment Sp*m
Weblog: Blogged
Excerpt: Weblog publishers who utilise the Movable Type system are particularly susceptible to comment sp*m. Until Six Apart release an updated version of Movable Type containing fixes for the current vulnerabilities, the only way to counteract comment...
Tracked: April 2, 2004 12:13 PM
Re: Software Monoculture
Jay allen has a software to aviod comment spam. have a look at it.
Read the post Good day yesterday
Weblog: I sound like a camel
Excerpt: Yesterday was good. Thanks for lunch Ren. First day in a while that I had some time of schedule. Btb,...
Tracked: October 5, 2004 7:03 PM
Read the post Die spammers die!
Weblog: Laurabelle's Blog
Excerpt: Using tricks from Parker and Dorothea, I've grown my own referer-spam-fighting fu. In addition, I've translated my old bot-fighting rules...
Tracked: January 18, 2005 2:12 AM
Re: Software Monoculture
This is now considered a bit of an old technique. If anyone comes across this page looking for information on robot spam its now best practice to have all comment links with a rel=”nofollow” in the a tag. For more information google nofollow and read the Wiki page. Cheers.
Re: Software Monoculture
Great information on what you did to deal with bots posting comments on your blog. Rendering your blog as a nofollow renders it less “valuable” for people who are trying to use your blog as a countable link for their website to be ranked…hopefully eliminating those folks from posting. I like your approach of looking ahead at the bigger picture to rememdy the issue of bot posting.