Skip to content

Using 301 vs. RewriteRule

A few weeks ago, I wrote out a series of steps aimed at Fighting Bots Via Their Bad Requests. After watching my logs since then, I’ve noticed I made an incredibly stupid mistake. Bad bots do not follow 301 redirects! What does that mean?

If a regular browser encounters a web page, image, or document and is told by the web server that that item has been moved (via a 301 response code), the web browser will then ask for the item at its new location. That is the redirect. But it is up to the browser to ask for the item at its new location! Bad bots don’t care if the item has been moved, they’re just looking for vulnerabilities to exploit.

So I have removed the R=301 code from my redirect requests. And now if a bad bot asks for something like http://www.planetmike.com/2007/02//include/scripts/export_batch.inc.php?DIR= they will not be redirected to my robot-trap, but instead will be given the robot-trap! Poof! Instantly they are blocked. I was making things more complicated than they needed to be.

Cancelled the Washington Post

This week my wife and I looked at the online options for the weekly tv listings, something the Washington Post suggested we do since they wanted to stop including the TV Week magazine with the Sunday edition. The online option is fairly useful. And since we only watch a handful of shows regularly, it’s not a huge hassle. This morning I looked for the online versions of the WaPo’s comics. Not a bad selection, I wish they would deliver to my inbox, but ok.

So I called the Washington Post (since you can’t cancel your subscription online). Only a couple of minute on hold. I told the representative I would like to cancel, she of course asked why. When I mentioned the TV Week change forced me to look at my online options, she quickly said she would add that to my subscription so I’d keep getting the magazine each week. She seemed honestly surprised when I said I’d prefer to read the online version.

Then she offered to let me keep paying my promotional rate of 49 cents per week. I said, no thanks. Then I asked for a refund of my outstanding issues, she said she would do that, and that’s it. I’ll be getting tomorrow’s paper, then no more.

I hope the editors and publishers at the Post aren’t surprised. It was after all their own suggestion that I look online for the TV listings.

Barnes & Noble Opt-Out Broken, No Response to Privacy Emails

A few months ago I signed up at barnesandnoble.com so I could download some free audiobooks. I of course gave them a uniquely generated email address so I could track if they sold my email address to other companies. I registered, confirmed my account, got my downloads, no problem.

Then I started getting B&N’s weekly newsletter. After a couple weeks, I decided I didn’t want it. So I followed the unsubscribe link, which had my email address embedded into the URL. I go to that page in my browser (Safari 3 under OS X), my email address is already filled into their form, hit the unsusbscribe button and wham! “We were unable to process your request. Please try again.”

Ok, maybe they’re having tech problems. Try again a few minutes later. Same error message. So I checked out their Privacy Policy, and sent a nice email to them at privacy@barnesandnoble.com.

Good morning,

A few weeks ago I created an account at the Barnes & Noble web site so I could download some of your free audiobooks. That “purchase” worked just fine. I was able to register my account and I received the download instructions in an email message. The email address I used was the same one I am using to send you this message.

Now I am receiving your weekly email newsletter. I read it, and decided I don’t want to receive it. So I clicked on the opt-out link. (link redacted)

But when I try to submit that link I am told the attempt was unsuccessful. “We were unable to process your request. Please try again.”

It seems to me that if you can allow me to create an account with a long email address, which is entirely valid, you’d allow me to unsubscribe that email address from your promotional mailing list. Please investigate your system, and fix it to allow people with long email addresses to unsubscribe. Thanks very much for your help, Michael

I received no response at all from Barnes & Noble.

Two weeks (and two newsletters) later I sent the note to B&N again. And again, no response at all.

If you’re going to have a privacy policy, you need to follow it. Part of that is actually monitoring the email address you give to the public if they have privacy concerns. A complaint has been filed with the Federal Trade Commission.

(Update: I tried submitting the opt-out form with Firefox, which apparently ignores the maxlength field on an input form. And apparently I’ve been opted out. Regardless, B&N needs to actually have someone assigned to do something with their privacy email address. I doubt I would ever shop at bn.com again.)

The Incredible Shrinking Washington Post

I currently subscribe to the Washington Post, Sundays only. Three years ago I could read through the entire paper in a couple hours. Now, with the dropping of a few sections (TV Week), and combining sections together (Book World, Editorials, Business section), and reducing the amount of original writing available in the printed paper, I can finish the entire printed newspaper in less than 45 minutes. I am 99% sure I will be canceling our subscription, if the Post will refund me the balance on my account. Of course, the ombudsman let it slip that most people threaten to, but don’t really, cancel the paper. If they won’t refund my outstanding balance, I’ll simply not renew.

Well, at 50 cents per Sunday, the printed version is a good value. But the equation of value is being changed by the Post. Comics? Available online with a larger selection. TV listings? Online in a custom grid showing only the stations I actually watch. Comments by people interested in an article’s topic? Online. Am I saying the Post should be charging less for the Sunday paper now that they’ve been reducing it’s size? Probably. Will they do that? Probably not. Will they make the weekday issues less expensive as well? I don’t know, I haven’t looked at a weekday paper in months.

Regardless, it’s a chicken or egg situation. Am I canceling because the Post is smaller than it used to be? Is it smaller because the Internet forced the paper to be smaller? Either way, the printed version of the Post won’t be coming to our home next month.

FaceBook Username

Facebook has allowed you to set up a username for your account. So you can now see my Facebook pages at http://facebook.com/planetmike.

Facebook, Meet 2009. 2009, Meet Facebook.

I agree with Kevin Burton’s thoughts at Facebook Pages are Just Blogs. I just wish I had written out the part about Facebook lacking RSS or other feeds first. I hate Monday mornings when I load up Facebook and have to hit the silly “Older Posts” button a bunch of times to catch up with everyone’s weekend. Putting all that into a feed of some sort would make that much more efficient. I would almost pay for a feed of the posts on my wall.

Actually, is there a way to export all of my Facebook posts so I can move into another blogging system or cms? The Help pages don’t mention a method. As Matt Cutts said, Not trapping users’ data = GOOD. Robert Scoble wrote about Facebook disabling his account back in January of 2008. Also, check out DataPortability.org. And ZDNet wrote in December 2006 about the issue Do ordinary users care about data portability? And if not, should they? Four social networks respond.

Fighting Bots Via Their Bad Requests

Last week I started looking through my “page not found” (404) errors. It’s been interesting to say the least, with nearly 800 bad requests since then, most of which have been various attacks in trying to mislead people. I address most of these attacks or probes by re-routing the request to a robot block system based on Daniel’s Webb’s Bot-trap – A Bad Web-Robot Blocker. In the code blocks listed below, I have listed a sample of the commands I have entered into my site’s .htaccess file. Warning: Messing with your .htaccess can break your web site. Be Careful!

  1. Scanning for Remote File Include Vulnerabilities. There are many bugs in many diferent programs out there on the web. So when someone scans my server for oen of these things, they aren’t up to any good. Common requests include errors.php, contact.php, advanced1.php. There is (was) a compromised server at
    • http://www.eyepro.net//assets/images/id1.txt
    • http://www.eyepro.net//assets/images/master-id.txt
    • http://www.graal-plus.zp.ua//images/roxx.jpg
    • http://grupowh.com/sqli/fx29id.txt
    • http://www.ecobook.or.kr/ecobook/data/ecobook/1132289642/copyright.txt
    • http://www.centrsoft.ru/logo.jpg
    • http://www.cookieez.com/image.jpg
    • http://largeface.com/gnuboard4/style/sid.txt
    • http://harvestusa.org///administrator//includes/id1.txt

    Someone at these IP addresses was scanning:

    • 80.67.20.178
    • 62.193.227.12
    • 212.193.241.25
    • 70.245.218.25
    • 203.185.28.194
    • 200.30.136.59
    • 74.55.117.34
    • 74.36.117.160
    • 94.23.200.54
    • 211.189.18.73

    These probes are now being dealt with in real time.
    See Fx29ID cmd for a little more information.


    RewriteRule errors.php /robot-trap/ [L]

  2. Requests for my favicon.ico in a directory other than the root. The only thing in common with the requests is they are all using GoogleToolbar 6.1; Windows 6.0; MSIE 7.0. This is a silly request, is there ever a reason for a favicon to be anywhere other than the root of the site? I’ve now dealt with these as well.

    RewriteRule .+/favicon.ico /favicon.ico [L]
  3. There were a very few requests for a malformed URL that was actaully part of my server’s file system. That’s been fixed.

    RewriteRule var/www/html / [L]
  4. I discovered a Java-based browser (probably a spam bot seeking email addresses) that kept tripping one URL that wasn’t being rewritten correctly. So I fixed my rewrite rule.
    • 213.93.203.217, Java/1.6.0_04
    • 24.132.227.22, Java/1.6.0_13
    • 77.211.115.58, Java/1.6.0_04
    • 82.192.63.216, Java/1.6.0_13
    • 84.124.194.76, Java/1.6.0_13

    Now to decide how to deal with these “web browsers.” Based on the traffic activity (one request every 1-2 seconds, no images or style sheets requested at all) should I even allow these browsers to access my web site? I’ve also decided to block access to my web site by Java user agents. See How To Block Java User-Agents for someone else’s similar approach to the Java problem.


    RewriteCond %{HTTP_USER_AGENT} Java.*
    RewriteRule ^(.*)$ /robot-trap/ [F]

  5. For some bizarre reason, the MSNBot (shouldn’t that be the BingBot now?) is sending requests including an anchor. Their requests are showing up as “GET /url-stuff-in-here/#respond HTTP/1.1”. I can’t see how the hash symbol (pound symbol) is being logged at all, I can’t reproduce the problem. Helpfully, MSN does include with their browser agent, a URL that can be used for help. “msnbot/2.0b (+http://search.msn.com/msnbot.htm)”

    Just try visiting that msn.com URL. You get bounced over to a site at live.com, which is now Microsoft’s bing search engine. Using Firefox 3, OmniWeb, or Opera, after “signing in” I kept getting into a loop where it would ask me to join the community. I couldn’t get past that point. I finally fired up Virtual Box and used IE8 under Windows 7. Sigh. And Microsoft wonders why people loathe them so. (Remember Bing stands for But It’s Not Good.)

  6. Speaking of Microsoft, some poor souls are still using IE 6, with the Discussion bar turned on. Discussion bar requests (/_vti_bin/ or /MSOffice/) are now being redirected to the I6 page at BrowserUpgrade.info.


    RewriteRule _vti_bin http://www.browserupgrade.info/ie6/ [R=301,L]

  7. A few web browsers were badly broken, and were requesting things like http://www.planetmike.comhttp://www.planetmike.com/2006/08/. I fixed this.


    RewriteRule www.planetmike.com(.*) $1 [R=301,L]

  8. A few bots were attempting to poison my referral logs, or to otherwise do Bad Things by requesting multiple times files on my site with a space in the file name. And to make things look legitimate, they set the referral to be a search at Google.


    RewriteCond %{HTTP_REFERER} google\.com
    RewriteCond %{REQUEST_URI} .*\ .*
    RewriteRule ^.*$ /robot-trap/space.php [L]

  9. A few requests for labels.rdf. This is a standard for labelling how family-friendly your web site is. See the Family Online Safety Institute Labelling Page for information. This is the next generation site labelling method after the PICs method which was used in the ’90’s. For now I’ll let these requests return a 404.
  10. Bad BrowserAgents. No browser agent at all? Denied! An agent of “anonymous” is not allowed either. Especially when your IP address is registered to Korea or Brazil.

    BrowserMatchNoCase "^$" spambot=1
    BrowserMatchNoCase "^anonymous$" spambot=1
    Order deny,allow
    deny from env=spambot
  11. And there have been a few links on my own site that were misspelled, or had other stupid mistakes. Those have been fixed.
  12. This is a start, I’ll post updates as more unique cases appear.

Publishing Your Blog on Kindle: The Agreement

While looking through the agreement for publishing your blog on a Kindle, I found the Terms of Service include 3,994 words. It’s 11 pages when I copy and paste the text into a new text document in OpenOffice.

Reading through the document, it actually looks like that Amazon will create a new feed to be used exclusively for the Kindle. You have to give to Amazon the rights to re-publish your content through their system. So is this similar to the publishers of the old days? Amazon keeps 70% of the revenues.

While there are more publications available then I thought there would be, some of the comments for The New York Times were enlightening. Read them at The NYT Kindle page.

Comment Policy

The comment policy on most of my blogs is: use a real name (first and last names), and if you include a link to a web site, the web site shouldn’t be pure advertising drivel. Something like “Jones Travel Blog” or “Something Tips and Tricks” would not be approved, as those sites are just wanting link juice. And I don’t use nofollow.

Jesper Rønn-Jensen at Time To Revise Our Comment Policy came up with five categories of comments. His categories of A, B, and E would be marked as spam on my sites. Categories C and D don’t matter to me, as I don’t count the gravatar as a positive or a negative, since the commenter can change their gravatar at any time.

Google Wave

Google set up an email list for people who want to know more about the Google Wave system. The Google Wave team asks for something interesting (Haikus, sonnets and ASCII art all accepted). Here’s my haiku:

What is Google Wave?
Is it email? IM? Both?
Is it the future?

There’s also a video of the announcement:

Yes, it’s an hour and twenty minutes long.