Since the Google PageRank storm last October, I have changed how I place ads on my sites. Some of my sites, like here on PlanetMike.com, I don’t have advertising at all. Other, like ChristmasMusic247.com, do have advertising. But all ads on my sites now have the overly broad nofollow tag on them. In the past 6 months, three different ticket brokers have contacted me about adding ads to my site. And all of them backed away because they aren’t interested in sites that have the nofollow tag on ads. So if a search engine wants to prune out bad actors in the online ad space, all they need to is look at sites that are linking to the large ticket brokers. It’s also obvious that the ticket brokers aren’t really interested in supporting small web sites, or building their customer base from niche web sites. They want to game the search engines by the text in the links in the ads they would run. And that’s exactly what Google is trying to fight.
Evolving PlanetMike.com: Chapter 4 – Copying the Database and Redirecting
Ok, I’ve moved a whole bunch of posts from PlanetMike.com over to my personal site, MichaelClark.name. While tedious, it was a pretty straightforward process.
- Install WordPress on your new domain. Make sure it is the same version as the one on the old domain!
- Copy the active theme to the new site’s theme directory. This should be under wp-content/themes/.
- Copy all of your plugins to the new site’s plugins directory. This should be under wp-content/plugins/.
- Copy the WordPress database to a new database. Ideally, make a backup of the old site’s database, and then import it into a new database. At this point you have two WordPress database set up with 100% duplicated content, users, passwords.
- Use a mysql editor to edit the “siteurl” and “home” values in the wp_options table. I used Navicat, but you could use any mysql database editor like PHPMyAdmin or the command line.
- Then simply edit the wp-config.php file on the new site so it can read the new database.
Tada! You now have a complete copy of your old WordPress blog on a new domain.
I took this opportunity on the new site to change my permalink structure. I removed the dates from my post’s URLs.
Important! If you have any photos, style sheets, images, included files, or other content referenced from the old site, those links won’t work on the new site until you copy or move that content over. And since I was splitting out my personal information, I had a lot of other info to move around.
My goal was to have my photos, family stuff, jokes, etc… all on my personal site. So I moved those directories to their new location. But for now I don’t want those pages to break if someone visits that URL. So I created a link from the new site to the old site. For example:
ln -s photos ../www.planetmike.com/photos
I did this for a bunch of directories. To the end user, you can’t tell that anything has changed. But I do have the same content on two domains now, which could mean a penalty at some point from search engines. The reason I had a bunch of directories is that PlanetMike.com has been around since 1997, the days before there were content management systems available.
Now another tedious step: removing duplicate content from the blogs. On the new site I went through and removed a bunch of posts and categories that will not be used there. On the old site I didn’t have to do anything with those posts, since they aren’t changing.
And the final tedious step: On the old site, look at each post in the categories for the new site. A very few of these posts could be deleted entirely, from both blogs. But most needed to be redirected to the new site. I created a list of URLs from the old site that were to be redirected to the new site. I did this by category, just to keep this in manageable chunks. A few posts I reclassified into other categories. Then in my .htaccess for the blog, I added the RewriteRule, one for each URL:
RewriteRule 2005/09/15/virginia-tech-photographs/? http://www.michaelclark.name/virginia-tech-photographs [R=301,L]
This rule says if the requested url contains 2005/09/15/virginia-tech-photographs (with an optional slash at the end), that page has been permanently redirected to http://www.michaelclark.name/virginia-tech-photographs. As I added each category’s URLs to the .htaccess file, I deleted the post from the old blog. Anyone that requests the old URL would transparently be bounced to the new site. That might be confusing to people who visit the site during the transition.
I ended up with over 270 lines in the .htaccess file. I could have used some regex (regular expression) magic, but I liked actually touching each URL. By doing each URL one at a time, I was able to basically audit the entire site.
I finished this step a week ago, and have been watching my server logs and my Google webmaster account for problems. One problem I’m seeing now is that some of my category and monthly archives had multiple pages, but since I’ve pruned the number of posts down, the highest numbered pages don’t exist any longer. I’ll be leaving that as a 404 error so that the search engines will remove that page from their indices.
Akismet Misses “Paris Hilton Sex Tapes” Comment Spam
As much as I love using Akismet for getting rid of comment spam on my blogs, it is extremely frustrating that for some oddball reason, Akismet rarely catches spam that discusses Paris Hilton’s sex antics. I dutifully tag them as spam, but for whatever reason, Akismet won’t learn that Ms. Hilton’s shenanigans aren’t a real comment.
The same thing happens with email using SpamAssassin. Just this morning, one of my users complained to me about a bunch of messages that were getting through safely. I watched the logs and tweaked the SA rules a bit, so hopefully that will help that user.
I really like that SpamAssassin’s rules can be tweaked and modified as needed. But Akismet is a closed to the world, we throw data into it, and what comes out is what comes out. I see that in 2005 someone worked on integrating SpamAssassin into WordPress, but stopped development. I wonder if it should be taken up again? Or maybe just a new plugin is needed that allows WordPress blog owners to add in some custom rules. I don’t think the built-in “Comment Blacklist” is enough.
Extra Fields in Trackback Spam
One of my blogs just caught a spammer from 69.31.80.66 trying to submit trackbacks to the blog, with extra fields in the “Name” field.
Gen Drebery’,’deber@gmail.com’,”,’63.2.12.45′,’2008-01-25 13:43:30′,’2008-01-25 13:43:30′,”,’0′,’Internet Explorer’,’comment’,’0′,’0′),(’0′, ”, ”, ”, ”, ‘2008-01-26 13:43:30′, ‘2008-01-26 13:43:30′, ”, ’spam’, ”, ‘comment’, ‘0′,’0′ ) /*
The web server logs showed he was trying to hit a specific post, then tried to hit the first post. Could this be an attempt to fingerprint my blog?
69.31.80.66 – – [25/Jan/2008:08:43:28 -0500] “POST /2006/10/30/post-slug-here/wp-trackback.php HTTP/1.0” 404 19104 “-” “Python-urllib/1.17”
69.31.80.66 – – [25/Jan/2008:08:43:28 -0500] “POST /2006/10/30/wp-trackback.php HTTP/1.0” 404 19123 “-” “Python-urllib/1.17”
69.31.80.66 – – [25/Jan/2008:08:43:28 -0500] “POST /2006/10/wp-trackback.php HTTP/1.0” 404 19104 “-” “Python-urllib/1.17”
69.31.80.66 – – [25/Jan/2008:08:43:29 -0500] “POST /2006/wp-trackback.php HTTP/1.0” 404 19104 “-” “Python-urllib/1.17”
69.31.80.66 – – [25/Jan/2008:08:43:29 -0500] “POST /wp-trackback.php HTTP/1.0” 200 135 “-” “Python-urllib/1.17”
69.31.80.66 – – [25/Jan/2008:08:43:29 -0500] “GET /wp-trackback.php?p=1 HTTP/1.0” 302 – “-” “Python-urllib/1.17”
69.31.80.66 – – [25/Jan/2008:08:43:30 -0500] “GET /wp-login.php?action=logout HTTP/1.0” 302 – “-” “Python-urllib/1.17”
69.31.80.66 – – [25/Jan/2008:08:43:30 -0500] “POST /wp-trackback.php?p=1 HTTP/1.0” 200 78 “-” “Python-urllib/1.17”
69.31.80.66 – – [25/Jan/2008:08:43:31 -0500] “POST /wp-trackback.php?p=1 HTTP/1.0” 500 600 “-” “Python-urllib/1.17”
69.31.80.66 – – [25/Jan/2008:08:43:31 -0500] “POST /wp-trackback.php?p=1 HTTP/1.0” 500 600 “-” “Python-urllib/1.17”
Renting movies from Apple via iTunes
I wrote an article about my experience with renting a movie via the Apple iTunes store. It’s published on TheAppleBlog.com.
Creating a “GoTo” URL For Your WordPress-Powered Site
For one of my other sites, I’ll be doing some postal mailings in which I’ll need to include the URLs of some of the posts I’ve made. I really don’t want to force people to have to retype those horribly long URLs. I could use a service like TinyUrl.com, but I’m not happy giving a third party control of portions of my web site. So I’ve made it easier by using the power of apache’s rewrite rules with WordPress’ Post ID #. So instead of me having to mail out a URL like:
http://www.showbizradio.net/2008/01/10/community-theater-schedule-wallpaper/
I can include this one, which will redirect to the same post, and is much easier to type, or read over the phone:
http://www.showbizradio.net/goto/2133
To do this, create a new folder under your WordPress directory. You can call it anything you like, but shorter is better. I’ve called the directory goto, although go would also work well.
Inside that directory, create a file called .htaccess. The leading dot is important!
Put these lines in the the .htaccess file:
RewriteEngine On
RewriteRule ([0-9]+) /index.php?p=$1 [R=301,L]
RewriteRule (.*) / [R=301,L]
The first line simply enables the ability for the web server to process the request.
The second line says that if any page request in your “goto” directory is only digits, to pipe those digits into the index.php program. The R=301 tells web browsers and search engines to permanently redirect to the new url, and the L means this is the last command to execute.
The third line catches any other request (such as http://www.showbizradio.net/goto/heck) by simply redirecting any other request to your site’s home page.
And that’s all there is to it. Let me know if you have any problems with this. I’ve tested it only on WP 2.3.2, running under Apache. It should work fine if you have customized your site’s permalink structure.
Evolving PlanetMike.com: Chapter 3 – Moving Content Between Domains
I’ve made a good start on splitting out personal stuff from PlanetMike.com, and moving it to MichaelClark.name. Jokes, photo galleries, and a handful of categories from WordPress have been moved. I’m moving a few more categories today, then I’ll share some of the lessons I’ve learned. Hopefully you won’t see too many problems on any of the pages during this transition. I’m using linked directories, and apache’s rewrite rules to gracefully move content around.
Announcing CodeQuote
I’ve just released a WordPress plugin for disabling smart quotes in text that is inside a <code> block. Smart quotes, also known as curly quotes or fancy quotes, don’t mix well inside code, so if someone copies and pastes your code with smart quotes, they have to tweak the code they want to use. Which I think everyone will agree is a waste of time. More information and the download are available on the CodeQuote page. Please send me your feedback; I definitely need to know about situations where other characters inside code are being “fixed” by WordPress.
Update: 4:42pm I’ve already made several bug fixes to CodeQuote. Things like less than symbols apparently are now working correctly. And it doesn’t matter if you have a blank line in front of the open code tag. Let me know if you see any other weirdness. I’ll need to see the code you’re entering into a post so I can experiment on it here.
Overview of Web Site Traffic Analysis Tools
The first of the year is a time for looking back at what was accomplished in the past year, and to set goals for the coming year. If you run a web site, one way to do that is to look at your site’s traffic and see what happened. I run my own server, so have access to my complete web server logs. Historically I’ve aways used Webalizer to run my stats.
But is Webalizer still the tool I should be using? There are other tools out there, so I looked at six different applications that will generate different reports based on my server logs. The tools I looked at are: Analog, AWStats, PWebStats, Visitors, Webalizer, and W3Perl.
| Tool | My Comments | Final Grade |
|---|---|---|
| Analog |
|
|
| AWStats |
|
|
| PWebStats |
|
|
| Visitors |
|
|
| Webalizer |
|
|
| W3Perl |
|
Notes:
- I was testing offline processing. Some of these tools can be installed to run directly from a web site’s cgi-bin (W3Perl apparently prefers that setup).
- I ran these reports on the log of one of my rarely used domains, which is also used for exploring software. The log file had 7,746 records in it for 2007.
- I edited by hand the sample reports to remove spam referral links, as I don’t want to link to “bad places.”
- Processing speed can be important, but generally reporting tools like these are scripted and run overnight, so I did not track processing time. When I ran these programs on my larger sites’ logs (PlanetMike.com, ChristmasMusic247.com and ShowBizRadio.net) they all finished in an acceptable amount of time.
- The sample reports show default options. Read the tool’s docs for details on customizing your reports.
- Final Grade is entirely subjective, my own opinion.
Keep in mind reading server logs is a black art. Many assumptions are made by each of these tools. A key assumption is how to define a visit. If the same IP address and user agent visit within 30 minutes, that’s one visit. To another tool with the same data, that may be two visits. Look at this example: A user from the IP address 257.258.259.260 visits a site at 10:00pm, reads that page, and then follows links on the site. The pages are accessed at 10:10pm, 10:50pm, 11:00pm, 11:10pm, 11:45pm, 12:05am and 12:10am. Webalizer would report one site and three visits. Visitors would report two unique visitors.
And that raises the issue of defining terms. Someone (or something) that accesses a site may be called a visitor, a host, or a site. Hits may be called hits, requests, accesses. Some tools define hits as only html pages that are accessed, others think a hit is anything that is accessed from the server. So you shouldn’t compare stats from one tracking package to another. You can’t easily compare stats until you understand what they are reporting.
Another major issue is removing robots, spiders and crawlers from your reports. Most webmasters aren’t interested in how many automated critters are devouring their site, they only want to know how many people are reading their articles. That’s where third party embedded tools come into play. I will discuss those tools next week. One time when you do want to know about automated traffic is when you want to identify Bad Things. Spammers, thieves, crackers and other abusers are out there leaving fingerprints in your log files. Third party tools can’t help you when you’re looking for bots.
Conclusion
Each of these tools has a place in a webmaster’s toolbox. I will continue to use Webalizer for my public traffic report for PlanetMike.com. And for my own knowlede of how my sites are doing, I’ll probably use Webalizer, Visitors, and AWStats.
Update: September 8, 2009: I’ve removed the sample reports I generated. They were attracting a huge amount of attention from referer log spammers.
Usage Statistics for PlanetMike.com: 2007

| Summary by Month | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Month | Daily Avg | Monthly Totals | ||||||||
| Hits | Files | Pages | Visits | Sites | KBytes | Visits | Pages | Files | Hits | |
| Dec 2007 | 15,151 | 13,329 | 8,748 | 2,681 | 42,654 | 7,180,572 | 83,124 | 271,191 | 413,207 | 469,697 |
| Nov 2007 | 15,551 | 13,817 | 8,359 | 2,690 | 43,339 | 7,285,245 | 80,720 | 250,796 | 414,536 | 466,559 |
| Oct 2007 | 14,355 | 12,382 | 7,932 | 2,829 | 41,912 | 5,712,332 | 87,721 | 245,918 | 383,861 | 445,030 |
| Sep 2007 | 16,362 | 14,760 | 8,595 | 3,121 | 44,531 | 6,048,530 | 93,651 | 257,851 | 442,803 | 490,879 |
| Aug 2007 | 19,132 | 17,161 | 9,987 | 3,957 | 53,589 | 9,012,449 | 122,670 | 309,605 | 532,000 | 593,098 |
| Jul 2007 | 19,288 | 16,980 | 10,382 | 4,510 | 54,278 | 7,551,001 | 139,820 | 321,848 | 526,400 | 597,931 |
| Jun 2007 | 19,756 | 17,743 | 10,586 | 4,326 | 55,907 | 5,610,407 | 129,796 | 317,585 | 532,290 | 592,688 |
| May 2007 | 19,309 | 17,420 | 10,541 | 3,538 | 51,775 | 5,939,830 | 109,689 | 326,784 | 540,050 | 598,607 |
| Apr 2007 | 23,801 | 21,295 | 11,800 | 3,880 | 62,231 | 7,530,408 | 116,417 | 354,002 | 638,869 | 714,044 |
| Mar 2007 | 18,459 | 15,847 | 10,354 | 3,159 | 46,489 | 6,788,780 | 97,932 | 320,974 | 491,286 | 572,233 |
| Feb 2007 | 20,256 | 17,948 | 9,690 | 3,415 | 57,104 | 6,743,685 | 95,634 | 271,332 | 502,559 | 567,168 |
| Jan 2007 | 21,433 | 18,710 | 11,511 | 2,954 | 46,985 | 6,808,148 | 91,595 | 356,869 | 580,011 | 664,446 |
| Totals | 82,211,387 | 1,248,769 | 3,604,755 | 5,997,872 | 6,772,380 | |||||