Moving a blog

Posted by David on Saturday, January 10, 2004 at 5:27 PM.

At the risk of being painfully self-referential, I thought it might be useful to the community as a whole to talk about some of the issues around my recent blog move. Those of you looking for the usual photos and weirdness, just move along, you don't care about this.

Why the move?

Earlier this year, I was paying for two Earthlink starter sites -- this site, and my photography portfolio site (sadly in need of updating). Earthlink isn't a bad host, but there were a few things I wasn't happy with.

One was that they didn't have all of the PERL libraries that Movable Type (the software that drives this and many other blogs) needs to be totally happy -- in particular, I wasn't able to send automated "trackback" pings to let various sites know that I had updated my site, and some image handling stuff didn't work either. When I contacted Earthlink's customer support, they told me that (although the individual support techs were sympathetic) there was no way Earthlink was going to update their PERL libraries for me.

The other issue that concerned me about Earthlink is their policy on bandwidth overages. If you're hosting a site with Earthlink, and you get slashdotted, they'll just keep serving up your pages as you pass your allotted bandwidth -- and then bill you for it at your overage rate. As some people have found (including Tom Tomorrow, to the tune of $3800), those overage charges can be hefty.

Then a friend pointed me to Lunarpages. Without being an actual ad for them, I'll just say that their deal (800mb of storage and 40gb of bandwidth a month, for $7.95 a month) was much better than Earthlink's (200mb of storage and 10gb of bandwidth, for $19.95 a month).

The two other benefits Lunarpages had over Earthlink were these: First, if I ever manage to exceed their very generous monthly bandwidth allottments, they won't just bill me -- they cut off the site. If I want to pay for more bandwidth, I get to make that choice and do that explicitly. No surprise $3800 bills. (Not to mention that their next higher hosting package, with 80gb of bandwidth, is only $22.95 a month)

Second, they allow add-on domains -- essentially a second web site sharing the storage space of the first site -- for $2.50 a month. So instead of paying $39.90 a month to Earthlink, I'm now only paying $10.45. Saving $353.40 a year on my hobbies was very attractive.

What don't I like about Lunarpages? A couple of minor nit picks -- logging in to and using their "control panel" is a little obtuse; also, their usage statistical reports aren't as nicely designed as Earthlink's Urchin reports. They also only allow one mailing list per account, instead of the 30 or so that Earthlink allowed me. Still: minor issues.

The problem

Moving davidadam.com to Lunarpages was super easy -- copy over files, re-establish e-mail accounts and mailing list, etc.

But when I tried to move Noise to Signal over, I ran into a significant snag.

The underlying problem was that my Movable Type (MT) database had somehow gotten slightly corrupted -- not so corrupted that I was seeing problems on the actual site, but corrupted enough that I wasn't going to be able to do a clever under-the-hood database transfer. There was another way for me to transfer the records -- using MT's "export/import" feature -- but there was also a major problem with that method.

When I started Noise to Signal, I used MT's default archiving style, where each individual page is saved as something like http://www.noise-to-signal.com/archives/000001.html, where the 000001 refers to the actual record number in the MT database.

The problem was that if I used the clumsy "export/import" method of moving the MT database, there was no way to insure that the records would maintain the same number in the database. Which meant that when I built the new site, what had previously been 000237.html might end up as 000235.html.

And the problem with that, my friends, is Google. Say you were searching for "haupia bread pudding". Record #237 on my old site, it turns out, is the first hit. So you click on the result in Google, but because I've moved my databases, instead of a recipe for a tasty dessert you might get the former record 235, photos of an anti-bush protest. Or if you had actually been searching for those photos of an anti-bush rally, you might get a picture of my cats. They're cute and all, but not necessarily what you were looking for.

And that was it. I was stumped. Dead in the water for four months. Still paying $19.95 a month to Earthlink that I didn't want to be paying.

The solution

I ran across the solution while wandering around the TypePad site looking for some info for a friend who was considering using their service. TypePad, briefly, is a all-in one blogging solution, built by the same people who built Movable Type, with a slightly newer UI and without the hassles of installing software on a server, which is more than most people want to do.

On the TypePad site was a list of external resources, including an article by the very clever David Ely on Redirecting your Movable Type permalinks to TypePad. As I read that article, I realized that David had solved my problem, as well.

I'll let you read his article for the full technical writeup, but in brief the solution is this: While the default archiving style for MT is the previously mentioned /archives/recordnumber system, you can define your own style. TypePad's default style, for example, is to use /year/month/first fifteen letters of title -- so this post, for example, would be /2004/01/moving_a_blog.html. And this is possible in MT, as well. Switching to this kind of URL system gets rid of the entire problem of 000237.html vs. 000235.html -- whatever the record number is, the titles stay the same.

The second part of David's solution was to use MT to automatically generate redirect pages for each of the old numbered pages. So, for example, if you follow that Haupia link from Google -- http://www.noise-to-signal.com/archives/000237.html -- you will briefly see a redirect page, and then you'll jump right over to the page in the new, slick archiving style: http://www.noise-to-signal.com/2003/08/haupia_bread_pu.html. Brilliant.

I was delighted. I offered to re-name myself after David (until I realized we already have the same name).

The gnarly technical bits (as if the preceeding paragraphs weren't dull and technical enough)

So here's the process I followed:

  1. Safely back up my current template for the individual pages.
  2. Replace the template with the redirect template.
  3. Rebuild the entire site with the redirect template in place.
  4. Restore the normal template for the individual pages.
  5. Change the archive style in the MT config page.
  6. Rebuild the entire site again with the normal page template, but the new archive structure.
  7. Copy the old archives directory from Earthlink to Lunarpages.
  8. Export the posts from my Earthlink MT site and import them into the lunarpages site.

There were two key bits of the redirect template. First, a pair of meta tags in the <head> section:

<meta 
   http-equiv="REFRESH" 
   content="2; url=http://www.noise-to-signal.com/
   <$MTEntryDate format="%Y/%m"$>/
   <$MTEntryTitle dirify="1" trim_to="15"$>.html" />
   
<meta 
   name="robots" content="noindex,follow" />
The first meta tag is the actual redirect; it tells the browser to wait two seconds and then send the user to this new URL, which is autogenerated by MT based on the date and title of the entry.

The second meta tag tells search engine spiders to not index this redirect page, and to update their records with the page I'm redirecting them to.

Then, in the actual text of the redirect page, I used the same technique to tell people where they're about to go, and give them a link to the page if their browser doesn't support redirects:

The new address of this entry is: 

<a href="http://www.noise-to-signal.com/
<$MTEntryDate format="%Y/%m"$>/
<$MTEntryTitle dirify="1" trim_to="15"$>.html">

http://www.noise-to-signal.com/
<$MTEntryDate format="%Y/%m"$>/
<$MTEntryTitle dirify="1" trim_to="15"$>.html

</a>

The MT code in there should look just like the code in the refresh meta tag, above.

Finally, the MT archive "template" -- where it should store individual archives -- looks like this:

<$MTEntryDate format="%Y/%m"$>/
<$MTEntryTitle dirify="1" trim_to="15"$>.html

Which should again look familiar.

Lessons learned

I suppose the biggest lesson I learned from this was that if I was setting up a new Movable Type install for someone, I would definitely use the year/month/title archiving style; if I had done that to begin with, it would have obviated all of this hassle, because I would never have been tied to numbered entries in the first place.


I hope all of this turns out to be useful for someone else; if this has helped you to solve a similar problem, I'd love to hear about it!


d, on Saturday, January 10, 2004 at 6:17 PM:

Coincidently, being an Earthlink customer - I also looked for another host for the same two reasons you mentioned. And in fact I am spending much of this weekend switching various blogs over. I chose Pair after a hefty and long search online.

I host several blogs too, and your archiving tip is something I hadn't considered doing until now.

Thank you for your post. Oddly comforting to read someone else has recently gone through the same process as myself.


rfkj, on Sunday, January 11, 2004 at 6:39 AM:

Is the process of generating new page names automated, or do you have to do it yourself? How does it deal with non-unique names? I'm not so sure that fifteen letters is unique enough; what if, the day after the Haupia Bread Pudding recipe, you got a cute loaf pan in the shape of a dog, and you wrote about Haupia Bread Puppies? Or you discovered that the bread pudding was great when integrated with calisthenics, and wrote about Haupia Bread Pushups? The examples are contrived, but the question is serious. Having the year and the month be part of the URL obviates a lot of that problem, but it's still conceivable that you'd duplicate the first fifteen letters of a title within a thirty day period.

Still, a text-based indexing system is a lot better than numbered records. Keeping things organized in some sort of hierarchy is an improvement over a single flat file. It makes things so much easier to find.

Database migrations suck. You haven't lived until you've tried to migrate a huge NetWare installation from NetWare 3.x to NetWare 4.x, which used different structures entirely. Good story!


David Adam Edelstein, on Sunday, January 11, 2004 at 7:56 AM:

It's definitely a valid point. I could have gone with 20 letters or more, and I probably would if I was setting up a blog for someone else... but I know that my titles are weird and inconsistent enough that I wouldn't repeat in a 30 day period -- in fact, I'm pretty sure I haven't repeated the first 15 letters yet, at all.


David Adam Edelstein, on Thursday, March 25, 2004 at 11:15 PM:

Update: Google, of course, has updated, so my example above (the search on Haupia bread pudding) doesn't work any more -- now Google is pointing to the name-based URL, instead of the number-based one.

Too darn effecient! :-)