More on comments and blog spam
All right, here's the summary. Regular readers, feel free to skip :-)
A couple of months ago, I received e-mail from my host, Lunarpages, saying that the script Movable Type uses for comments was chewing up near 90% of the CPU on the server I'm stored on. Because of that, they were disabling that script, killing comments on my site.
Since I was getting thousands of pieces of blog spam a week -- mostly blocked by the wonderful MT-Blacklist -- that didn't surprise me.
I conferred with Lunarpages, and after I closed comments on most of my old posts and upgraded to the latest version of Movable Type, they re-enabled the script.
Earlier today, I received another e-mail from Lunarpages, saying that I was still using 90% of the CPU on the server with the same script, so they shut it down again.
After doing a bit of research, I realized that although I had shut down comments, the blogspammers were still hitting that script with thousands of requests -- and even though it wasn't responding, it did have to wake up long enough to deny the request.
The solution came in the form of this excellent post, which (in summary) recommends changing the name of the mt-comments.cgi script to... something else. No, I'm not going to say what I changed it to. You never know who's listening.
Anyway, what that should do is prevent the blogspammers who have that URL in their database from being able to find it again easily, thus preventing server load, thus keeping both my esteemed hosts and you, faithful readers, happy.
Comments are back
Looks like Lunarpages shut down commenting on my site because of the high server loads the comment script was generating. We have some disagreement about what's causing that, but they have graciously restored the script as long as it doesn't become too much of a load.
In an attempt to give blog spammers a smaller "surface" to work with, I've installed David Raynes' simple and to the point plugin mt-close.
All it does is allow you to close comments on all entries older than a certain date. I, for example, decided on 30 days. That means that on my site, out of ~850 entries, comments are only possible on the last 12. Which should make my blogspam drop off considerably, reducing my server load.
The only thing it doesn't do, unfortunately, is run automatically. But hey, that's a small price to pay.
David Adam Edelstein, on Thursday, August 4, 2005 at 6:49 AM:
Hey, look, here one is now.
Andrew, on Thursday, August 4, 2005 at 8:25 AM:
A while ago I was having a spam problem and eventually found what is theoretically an update to mt-close, imaginatively entitled mt-close2. It's available from this site: http://thedeadone.net/
However, since Movable Type 3, I've been using Jay Allen's MT-Blacklist plugin with great success - to the point where I can afford to leave old posts open these days.
david adam edelstein, on Thursday, August 4, 2005 at 9:16 AM:
Well, MT-blacklist was doing a fine job of keeping blog spam from appearing on my site; the problem as near as I can tell is that the blog spam was still triggering the mt-comment script again and again and again, and putting too much load on my server's cpu.
Either that or Lunarpages has a vendetta against MT, which I've heard as well.
Uncle Vinny, on Thursday, August 4, 2005 at 8:59 PM:
Comment allez vous?
Search strings
You may or may not know that when you get to a site by searching for it on many of the popular search engines, the search engine passes on to the site the exact search string you had entered.
I'm not actually sure why the search engines do this, but it provides for some pretty disheartening reading on my part:

The number one way people get to my site is by searching for "cute animals"?!?! Boy, are they going to be disappointed.
(That chart, by the way, is part of the logging package that comes with my totally awesome site host, Lunarpages.)
Uncle Vinny, on Thursday, May 6, 2004 at 12:35 AM:
What's *really* scary is that noise-to-signal.com isn't in the top 500 results for "cute animals" -- I know, I checked. So that means that the 252 people (per day?!) coming to your site based on this request are dissatisfied with the other 500 "cute animal" pages they visited...
Richard Beers, on Thursday, May 6, 2004 at 12:54 PM:
I like the 'light transmitting concrete' option myself... it's how I usually get here... I just never remember those URL thingys...
Andrew, on Thursday, May 13, 2004 at 8:34 AM:
Me too. I always just google for "orthodox jews" to get here. Do-it-n-doo-doo feelin luckeeeeee.
Beth, on Saturday, May 7, 2005 at 5:17 AM:
I found you by searching for "fritty plowers". I was looking for references or song lyrics to the Jim Henson 1972 made-for-TV-movie "Frog Prince". Obviously your posts have nothing to do with that, but I have found many other delicious items here instead! (I did not notice any cute animals.) Your blog rocks. Hopefully you are not plagued by the Frog Prince as I am. "N'Im Einteen, nim einteen, by mirthday's dotay..."
An experiment
rfkj, on Thursday, January 15, 2004 at 9:20 PM:
Does the phone have a built in camera? That would be really neat: see something cool, take a snap with the phone and upload it instantly.
What input method are you using? I've grown accustomed to using Transcriber on my Jornada. It's a far cry from, say, the Newton or Graffitti. The only real problem I've had with it so far is that it's got difficulty with differentiating between F and f, and C and c, understandably. Does the PPC Phone Edition even support Transcriber, or do you have to hunt and peck on that infinitesimal virtual keyboard?
David Adam Edelstein, on Friday, January 16, 2004 at 8:43 AM:
Nope, no built-in camera. The phone is the fairly crappy Siemens SX56 (note the $300 instant rebate on it) with a bad screen and poor ergo. But on the other hand the only reason I have one is because I need to use it for work, so if I wanted a better one I could go get one my own damn self.
I tried using transcriber, and got pretty good with it, but I'm much faster with the virtual keyboard and auto-complete.
This phone and a folding keyboard might not be a bad sub-notebook roaming setup -- a bit of web browsing, e-mail, and being able to post to this site. It's a small package, for a pretty small cost now -- the device is $99 after the rebate, a keyboard would be, what, another $100, and with t-mobiles $19.95 unlimited data plan, it'd be pretty reasonable.
On the other hand, a 12" laptop would be 1000% more flexible, and only about 2x the size of the pda/keyboard combo.
Moving a blog
At the risk of being painfully self-referential, I thought it might be useful to the community as a whole to talk about some of the issues around my recent blog move. Those of you looking for the usual photos and weirdness, just move along, you don't care about this.
Why the move?
Earlier this year, I was paying for two Earthlink starter sites -- this site, and my photography portfolio site (sadly in need of updating). Earthlink isn't a bad host, but there were a few things I wasn't happy with.One was that they didn't have all of the PERL libraries that Movable Type (the software that drives this and many other blogs) needs to be totally happy -- in particular, I wasn't able to send automated "trackback" pings to let various sites know that I had updated my site, and some image handling stuff didn't work either. When I contacted Earthlink's customer support, they told me that (although the individual support techs were sympathetic) there was no way Earthlink was going to update their PERL libraries for me.
The other issue that concerned me about Earthlink is their policy on bandwidth overages. If you're hosting a site with Earthlink, and you get slashdotted, they'll just keep serving up your pages as you pass your allotted bandwidth -- and then bill you for it at your overage rate. As some people have found (including Tom Tomorrow, to the tune of $3800), those overage charges can be hefty.
Then a friend pointed me to Lunarpages. Without being an actual ad for them, I'll just say that their deal (800mb of storage and 40gb of bandwidth a month, for $7.95 a month) was much better than Earthlink's (200mb of storage and 10gb of bandwidth, for $19.95 a month).
The two other benefits Lunarpages had over Earthlink were these: First, if I ever manage to exceed their very generous monthly bandwidth allottments, they won't just bill me -- they cut off the site. If I want to pay for more bandwidth, I get to make that choice and do that explicitly. No surprise $3800 bills. (Not to mention that their next higher hosting package, with 80gb of bandwidth, is only $22.95 a month)
Second, they allow add-on domains -- essentially a second web site sharing the storage space of the first site -- for $2.50 a month. So instead of paying $39.90 a month to Earthlink, I'm now only paying $10.45. Saving $353.40 a year on my hobbies was very attractive.
What don't I like about Lunarpages? A couple of minor nit picks -- logging in to and using their "control panel" is a little obtuse; also, their usage statistical reports aren't as nicely designed as Earthlink's Urchin reports. They also only allow one mailing list per account, instead of the 30 or so that Earthlink allowed me. Still: minor issues.
The problem
Moving davidadam.com to Lunarpages was super easy -- copy over files, re-establish e-mail accounts and mailing list, etc.
But when I tried to move Noise to Signal over, I ran into a significant snag.
The underlying problem was that my Movable Type (MT) database had somehow gotten slightly corrupted -- not so corrupted that I was seeing problems on the actual site, but corrupted enough that I wasn't going to be able to do a clever under-the-hood database transfer. There was another way for me to transfer the records -- using MT's "export/import" feature -- but there was also a major problem with that method.
When I started Noise to Signal, I used MT's default archiving style, where each individual page is saved as something like http://www.noise-to-signal.com/archives/000001.html, where the 000001 refers to the actual record number in the MT database.
The problem was that if I used the clumsy "export/import" method of moving the MT database, there was no way to insure that the records would maintain the same number in the database. Which meant that when I built the new site, what had previously been 000237.html might end up as 000235.html.
And the problem with that, my friends, is Google. Say you were searching for "haupia bread pudding". Record #237 on my old site, it turns out, is the first hit. So you click on the result in Google, but because I've moved my databases, instead of a recipe for a tasty dessert you might get the former record 235, photos of an anti-bush protest. Or if you had actually been searching for those photos of an anti-bush rally, you might get a picture of my cats. They're cute and all, but not necessarily what you were looking for.
And that was it. I was stumped. Dead in the water for four months. Still paying $19.95 a month to Earthlink that I didn't want to be paying.
The solution
I ran across the solution while wandering around the TypePad site looking for some info for a friend who was considering using their service. TypePad, briefly, is a all-in one blogging solution, built by the same people who built Movable Type, with a slightly newer UI and without the hassles of installing software on a server, which is more than most people want to do.
On the TypePad site was a list of external resources, including an article by the very clever David Ely on Redirecting your Movable Type permalinks to TypePad. As I read that article, I realized that David had solved my problem, as well.
I'll let you read his article for the full technical writeup, but in brief the solution is this: While the default archiving style for MT is the previously mentioned /archives/recordnumber system, you can define your own style. TypePad's default style, for example, is to use /year/month/first fifteen letters of title -- so this post, for example, would be /2004/01/moving_a_blog.html. And this is possible in MT, as well. Switching to this kind of URL system gets rid of the entire problem of 000237.html vs. 000235.html -- whatever the record number is, the titles stay the same.
The second part of David's solution was to use MT to automatically generate redirect pages for each of the old numbered pages. So, for example, if you follow that Haupia link from Google -- http://www.noise-to-signal.com/archives/000237.html -- you will briefly see a redirect page, and then you'll jump right over to the page in the new, slick archiving style: http://www.noise-to-signal.com/2003/08/haupia_bread_pu.html. Brilliant.
I was delighted. I offered to re-name myself after David (until I realized we already have the same name).
The gnarly technical bits (as if the preceeding paragraphs weren't dull and technical enough)
So here's the process I followed:
- Safely back up my current template for the individual pages.
- Replace the template with the redirect template.
- Rebuild the entire site with the redirect template in place.
- Restore the normal template for the individual pages.
- Change the archive style in the MT config page.
- Rebuild the entire site again with the normal page template, but the new archive structure.
- Copy the old archives directory from Earthlink to Lunarpages.
- Export the posts from my Earthlink MT site and import them into the lunarpages site.
There were two key bits of the redirect template. First, a pair of meta tags in the <head> section:
<meta http-equiv="REFRESH" content="2; url=http://www.noise-to-signal.com/ <$MTEntryDate format="%Y/%m"$>/ <$MTEntryTitle dirify="1" trim_to="15"$>.html" /> <meta name="robots" content="noindex,follow" />The first meta tag is the actual redirect; it tells the browser to wait two seconds and then send the user to this new URL, which is autogenerated by MT based on the date and title of the entry.
The second meta tag tells search engine spiders to not index this redirect page, and to update their records with the page I'm redirecting them to.
Then, in the actual text of the redirect page, I used the same technique to tell people where they're about to go, and give them a link to the page if their browser doesn't support redirects:
The new address of this entry is:<a href="http://www.noise-to-signal.com/
<$MTEntryDate format="%Y/%m"$>/
<$MTEntryTitle dirify="1" trim_to="15"$>.html">
http://www.noise-to-signal.com/
<$MTEntryDate format="%Y/%m"$>/
<$MTEntryTitle dirify="1" trim_to="15"$>.html
</a>
The MT code in there should look just like the code in the refresh meta tag, above.
Finally, the MT archive "template" -- where it should store individual archives -- looks like this:
<$MTEntryDate format="%Y/%m"$>/
<$MTEntryTitle dirify="1" trim_to="15"$>.html
Which should again look familiar.
Lessons learned
I suppose the biggest lesson I learned from this was that if I was setting up a new Movable Type install for someone, I would definitely use the year/month/title archiving style; if I had done that to begin with, it would have obviated all of this hassle, because I would never have been tied to numbered entries in the first place.
I hope all of this turns out to be useful for someone else; if this has helped you to solve a similar problem, I'd love to hear about it!
d, on Saturday, January 10, 2004 at 6:17 PM:
Coincidently, being an Earthlink customer - I also looked for another host for the same two reasons you mentioned. And in fact I am spending much of this weekend switching various blogs over. I chose Pair after a hefty and long search online.
I host several blogs too, and your archiving tip is something I hadn't considered doing until now.
Thank you for your post. Oddly comforting to read someone else has recently gone through the same process as myself.
rfkj, on Sunday, January 11, 2004 at 6:39 AM:
Is the process of generating new page names automated, or do you have to do it yourself? How does it deal with non-unique names? I'm not so sure that fifteen letters is unique enough; what if, the day after the Haupia Bread Pudding recipe, you got a cute loaf pan in the shape of a dog, and you wrote about Haupia Bread Puppies? Or you discovered that the bread pudding was great when integrated with calisthenics, and wrote about Haupia Bread Pushups? The examples are contrived, but the question is serious. Having the year and the month be part of the URL obviates a lot of that problem, but it's still conceivable that you'd duplicate the first fifteen letters of a title within a thirty day period.
Still, a text-based indexing system is a lot better than numbered records. Keeping things organized in some sort of hierarchy is an improvement over a single flat file. It makes things so much easier to find.
Database migrations suck. You haven't lived until you've tried to migrate a huge NetWare installation from NetWare 3.x to NetWare 4.x, which used different structures entirely. Good story!
David Adam Edelstein, on Sunday, January 11, 2004 at 7:56 AM:
It's definitely a valid point. I could have gone with 20 letters or more, and I probably would if I was setting up a blog for someone else... but I know that my titles are weird and inconsistent enough that I wouldn't repeat in a 30 day period -- in fact, I'm pretty sure I haven't repeated the first 15 letters yet, at all.
David Adam Edelstein, on Thursday, March 25, 2004 at 11:15 PM:
Update: Google, of course, has updated, so my example above (the search on Haupia bread pudding) doesn't work any more -- now Google is pointing to the name-based URL, instead of the number-based one.
Too darn effecient! :-)
The joys of having a public web site
I seem to get a lot of e-mails like this, regarding either this site or my photography portfolio site.
From: Randy Smith
To: dae@davidadam.com
Sent: Thursday, October 02, 2003 12:15 PM
Subject: Request for linking to us from your resource page.
Good Afternoon:Please consider adding a link to us on your resources page:
http://www.noise-to-signal.com/
We are Foodservicedirect - a leading online supplier of restaurant supplies including cookware, glassware, bar supplies and paper supplies.
Thanks for your consideration!
Randy Smith
Marketing Services
Foodservicedirect.com
27 Forest Ave.
Locust Valley, NY 11560
Telephone: 516-759-9000
email: randy@foodservicedirect.com
url: http://no.i'm.not.going.to.link.to.you.com
They've clearly harvested a bunch of e-mails and sites and haven't bothered to target it in the least -- semi-personalized spam asking me to give them free advertising.
Which probably sounds good to a testosterone-drunk marketeer, but makes me think they're just an idiot.
Does anyone else get these asinine e-mails?
Joshua Edelstein, on Friday, October 3, 2003 at 8:36 AM:
Oh yeah. I get them all the time. And since I run about five different sites, I get them in extra helpings! Yeesh.
Still alive...
Well, it's taking a little longer than I had hoped to move to my new site. The crux of the biscuit is that one of the database files that contain the content for this site seems to refuse to upgrade; suggesting, possibly, corruption in the database.
I do have ways of exporting the data, and re-importing it on the new site; however, the problem is that I can't be sure that the numbers of the entries will be consistent between sites. This is a fairly annoying problem, since the "permalink" for each entry is based on the entry number in the database.
So if the entry numbers changed, someone could have a link on their site to my post on the Designer's eye, and that link could end up being a dorky picture of Edgar.
All of this will work out, I'm sure, and I'll write an extended post about the process -- as well as all of the other things that have been going on (lots of work in the yard today!).
I'd like to put in a plug for my new host while I'm at it: Lunarpages. I'm only on the $7.95 a month plan and their tech support people have still been working hard to solve my problem. Thanks guys!
Anyway... Sooner or later the problem will be solved, I will transfer the site, and this will all be behind us. Until then... I beg your indulgence for a little while longer. Keep checking back!
Audblog
Hey, this is kind of cool.
Audblog allows you to post an audio clip to your blog from a telephone. Talk about letting your voice be heard by the world...