{"id":78,"date":"2010-02-21T12:36:44","date_gmt":"2010-02-21T17:36:44","guid":{"rendered":"http:\/\/blog.law.cornell.edu\/tbruce\/2010\/02\/21\/outage\/"},"modified":"2010-02-21T12:52:56","modified_gmt":"2010-02-21T17:52:56","slug":"outage","status":"publish","type":"post","link":"https:\/\/blog.law.cornell.edu\/tbruce\/2010\/02\/21\/outage\/","title":{"rendered":"Outage!"},"content":{"rendered":"

I have had a miserable couple of days, here at the keyboard, working through the effects of the Great LII Outage of 2010.\u00a0 I spent a lot of time on repairs, and on measures that sharply decrease the chances of another.\u00a0 But this is the Internet, after all,\u00a0 and a highly complex system, and we know that sooner or later this will happen again.\u00a0 We had a good run. The last unintended outage we had was about six years ago.\u00a0 We experience slowdowns two to four times a year, usually the result of some perfect storm of network traffic that confuses our clustering software, or of a fault in the database back end.\u00a0 But nothing like this last one, ever, and I am hoping that a decade will pass before there is another.\u00a0 It went on for a little over 48 hours.<\/p>\n

Like this one, the chances are that the next outage will be self-inflicted.<\/p>\n

We brought this on ourselves.\u00a0 We assumed that there was such a thing as an innocent change in a heavily-used system as big and complicated as ours.\u00a0 There isn’t, and we should have anticipated that.\u00a0 We should have had an easier way to back the changes out once they were in place.\u00a0 We should have been more methodical in our diagnosis.\u00a0 What followed was the predictable result of hubris, confusion, and a really bizarre technical problem… but it’s not my point to talk about that here. We’ll fix the technical stuff and put all kinds of traps and wires in place to prevent a recurrence, and we’ll change our deployment procedures.\u00a0 Next time, we’ll say more to our users about what happened, and we’ll say it sooner.<\/p>\n

[ Geek note: for those interested in root causes, it turns out that Perl doesn’t deal with tail recursion very well, especially inside mod_perl,\u00a0 and that a 750 ms. change in the time it takes to generate a dynamic page can bring a site to its knees, even if it’s running on a good-sized cluster. Also, if you change a lot of content, the reaction from crawlers is just indescribable.\u00a0 “Feeding frenzy” doesn’t even come close.]<\/p>\n

Again, the point of this post is not to review the usual lessons learned, but to point out some others.\u00a0 Mostly these are about people.<\/p>\n

We have a remarkably loyal and patient group of users.\u00a0 I talked to, or e-mailed with, a number of them over the last few days (yes, it’s often me who answers the phone; we keep telling you guys that there are only five of us here, and that number does not include a receptionist. I still owe some e-mail responses, and will for a few days yet).\u00a0 All were courteous; all told us how much they depend on us; all wanted us back online five minutes ago. And this is probably a good time to thank all of you from firms and libraries who tweeted or wrote us to say that WEXIS is no better, at least in your institutions.<\/p>\n

Many who called or wrote were worried that higher education budget cuts had put us off the air for good.\u00a0 Nope, not so. I have to say that the relief these people expressed (often with an explosive “oh thank God”) was probably the brightest spot of the last few days; we felt really appreciated.\u00a0 We get core support — about two-thirds of our budget last year, hopefully less this year —\u00a0 from the Cornell Law School. While we are hardly central to their mission of providing legal education, they have been, and continue to be, generous in their support.\u00a0 We are working to reduce our dependency on the School,\u00a0 but it will be a few years before we are fully self-sustaining.<\/p>\n

But I think the most interesting contact came from someone in the far reaches of a large organization (I won’t say where in order to protect the innocent, and some of the guilty, too — we’ll call him Fred).\u00a0 Fred was very worried about the outage, because some months ago he had recommended that we be made the standard go-to source for US statutes within his extended workgroup.\u00a0 Apparently Fred has taken a good bit of flak for that decision.\u00a0 The critics are, he says, much more vocal at the times of year when we run fundraising notices.<\/p>\n

Fred just wanted to know what to expect, and to get some kind of a track record on our outages so he could answer his critics.\u00a0 I cannot imagine a more loyal advocate than this guy.\u00a0 I would guess there are a lot more like him out there;\u00a0 I sure hope there are. No doubt they will be hearing about this from their co-workers, too, and I’m sorry for that\u00a0 (repeat after me: First time in six years. Two to four slowdowns a year. Fewer once we have stuff in the cloud, slated for this summer).\u00a0 Fred, and all of you like him:\u00a0 my thanks for your belief in us, and your advocacy on our behalf.<\/p>\n

Fred’s co-workers, well… them I’m a little less happy with.\u00a0 We have 90,000 visitors each and every day of the week.\u00a0 I have no idea what the aggregate number has been over the last several years, but it’s certainly a lot more than 90,000.\u00a0 We have 6,000 active donors.\u00a0 That is a lot of free riders.\u00a0 I think a fundraising solicitation that pops up no more than once during your visits during the months of December and June (assuming you click the thing that turns it off after the first time), is not a heavy price to pay.\u00a0 I don’t think I’ve ever seen a sarcastic review of the LII that says it’s worth every penny you pay for it, and I hope I never do; nor do I mean to suggest that those who don’t pay for a service are barred from criticism.\u00a0 Far from it.\u00a0 I hope they’ll write to us directly and tell us what it is they would like us to improve.\u00a0 Oh, and about the insufferable burdens of being asked to contribute, too.<\/p>\n

We have deliberately chosen to avoid give-money-or-we-shoot-the-dog appeals of the kind used by many advocacy organizations, despite the fact that most fundraisers find them highly effective.\u00a0 I think they are unbearably shrill, and as much about manufacturing crisis as solving problems. That’s why we won’t be turning the servers off once a year to make a point, I guess. Besides, that would be childish.<\/p>\n

But I have to say that it looked awfully tempting along about hour 17 of the outage, when Fred’s e-mail came in and Dan Nagy and I were rewriting code and juggling servers on our noses.\u00a0 Picture Tom, with a little devil perched on his shoulder, whispering: Pssst….you know…we could turn the lights out for 24 hours every year, predictably and with advance notice. Maybe on Bentham’s Birthday….\u00a0 It’s rumored that Paul Ginsparg<\/a> pulled a stunt like that with the physics arXiv<\/a> when he was still at Los Alamos.\u00a0 Tempting, so tempting, especially after the <\/a>Tim Stanley <\/a>diet-cola-consumption limit is only a distant memory and you’ve lapsed into twitchy irritability.<\/p>\n

All of this to say that the psychological dimensions of something like this outweigh the technical ones, at least for us.\u00a0 There is, of course, the usual set of platitudes about doing things better — all of them severely devalued in this year of our Lord 2010 by having them pushed into our faces in prime time by Domino’s (“our pizza sucks but we’re fixing it”)\u00a0 and Toyota (“you’ve always trusted us and naturally we’re fixing your cars so you don’t hit the guardrail at 75 MPH”).\u00a0 Well, our pizza doesn’t suck.\u00a0 And we are fixing the brakes.\u00a0 And we are very, very grateful to those of you who have borne with us through this.\u00a0 It’ll happen again, but with luck and (mostly) skill, it won’t happen soon.<\/p>\n

Tb.<\/p>\n

Oh, and a final word:\u00a0 there is a very special place in Hell reserved for people who have put up web crawlers and have no idea how to operate them.\u00a0 The commercial indexers like Google, Yahoo and their ilk are actually quite respectful of robots.txt files, offer rate-limiting apparatus, and so on.\u00a0 The horde of people who have put up search appliances on college campuses and elsewhere without any idea of the effect they’re having on the world are another matter.\u00a0 I wish them an eternity staked out under a heavy, random shower of red-hot air-gun pellets; that seems about right.<\/p>\n

<\/p>\n","protected":false},"excerpt":{"rendered":"

I have had a miserable couple of days, here at the keyboard, working through the effects of the Great LII Outage of 2010.\u00a0 I spent a lot of time on repairs, and on measures that sharply decrease the chances of another.\u00a0 But this is the Internet, after all,\u00a0 and a highly complex system, and we […]<\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[1],"tags":[],"_links":{"self":[{"href":"https:\/\/blog.law.cornell.edu\/tbruce\/wp-json\/wp\/v2\/posts\/78"}],"collection":[{"href":"https:\/\/blog.law.cornell.edu\/tbruce\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.law.cornell.edu\/tbruce\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.law.cornell.edu\/tbruce\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.law.cornell.edu\/tbruce\/wp-json\/wp\/v2\/comments?post=78"}],"version-history":[{"count":0,"href":"https:\/\/blog.law.cornell.edu\/tbruce\/wp-json\/wp\/v2\/posts\/78\/revisions"}],"wp:attachment":[{"href":"https:\/\/blog.law.cornell.edu\/tbruce\/wp-json\/wp\/v2\/media?parent=78"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.law.cornell.edu\/tbruce\/wp-json\/wp\/v2\/categories?post=78"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.law.cornell.edu\/tbruce\/wp-json\/wp\/v2\/tags?post=78"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}