skip navigation

West Group’s edit of Bob Berring’s remarks on free access to law leads with the interesting assertion that government should get out of the legal information business because “every time government has tried to take over the provision of legal information it’s failed”.  That is worth a little discussion, in part because it so patently reveals West Group’s worst nightmares, and in part because it’s not clear which variety of government activity West is talking about.

No doubt some government forays into the legal infosphere have not been all that they might be.  A couple of years back, I served on an ABA committee that looked at, the forum for interaction between information systems and the notice-and-comment rulemaking process.  The executive summary of that committee report offers a view of how and why things can go wrong in government legal information projects, even (or especially) when many talented people are operating with good intentions.  In retrospect, the report reminds me of Michael Flynn’s excruciatingly meticulous tale of starship disaster, The Wreck of the River of Stars. Nobody’s at fault, everybody’s doing the best they can with what they have, all intentions are good, and the results, well, the results are not so good. Like Flynn’s science-fiction story, the real-life story of government information services is a nuanced one, with a lot more to it than budget cuts and caricatures of incompetence.  A very thoughtful view of those problems comes from Ed Felten’s group at Princeton, which eloquently makes the case for government to get out of the business of web site building and into the business of bulk data provision a la  And by the way, has improved immensely since the report came out.

I don’t completely agree with Felten and company’s idea that government should restrict itself to wholesaling data. There are certainly some basic legal-information services that it ought to be providing directly to the public.  But no single entity, public or private, can possibly service the needs of all of the niche consumers of legal information, a fact I first remarked in 1995; even earlier, well before the Web, Hank Perritt had pointed out the unique position of government at the head of a chain of potential value-adders in the legal information field.  Then as now economic arguments — even those with a free-market slant — strongly favor the idea that government should act in ways that reduce rather than reinforce entry barriers in the legal information market.  The balance between government services, the private sector, and non-profit third parties in law publishing is a difficult one, with many constraints (some of which I outlined here, in 2000).  But it is difficult to imagine any legal information ecology in which government is not providing legal information, at least some of it retail.

And such a complete exclusion of government is not what Bob Berring is talking about, and it is not what West Group wants.  Government also provides legal information services at the wholesale level.  Like the LII, the New York Times, and various of its competitors, West Group gets its Supreme Court cases from Project Hermes, the Supreme Court’s electronic distribution system that releases opinions on the day of decision.  Hermes has been in operation since 1991.  It has always been speedy, highly reliable, and built on open standards.  In the minds of many, the much larger and better-known PACER system is less of a success story — but it is also a major feeder for West Group and other private-sector legal publishers, who pay the millions of dollars in user fees that has allowed PACER to accumulate a $150 million surplus.   I don’t think that West wants Hermes or PACER to go away.  They’re a real bargain, compared to running around to all those courthouses.  But, like any other business, West would like to keep the barriers as high as possible for its competitors.

I don’t think that West is being as silly as Rick Santorum was when he called for the government to get out of the meteorology business in favor of the better-funded private sector — only to find out that the private sector was getting all its raw data from the National Weather Service.  And perhaps the NWS is not a bad model to think about here.  But we all need to remember that West’s concern is not that the government is inept at providing retail legal information — it’s that government is in fact very good at wholesaling, and there are a lot of potential consumers and competitors in the retail legal information business.

[ Note: this piece is part of a trilogy on the West video:  1, 2, and 3 .  Kind of like the Lord of the Rings, only longer, with a less confusing plot, and very few cute hobbits.] 

Some people just can’t let go of things. Just yesterday, I was confronted by a progressive friend who corrected me when I referred to a certain airport in Washington, DC as “Reagan”.  For better or worse, they changed the name over a decade ago, and no amount of beating could possibly make that horse any deader.  I didn’t think much of Reagan, but I also don’t think any debate benefits from the sort of childish pushback-at-all-costs that seems to characterize so much public discussion in the US these days.  I don’t want to be one of those single-issue people.

Nevertheless…. there’s more to say about the West marketing video that features Bob Berring.  That’s an awkward way to refer to it, but it’s deliberate.  I think we can assume that West’s much-vaunted talent for selecting, ordering, and presenting information applies to video footage as much as caselaw.  In Paul Lomio’s class at Stanford the other day, I remarked that one thing the LII and West videos have in common is that neither of the front men were completely in control of the use and presentation of their words.  And in fact what I have to say here is more about the music than the lyrics, and the music would seem to be entirely West’s. So let’s at least move on from “Berring kerfluffle” to “West video”– while, I guess, remaining firmly inside “tempest in a teapot”, which is how one law librarian characterized it.   Teapot or no, they’ve opened the door for long-overdue and important discussion of a number of issues that have been waiting for attention, so… let the discussion commence.

I can think of at least three points raised by the video that need some serious attention from the library profession.  (That, folks, is a cheap rhetorical gimmick. On a sunny afternoon in November, with the leaves unraked, I only have time for one today. You’ll just have to remain in suspense on the other two until I can get around to them in some future post).  Today I want to talk a little about one of  the video’s anthems: the virtues of the free market, and the presumed triumph of muscular, well-funded capitalism over a bunch of uploadin’ hippies with short attention spans and no money. Heroic uploadin’ hippies, that is.

An historian might find in that song some odd resonances with the West of the mid-90’s, the one where Vance Opperman talked about “copyists” in much the same way that the video talks about “volunteers”, the West that wrapped itself in the flag as the “last American caselaw publisher” shortly before selling out to a Canadian company.  There is nothing so American, after all, as the free market, and few things as virtuous.  An ironist might wonder how sure the fire of that particular pitch is, nowadays.

But West lives in a distorted economic arena.  It sells its goods in a cul-de-sac separated from the agora of the free market by a series of barriers. Before the Internet came along, and even until quite recently, the hurdles raised by the difficulty of collecting source material were protection enough from competitors.  Government release of raw, bulk legal data threatens to remove many obstacles that discourage private-sector competition with West.  And despite what Professor Berring says, government is quite competent to do that, and has done so for many years in other venues, including some that feed West the grist for its mills. Second, West has historically protected itself from competition by vigorous pursuit of expansive claims of copyright in the apparatus of citation. Third, it has profited hugely from the conversion of the law creator’s “soft” natural monopoly in legal information into rigid commercial advantage via the economic alchemy of official publication  status (or, as it is now known, authenticity).   Fourth, in a turn our ironist would find amusing given recent events in the “free” market, West has benefited from the inattention of those who are supposed to regulate the market in which it operates. Finally, it exists in what is for all intents and purposes a duopoly market where pricing advantages depend hugely on a lack of transparency.  That does not suggest the same open competition normally associated with a “free” market.

There’s little more to be said on the first or second points. The relative ease of collecting legal data these days is evident, as is the scrappiness of potential West competitors like FastCase.   West’s use of copyright claims in official citation to prevent market entry by competitors is a matter well known to legal information professionals.  So is their bartering around official status; Peter Martin published quite a good paper that deals with it (among other topics) in 2000, and you can read one such contract from 2001 here (the interesting stuff appears under “other considerations”, and similar headings).  In 2009, the law library profession still struggles to find a position on authenticity that will avoid hardening the natural monopoly of legal information that law creators enjoy into rigid commercial monopoly by legal publishers.

The fourth point — that West has benefited from the inattention of regulators, particularly at the time of the West-Thomson merger — is perhaps more controversial.  But, as a 1997 article in the Connecticut Law Tribune explained, that merger got remarkably little scrutiny for one that, in the words of one amicus, converted a three-giant industry to a two-giant one.  The products that West and Thomson divested, most of which lay outside the integrated systems of books and databases that provide real utility to practitioners, were all sold to Lexis.

Which segues nicely to a final point.  To be sure, low-cost legal information providers like FastCase, VersusLaw, and LoisLaw are doing pretty well, and and other open-access providers chip away at the edges of West’s business. But for all intents and purposes commercial legal information is a duopoly, as it has been for many years. And duopoly markets have very good reasons to avoid price wars, since theoretically these end with profits spiralling down toward marginal cost; that is the classic “gas war” between gas stations on opposite corners of an intersection. The price  of peace is that actual pricing has to be kept secret from the community of buyers and from the competition.  Otherwise, each competitor will attempt to undercut the other until the actual marginal cost is reached and there is no profit. This may explain, first, why West’s pricing agreements with large law firms are as highly confidential as they are, and second, why West would refuse participation in AALL’s price index, even at the expense of being barred as sponsors of AALL’s annual meeting.

Bob Berring believes in the market system.  So does West, for as long as the market system in play is one with externalities that protect it from the peskier aspects of competition.  And in such a scheme it is important to keep market barriers high by, for instance, restricting your competitors’ access to the raw materials needed to create competitive products.  Government may not be able to create the finished legal-research systems that West does.  But it can certainly release bulk data to those who can produce products that will compete with West, and in time it will.  Is the West video a sort of legal-info-Harry-and-Louise?  No.  But it would be naive to say that it is unaware of, or unresponsive to, serious competitive threats that West will face in the near future, from people who are not volunteering at all.

[ Note: this piece is part of a trilogy on the West video:  1, 2, and 3 .  Kind of like the Lord of the Rings, only longer, with a less confusing plot, and very few cute hobbits.]

Lately, I’ve been tempted toward complicated prose that urges rethinking of legal-information fundamentals.  Why? Because the idea of public access to law in a global digital society makes some fundamental rethinking necessary.   It would be swell to explore those notions in some longwinded way, but I’m both lazy and out of time.  So instead I am going to offer two really simple propositions:

1) Simple fairness demands that the public have free access to legal-information systems that embed the same functionality and quality as the most advanced systems commonly available within the public body that creates or issues that legal information.

2) Authority in legal text ought to be judged simply (and exclusively) on the basis of accuracy, currency, and other objective quality measures.  The barter in “official status”  is unnecessary.

Please discuss in the comments.  The fun, of course, lies in cataloging all the ways in which our current situation does not match those ideals, and why.  Assuming, of course, that you think they are ideals.

Over in VoxPopuLII this week, Dan Dabney makes a number of good points about the proper role of LIIs and other public legal information providers. In his view, our useful purpose is to drive innovation up a ladder of value-added legal information providers.  West Group, unsurprisingly, occupies the top rung of his ladder.  I agree with him.  Duopolies are, in many ways, a terrible environment for innovation, because innovation is too often seen as a weapon to be used against the competitor rather than as a way of answering customer needs (I adhere to Guy Kawasaki’s view that your customers don’t care about eradicating your competition).   I am proud that the LII has contributed substantially to breaking the intellectual and engineering stranglehold that West and Lexis had on legal information twenty years ago.  And I think that we and our colleagues in the public sector have driven a great deal of innovation since.  You ain’t seen nothin’, yet.

Dan successfully makes the case — if I can equally reasonably shape his views into a different metaphor — that an average public-transit bus will never win a race against a Porsche Carrera GT.  And indeed it is true that LIIs have neither the holdings nor the editorial depth of Thomson Global.  Nor, I think, would we outpace them in any particular slice of American legal information. Though I might be willing to take him on in a street race with the US Code, for which I think we have better Web architecture, even though our edition is less speculatively up-to-the-minute than West’s.  But that is not my point. West is unquestionably winning the Grand Prix. The LIIs are just trying to help a lot of people get to work.

A couple of weeks ago, I started a draft article for this space with a self-conscious echo of Fred Rodell:

There are two things wrong with how we think about legal information. One is that we are not thinking about how it is produced, and the other is that we are not thinking about how it is consumed.  That about covers the ground. 

I want to concentrate on that second idea, because it would be fair to say that West and the LIIs are thinking about very different kinds of research consumers.

Dan Dabney imagines — as most law-school instruction in legal research seems to — that the aim of research is to support argument in high-stakes litigation, or in some other setting where potential hazard justifies the expense of a high-end service.  This echoes the position that John West himself took over a hundred years ago: the idea that legal information provides the lawyer with insurance against the loss of his case.  West was arguing for the superiority of comprehensiveness over selectivity in the publication of cases.  This is unquestionably true, and as West said the general policy of insurance is the best one.  It is equally true, however, that most people insure only to the value of the goods.  Few will throw $5000 worth of research at a $500 case.

Ken Svengalis has made himself endlessly popular with commercial publishers by making that point in the form of a buyer’s guide that stresses practicality over comprehensive acquisition.  And of course services like Findlaw and LexisOne are tacit nods to that principle, as are well-established, lower-cost commercial services like LoisLaw and FastCase. When we talk about those services, we are talking about services primarily intended for lawyers.  The aim of an LII — or at least this LII, for my colleagues elsewhere do very different things — is to provide legal information for everyone, something that they do by using technology in innovative ways.

And that means that we serve a type of legal research that is very different — not naive, necessarily, but different.  We primarily serve people whose aim is to manage risk using information, and to take bearings on the advice offered them by professionals.  In that respect, use of the LIIs closely resembles responsible use of something like WebMD.  And it continues a belief in responsible self-help that we have seen from diverse sources in the past — citizens advice bureaux, publishers like Nolo Press, trade- or interest-specific guides to the law, and so on.  This is not pro se representation, and it is not intended to take bread from the mouths of lawyers (my own belief, and Richard Susskind’s, is that it will increase the demand for legal services by lowering entry barriers).  It is simply a different activity (indeed, it was Dan who first clued me in to how different it really can be). It is aimed at those whose use of primary legal material is less rigorous because their aim is, perhaps, to get general orientation, or to make sense of the competing advice of professionals and pundits, or to fortify themselves for an initial encounter with a professional who, in their minds, represents a legal system that is scary and incomprehensible.

Of course, there are those with more sophisticated needs who cannot afford more highly priced services than ours.  In fact, most of our users are people making use of legal information in a professional context, not people having traumatic, episodic encounters with the legal system. An example I often use is that of a hospital administrator with a day-to-day need to know about public benefits law.  Many of our most supportive users are government lawyers, and I understand that in some agencies at least access to commercial services is at least limited by seniority if not altogether barred by the budget.  And quite recently the Permanent Bureau of the Hague Convention on Private International Law has become concerned with the high cost of legal question-answering across borders.  That is the problem with the Porsche:  it is fast as hell, well-made, and pretty, but far too expensive for a lot of people who need more ordinary transportation.

In a very recent paper , legal anthropologist Annelise Riles points out the existence of something she calls “the polycentric model of legal expertise”.  The idea, so far as I understand it,  is that there are benefits to be had from the distribution of a species of legal literacy throughout a population. She attributes some important aspects of Japanese legal culture to this phenomenon.  This, I think, is what LIIs are about in the long run: the promotion of polycentric legal knowledge.

That, and building shinkansen , which go almost as fast as Porsches, and carry a lot more people.  We’re starting to lay the track.

Over the last several months, I have spent an awful lot of time travelling. I met with a lot of people who work in legal information, both here in the US and abroad.  And I had every intention of filling this blog with posts about interesting things I’d seen and heard — a kind of travelogue of legal informatics.  It’s been slow in coming.  Actually, I’m not convinced that a travelogue per se is what’s needed.  Event-by-event reporting is easy.  Drawing a map of everything that’s going on out there in the world of legal information is not.

It’s a big world now.  People are doing legal informatics in a lot of places.  And that phrase — “doing legal informatics” — now includes a breathtaking number of disciplines and perspectives.  It used to be that we thought of ourselves as situated at the corner of law publishing and computer science.  Now we need to add big chunks of information science (itself a composite field), legal bibliography, digital librarianship, e-government studies, political science, and sociology of the professions.  At various times during the last six months I’ve had fascinating discussions with representatives from each of those academic disciplines, and from the practical side of librarianship, government, and publishing.  Each was working in a distinct context leading to different kinds of insights and solutions. Each worked within a different legal regime.  Each had mapped a different part of the world.

We badly need communication across borders — national, disciplinary, institutional. It’s important that we do that now.  We have opportunities — and challenges — of unprecedented scale and scope.  We can act on those most effectively if we can stitch together all the little maps, overlay them, get a more complex and mutually-informed view of the world. And stop reinventing the wheel in each of its out-of-the-way corners.

It has taken well over a decade to reach this point.  Legal and government information showed up on the Web in the early 90’s — our own efforts here at the LII and Carl Malamud‘s liberation of the EDGAR database were leading examples.  Those were quickly followed by open-access projects in Canada and Australia.  Digital-government projects began in the US around 1995 or 1996, including many self-publication projects in courts and legislatures.   These efforts created significant pools of data based on open standards, and the availability of that data made it possible for information-science researchers to pay far more attention to legal data than they could when it was behind proprietary barriers.  Now we’re seeing lot more work  on legal data by computer scientists working with language technologies, database specialists, semantic-Web engineers, and others. In Europe, work on integration of government information was propelled (and, ultimately, funded) by the requirements of unification.   Everywhere, more and more courts, legislatures, and agencies are putting information on the Internet in more and better ways using improved technologies.

A condensed narrative like the preceding demands oversimplification, and I apologize if I’ve slighted anyone out of sheer middle-aged forgetfulness.  And this tale no doubt has its beginnings much earlier — you could, for example, point at the long cooperation between the statistical arms of various government agencies and academia and industry as part of the story.  But as with so many other things the rise of the Web was the start of a new wave.  That long, slow groundswell — the product of many individual efforts over a decade and a half — is now peaking.

The American press first saw fit to remark on it about a year ago, with the release of extensive caselaw datasets by — Carl Malamud’s latest effort.  A community has started to form.  Just within the last year or so, we’ve seen:

It’s time.

But … we need to be talking to each other much, much more.  We need the kind of efficiency that we can only get by learning from one another.  We need to make informed choices between inexpensive automated approaches that work by brute force and the hand-crafted, highly-accurate approaches of legal bibliography that are not always scalable or affordable.  We need to recalibrate what we mean by “authority”, and begin to think about measures of quality and reliability for legal text that avoid the creation of  unnatural monopolies in legal information.

Okay, so I admit I was having a kind of Tom-Friedman-sings-kumbaya moment there, and I’m over it now.  Really.

We do all need to be talking more, and this week the LII starts a modest effort in that direction. Our new guest blog, VoxPopuLII,  is designed to help the conversation along with biweekly posts from folks you may not have heard from before.  They’re from all different tribes in all different places on the intellectual and global map. We’ve asked for their big ideas — and if you’ve got big ideas of your own, I’d invite you to get in touch with me about writing something for us. And of course we invite your comments and suggestions about what you find there.

Technorati Profile

A recent tweet reminded me that, almost 15 years ago, Peter Martin and I spent the day with members of the Bar Association of the City of New York.  As I recall, the best moment of the day was an extended peroration from Chris Locke (a/k/a rageboy) on the subject of lawyers and the Internet which, in his mind at least, had something to do with dinosaurs calling to one another in a swamp (yeah, I know, and for the life of me I can’t remember what it had to do with the subject at hand, either — but one of Chris’ great virtues is that he can suspend that kind of disbelief, apparently by holding his mouth right).

Second best (sorry, Peter) was Peter Martin’s presentation on why lawyers belong on the Internet.  Perhaps it might have better been titled “What the Internet offers lawyers”.  Peter mentioned five things:

  1. clients and potential clients are there
  2. other law firms are establishing themselves on the Net (there were only two, at the time)
  3. conversation among lawyers and maybe clients is taking place there
  4. cost-effective access to (legal) information
  5. cost-effective global communication of data of all sorts

These may seem obvious now.  At the time, they weren’t.  And maybe they’re not so obvious even today, or maybe each new technology that comes along makes us revisit these same arguments:

  1. clients and potential clients? Kevin O’Keefe gets rhapsodic about LinkedIn (6/2008)
  2. other law firms? Muzeview’s law firm Internet presence rankings for December are here.
  3. conversation among lawyers and maybe clients?  See Justia’s LegalBirds, LexTweet, and maybe just plain old Twitter itself.
  4. cost effective access to legal information? o hai, westlawz [1, 2, 3….].  And there are over a million inbound links to the LII alone.
  5. cost effective global communication of data? heh.

So… these things just keep coming around again and again, getting stronger in each cycle.  Fifteen years from now?  (kthxbye, westlawz….)

cwr1a.jpgI have a couple of hobbies. Actually (as those close to me would tell you) I have an endless series of momentary obsessions. But a few have met the chronological challenge and persisted, so they’re hobbies. For one, I deal in antique woodworking tools. I also ride around on a bike. It’s a nice thing to do in Ithaca during the lighter months. Saturday, I was intent on both — a fellow up near Syracuse wanted to buy some planes and a set of auger bits, and I wanted to take a bike ride around Cazenovia, which is a pretty area, and fairly flat for this neck of the woods.

My tool customer was a guy named Sean Murphy, who works for (probably owns) a company called CWR Manufacturing in Syracuse. They make cold-formed parts for, well, pretty much anything. He mentioned automobiles, small electric appliances, casement windows, and electric motors among many other things. He is just getting into woodworking — and built his own dust collection system, whose cyclone chamber he welded up with a MIG welder, from plans he found on the Web. I am guessing he is a very good craftsman.

More to the point, he’s a happy, and habitual, LII user.

Sean uses us for information about employment law, intellectual property law, and as he said “the sort of thing that a guy in manufacturing needs to know to run a business”. I noticed that he didn’t blanch when I referred to the “CFR”. And he told me about how he had heard a lecture on intellectual property in a course he was taking (working toward an MBA), and started to wonder how he could use that to protect his company’s work product — none of which is patentable. As a result, he’s introduced the use of the non-disclosure agreement to his industry.

The large companies he deals with often do a kind of technology transfer — Sean’s crew is hired to produce a new part using cold-forming technology, and the company that hires them tries to learn as much as possible about what goes into the design and manufacture of the part. CWR does that for a couple of years, and then the client figures out how to use his own equipment and engineers to replicate the know-how that Sean’s crew has provided in designing and making the new part. Or the client passes the design and the knowledge on to another supplier who will make the part more cheaply, maybe offshore, leaving CWR behind. Not so good, that. After hearing the lecture, Sean consulted with the lawyer who had been brought in to give it — and as a result, now makes signing of an NDA a standard part of his arrangement with new customers. Big deal, sez you — everyone in the software industry signs five NDAs before lunch. That’s right — and now everyone in manufacturing will too, and the situation for small shops like Sean’s will improve as a result, because the client will no longer be able to walk away with CWR’s real work product: the know-how involved in re-engineering the part for cold-forming manufacture.

I get two very warm and fuzzy things from this: first, another anecdote to add to the many that tell us that the audience for legal information goes way, way beyond lawyers, and second, an indication that maybe what Richard Susskind said about a more transparent legal information regime increasing, rather than decreasing, the need for legal services is proving itself. If — as we’ve long thought here — more accessible legal information means that people are less apprehensive about approaching the legal system (or feel better prepared when they do), then more people like Sean will do so. And the result will be an increase in the use of legal services in a preventive way. Nothing new about that, of course — but I’d like to think the numbers are going up as legal information gets more and more available.

These days, if you say something is “like a legal version of WebMD” , people are inclined to think in terms of consumer law, bankruptcy, divorce. But it’s also about small business, entrepreneurship, and a healthy economy. That is, if you’re like Sean.

And he’s a good businessman — I know because he wouldn’t pay my ridiculous prices for a #8 jointer plane and a set of auger bits.

bigdog.gifAlong with LII Editorial Boss Sara Frug, I spent yesterday morning with the folks at the Sunlight Foundation — an organization with a compelling mission and a growing set of activities that reflect it. Founded with the idea of using Web 2.0 techniques to bring transparency to Congress, Sunlight is now becoming a rallying point for a diverse community of folks who share the idea of making government better by making the information it generates and consumes more accessible.

That’s an idea we find really attractive. We’ve been amazed — shocked, really — at how little access government has to its own work product (never mind the public). We recently learned that some branches of the Federal courts limit access to the commercial legal information services based on seniority; we understand that the same is true of Federal agencies, where junior people don’t have access to Lexis and Westlaw [note to legal research teachers: “junior” would pretty much describe our recent graduates, wouldn’t it? Think we should be teaching them more about free online sources? ] Our e-mail is chock-full of questions and testimonials from government attorneys who rely on our edition of the US Code and our Federal rules collections. Our most successful projects over the last fifteen years have involved improving or re-mixing the presentation of Federal data to make it more easily used and understood by a broad audience.

So our question for John Wonderlich at Sunlight was “how can we help?”.

Turns out there are a number of ways. We have a lot of expertise in the arcana of Federal data online, experience with data standards, software tools that have remained in-house because we didn’t think anyone else had any use for them, and so on. There are a lot of ways that the LII can and will participate in the growing community of technologists who want to “hack government”. One of the best ways we thought of involves some help from you… particularly if you are a law librarian, legal scholar, or anyone else with experience working with government documents.

We know from experience that some online documents are especially useful to people building new services on top of government information. Here are some examples of these “linchpin” documents:

These are documents that provide important information about the context and structure of government, or that link isolated pools of legal information together. For example, the classification tables form the basis of our US Code updating features; we parse them into a database that is then used to power both clickable update links and RSS feeds. As published by the government, they are difficult to use; they would be near the top of our list of documents the government should be publishing in easily processed XML form. Put another way, they are the documents that are most useful in building online services that make legal information more transparent. They would be a good focus for our efforts here at the LII, for the growing community of government-transparency hackers, and for lobbying of (and cooperation with) the GPO’s FDSys effort.

We think that it should be possible to build a list of (let’s say) 100 such documents — the Big Docs that those who develop these kinds of services would most like to see placed online in a form that is both easily processed by machines (ie. XML) and continuously updated by the official body that creates them. What would your suggestions be? Put ’em in the comments, please.

[editorial note: this week’s post is a bit later than usual because of my stay at the 2008 CALI conference, about which more next time. The posting schedule will no doubt continue to be spotty throughout the summer — I’m travelling and talking more than usual — Tb.]

800px-us_51-star_alternate_flagsvg.pngLong ago, in a universe far, far away, David Mamet told me about his theory of jokeless punchlines. Some punchlines, he said, were so good in and of themselves that no actual buildup is required; the receiver can mentally compose the joke himself. He’s right (hell, he’s David Mamet, fer chrissakes). A few examples:

  • “For a nickel, I will”
  • “Your paycheck”
  • “Two pounds of Metamucil and a tuba”

Normally you wouldn’t think of the US Code as a place to find punchlines like that. At least you wouldn’t unless you were a legal-information blogger desperate for a catchy opening. But consider our old friend 4 USC 1 (“Flag, stripes and stars on”), which in its most current version states:

The flag of the United States shall be thirteen horizontal stripes, alternate red and white; and the union of the flag shall be forty-eight stars, white in a blue field.

As I said, this is the most current version. There are extenuating circumstances, of course, and we’ll get to them. But this little gem is good for a small but relentless stream of letters to the LII questioning either our currency or our sanity, usually the latter. And so the first thing to notice is that it exactly fits a popular prejudice about law and lawyers — namely, that they use fancy language and tortuous logic to reach facially ridiculous conclusions. About half the letters we get — no doubt sent by the large percentage of the population that hates lawyers anyway — contain some sentiment along these lines. Just like a bunch of lawyers to get something so obvious so wrong. Don’t you guys ever update your stuff?

Well, yes, we do — whenever the House Office of the Law Revision Counsel does. Of course there’s a trick to this. Two tricks, actually. First, the missing stars can be found in the notes to 4 USC 1; they were added by executive order. This is still a little confusing given the way things are presented in the notes — only the most recent executive order (admitting Hawaii) is fully spelled out; the earlier order admitting Alaska is listed as superseded, but unless you’re paying close attention, you’ll wonder where the other star went.

Second, the one-star-per-state algorithm is given in 4 USC 2. These just confirm conventional statutory-research wisdom: always check the notes, and always check the sections adjacent to the one that seems most interesting. These are both good advice, of course — but they are advice that no member of the general public will have heard, or have had any reason to hear. And they give the lie to the idea that somehow the general public can’t do — or shouldn’t do — legal research. Just as well tie somebody to a chair and then complain that they can’t dance.

Some further observations:

  • No doubt there are statutes that the public would find difficult to understand. And no doubt there are some where the unwary researcher would need to pay close attention to interpretive caselaw in order to find out what’s really going on. The number of stars on the flag falls into neither of these categories.
  • It would be reasonable to ask why 4 USC 1 has never been amended to reflect current reality. Must be because 4 USC 2 is deemed to cover the case.
  • This represents at least two failures of information design, and probably three. The first is the LII’s — unlike many sites, we choose to present the Notes separately from the text of the section. The second is an accident of section-ordering or of draftsmanship; why not give the algorithm first, or combine it with 4 USC 1 so that the present state of affairs (circa 1947) and the way forward (the algorithm) are stated side by side? The third is an inevitable consequence of search engines — the places people end up after clicking a search engine result link tend to be seen as the only places there are. As a web designer, it’s sometimes hard to figure out what to do about that.

Is this whole thing just a straw man? You bet it is. Anyone actually typing “number of stars on the flag” into Google would be rewarded with an article in Wikipedia that sets out the entire history of the flag and its many alternate versions. Several articles, actually. But I’ve always loved it as an example of what can go wrong with legal research, precisely because the fault is so obviously with the organization and presentation of the information rather than with any lack of knowledge on the part of the information-seeker. And that leads me to ask: how often is that the case, and what can be done about it?

Pretty often, and not a lot — at least with primary materials, and at least as we’ve thought about the problem in the past. A lot has been done with so-called “plain English” laws, but these generally reach only the words used and the typography with which they are presented. And it is usually presented as a consumer=protection matter, conceived mostly with respect to contracts and other documents created by private parties, and not legislation itself (though regulations do get a fair amount of attention). Seldom if ever is anything said about structure, availability of secondary sources, or information design. I’d be the first to admit that the latter would be very hard to spell out — but if the statutes related to pornography can be interpreted on an “I-know-it-when-I-see-it” basis, why not good information design too?

Taken together, I find all this a powerful argument for the creation of the kind of secondary materials that we make here at the LII — intended for lay persons with problems they are trying to understand, and sophisticated enough for (say) a practitioner entering an unfamiliar area. And figuring how we can connect these kinds of contextual or explanatory documents with the primary materials they explain is a matter for ongoing exploration and engineering. The Web makes it very easy to build structures that say “show me more detail on this” or “let me drill down”, and a lot harder to automate the process of providing meaningful context for the user who is puzzled by something he sees on a particular page. That’s the business of the Semantic Web, of advanced text-categorization tools, and of essay-writers and information architects — all things you can find here at the LII, on a good day.

gull.jpgOuch. I’ve just been looking over the last blog post on interoperability, which has all the charm of an underfed seagull on crystal meth. Squawk, squawk, squawk. Amid all the screeching in the last post, it’s a little hard to figure out what the point was. So I’ll just say it: folks, the future does not lie in putting up huge, centralized collections of caselaw . It lies in building services that can work across many individual collections put up by lots of different people in lots of different institutional settings. Let me say that again: the future does not lie in putting up huge, centralized collections of caselaw. It lies in building services that can work across many individual collections put up by lots of different people in lots of different institutional settings. Services like site-spanning searches, comprehensive current-awareness services, and a scad of interesting mashups in which we put caselaw, statutes and regulations alongside other stuff to make new stuff.

There are some services like that. AltLaw is one; so is the Public Library of Law; so are the Legal Research Engines that the Cornell Law Library runs; and I’m sure I’m omitting many more, including some we built here at the LII more than five years ago. Most are either “framers” ( who put a wrapper around multiple sites operated by others) or “spiders” (who, like Google, crawl other sites and federate the content in interesting ways). These are fairly blunt instruments — they don’t show much in the way of law-specific metadata — and the spiders in particular are hard to maintain. And there are really very few of them. There has not been very much building of distributed services in the legal-information world.

Why is that? It’s partly because trying to build and maintain site-spanning applications in the absence of standards is insane. Source material moves around. Sites disappear and reappear. Firewalls suddenly block your robot, then just as surprisingly stop, after you’ve spent two weeks finding an e-mail address for a webmaster who is wisely concealing her identity. Robots.txt files suddenly sport new policies. The subdirectory on site X that holds the decisions for the month of December changes its name from dec2007 to december2007. The name of the judge writing the opinion gets moved from the third line after the second <H3> in the document to the fourth line. And so on. And on. And on. The whole thing is a house of cards, because there are few common practices among sites and very little consistent practice on any site. This makes it very difficult to automate things, and things that can’t be automated won’t scale. In such an unstable environment, building services that remain reliable over the long term is very difficult. And (I speak from experience here) it’s mind-numbingly annoying, because the things that (frequently) break such services are trivial and preventable and numberless and arbitrary. For a programmer, it’s like being nibbled to death by ducks.

These are not new problems. Librarians and others concerned with long-term information availability have been discussing these issues since about fourteen minutes after the first Web site appeared. Reporters of decisions and court clerks have long settled similar issues in print publication, and are beginning to do so on the Web. But much more is needed, and faster. The Web is not going away, and Web publication of legal information should not be thought of as a kind of unfunded mandate delivered as a sop to those who don’t buy the books.

Sorry, the seagull started screaming again. Persistent little bugger.

Difficulty and tedium aside, few Internet legal-information providers have been interested in building distributed services. How come? It’s partly because we’re brainwashed by centralized models that are the legacy of many years’ reliance on Westlaw and Lexis. It’s partly because law people deliberately confuse that kind of branded centralization with authority — easy to do when those who grant “official status” use it as a form of barter, chiefly with those who operate large, centralized systems. It’s partly because, up until now, pulling everything together in one big heap has been easier than creating interoperability. And it’s mostly because we haven’t been paying attention.

More than a decade ago, the digital-library community began solving this problem, systematically and effectively. They were mostly dealing with another kind of heavily cross-referenced essay: not the judicial opinion, but the scientific pre-print. Many approaches were tried; some (like Dienst, which Brian Hughes and I built into a law-journal repository system a decade ago) were glorious failures. But ultimately these folks were successful because they realized several things:

You can unbundle services from repositories. This is what Google does. It doesn’t hold everything — it just indexes it. The same thinking applies to things like current-awareness services that need input from multiple sources. You can do that without holding everything yourself. Indeed, services like large-scale search will only work if you unbundle them from repositories. Early on, there were many attempts at federating search services that failed because the whole system was held to the performance of the weakest participant. As a practical matter, scaling past 100 sites just would not work no matter what.

Services can be made a lot better if they have metadata available to them, particularly metadata about where to find the documents the service addresses. This is the basis of Google Sitemaps and the other related site-mapping standards. As an idea, it goes back to at least 1992 and Archie, the system for discovering anonymous-FTP sites. An important side effect is that participation in such schemes give the repository operator greater exposure for her information; in a way it’s a form of marketing.

Issuing metadata in a standard format makes a lot of things easier — like developing harvesting tools, services, and anything that has to process the metadata. XML is a really good vehicle for this, because it can be validated and reliably processed. This makes new services much, much cheaper to build. And, if your metadata standard can be extended by well-understood technical means, so that communities can effectively customize it — well, you get a lot of leverage in the form of standardized toolsets and the like.

Most important, all of this can take place independently of administrative structures, institutional gaps, or any other incidental barrier. It doesn’t matter who the repository operators are, or where they are, or what sort of institution they’re affiliated with. No consortium or other administrative apparatus is needed. It is up to the service provider to decide what makes a useful aggregation. And that is a very scalable idea.

Let’s hope it’s salable as well as scalable, because of course it depends on network effects for most of its value. It’s working that way in the digital-libraries world, where OAI-PMH (the Open Archives Initiative Protocol for Metadata Harvesting) is now a basic standard used by hundreds if not thousands of sites. Creators of legal data can do likewise. The protocol is easy to implement, and at the LII we are making it even easier by building OAI implementations that can easily be bolted onto existing case-management systems and otherwise fed from existing repositories. If you’re interested, take a look at, where we’re starting to put things together (including a reference implementation you can tour, and for which we will shortly release code that all can use).

Oh, and interoperability? Well, it turns out that it takes the form of a lot of really geeky and scary-looking XML. But it could just be the best thing to happen to the free exchange of legal information since the death of Law French.

Comments, please.