Interoperability — it’s a big topic, and an important one in the evolving legal infosphere. I’ll try to make it a little more manageable by breaking it up into a series of posts that will appear over the next several weeks.
In general , the idea is that collections of online legal information should work together (and with the audience, and with other legal-information authors and providers) in a way that makes it easier to develop and use information services. Services that span collections offered by different providers, such as cross-site search and current-awareness services, are especially attractive. Interoperability is created at several levels of technical implementation, but the underlying ideas are simple: similar repositories should be transparent in the way they expose their information to the rest of the world, and if possible, they should do so in reliable, standardized ways that span a community of interest. Sometimes this involves standard-setting within the community; often, though, it’s just a series of small, sensible design decisions.
Here at the LII, our first stab at interoperability was the use of “head-compatible” URLs — the idea that a document’s address should be easily guessable by a would-be linker or author. For example, the Wild Horse Annie Act, 18 USC 47, becomes http://www.law.cornell.edu/uscode/18/47.html . Grutter v. Bollinger, 539 US 306, becomes http://www.law.cornell.edu/supct-cgi/get-us-cite?539+306 (though these days we’d probably change that “supct” to “scotus” just to be like everybody else). The idea is to make it easy for other people to link to your stuff, whether they are authoring manually or building something by automated means. Simple enough.
Interoperability is built on a series of seemingly-trivial decisions like that, decisions that favor common sense, transparency, and ease of use for those who want to build other things on top of information you’re providing. Good practice also involves a commitment to transparency and maintenance over time. We’ve changed the way we handle New York Court of Appeals decisions at least three times since 1996, when we first offered them. All our old systems of document addressing still work as well as they ever did (take a look at the two links to Wild Horse Annie in the last paragraph) . The same is true of all of our old “captive-search” URLs as well (for example, the one that will return Supreme Court decisions on employment discrimination). There, interoperability rests on a 50-line Apache server module that translates old search-engine search strings into whatever we are using now. It replaces hundreds of lines of mod_rewrite code piled up over five generations of LII search engines.
Interestingly, years of debate over “persistent document identifiers” — pURLs, DOIs, and all that stuff — have finally resulted in some recognition that the problem is not one of creating complicated, all-embracing alternative schemes for the naming of information resources. Rather, it’s a matter of consistent, transparent practices by people who run web sites. And there’s a lesson in that — no matter what technical schemes and formal standards are used to create interoperability, it ultimately rests on non-technical practices and concern for quality.
So interoperability — even the simplest kind — takes thinking and a fair bit of work. Even so, you’d think that the First Commandment of Legal Information Interoperability — “Thou shalt make thy URIs harmonious with well-known document identifiers like citations” — would be honored more often than it is. Good luck with that.
You can’t find opinions on the sites of the Circuit Courts of Appeal that way, for instance, nor can you link directly to sections of the Code of Federal Regulations in the new(ish) e-CFR collection at NARA. The reasons in each case are instructive. The Circuit Courts don’t do it at least in part because print citation information — unfortunately still the best-known (and usually the only) set of addresses for these documents — isn’t available at the time the decisions are put online. It’s not clear why they don’t return to the decision and add the print citations when they become available; probably it’s a matter of time and expense in courts that issue thousands of opinions every year. Vendor-neutral citation — which numbers according to the order in which decisions are issued rather than using page numbers from a bound volume — would do better, but very few courts use it in a way that is reflected in their URLs (the court system in Ohio is a notable exception). In NARA’s case, the reason given is that there is just too much churn in the numbering of sections to allow reliance on section-level URLs (we note that the GPO has no such misgivings, but then they are flighty types, and not archivists at all). You’ll see these same problems — accidental fallout from reliance on printed books, and a concern for stability at the expense of practicality — oddly spattered across the legal-information landscape. It’s a weird product of time and place. If reporters of decisions, and others who publish legal information, brought the same meticulous and conscientious pursuit of standards and interoperability to cyberspace that they have to print, we wouldn’t be talking about interoperability at all. We’d have it. But those folks are, for the most part, not yet as comfortable in the electronic world as they are in the world of print. I’ll say more about why that is in a minute.
But I’ve skipped over a couple of points on my way here, and they’re crucially important:
First, why would technical discussions about citation and URIs pass anybody’s who-cares test? Simply: document addresses always matter in retrieval systems, as does the granularity of the documents they address. Importantly, this is one place among many where legal-information systems have a very different character from other kinds of text databases, especially in common-law jurisdictions. Document addresses matter because it is crucially important that everyone look at the same authoritative information when resolving a dispute (that means “same” as in “same text”, not as in “same web site” — it is no more necessary to put all electronic legal text in a single repository than it is to put all the world’s books in a single building). So we need a common way of referring to things.
Second, why do we need any kind of interoperability between collections at all? Wouldn’t it be easier — and more authoritative — to just put everything in one place and give people access to it? Old-timers on the legal information scene will recognize this as the “why don’t we build a public-domain Westlaw?” question. That question has been asked many times, always in the interval before a newly-arrived group of legal-information enthusiasts discovers the dimensions of what it has taken on. It is a particularly thorny issue, and I’ve walked up to it before:
End users have been conditioned by training, experience, and careful marketing campaigns to value particular aspects of familiar systems like LEXIS and WESTLAW. Those systems are strongly branded, and have a deservedly high reputation among those who have been able to use them. It is not at all surprising (though it is at times dismaying) that experienced users find it difficult to recognize those same virtues when they are produced by new and unfamiliar implementations. At the core of this phenomenon is a bias induced by thirty years’ experience with older computer systems and older modes of industrial organization: centrality equals reliability. The Internet approach stands in sharp contrast as it argues the contrary: decentralization equals reliability, attainability, and scalability. On some profound but subliminal level this is news that shocks and bewilders. New, distributed models of computing that are reflected in distributed information systems and distributed models of business organization must seem inherently anarchic and therefore inherently suspect, no matter their virtues. That suspicion will subside in time, to a degree. But it will never vanish entirely until we become more discerning than we are about what was necessary about older ways of doing things and what was merely incidental.
I don’t want to re-fight that battle, particularly in something that is meant to be a quick run through the issues. But quickly: distributed systems provided by diverse actors allow for audience-oriented customization, encourage greater innovation, lower barriers for new entrants into the market, encourage greater care and authority (as when the publisher is also the law creator), conform better to our existing systems of organization for courts and legislatures, and can scale much larger over a more diverse collection of actors. In an environment as jurisdictionally and administratively complex as the United States, a distributed model may well be the only approach that works. Other, smaller jurisdictions provide counterexamples, notably Canada and Australia. Even there, there is no attempt for the systems to take in all the legal information that might be produced by agencies, municipal government, local commissions, and so on. There will always be a need for federation. That need becomes all the greater as multiple niche providers of legal information emerge to serve a diverse variety of audiences. We are on the brink of such a system, if we are not there already.
The price of such a federated, distributed system is the development of standards. In a system where there are only a few, supposedly comprehensive legal publishers, standards are whatever that publisher says they are. That is the system we had in the US until relatively recently, and it is one that commercial publishers fight very hard to maintain. Interestingly, it is also a system that has a deeply symbiotic relationship with the source publishers of legal information (reporters of decisions) I was talking about a moment ago.
That profession, whose leading light once referred to his staff as “double revolving peripatetic nit-picker[s]“, is accustomed to looking for nits that might accidentally find their way onto the pages of the bound, authoritative volumes produced by an official publisher. This was not neglect on their part; it was what professional practice demanded at the time, and still demands today. But the world of published legal information is a much bigger, more digital place now, and they are adapting to it, but slowly. The websites of many courts and legislatures are not crafted with the same attention to detail as their printed output. A new understanding of best practices is needed — one that brings to the electronic realm the same meticulous concern with consistent metadata, citation, and organization that reporters of decisions have always brought to print. These are emerging, but slowly. And there are many costs associated with a lack of workable standards, though they may only be immediately apparent to those who try to build resources that span multiple court web sites without themselves housing the information.
Citation and, more generally, systems of legal document identifiers are just one example of legal metadata we should make transparent and interoperable. It is probably the most urgent case. Next time, I’ll move on to how broader interoperability might be done. Interestingly, many of the problems have already been solved.