The LII's 20th anniversary -- that's right, it's been that long -- is an occasion to look over the world of open-access legal publishing, to think and talk about what's changed and what hasn't. I could rattle on about a number of things -- and will, in future posts -- but right now I'm thinking about where the original LII is headed technically, and why.
For most of the past 20 years, people who promote open access to legal information have been preoccupied with case law. That's understandable, for reasons that have little to do with the great inherent value of putting case law out where people can see it. In the early days, most of us were found in legal academia, a culture obsessed with the study of appellate edge cases. And -- no matter what institutional base an LII was operating from -- those were days when publishing the decisions of the highest appellate court available offered the benefits of respectability-by-association, in a professional culture where novelty is suspect. That early bias toward case law has skewed our ideas about what we are and who we serve. Even here at the LII, where we make a lot of noise about serving a population beyond lawyers, we have spent most of our time and intellectual effort building systems for which lawyers are the primary audience.
By contrast, much of the world sees case law only as an interpretive layer that exists on top of codes, statutes and regulations. Many also believe that there is such a thing as the straightforward application of a statute or regulation, without the need for professional interpretation. People calculate risk and decide on courses of action all the time without help from lawyers. They doubtless do that more often than they should. Interestingly, the calculation they are making is often one about how much help they need, and it often works out in favor of seeking professional assistance. That is why other consultive online services like WebMD work, and it is one basis for Richard Susskind's ideas about the "latent legal market". There are many situations in which interpretation is needed. But there is also much that is straightforward. The size of a truck tire or the width of a wheelchair ramp is apparent to anyone. If everyone who is touched by a regulation waited for interpretation by lawyers -- much less law professors -- before doing anything, economic life would grind to a halt.
But I digress. The point is, it did take most open-access publishers outside government rather longer to get to statutes, and longer still to get to regulations. That was partly because of the bias toward case law, and partly because statutes and regulations are difficult for under-financed publishers to wrangle -- hard to parse, hard to keep current, and hard environments in which to build anything that depends on editorial conformity. But we did get there, and in time, an array of stovepipes -- case law, statutes, regulations -- was built within each of many jurisdictions by many groups and organizations acting more or less independently, though with a great deal of mutual awareness.
From the beginning, it was worthwhile to break down stovepipes between those corpora by linking cross-references, building common search mechanisms such as WorldLII, and so on. Only in the last few years -- largely propelled by developments in the EU -- have we begun thinking about using standards and interoperability to break down barriers and facilitate tool development among national collections in a global world (oddly -- at least until recently -- there's been much less thinking about breaking down state/Federal barriers inside the US). In the debate surrounding the Thomson-West merger in the mid-1990's, John Lederer remarked that those charged with evaluating the potentially anticompetitive effects of the merger were ignoring an important fact: lawyers don't buy books -- they buy systems of books. For our part, we've been building systems of legal databases. We connect legal information with other legal information of a different flavor, and to the same flavors of legal information in other jurisdictions. Legal information to legal information to more legal information. We are like a guy who, for 20 years, has been sitting on a barstool talking to himself.
Linked Open Data is a way of relating data to data, of assembling statements about things in the world from different sources. I believe that the next few years, at least, will be about building data architectures that link law not to itself but to the rest of the world. That is much easier to understand if we think about regulations than it is if we think about case law. After all, case law squints at concrete objects in a way that blurs them into abstractions; one lengthy passage in Llewellyn's Bramble Bush talks about the way that case analysis rejects irrelevant facts and turns the remainder into inhabitants of more abstract categories and concepts. As Dan Dabney has pointed out, the problem in information retrieval for case law is often how to get from mangy dogs to the implied warrant of merchantability.
Regulations are very often about mangy dogs and not about legal concepts. They are about things -- things that carry legal burdens and requirements that are important to people who use, work with, live with, pay for, manufacture, grow, create or are otherwise affected by those things. How many things? Pretty much every thing. The other day, we gave a presentation on our use of Semantic Web technologies with the CFR to a bunch of information-science faculty and grad students. They were sitting on office chairs in a seminar room eating bagels. I discovered that all those things are mentioned in the Code of Federal Regulations. As was most everything else in the room, including the air. That's ubiquity. That something is an object of regulation is an important fact. And the objects themselves are multifaceted in the way they relate to the world, and often defined differently in non-regulatory contexts than they are in the regulations themselves.
The Semantic Web -- and Linked Data -- are very much about things, and about the ability to relate things that are not on the Web with things that are. In a post on VoxPopuLII last year, John Sheridan of legislation.gov.uk talked about "accountable systems" -- systems that embed knowledge of the legal requirements surrounding the objects they contain. That's one class of applications we can create. We can also make regulatory information more accessible by simply relating it to the world of information that exists around the objects being regulated. There is room -- lots of room -- for us to consume as well as publish linked data.
We just began offering one example -- a trivial one, really -- at the LII:
- Go anywhere within Title 21 of our newly-offered Code of Federal Regulations.
- Enter the word "tylenol" in the "Search CFR" input field in the toolbox at the upper right, and push the button.
Your results will be meager -- one CFR section that mentions Tylenol in passing. But you'll also get a list of suggested search terms that we pulled from the DrugBank collection of linked pharmaceutical data, and if you hover your mouse over each one you can see its definition from DrugBank. The list may seem a little strange and expansive -- it includes all of the active ingredients in all the Tylenol-branded products, including Tylenol cold medicines and sinus and allergy formulations. In time we'll figure out how to break it out by product. But seeing "acetaminophen" in the list helps remind the user that drugs are regulated under generic names, and that it is usually the components of a mixture that are regulated rather than the mixture itself. Try the same thing with "Nyquil", and you'll find all the terms that relate to its components.
That rather simple exercise in expanding search terms by using Linked Data from another domain bridges a major disconnect between the way average people think about what's being regulated and the way that regulators express themselves. There are no doubt many other ways to do similar things in other topical areas. Right now, the CKAN database of Linked Data collections contains a few less than 3500 entries. It may be a little hard to figure out what use we might make of the Greater Manchester Bus Timetable, but easier to see how the UN Classifications of the Functions of Government or the various agricultural vocabularies might prove useful in connecting primary legal information to things in the real world.
For 20 years we've been "opening" access to legal information without doing nearly as much as we could to situate that information in a world that is inhabited by non-lawyers. At its most fundamental, information retrieval is a transaction in which a user uses something she knows to get something she doesn't. Moving from simple availability of legal information to real access involves making those trades easier for the user. We can do that by linking primary legal information to the things it regulates, the things that people encounter in their environment. That should be our business now. It may be unglamorous, but it will meet a lot of people where they live.
[ Hopefully, it won't be two years before my next post appears here; it's easy to drop blogging when other things get busy, and they have surely been busy lately. For what it's worth, I'm also writing about more technical stuff over at Making Metasausage. ]