skip navigation
search

Bruce Thomas

Thomas R. Bruce is co-founder and director of the Legal Information Institute at the Cornell Law School, the first legal-information web site in the world. He was the author of the first Web browser for Microsoft Windows, and has been the principal technical architect for online legal resources ranging from fourteenth-century law texts to the current decisions of the United States Supreme Court. Mr. Bruce has consulted on Internet matters for numerous commercial, governmental, and academic organizations on four continents. He has been a fellow of the Center for Online Dispute Resolution at the University of Massachusetts, and a Senior International Fellow at the University of Melbourne Law School. He is an affiliated researcher in Cornell's program in Information Science, where he works closely with faculty and students who experiment with the application of advanced technologies to legal texts. He currently serves as a member of the ABA Administrative Law Section Special Committee on e-Rulemaking, on Cornell's Faculty Advisory Board for Information Technology, and is a longtime member of the board of directors of the Center for Computer-Assisted Legal Instruction. He has been known to play music at high volume.

scotus-scaffold1.jpgSomewhere between 13 and 22 seconds after the first offering of free content on the Web, the publisher asked herself, “How am I going to pay for this?”.  And web publishers have been asking that question ever since.  The current economic meltdown makes it a more urgent question, but it’s always been there.  We sum it up in a slogan:  this service is free, but it is not costless.  We spend a lot of time and effort trying to resolve that conflict between our aspirations and the need to buy groceries.

The open-access-to-law community (particularly in the US) has had trouble with this.  There are a variety of solutions, few if any complete in themselves.  Most open-access providers originally depended — as we have — on grant funding, and on extensive support from a parent institution or a consortium.  Most have added consulting income to the mix.  And many get income from commercial partnerships, often based on the sale of back-end bulk data services.  The most stable model is CanLII‘s, which is financed by a head tax on Canadian lawyers.  An excellent paper by Graham Greenleaf (abstract here, slides here), offered at the recent Law via the Internet conference in Florence, describes one prominent free-access provider’s experience in keeping the doors open.

In any case, open access to law presents some unusual sustainability problems.  And those problems vary a lot from place to place. Institutional and political settings are very different, particularly in transition countries ( no doubt some of our problems here at the LII would no doubt be seen as high-quality problems by others).   Here’s a brief catalog — full treatment would need a very long article indeed:

First, open-access providers don’t really do research, in the sense of either basic science, or quantitative social science, or any of the things funded by research-oriented outfits like the NSF.  We can sometimes make a plausible case for ourselves as testbeds or helpers on grants that go to others (as the LII has with its participation in the CeRI project).

Second, we are continually faced with rising costs of innovation. The new legal information products and services we imagine, and hope to build, are significantly more expensive to produce than the things we imagined when we began 16 years ago. This is partly the result of the Web’s technical evolution and partly the result of more sophisticated needs and wants — a slow but steady revolution of rising expectations that we share with our users.  In 1993, we could significantly raise the sights of everyone in the legal information world by spending a snowy afternoon putting Two Pesos v. Taco Cabana into HTML — a one-person task that created the first web-published judicial opinion.  These days, it takes many more people a lot longer to come up with something that interesting and useful.

Like other not-for-profit projects, we have more trouble finding operating money than we do finding startup funds.  A lot of people would like to see their law (or someone else’s) put online. Very few are willing to pay to maintain it.  This is a particular problem with legislation, which requires frequent updating.  It also distinguishes legislation and regulations from scholarly publishing, and from many open-access repositories, which (like judicial opinions) gather material that is relatively static once mounted.

Some deep-rooted reluctance surrounds the funding of legal information, perhaps based on the idea that free legal information is just lawyer subsidy, or only answers the information problems of the rich (as Dan Dabney once put it).  Who wants to feed the sharks?

Open-access providers could do more than they have to dispel this distrust.  For the most part, we’ve made the case for open access in highfalutin’ normative terms:  support the rule of law, level the playing field for human rights, force the state to meet the transparency/publishing obligations implied by the idea that ignorance of the law is no excuse.  That’s a regime where success is hard to measure, if only because assessment is often self-referential. We declare our results to be good because our intentions are noble. We are Doing the Right Thing, and who would question that?  We are now starting to see some work on evaluation of open access in much more hardheaded terms — how does it contribute to lawyer competence, support economic development, level the playing field between one-shot litigants and repeat players?  These are important questions that will, if we are able to answer them rigorously, provide us with a strong case for support.

And then there is the “Tweed Ring” challenge — illustrated handsomely by this Thomas Nast cartoon.  Everybody thinks that free legal information is the next guy’s problem. Private foundations think government should do it. Government thinks that it is doing it (via weak services like PACER), and that anything it isn’t doing must be some kind of value-added service that the legal profession should pay for, or a matter of interest only to academic researchers.  The legal profession would like to pass research costs on to the clients, and increasingly can’t (small and solo law offices probably never could).  And so on.

Nobody thinks that free legal information is a bad thing, or unworthy.  It’s just not at the top of anyone’s list.  The US is a particularly severe case because (unlike the countries of the European Union, where partnerships like ITTIG are fairly common) we have never thought that work in legal information was a matter for cooperation between government and academia. Partnerships have been almost exclusively with private industry, though private industry sometimes relied on basic research produced by information scientists.

This circle may break soon.  Government transparency-by-web will no doubt get a lot of attention from the new administration; expectations among the “hack-the-gov” crowd are already very high, and there is good reason for this optimism.  We are seeing what amount to open-access legal information projects of this kind demanded by assessments like the recent ABA committee report on e-rulemaking and realized in the efforts of people like the Sunlight Foundation (GovTrack.us is a good example).  And certainly the economic crisis is pushing us toward greater awareness of global interdependency and with it the need for global transparency of regulatory and other legal regimes.

At the LII, we are moving toward sustainable self-support; we’re not there yet. We get about 20% of our current budget from the generous contributions of private individuals who think we’re worthwhile (and if you’re one of them, let me thank you again on behalf of all six of us here).   We recently added Google ads to many of our pages; we’re seeking sponsors for others.  And this week marks the launch of a new service offered in partnership with our friends at Justia.com : a lawyer directory service that offers great value, and an opportunity for us to give reciprocal benefits to those who help us out either financially or with donations of their expertise in the form of wiki content and other things that benefit our audience.

We have great hopes for it.  Talk it up.

[ Guilt-inspired author note:  apologies for the long absence.  It’s been an unusually busy six or seven months, even by LII standards.  With luck, this will be the first in a series of posts reporting on what I’ve seen and learned in that time.]

cwr1a.jpgI have a couple of hobbies. Actually (as those close to me would tell you) I have an endless series of momentary obsessions. But a few have met the chronological challenge and persisted, so they’re hobbies. For one, I deal in antique woodworking tools. I also ride around on a bike. It’s a nice thing to do in Ithaca during the lighter months. Saturday, I was intent on both — a fellow up near Syracuse wanted to buy some planes and a set of auger bits, and I wanted to take a bike ride around Cazenovia, which is a pretty area, and fairly flat for this neck of the woods.

My tool customer was a guy named Sean Murphy, who works for (probably owns) a company called CWR Manufacturing in Syracuse. They make cold-formed parts for, well, pretty much anything. He mentioned automobiles, small electric appliances, casement windows, and electric motors among many other things. He is just getting into woodworking — and built his own dust collection system, whose cyclone chamber he welded up with a MIG welder, from plans he found on the Web. I am guessing he is a very good craftsman.

More to the point, he’s a happy, and habitual, LII user.

Sean uses us for information about employment law, intellectual property law, and as he said “the sort of thing that a guy in manufacturing needs to know to run a business”. I noticed that he didn’t blanch when I referred to the “CFR”. And he told me about how he had heard a lecture on intellectual property in a course he was taking (working toward an MBA), and started to wonder how he could use that to protect his company’s work product — none of which is patentable. As a result, he’s introduced the use of the non-disclosure agreement to his industry.

The large companies he deals with often do a kind of technology transfer — Sean’s crew is hired to produce a new part using cold-forming technology, and the company that hires them tries to learn as much as possible about what goes into the design and manufacture of the part. CWR does that for a couple of years, and then the client figures out how to use his own equipment and engineers to replicate the know-how that Sean’s crew has provided in designing and making the new part. Or the client passes the design and the knowledge on to another supplier who will make the part more cheaply, maybe offshore, leaving CWR behind. Not so good, that. After hearing the lecture, Sean consulted with the lawyer who had been brought in to give it — and as a result, now makes signing of an NDA a standard part of his arrangement with new customers. Big deal, sez you — everyone in the software industry signs five NDAs before lunch. That’s right — and now everyone in manufacturing will too, and the situation for small shops like Sean’s will improve as a result, because the client will no longer be able to walk away with CWR’s real work product: the know-how involved in re-engineering the part for cold-forming manufacture.

I get two very warm and fuzzy things from this: first, another anecdote to add to the many that tell us that the audience for legal information goes way, way beyond lawyers, and second, an indication that maybe what Richard Susskind said about a more transparent legal information regime increasing, rather than decreasing, the need for legal services is proving itself. If — as we’ve long thought here — more accessible legal information means that people are less apprehensive about approaching the legal system (or feel better prepared when they do), then more people like Sean will do so. And the result will be an increase in the use of legal services in a preventive way. Nothing new about that, of course — but I’d like to think the numbers are going up as legal information gets more and more available.

These days, if you say something is “like a legal version of WebMD” , people are inclined to think in terms of consumer law, bankruptcy, divorce. But it’s also about small business, entrepreneurship, and a healthy economy. That is, if you’re like Sean.

And he’s a good businessman — I know because he wouldn’t pay my ridiculous prices for a #8 jointer plane and a set of auger bits.

bigdog.gifAlong with LII Editorial Boss Sara Frug, I spent yesterday morning with the folks at the Sunlight Foundation — an organization with a compelling mission and a growing set of activities that reflect it. Founded with the idea of using Web 2.0 techniques to bring transparency to Congress, Sunlight is now becoming a rallying point for a diverse community of folks who share the idea of making government better by making the information it generates and consumes more accessible.

That’s an idea we find really attractive. We’ve been amazed — shocked, really — at how little access government has to its own work product (never mind the public). We recently learned that some branches of the Federal courts limit access to the commercial legal information services based on seniority; we understand that the same is true of Federal agencies, where junior people don’t have access to Lexis and Westlaw [note to legal research teachers: “junior” would pretty much describe our recent graduates, wouldn’t it? Think we should be teaching them more about free online sources? ] Our e-mail is chock-full of questions and testimonials from government attorneys who rely on our edition of the US Code and our Federal rules collections. Our most successful projects over the last fifteen years have involved improving or re-mixing the presentation of Federal data to make it more easily used and understood by a broad audience.

So our question for John Wonderlich at Sunlight was “how can we help?”.

Turns out there are a number of ways. We have a lot of expertise in the arcana of Federal data online, experience with data standards, software tools that have remained in-house because we didn’t think anyone else had any use for them, and so on. There are a lot of ways that the LII can and will participate in the growing community of technologists who want to “hack government”. One of the best ways we thought of involves some help from you… particularly if you are a law librarian, legal scholar, or anyone else with experience working with government documents.

We know from experience that some online documents are especially useful to people building new services on top of government information. Here are some examples of these “linchpin” documents:

These are documents that provide important information about the context and structure of government, or that link isolated pools of legal information together. For example, the classification tables form the basis of our US Code updating features; we parse them into a database that is then used to power both clickable update links and RSS feeds. As published by the government, they are difficult to use; they would be near the top of our list of documents the government should be publishing in easily processed XML form. Put another way, they are the documents that are most useful in building online services that make legal information more transparent. They would be a good focus for our efforts here at the LII, for the growing community of government-transparency hackers, and for lobbying of (and cooperation with) the GPO’s FDSys effort.

We think that it should be possible to build a list of (let’s say) 100 such documents — the Big Docs that those who develop these kinds of services would most like to see placed online in a form that is both easily processed by machines (ie. XML) and continuously updated by the official body that creates them. What would your suggestions be? Put ’em in the comments, please.

[editorial note: this week’s post is a bit later than usual because of my stay at the 2008 CALI conference, about which more next time. The posting schedule will no doubt continue to be spotty throughout the summer — I’m travelling and talking more than usual — Tb.]

beerbeggar.jpgThis is probably not the time for it, but honesty compels me: very few people can turn off a car radio as fast as I can when an NPR fundraiser is in progress. I’ve sent them money, and two cars. But my local NPR outlet manages to combine self-righteous patronizing with mind-numbing repetition in a way that causes me to tune out very, very quickly. I hasten to say that they’re far, far below average in this respect. I know of no other NPR station that would call the Triple Cities the “Treble Cities”, or thank its donors on-air in a tone usually used to give positive reinforcement to a slightly retarded Labrador retriever who has just managed “roll over” for the first time. These shenanigans are a strong deterrent for somebody who has to make the case — to a broad swath of the public — that our work at the LII deserves support.

It does.

That “broad swath” is one reason. We put law — well organized and well explained — in front of a lot of people. We had close to a million and a half unique visitors last month, from over 200 countries — that’s about 20 million page views, maybe slightly more. Some of those visitors are lawyers, many are private citizens, and the majority of them are making use of law in some professional way. Those that are not lawyers are doing risk management — figuring out whether the advice of a lawyer is good, checking the implications of particular courses of action, trying to figure out what they’re expected to do. Among those who are lawyers, a large number serve the public, either as government officials or as workers for non-profits and public service organizations of all kinds. Our LIIBULLETIN subscription list is fascinatingly democratic — it includes schoolteachers, professors, insurance guys, cops, the members of the Supreme Court practice group at Akin, Gump — and many, many others.

I like to avoid platitudes when I can. In a place where the justice system has no access to the precedents on which it is supposedly built, the argument for open access to law is all too obviously an argument for a fundamental human right. But, even (or perhaps especially) in developing countries, it is also a practical argument about how difficult it is for people to discover the legal framework for starting a business, buying and selling things, hiring and firing, and so on. No matter where or who you are, finding out what you’re expected to do can be expensive and troublesome. One compelling reason for open access to law is that it should be neither of those things. People in all sorts of places, for widely different reasons, have a right to legal information and practical, powerful reasons to use it to make their situation better.

These are things we’ve said before. They’re compelling and true. But they don’t really show the LII as the uniquely valuable activity that it is. That’s a much more difficult case to make. It lacks the overtly emotional grab-points that cause people to give reflexively, and that fundraisers love. Let’s try something a little more thoughtful instead. It goes like this:

Increasing numbers of people and institutions can publish law, many people and institutions already do, and soon many more will. Some of them are late to a party they should have joined long ago — particularly those courts that are only now offering open access to their decisions. But there they are nonetheless. Their efforts at self-publication are being substantially assisted by other open-access providers like AltLaw and public.resource.org. We applaud these guys, and we help them when we can. We’re also different from them in one important respect: we’re looking ahead to what will need to emerge once everyone is publishing what they should. Within cramped resource constraints, we’re working to figure out what a seamless, highly usable, multinational legal information commons would look like and how it might be built from the isolated, individual collections that now exist. We’re constantly thinking about how to integrate caselaw, statutes, regulations, and explanatory material in ways that make the whole greater than the sum of the parts.

We like synergy because we’re made of it. The LII is unique in its location: affiliated with a graduate law program, inside a research university where, among many other things, you can find many of the digital librarians who did the fundamental architectural work on the emerging global structure that supports electronic publishing, world-class researchers in language technologies, and — significantly — very smart students who don’t hesitate to jump the boundaries between these disciplines. In the fifteen years that we have reliably provided public access to legal information, we have made great use of the efforts and insights of a wide variety of people in creating pathbreaking, innovative services and techniques now used by many.

As time has gone on, we have found this harder and harder to do. Most of you know that we do this with very few people: just six of us for three quarters of a million web pages. That staff is adequate to maintain and improve what we’re doing now — but we have little time free to do what we do best, which is to conceive and develop next-generation services. That is so for two reasons. Our maintenance responsibilities grow with each new database and service. And, after 15 years of Web development, innovation has become an expensive effort demanding time and talent in increasing amounts. In 1992, my co-founder Peter Martin could spend a snowy afternoon marking up the text of Two Pesos v. Taco Cabana in a new thing called HTML and produce something the world had never seen before — the first Web-published judicial opinion. Projects with comparable impact now take teams of people working for months. We have many more such projects on our list than we can possibly do. We would like to narrow the gap between our reach and our grasp, and with your help, we will. While we’re not (yet) announcing anything as complicated or ponderous as a traditional fundraising “campaign”, we would like to be raising an amount equal to one dollar each year for every repeat visitor we had to the site last month.

If you’ve taken the trouble to read this much of this blog post — cheerfully self-serving as it is — then I’m going to assume that you’re fairly committed to us and our cause. I’d like to impose on you by asking you to do more — not an increased dollar contribution, but something much harder. Please help us persuade others to join you in supporting the LII. Many of our supporters are lone individuals among many in an organization who know and value what we do. We need volunteer “recruiters” to bring others into the fold, and you can help. Just ask somebody you know who uses the service whether they’ve thought about how it’s done, who does it, and whether they think those LII folks drive fancy cars. (I don’t, personally, and you already know what I do with the old ones).

Direct support by our users is, for us, the best kind of support. It is free of partisanship and of imposed research agendas. It is a vote for what we are, and what we might become, and not what somebody else wants us to be. We need more of those who use the service to support us — and we need those of you who already make donations to know how very much all of us here value them.

Thanks again, from all six of us. We look forward to hearing from more people like you.

throughwalls.jpgHanging out at the dg.o digital government conference this week has got me thinking about the relationship between a legal-information infrastructure-builder like the LII and digital government projects generally. The LII moved into e-government work quite easily. That happened because both are concerned with increasing public access to legal process, which is often the same problem as increasing access to government process. More important, perhaps, was that the move reinforced the LII’s longstanding role as a convenor of conversations between information scientists and legal specialists. Much of my role in the project has been translating between the two. It’s not as simple as one-speaks-math-and-the-other-speaks-administrative-procedure. The two academic cultures are very different in the way they go about things, for one thing. But one of the great benefits of the LII’s location in a great research university is that we can bring world-class experts together around interesting problems, and provide real solutions to the public.One obvious relationship between digital government and legal information is simply that the LII is a reference library for all manner of digital-government activities. You can see this in our link-census, if you look for government sites that link to the LII. Many of them are using us to provide background information for web pages that support citizen interaction with government. And I have long believed that citizen interaction can be a valuable justification for a legal-information service in places that are inclined to resist exposure of the legal system. The clear benefits of increased efficiency and effectiveness in government interaction with citizens could potentially be seen as outweighing the risks of transparency.

Digital-government research — especially the part of it that involves the computational processing of regulations or other legal text — offers valuable techniques for us. The same advanced language technologies that can be used to automatically categorize and structure responses in notice-and-comment rulemaking can help us organize and navigate judicial opinions. The algorithm that looks promising for extracting metadata from SEC filings can do the same thing with Circuit Court opinions, and so on. So you can get a lot of ideas for techniques and services at a conference like this. And we have our own research and techniques to offer too. Most important, we have a very deep well of experience with non-lawyer professionals who are trying to find out about the law.

Which brings me to the first of two slightly deeper observations. Non-lawyers trying to find and understand the law face two barriers that look a lot like the barriers for ordinary citizens who try to participate in e-government. Neither has anything to do with barriers to Internet access per se, or with the usual run of “digital divide” issues. The first is that they have to somehow figure out how to map a set of terms or nouns found in a description of their problem or interest against a set of more formal concepts that map things in the way that legal or governmental agencies understand the world to be. At the LII we’ve frequently joked that we’re trying to build systems that will find the article labelled “nuisance” when the user submits “barking dog” to the search engine. This is not exactly a new idea; Dan Dabney of West Group was making the point about differences in search style several years ago, though unfortunately none of his remarks were ever published. Similar problems apply when ordinary citizens try to figure out, say, regulations put forward by a particular agency. They have their own way of talking about things.

And of organizing and doing things, too. The second shared barrier or threshold has to do with the naive user’s lack of understanding of how the government, or the legal system, is organized. The farmer who grows peaches is not necessarily going to know which agency regulates their importation or distribution, just as he is unlikely to have much understanding of jurisdiction or civil procedure when starting to pursue a purely legal question. So the question of how we get inexperienced users past barriers rooted in a lack of structural or procedural understanding is one that is shared between these two areas of investigation. And it’s an important one to solve. Public education is part of the answer, as are improved search and navigation systems, as well as tutorial systems that help users frame questions.

By the way, there are 17 separate Federal agencies whose activities touch the problem of food safety. And literally hundreds of regulations that apply to the use of dry-cleaning chemicals — something of vital concern to 85,000 small businesses in the United States, most of which are family owned. That’s a lot of help to try to give people.

I said there were two deeper observations, though perhaps this second one is more intuition than anything else. Ideas about citizen participation in the rulemaking process tend to fall somewhere on a spectrum defined at the outside by two equally impractical views. At one end, we find extreme direct-democracy advocates who believe that any attempt to channel or structure citizen input into governmental process either is now or could become a form of censorship. At the other end, some within agencies believe that increased access to process (via, say, e-mailed submissions) inevitably devolves into spam, anarchy or both. Each can point with some validity toward the excesses of the other. Both are wrong. Completely unstructured, unmoderated input is too confusing and noisy to be of much real use. Anybody who is sufficiently socialized to know that people take turns talking in face-to-face conversations has absorbed that lesson. At the same time, narrowly channeling and control of input from a set of stakeholders limited to those government already knows about is equally problematic; it misses a lot. So a better question is where and how to build reasonable structures for conversation. Likely, it’s somewhere in the middle. And many of the techniques that we create to do that will be equally useful for structuring conversations about what other kinds of law should be.

So there are lots of reasons for the open-access-to-law community and e-government researchers to be talking, and here in Montreal it’s a good discussion. As usual, the LII staff is dreading my return, because I’ll be chock full of ideas that someone will then have to build…..

o-hai.jpgPrivacy is the single most difficult issue confronting legal-information providers today. There is no resolution of competing concerns that will satisfy everyone. There is tension between the belief that it is important for the business of the legal system to take place in public view, on the one hand, and the need for individual privacy on the other. That’s at the center. On the edges you’ll find a collection of other agendas that complicate things no end; and even at the center, what we are talking about is a little more complicated than it first appears.

That “need for individual privacy” is not one thing; it’s a bundle. There are crimes that inappropriately shame the victim, such as rape, or where danger to the victim still exists should their identity or location become known. There are crimes where publicity serves to inappropriately extend the punishment of a perpetrator. There are dramatic circumstances in which it is dangerous to be known as witness or whistleblower, and many more less-dramatic settings in which the public nature of proceedings discourages the exercise of rights. And above all there is the threat of identity theft, which is now the main reason for public concern about public-records privacy in general.

And then there’s the question: “privacy in what?”. Not everything is a judicial opinion; courts generate a lot of other material that goes into the public record. Records of plea agreements are a good example of this. In January, the LII hosted a session with Peter Winn of the US Attorney’s Office in Seattle that talked about, among other things, the reaction to whosarat.com — a web site that exposes informants in criminal cases (Peter also had a lot of good things to say about policy in this area, and the video of the session is well worth viewing). In a recent paper, my colleague and co-founder Peter Martin discusses the aftermarket for court-created information among data-mining services — a less dramatic story, perhaps, but one that might tell of greater harm in the long run.

Many causes of harm are beyond the reach of policy, and a practical policy would account for that — very likely with an educational component involving judges and court administrators. These days, there is a lot of “collateral damage” — exposure of personal information that is unnecessary to the business at hand. This includes myopic court-administration practices like the recycling of personal ID numbers into document or case identifiers. In the texts of the opinions themselves, we often find unnecessary personal information about parties or (worse) others involved in the case. I’ve seen many instances where the violation of privacy has occurred in what amount to dicta, and involves someone who is not a party to the case and is not even particularly important to its story, much less its conclusion. These careless, inadvertent exposures will be the hardest to eradicate in the long run. And of course the “back file” of decisions written before widespread electronic distribution was even dreamt of is potentially expensive and difficult to redact.

Public legal-information providers are far from unanimous in their views on this subject, as recent discussion in an online group shows. The LII is a little idiosyncratic in this respect. First, we run into this much less frequently than many providers do, simply because we deal in the opinions of high-level appellate courts where problems are less likely to occur (by contrast with, say, the local Family Court). Second, we’ve taken an avowedly hard-line approach. We believe that the courts (Federal courts, in particular) have a strong obligation to deal with the issue. We won’t shield them from that. As a result, we never withdraw or suppress information that the court has made public. Our sympathies are very much with the victims — but the best route to relief for many is unrelenting pressure on courts to change their policies.

Not all courts are indifferent to these issues. Many, many state courts are making real progress, and some policies are masterful in the way they balance competing concerns. You can find many of them at sites operated by groups like the National Center for State Courts and EPIC. It’s instructive to see who was involved in discussion of the policies — take a look at the New York State Commission’s roster. You’ll find a reporter, the executive director of a domestic-violence shelter, an insurer, a publisher, a county clerk, and a Reporter of Decisions, among others.

The policies are best where the process has been open and representative. The worst policy, in the long run, is what we seem to be getting by default in some jurisdictions. If courts take the attitude that they simply decide cases, and will not concern themselves with the mechanics of promulgation, then privacy policy will be determined by publisher-vigilantes. This is unhealthy. However high their aspirations, or benign their actions, publishers represent no-one but themselves. It is not their business to determine what the de facto policy of the courts should be. Anything as difficult as balancing the fundamental principle of open court with the individual right to privacy should be decided in the open, with vigorous public involvement.

800px-us_51-star_alternate_flagsvg.pngLong ago, in a universe far, far away, David Mamet told me about his theory of jokeless punchlines. Some punchlines, he said, were so good in and of themselves that no actual buildup is required; the receiver can mentally compose the joke himself. He’s right (hell, he’s David Mamet, fer chrissakes). A few examples:

  • “For a nickel, I will”
  • “Your paycheck”
  • “Two pounds of Metamucil and a tuba”

Normally you wouldn’t think of the US Code as a place to find punchlines like that. At least you wouldn’t unless you were a legal-information blogger desperate for a catchy opening. But consider our old friend 4 USC 1 (“Flag, stripes and stars on”), which in its most current version states:

The flag of the United States shall be thirteen horizontal stripes, alternate red and white; and the union of the flag shall be forty-eight stars, white in a blue field.

As I said, this is the most current version. There are extenuating circumstances, of course, and we’ll get to them. But this little gem is good for a small but relentless stream of letters to the LII questioning either our currency or our sanity, usually the latter. And so the first thing to notice is that it exactly fits a popular prejudice about law and lawyers — namely, that they use fancy language and tortuous logic to reach facially ridiculous conclusions. About half the letters we get — no doubt sent by the large percentage of the population that hates lawyers anyway — contain some sentiment along these lines. Just like a bunch of lawyers to get something so obvious so wrong. Don’t you guys ever update your stuff?

Well, yes, we do — whenever the House Office of the Law Revision Counsel does. Of course there’s a trick to this. Two tricks, actually. First, the missing stars can be found in the notes to 4 USC 1; they were added by executive order. This is still a little confusing given the way things are presented in the notes — only the most recent executive order (admitting Hawaii) is fully spelled out; the earlier order admitting Alaska is listed as superseded, but unless you’re paying close attention, you’ll wonder where the other star went.

Second, the one-star-per-state algorithm is given in 4 USC 2. These just confirm conventional statutory-research wisdom: always check the notes, and always check the sections adjacent to the one that seems most interesting. These are both good advice, of course — but they are advice that no member of the general public will have heard, or have had any reason to hear. And they give the lie to the idea that somehow the general public can’t do — or shouldn’t do — legal research. Just as well tie somebody to a chair and then complain that they can’t dance.

Some further observations:

  • No doubt there are statutes that the public would find difficult to understand. And no doubt there are some where the unwary researcher would need to pay close attention to interpretive caselaw in order to find out what’s really going on. The number of stars on the flag falls into neither of these categories.
  • It would be reasonable to ask why 4 USC 1 has never been amended to reflect current reality. Must be because 4 USC 2 is deemed to cover the case.
  • This represents at least two failures of information design, and probably three. The first is the LII’s — unlike many sites, we choose to present the Notes separately from the text of the section. The second is an accident of section-ordering or of draftsmanship; why not give the algorithm first, or combine it with 4 USC 1 so that the present state of affairs (circa 1947) and the way forward (the algorithm) are stated side by side? The third is an inevitable consequence of search engines — the places people end up after clicking a search engine result link tend to be seen as the only places there are. As a web designer, it’s sometimes hard to figure out what to do about that.

Is this whole thing just a straw man? You bet it is. Anyone actually typing “number of stars on the flag” into Google would be rewarded with an article in Wikipedia that sets out the entire history of the flag and its many alternate versions. Several articles, actually. But I’ve always loved it as an example of what can go wrong with legal research, precisely because the fault is so obviously with the organization and presentation of the information rather than with any lack of knowledge on the part of the information-seeker. And that leads me to ask: how often is that the case, and what can be done about it?

Pretty often, and not a lot — at least with primary materials, and at least as we’ve thought about the problem in the past. A lot has been done with so-called “plain English” laws, but these generally reach only the words used and the typography with which they are presented. And it is usually presented as a consumer=protection matter, conceived mostly with respect to contracts and other documents created by private parties, and not legislation itself (though regulations do get a fair amount of attention). Seldom if ever is anything said about structure, availability of secondary sources, or information design. I’d be the first to admit that the latter would be very hard to spell out — but if the statutes related to pornography can be interpreted on an “I-know-it-when-I-see-it” basis, why not good information design too?

Taken together, I find all this a powerful argument for the creation of the kind of secondary materials that we make here at the LII — intended for lay persons with problems they are trying to understand, and sophisticated enough for (say) a practitioner entering an unfamiliar area. And figuring how we can connect these kinds of contextual or explanatory documents with the primary materials they explain is a matter for ongoing exploration and engineering. The Web makes it very easy to build structures that say “show me more detail on this” or “let me drill down”, and a lot harder to automate the process of providing meaningful context for the user who is puzzled by something he sees on a particular page. That’s the business of the Semantic Web, of advanced text-categorization tools, and of essay-writers and information architects — all things you can find here at the LII, on a good day.

gull.jpgOuch. I’ve just been looking over the last blog post on interoperability, which has all the charm of an underfed seagull on crystal meth. Squawk, squawk, squawk. Amid all the screeching in the last post, it’s a little hard to figure out what the point was. So I’ll just say it: folks, the future does not lie in putting up huge, centralized collections of caselaw . It lies in building services that can work across many individual collections put up by lots of different people in lots of different institutional settings. Let me say that again: the future does not lie in putting up huge, centralized collections of caselaw. It lies in building services that can work across many individual collections put up by lots of different people in lots of different institutional settings. Services like site-spanning searches, comprehensive current-awareness services, and a scad of interesting mashups in which we put caselaw, statutes and regulations alongside other stuff to make new stuff.

There are some services like that. AltLaw is one; so is the Public Library of Law; so are the Legal Research Engines that the Cornell Law Library runs; and I’m sure I’m omitting many more, including some we built here at the LII more than five years ago. Most are either “framers” ( who put a wrapper around multiple sites operated by others) or “spiders” (who, like Google, crawl other sites and federate the content in interesting ways). These are fairly blunt instruments — they don’t show much in the way of law-specific metadata — and the spiders in particular are hard to maintain. And there are really very few of them. There has not been very much building of distributed services in the legal-information world.

Why is that? It’s partly because trying to build and maintain site-spanning applications in the absence of standards is insane. Source material moves around. Sites disappear and reappear. Firewalls suddenly block your robot, then just as surprisingly stop, after you’ve spent two weeks finding an e-mail address for a webmaster who is wisely concealing her identity. Robots.txt files suddenly sport new policies. The subdirectory on site X that holds the decisions for the month of December changes its name from dec2007 to december2007. The name of the judge writing the opinion gets moved from the third line after the second <H3> in the document to the fourth line. And so on. And on. And on. The whole thing is a house of cards, because there are few common practices among sites and very little consistent practice on any site. This makes it very difficult to automate things, and things that can’t be automated won’t scale. In such an unstable environment, building services that remain reliable over the long term is very difficult. And (I speak from experience here) it’s mind-numbingly annoying, because the things that (frequently) break such services are trivial and preventable and numberless and arbitrary. For a programmer, it’s like being nibbled to death by ducks.

These are not new problems. Librarians and others concerned with long-term information availability have been discussing these issues since about fourteen minutes after the first Web site appeared. Reporters of decisions and court clerks have long settled similar issues in print publication, and are beginning to do so on the Web. But much more is needed, and faster. The Web is not going away, and Web publication of legal information should not be thought of as a kind of unfunded mandate delivered as a sop to those who don’t buy the books.

Sorry, the seagull started screaming again. Persistent little bugger.

Difficulty and tedium aside, few Internet legal-information providers have been interested in building distributed services. How come? It’s partly because we’re brainwashed by centralized models that are the legacy of many years’ reliance on Westlaw and Lexis. It’s partly because law people deliberately confuse that kind of branded centralization with authority — easy to do when those who grant “official status” use it as a form of barter, chiefly with those who operate large, centralized systems. It’s partly because, up until now, pulling everything together in one big heap has been easier than creating interoperability. And it’s mostly because we haven’t been paying attention.

More than a decade ago, the digital-library community began solving this problem, systematically and effectively. They were mostly dealing with another kind of heavily cross-referenced essay: not the judicial opinion, but the scientific pre-print. Many approaches were tried; some (like Dienst, which Brian Hughes and I built into a law-journal repository system a decade ago) were glorious failures. But ultimately these folks were successful because they realized several things:

You can unbundle services from repositories. This is what Google does. It doesn’t hold everything — it just indexes it. The same thinking applies to things like current-awareness services that need input from multiple sources. You can do that without holding everything yourself. Indeed, services like large-scale search will only work if you unbundle them from repositories. Early on, there were many attempts at federating search services that failed because the whole system was held to the performance of the weakest participant. As a practical matter, scaling past 100 sites just would not work no matter what.

Services can be made a lot better if they have metadata available to them, particularly metadata about where to find the documents the service addresses. This is the basis of Google Sitemaps and the other related site-mapping standards. As an idea, it goes back to at least 1992 and Archie, the system for discovering anonymous-FTP sites. An important side effect is that participation in such schemes give the repository operator greater exposure for her information; in a way it’s a form of marketing.

Issuing metadata in a standard format makes a lot of things easier — like developing harvesting tools, services, and anything that has to process the metadata. XML is a really good vehicle for this, because it can be validated and reliably processed. This makes new services much, much cheaper to build. And, if your metadata standard can be extended by well-understood technical means, so that communities can effectively customize it — well, you get a lot of leverage in the form of standardized toolsets and the like.

Most important, all of this can take place independently of administrative structures, institutional gaps, or any other incidental barrier. It doesn’t matter who the repository operators are, or where they are, or what sort of institution they’re affiliated with. No consortium or other administrative apparatus is needed. It is up to the service provider to decide what makes a useful aggregation. And that is a very scalable idea.

Let’s hope it’s salable as well as scalable, because of course it depends on network effects for most of its value. It’s working that way in the digital-libraries world, where OAI-PMH (the Open Archives Initiative Protocol for Metadata Harvesting) is now a basic standard used by hundreds if not thousands of sites. Creators of legal data can do likewise. The protocol is easy to implement, and at the LII we are making it even easier by building OAI implementations that can easily be bolted onto existing case-management systems and otherwise fed from existing repositories. If you’re interested, take a look at http://oai4courts.wikispaces.com, where we’re starting to put things together (including a reference implementation you can tour, and for which we will shortly release code that all can use).

Oh, and interoperability? Well, it turns out that it takes the form of a lot of really geeky and scary-looking XML. But it could just be the best thing to happen to the free exchange of legal information since the death of Law French.

Comments, please.

8B731795-A600-44F7-A744-9B7A501EDE5B.jpgInteroperability — it’s a big topic, and an important one in the evolving legal infosphere. I’ll try to make it a little more manageable by breaking it up into a series of posts that will appear over the next several weeks.

In general , the idea is that collections of online legal information should work together (and with the audience, and with other legal-information authors and providers) in a way that makes it easier to develop and use information services. Services that span collections offered by different providers, such as cross-site search and current-awareness services, are especially attractive. Interoperability is created at several levels of technical implementation, but the underlying ideas are simple: similar repositories should be transparent in the way they expose their information to the rest of the world, and if possible, they should do so in reliable, standardized ways that span a community of interest. Sometimes this involves standard-setting within the community; often, though, it’s just a series of small, sensible design decisions.

Here at the LII, our first stab at interoperability was the use of “head-compatible” URLs — the idea that a document’s address should be easily guessable by a would-be linker or author. For example, the Wild Horse Annie Act, 18 USC 47, becomes http://www.law.cornell.edu/uscode/18/47.html . Grutter v. Bollinger, 539 US 306, becomes http://www.law.cornell.edu/supct-cgi/get-us-cite?539+306 (though these days we’d probably change that “supct” to “scotus” just to be like everybody else). The idea is to make it easy for other people to link to your stuff, whether they are authoring manually or building something by automated means. Simple enough.

Interoperability is built on a series of seemingly-trivial decisions like that, decisions that favor common sense, transparency, and ease of use for those who want to build other things on top of information you’re providing. Good practice also involves a commitment to transparency and maintenance over time. We’ve changed the way we handle New York Court of Appeals decisions at least three times since 1996, when we first offered them. All our old systems of document addressing still work as well as they ever did (take a look at the two links to Wild Horse Annie in the last paragraph) . The same is true of all of our old “captive-search” URLs as well (for example, the one that will return Supreme Court decisions on employment discrimination). There, interoperability rests on a 50-line Apache server module that translates old search-engine search strings into whatever we are using now. It replaces hundreds of lines of mod_rewrite code piled up over five generations of LII search engines.

Interestingly, years of debate over “persistent document identifiers” — pURLs, DOIs, and all that stuff — have finally resulted in some recognition that the problem is not one of creating complicated, all-embracing alternative schemes for the naming of information resources. Rather, it’s a matter of consistent, transparent practices by people who run web sites. And there’s a lesson in that — no matter what technical schemes and formal standards are used to create interoperability, it ultimately rests on non-technical practices and concern for quality.

So interoperability — even the simplest kind — takes thinking and a fair bit of work. Even so, you’d think that the First Commandment of Legal Information Interoperability — “Thou shalt make thy URIs harmonious with well-known document identifiers like citations” — would be honored more often than it is. Good luck with that.

You can’t find opinions on the sites of the Circuit Courts of Appeal that way, for instance, nor can you link directly to sections of the Code of Federal Regulations in the new(ish) e-CFR collection at NARA. The reasons in each case are instructive. The Circuit Courts don’t do it at least in part because print citation information — unfortunately still the best-known (and usually the only) set of addresses for these documents — isn’t available at the time the decisions are put online. It’s not clear why they don’t return to the decision and add the print citations when they become available; probably it’s a matter of time and expense in courts that issue thousands of opinions every year. Vendor-neutral citation — which numbers according to the order in which decisions are issued rather than using page numbers from a bound volume — would do better, but very few courts use it in a way that is reflected in their URLs (the court system in Ohio is a notable exception). In NARA’s case, the reason given is that there is just too much churn in the numbering of sections to allow reliance on section-level URLs (we note that the GPO has no such misgivings, but then they are flighty types, and not archivists at all). You’ll see these same problems — accidental fallout from reliance on printed books, and a concern for stability at the expense of practicality — oddly spattered across the legal-information landscape. It’s a weird product of time and place. If reporters of decisions, and others who publish legal information, brought the same meticulous and conscientious pursuit of standards and interoperability to cyberspace that they have to print, we wouldn’t be talking about interoperability at all. We’d have it. But those folks are, for the most part, not yet as comfortable in the electronic world as they are in the world of print. I’ll say more about why that is in a minute.

But I’ve skipped over a couple of points on my way here, and they’re crucially important:

First, why would technical discussions about citation and URIs pass anybody’s who-cares test? Simply: document addresses always matter in retrieval systems, as does the granularity of the documents they address. Importantly, this is one place among many where legal-information systems have a very different character from other kinds of text databases, especially in common-law jurisdictions. Document addresses matter because it is crucially important that everyone look at the same authoritative information when resolving a dispute (that means “same” as in “same text”, not as in “same web site” — it is no more necessary to put all electronic legal text in a single repository than it is to put all the world’s books in a single building). So we need a common way of referring to things.

Second, why do we need any kind of interoperability between collections at all? Wouldn’t it be easier — and more authoritative — to just put everything in one place and give people access to it? Old-timers on the legal information scene will recognize this as the “why don’t we build a public-domain Westlaw?” question. That question has been asked many times, always in the interval before a newly-arrived group of legal-information enthusiasts discovers the dimensions of what it has taken on. It is a particularly thorny issue, and I’ve walked up to it before:

End users have been conditioned by training, experience, and careful marketing campaigns to value particular aspects of familiar systems like LEXIS and WESTLAW. Those systems are strongly branded, and have a deservedly high reputation among those who have been able to use them. It is not at all surprising (though it is at times dismaying) that experienced users find it difficult to recognize those same virtues when they are produced by new and unfamiliar implementations. At the core of this phenomenon is a bias induced by thirty years’ experience with older computer systems and older modes of industrial organization: centrality equals reliability. The Internet approach stands in sharp contrast as it argues the contrary: decentralization equals reliability, attainability, and scalability. On some profound but subliminal level this is news that shocks and bewilders. New, distributed models of computing that are reflected in distributed information systems and distributed models of business organization must seem inherently anarchic and therefore inherently suspect, no matter their virtues. That suspicion will subside in time, to a degree. But it will never vanish entirely until we become more discerning than we are about what was necessary about older ways of doing things and what was merely incidental.

I don’t want to re-fight that battle, particularly in something that is meant to be a quick run through the issues. But quickly: distributed systems provided by diverse actors allow for audience-oriented customization, encourage greater innovation, lower barriers for new entrants into the market, encourage greater care and authority (as when the publisher is also the law creator), conform better to our existing systems of organization for courts and legislatures, and can scale much larger over a more diverse collection of actors. In an environment as jurisdictionally and administratively complex as the United States, a distributed model may well be the only approach that works. Other, smaller jurisdictions provide counterexamples, notably Canada and Australia. Even there, there is no attempt for the systems to take in all the legal information that might be produced by agencies, municipal government, local commissions, and so on. There will always be a need for federation. That need becomes all the greater as multiple niche providers of legal information emerge to serve a diverse variety of audiences. We are on the brink of such a system, if we are not there already.

The price of such a federated, distributed system is the development of standards. In a system where there are only a few, supposedly comprehensive legal publishers, standards are whatever that publisher says they are. That is the system we had in the US until relatively recently, and it is one that commercial publishers fight very hard to maintain. Interestingly, it is also a system that has a deeply symbiotic relationship with the source publishers of legal information (reporters of decisions) I was talking about a moment ago.

41B043D0-D414-442D-BE1C-96CAADB18FEF.jpgThat profession, whose leading light once referred to his staff as “double revolving peripatetic nit-picker[s]”, is accustomed to looking for nits that might accidentally find their way onto the pages of the bound, authoritative volumes produced by an official publisher. This was not neglect on their part; it was what professional practice demanded at the time, and still demands today. But the world of published legal information is a much bigger, more digital place now, and they are adapting to it, but slowly. The websites of many courts and legislatures are not crafted with the same attention to detail as their printed output. A new understanding of best practices is needed — one that brings to the electronic realm the same meticulous concern with consistent metadata, citation, and organization that reporters of decisions have always brought to print. These are emerging, but slowly. And there are many costs associated with a lack of workable standards, though they may only be immediately apparent to those who try to build resources that span multiple court web sites without themselves housing the information.

Citation and, more generally, systems of legal document identifiers are just one example of legal metadata we should make transparent and interoperable. It is probably the most urgent case. Next time, I’ll move on to how broader interoperability might be done. Interestingly, many of the problems have already been solved.

eegah.jpgYou’ll remember that we were running a naming contest for this blog, with a winner to be announced on March 14th. It’s now the 18th, of course. We missed the deadline because some sorehead contestants demanded an independent audit and, unfortunately, our accounting firm was tied up on the Bear Stearns deal. Undaunted by fiscal meltdown, I am proud to announce that this space shall henceforth be known as “b-screeds”.

The lucky winner was Will Sadler, known to many of you as the pioneering creator of a legal web site at Indiana (among the first five or so on the web), former cruise-ship piano player, and hater of Jet-Skis. Will long since departed the legal information scene for the more hospitable arena of the insurance industry, but it’s nice to know he’s keeping an eye on us.