skip navigation
search

1. The Death and Life of Great Legal Data Standards

VOX.open.for.businessThanks to the many efforts of the open government movement in the past decade, the benefits of machine-readable legal data — legal data which can be processed and easily interpreted by computers — are now widely understood. In the world of government statutes and reports, machine-readability would significantly enhance public transparency, help to increase efficiencies in providing services to the public, and make it possible for innovators to develop third-party services that enhance civic life.

In the universe of private legal data — that of contracts, briefs, and memos — machine-readability would open up vast potential efficiencies within the law firm context, allow the development of novel approaches to processing the law, and would help to drive down the costs of providing legal services.

However, while the benefits are understood, by and large the vision of rendering the vast majority of legal documents into a machine-readable standard has not been realized. While projects do exist to acquire and release statutory language in a machine-readable format (and the government has backed similar initiatives), the vast body of contractual language and other private legal documents remains trapped in a closed universe of hard copies, PDFs, unstructured plaintext and Microsoft Word files.

Though this is a relatively technical point, it has broad policy implications for society at large. Perhaps the biggest upshot is that machine-readability promises to vastly improve access to the legal system, not only for those seeking legal services, but also for those seeking to provide legal services, as well.

It is not for lack of a standard specification that the status quo exists. Indeed, projects like LegalXML have developed specifications that describe a machine-readable markup for a vast range of different types of legal documents. As of writing, the project includes technical committees working on legislative documents, contracts, court filings, citations, and more.

However, by and large these efforts to develop machine-readable specifications for legal data have only solved part of the problem. Creating the standard is one thing, but actually driving adoption of a legal data standard is another (often more difficult) matter. There are a number of reasons why existing standards have failed to gain traction among the creators of legal data.

For one, the oft-cited aversion of lawyers to technology remains a relevant factor. Particularly in the case of the standardization of legal data, where the projected benefits exist in the future and the magnitude of benefit speculative at the present moment, persuading lawyers and legislatures to adopt a new standard remains a challenge, at best.VOX.confidential.stamp-pdf-file

Secondly, the financial incentives of some actors may actually be opposed towards rendering the universe of legal documents into a machine-readable standard. A universe of largely machine-readable legal documents would also be one in which it may be possible for third-parties to develop systems that automate and significantly streamline legal services. In the context of the ever-present billable hour, parties may resist the introduction of technological shifts that enable these efficiencies to emerge.

Third, the costs of converting existing legal data into a machine-readable standard may also pose a significant barrier to adoption. Marking up unstructured legal text can be highly costly depending on the intended machine usage of the document and the type of document in question. Persuading a legislature, firm, or other organization with a large existing repository of legal documents to take on large one-time costs to render the documents into a common standard also discourages adoption.

These three reinforcing forces erect a significant cultural and economic barrier against the integration of machine-readable standards into the production of legal text. To the extent that one believes in the benefits from standardization for the legal industry and society at large, the issue is — in fact — not how to define a standard, but how to establish one.

2. Rough Consensus, Running Standards

So, how might one go about promulgating a standard? Particularly in a world in which lawyers, the very actors that produce the bulk of legal data, are resistant to change, mere attempts to mobilize the legal community to action are destined to fail in bringing about the fundamental shift necessary to render most if not all legal documents in a common machine-readable format.

In such a context, implementing a standard in a way that removes humans from the loop entirely may, in fact, be more effective. To do so, one might design code that was capable of automatically rendering legal text into a machine-readable format. This code could then be implemented by applications of all kinds, which would output legal documents in a standard format by default. This would include the word processors used by lawyers, but also integration with platforms like LegalZoom or RocketLawyer that routinely generate large quantities of legal data. Such a solution would eliminate the need for lawyer involvement from the process of implementing a standard entirely: any text created would be automatically parsed and outputted in a machine readable format. Scripts might also be written to identify legal documents online and process them into a common format. As the body of documents rendered in a given format grew, it would be possible for others to write software leveraging the increased penetration of the standard.

There are — obviously — technical limitations in realizing this vision of a generalized legal data parser. For one, designing a truly comprehensive parser is a massively difficult computer science challenge. Legal documents come in a vast diversity of flavors, and no common textual conventions allow for the perfect accurate parsing of the semantic content of any given legal text. Quite simply, any parser will be an imperfect (perhaps highly imperfect) approximation of full machine-readability.

Despite the lack of a perfect solution, an open question exists as to whether or not an extremely rough parsing system, implemented at sufficient scale, would be enough to kickstart the creation of a true common standard for legal text. A popular solution, however imperfect, would encourage others to implement nuances to the code. It would also encourage the design of applications for documents rendered in the standard. Beginning from the roughest of parsers, a functional standard might become the platform for a much bigger change in the nature of legal documents. The key is to achieve the “minimal viable standard” that will begin the snowball rolling down the hill: the point at which the parser is rendering sufficient legal documents in a common format that additional value can be created by improving the parser and applying it to an ever broader scope of legal data.

But, what is the critical mass of documents one might need? How effective would the parser need to be in order to achieve the initial wave of adoption? Discovering this, and learning whether or not such a strategy would be effective, is at the heart of the Restatement project.

3. Introducing Project Restatement

Supported by a grant from the Knight Foundation Prototype Fund, Restatement is a simple, rough-and-ready system which automatically parses legal text into a basic machine-readable JSON format. It has also been released under the permissive terms of the MIT License, to encourage active experimentation and implementation.

The concept is to develop an easily-extensible system which parses through legal text and looks for some common features to render into a standard format. Our general design principle in developing the parser was to begin with only the most simple features common to nearly all legal documents. This includes the parsing of headers, section information, and “blanks” for inputs in legal documents like contracts. As a demonstration of the potential application of Restatement, we’re also designing a viewer that takes documents rendered in the Restatement format and displays them in a simple, beautiful, web-readable version.

Underneath the hood, Restatement is all built upon web technology. This was a deliberate choice, as Restatement aims to provide a usable alternative to document formats like PDF and Microsoft Word. We want to make it easy for developers to write software that displays and modifies legal documents in the browser.

In particular, Restatement is built entirely in JavaScript. The past few years have been exciting for the JavaScript community. We've seen an incredible flourishing of not only new projects built on JavaScript, but also new tools for building cool new things with JavaScript. It seemed clear to us that it's the platform to build on right now, so we wrote the Restatement parser and viewer in JavaScript, and made the Restatement format itself a type of JSON (JavaScript Object Notation) document.

For those who are more technically inclined, we also knew that Restatement needed a parser formalism, that is, a precise way to define how plain text can get transformed into Restatement format. We became interested in recent advance in parsing technology, called PEG (Parsing Expression Grammar).

PEG parsers are different from other types of parsers; they're unambiguous. That means that plain text passing through a PEG parser has only one possible valid parsed output. We became excited about using the deterministic property of PEG to mix parsing rules and code, and that's when we found peg.js.

With peg.js, we can generate a grammar that executes JavaScript code as it parses your document. This hybrid approach is super powerful. It allows us to have all of the advantages of using a parser formalism (like speed and unambiguity) while also allowing us to run custom JavaScript code on each bit of your document as it parses. That way we can use an external library, like the Sunlight Foundation's fantastic citation, from inside the parser.

Our next step is to prototype an "interactive parser," a tool for attorneys to define the structure of their documents and see how they parse. Behind the scenes, this interactive parser will generate peg.js programs and run them against plaintext without the user even being aware of how the underlying parser is written. We hope that this approach will provide users with the right balance of power and usability.

4. Moving Forwards

Restatement is going fully operational in June 2014. After launch, the two remaining challenges are to (a) continuing expanding the range of legal document features the parser will be able to successfully process, and (b) begin widely processing legal documents into the Restatement format.

For the first, we’re encouraging a community of legal technologists to play around with Restatement, break it as much as possible, and give us feedback. Running Restatement against a host of different legal documents and seeing where it fails will expose the areas that are necessary to bolster the parser to expand its potential applicability as far as possible.

For the second, Restatement will be rendering popular legal documents in the format, and partnering with platforms to integrate Restatement into the legal content they produce. We’re excited to say on launch Restatement will be releasing the standard form documents used by the startup accelerator Y Combinator, and Series Seed, an open source project around seed financing created by Fenwick & West.

It is worth adding that the Restatement team is always looking for collaborators. If what’s been described here interests you, please drop us a line! I’m available at tim@robotandhwang.org, and on Twitter @RobotandHwang.

 

JasonBoehmigJason Boehmig is a corporate attorney at Fenwick & West LLP, a law firm specializing in technology and life science matters. His practice focuses on startups and venture capital, with a particular emphasis on early stage issues. He is an active maintainer of the Series Seed Documents, an open source set of equity financing documents. Prior to attending law school, Jason worked for Lehman Brothers, Inc. as an analyst and then as an associate in their Fixed Income Division.

tim-hwangTim Hwang currently serves as the managing human partner at the offices of Robot, Robot & Hwang LLP. He is curator and chair for the Stanford Center on Legal Informatics FutureLaw 2014 Conference, and organized the New and Emerging Legal Infrastructures Conference (NELIC) at Berkeley Law in 2010. He is also the founder of the Awesome Foundation for the Arts and Sciences, a distributed, worldwide philanthropic organization founded to provide lightweight grants to projects that forward the interest of awesomeness in the universe. Previously, he has worked at the Berkman Center for Internet and Society at Harvard University, Creative Commons, Mozilla Foundation, and the Electronic Frontier Foundation. For his work, he has appeared in the New York Times, Forbes, Wired Magazine, the Washington Post, the Atlantic Monthly, Fast Company, and the Wall Street Journal, among others. He enjoys ice cream.

Paul_SawayaPaul Sawaya is a software developer currently working on Restatement, an open source toolkit to parse, manipulate, and publish legal documents on the web. He previously worked on identity at Mozilla, and studied computer science at Hampshire College.

VoxPopuLII is edited by Judith Pratt. Editors-in-Chief are Stephanie Davidson and Christine Kirchberger, to whom queries should be directed.

Within the field of legal informatics, discussions often focus on the technical and methodological questions of access to legal information. The topics can range from classification of legal documents to conceptual retrieval methods and Automatic Detection of Argumentation in Legal Cases. Researchers and businesses try to increase both precision and recall in order to improve search results for lawyers, while public administrations open up the process of legislating for the benefit of democracy and openness. Where are, however, the benefits for laypersons not familiar with retrieving legal information? Does clustering of legal documents, for example, yield a legal text any more understandable for a citizen?

To answer these questions, I would like to go back to the beginning, the purpose of law. Unfortunately for us lawyers, law is not created for us, but to serve as the oil that keeps society running smoothly. One can imagine two scenarios to apply the oil: If the motor has not been taken care of sufficiently, some extra greasy oil might be necessary to get it running again (i.e. if all amicable solutions are exhausted, some sort of dispute resolution is required), this would be the retroactive approach. The other possible application is to add enough oil during driving, so the engine will continue running smoothly without any additional boost, in other words trying to avoid disputes, this would be the proactive line of thinking.

How can proactive law work for the citizens? The basic assumption would be that in order to avoid disputes, one has to be aware of possible legal risks and how to prevent them. In line with the position of the European Union, we can further assume that the assessment and evaluation of risks requires relevant information about the legal facts at hand. It is only possible for a citizen to reach a decision regarding, for example, social benefits or certain rights as an employee, if she or he is aware of the various legal rights and obligations as well as possible legal outcomes.

Having stipulated that legal information is the core requirement for being able to exercise one’s rights as a citizen, the next questions would include which type of information is actually necessary, who should be responsible to communicating it and how it should be provided. These questions I would like to discuss below.  That is, we will talk about why, what, who and how.

Why?

ignorance

Before we move on to the main theme at hand on access to legal information, I would like to highlight a few more things about the why. As already mentioned, and as many legal philosophers have noted, law is the clockwork that makes society click. The principle Ignorantia juris neminem excusat (Ignorance of the law is no excuse) is commonly accepted as one of the foundations of modern civilization. But how would we define ignorance in today’s world? What if a citizen has troubles finding the necessary information despite endless efforts? What if she or he, after finding the relevant information, is not able to understand it? Does this mean she or he is still ignorant?

Public access to legal information is also a question of democracy, because citizens’ insight into politics, governmental work and the lawmaking process is a necessary prerequisite for public trust in the legislative body.

"In shifting from infrastructure to integration and then to transformation, a more holistic framework of connected governance is required. Such a framework recognizes the networking presence of e-government as both an internal driver of transformation within the public sector and an external driver of societal learning and collective adaptation for the jurisdiction as a whole." (UN e-Government Survey 2008)

In this spirit, governments should consider the management of knowledge an increasing importance. "The essence of knowledge management (KM) is to provide strategies to get the right knowledge to the right people at the right time and in the right format." (UN e-Government Survey 2008) What, then, is the right knowledge?

What?

The term legal information is as obvious as the word law. It is both apparent and imprecise, and yet we use it rather often. Several scholars have tried to define legal information and legal knowledge, inter alia, Peter Wahlgren in 1992, Erich Schweighofer in 1999, and Robert Richards in 2009.

books

If we consider the term from a layperson’s perspective, one could define it as the data, the facts and figures, that are necessary to solve an issue--one that cannot be handled amicably--between two persons (either legal or physical). In order for a layperson to be able to utilize legal information she or he has to be able to access, read, understand and apply the information.

The accessing element is one of the tasks that legal information institutes fulfill so elegantly. The term "reading" is here to be understood as information that can be grasped either with one’s eyes or ears. The complexity begins when it comes to understanding and applying the information. A layperson might have difficulties understanding and applying the Act on income tax even though the law is accessible and readable.

Is this information then still legal information if we assume that the word “information” means that somebody can receive certain signs and data and use this data meaningful in order to increase her or his knowledge? "Knowledge and information […] influence in a reciprocal way. Information modifies knowledge and knowledge guides potential use of information." (Schweighofer)

If a layperson does not understand the information provided by official sources, she or he might refer to other information sources, for example by utilizing a Google search. In this case, the question arises how reliable the retrieved information is, however comprehensible. A high ranking in Google search does not automatically relate to high quality of the information even though this might be a common misconception, especially for laypersons not trained in source criticism. Here the importance of providing citizens with some basic and comprehensible information becomes apparent.

This comprehensible information might include more than plain text-based legislation and court decisions. Of interest for the layperson (both in business-consumer as well as government-citizen situations) can furthermore be, inter alia,

  • additional requirements according to terms and conditions or specific procedural rules in public administrations
  • possible legal outcomes and necessary facts that lead to them
  • estimated time of delivery of the product or the decision
  • creditability of the business, including the amount of pending cases before the courts or complaints before the consumer protection authorities.

For a citizen it might also be very significant to know how she or he could behave differently in order to reach a desired result. Typically, citizens are only provided with the information as to how the legal situation is, but not what they could do to improve it, unless they contact a lawyer.

Commonly all these types of data already exist, if maybe not in one location. The most – technically – accessible information are traditional legal sources, such as legislation and case-law. Again, here the question mainly focuses on how to provide and utilize the existing information in a fashion understandable to the user.

"Like any other content transmitted through a communication system, primary legal sources can be rendered more or less understandable, locatable, and hence effective by structuring and presenting them differently for different audiences. And secondary sources must of course be constructed for a particular market, audience, or level of understanding. "(Tom Bruce)

Who should then be responsible for structuring, presenting and rendering it understandable, especially in the light of source criticism and trust?

Who?

Ignorantia juris neminem excusat presupposes that the legal information provided is correct and of high quality. Who can guarantee such a quality? The state, private entities, research facilities, non-governmental organizations or citizens? My answer would be that all could contribute their part of the game.

One should, however, keep in mind, that user-friendliness is not the same as trustworthiness, which leads to the question of how to ensure that citizens are supplied with the right answers? In a world where even governments do not always take responsibility for the correctness of the provided information, such as in the case of online publications for law gazettes, the question remains who, or what entity, should be held liable for the accuracy of its services. But even if a public authority would sustain accountability, to what extent could that influence an already reached legal decision?

The answer of who should provide a certain legal information service could also depend on who the target group of the information is.

"The legal information market is really no longer conceivable as bipolar – it can no longer be seen as a question of lawyers on the one hand versus a largely legally ignorant everyone else on the other. […] Internet-based legal information systems are used by many cases and conditions of people for many different reasons. […] Probably the most interesting group [are] non-lawyer professionals. These are people whose interest in law is vital, ongoing, and professional rather than either being casual and hobby-like or sporadic and trauma-driven. […] Such new and diverse audiences require new and diverse legal information architectures. They will want specialized collections of law of particular relevance to them. They will want those collections organized and presented in ways that reflect their profession or their situation, in ways that collections organized according to the legal abstractions and legal terms in use by lawyers do not. They are concerned with situations and fact-patterns rather than theories, doctrines, and concepts. They are, in short, a very intelligent and exciting type of lay users, and a potentially enormous audience. '(Tom Bruce)

Non-lawyer professionals probably constitute a large market for businesses that can tailor their services to a specific group and therefore render them profitable, as the services are considered of value for these professionals.

Traditional laypersons, however, typically do not represent a large market power simply because they will not always be willing to pay for services of this kind. This leaves them to the hands of other stakeholders such as public administrations, research institutes, non-governmental organizations and private initiatives. As already mentioned, conventionally the raw data is supplied by public administrations.  The question, then, is how to deliver it to the end-user.

How?

The Austrian civil code knows two concepts regarding fulfilling one’s part of the contract, Holschuld and Bringschuld. Holschuld means a debt to be collected from the debtor at his residence. Bringschuld constitutes an obligation to be performed at creditor’s habitual residence. In today’s terminology, one could compare Holschuld with pull technology and Bringschuld with push technology. In other words, should the citizens pick up the relevant legal information or should the government actively deliver it at people’s doorsteps, so to speak?

delivery

In the offline paper world, the only way to reach a citizen was to send a letter to her or his house. Obviously, information technology offers many more possibilities when it comes to communicating with citizens, either via a computer or even a mobile phone, taking privacy concerns into consideration.

Several e-government and initiatives (video feed from European Parliament sessions and EU’s channel at Youtube) increase the public participation and insight into politics. While these programs are an important contribution to democracy, they typically do not facilitate daily encounters with legal issues of employment, family, consumer, taxes or housing, or provide citizens with the necessary information to do so.

In this respect, technologies enabling interactivity and re-use of public information are of greater importance, the latter also being a strategic concern of the European Union.  In particular, semantic technology offers solutions for transforming raw data into comprehensible information for citizens. Here, practical examples that utilize at least part of this technology can also be found within e-government projects as well as in private initiatives.

The next step would be law being built into the code already. Intelligent agents negotiate the most advantageous terms and conditions for their owner, cars prevent being switched on if the driver exceeds the permitted alcohol level (Ignition interlock device) and music songs do not play unless your device is authorized (iTunes).

So, from a technological point of view, anything from presenting legal information on a website to implementing law directly into the end device is possible. In practice, though, most governments are content with providing textual legal information, at best in a structured format so it can be re-used easier. The technical implementation of more advanced functions is often left to other market players and businesses.

There are two initiatives in this respect that are worth mentioning, one being a true private project in Sweden and the other one being provided by the Austrian government.

Lagen.nu (law now) has been around for some time now as a private initiative offering free access to Swedish legislation and case law. Recently the site was extended by adding commentaries to specific statutes, which should enable laypersons to understand certain legislation. The site includes explanations for certain terminology and particular comments are also categorized and include links to other laws and cases.

The other example, HELP, a service provided by the Austrian Government Agency, structures and presents legal information depending on the factual situation, e.g. it contains categories such as employment, housing, education, finances, family and social services. The relevant legal requirements are then explained in plain text and the responsible authority is listed and linked to.  In some cases the necessary procedure can even be initiated through the web site.

Both projects are fine examples of the possible transformation of legal information from pull to push technology. They are not quite there yet, though.

The answer

The question we are faced with now is not so much how or which technique would be the best, but rather in which situation a citizen might need certain legal information. Somebody trying to purchase a book via a web site might need information at that moment, and either as a warning text or a check list or its intelligent agent, the purchaser might go to another web site that has better ratings and more favorable legal terms and conditions and no pending law suits. In some other cases, the citizen might need certain information in a specific situation right at the spot.  For example, while filling out a form she or he might want to know what would be most favorable choice, rather than simply the type of personal data required for the form. Depending on the situation, different approaches might be more valuable than others.

The larger issue at hand is where the information is retrieved and who is the provider of the information. In other words, trust is an important factor, particularly trust of the information provider. As previously stated, legal information is not usually provided by public bodies but instead is rerouted through various other entities, such as businesses, organizations and individual efforts. This increases the importance of source criticism even more.

In many cases citizens will use general portals such as Google or Wikipedia to search for information, rather than going directly to the source, most often because citizens are not aware of the services offered. This underlines the importance for legal information providers to co-operate with other communication channels in order to increase their visibility.

The necessary legal information is out there, it just remains to be seen if and how it reaches the citizens. Or to put it in other words: The prophet still has to come to the mountain, but in time, with the increasing use of technology, maybe the mountain will come a bit closer.

ChristineKirchberger

Christine Kirchberger has been a junior lecturer at the Swedish Law and Informatics Research Institute, Stockholm University) since 2001. Besides teaching law and IT she is currently writing her PhD thesis on Legal information as a tool where she focuses on legal information retrieval, the concept of legal information within the framework of the doctrine of legal sources and also takes a look at the information-seeking behavior of lawyers.

VoxPopuLII is edited by Judith Pratt.

The recent attention given to government information on the Internet, while laudable in itself, has been largely confined to the Executive Branch. While there is a technocratic appeal to cramming the entire federal bureaucracy into one vast spreadsheet with a wave of the president's Blackberry, one cannot help but feel that this recent push for transparency has ignored government's central function, to pass and enforce laws.

Advertisement on data.gov

Whether seen from the legislative or judicial point of view, law is a very prose-centric domain. This is a source of frustration to the mathematicians and computer scientists who hope analyze it. For example, while the United States Code presents a neat hierarchy at first glance, closer inspection reveals a sprawling narrative, full of quirks and inconsistencies. Even our Constitution, admired worldwide for its brevity and simplicity, has been tortured with centuries of hair-splitting over every word.

Nowhere is this more apparent than in judicial opinions. Unlike most government employees, who must adhere to rigid style manuals; or the general public, who interact with their government almost exclusively through forms; judges are free to write almost anything. They may quote Charles Dickens, or cite Shakespeare. A judicial opinion is one part newspaper report, one part rhetorical argument, and one part short story. Analyzing it mathematically is like trying to understand a painting by measuring how much of each color the artist used. Law students spend three years learning, principally, how to tease meaning out of form, fact out of fiction.

Why does a society, in which a President can be brought down by the definition of is, tolerate such ambiguity at the heart of its legal system? (And why, though we obsessively test our children, our athletes, and our attorneys, is our testing of judges such a farce?)

Engineers such as myself cannot tolerate ambiguity, so we feel a natural desire to bring order out of this chaos. The approach du jour may be top-down (taxonomy, classification) or bottom-up (tagging, clustering) but the impulse is the same: we want to tidy up the law. If code is law, as Larry Lessig famously declared, why not transform law into code?

Visualization of the structure of the U.S. Code

This transformation would certainly have advantages (beyond putting law firms out of business). Imagine the economic value of knowing, with mathematical certainty, exactly what the law is. If organizations could calculate legal risk as efficiently as they can now calculate financial risk (recession notwithstanding), millions of dollars in legal fees could be rerouted toward economic growth. All those bright liberal arts graduates who suffer through law school, only to land in dismal careers, could apply themselves to more useful and rewarding occupations.

And yet, despite years of effort, years in which the World Wide Web itself has submitted to computerized organization, the law remains stubbornly resistant to tidying. Why?

There are two answers, depending on what goal we have in mind. If the goal is truly to make tenets of law provable by mechanical (i.e., algorithmic) means, just as the tenets of mathematics are, we fail before we begin. Contrary to lay perception, law is not an exact science. It's not a science at all (says a lawyer). Computers can answer scientific questions ("What is the diameter of Neptune?") or bibliographic ones ("What articles has Tim Wu written?") but cannot make value judgments. Law is all about value judgments, about rights and wrongs. Like many students of artificial intelligence, I believe that I will live to see computers that can make these kinds of judgments, but I do not know if I will live to see a world in which we let them.

The second answer speaks to the goal of information management, and the forms in which law is conveyed. The indexing of the World Wide Web succeeded for two reasons, form and scale. Form, in the case of the Web, means hypertext and universal identifiers. Together, they create a network of relationships among documents, a network which, critically, can be navigated by a computer without human aid. This fact, when realized at the scale of billions of pages containing trillions of hyperlinks, allows a computer to derive useful patterns from a seemingly chaotic mass of information.

3-d visualization of hypertext documents in XanaduSpace™

Law suffers from inadequacies of both form and scale. For example, all federal case law, taken together, would comprise just a few million pages, only a fraction of which are currently available in free, electronic form. In spite of the ubiquity of technology in the nation's courts and legislatures, the dissemination of law itself, both statutory and common, remains a paper-centric, labor-intensive enterprise. The standard legal citation system is derived from the physical layout of text in bound volumes from a single publisher. Most courts now routinely publish their decisions on the Web, but almost exclusively in PDF form, essentially a photograph of a paper document, with all semantic information (such as paragraph breaks) lost. One almosts suspects a conspiracy to keep legal information out of the hands of any entity that lacks the vast human resources needed to reformat, catalog, and cross-index all this paper — in essence, to transform it into hypertext. It's not such a far-fetched notion; if law were universally available in hypertext form, Google could put Wexis out of business in a week.

Social network of federal judges based on their clerks

But the legal establishment need not be quite so clannish with regard to Silicon Valley. For every intellectual predicting law's imminant sublimation into the Great Global Computer, there are a hundred more keen to develop useful tools for legal professionals. The application is obvious; lawyers are drowning in information. Not only are dozens of court decisions published every day, but given the speed of modern communications, discovery for a single trial may turn up hundreds of thousands of documents. Computers are superb tools for organizing and visualizing information, and we have barely scratched the surface of what we can do in this area. Law is created as text, but who ever said we have to read it that way? Imagine, for example, animating a section of the U.S. Code to show how it changes over time, or "walking" through a 3-d map of legal doctrines as they split and merge.

Of course, all this is dependent on programmers and designers who have the time, energy, and financial support to create these tools. But it is equally dependent on the legal establishment -- courts, legislatures, and attorneys -- adopting information-management practices that enable this kind of analysis in the first place. Any such system has three essential parts:

  1. Machine-readable documents, e.g., hypertext
  2. Global identifiers, e.g., URIs
  3. Free and universal access

These requirements are not technically difficult to understand, nor arduous to implement. Even a child can do it, but the establishment's (well-meaning) attempts have failed both technically and commercially. In the mean time, clever engineers, who might tackle more interesting problems, are preoccupied with issues of access, identification, and proofreading. (I have participated in long, unfruitful discussions about reverse-engineering page numbers. Page numbers!) With the extremely limited legal corpora available in hypertext form — at present, only the U.S. Code, Supreme Court opinions, and a subset of Circuit Court opinions — we lack sufficient data for truly innovative research and applications.

This is really what we mean when we talk about "tidying" the law. We are not asking judges and lawyers to abandon their jobs to some vast, Orwellian legal calculator, but merely to work with engineers to make their profession more amenable to computerized assistance. Until that day of reconciliation, we will continue our efforts, however modest, to make the law more accessible and more comprehensible. Perhaps, along the way, we can make it just a bit tidier.

stuart.jpgStuart Sierra is the technical guy behind AltLaw.  He says of himself, " I live in New York City.  I have a degree in theatre from NYU/Tisch, and I’m a master’s student in computer science.  I work for the Program on Law & Technology at Columbia Law School, where I spend my day hacking on AltLaw, a free legal research site. I’m interested in the intersection of computers and human experience, particularly artificial intelligence, the web, and user interfaces."

VoxPopuLII is edited by Judith Pratt.

Recently I, like many law librarians (including Dean Richard Danner, James Donovan, and the panelists at the University of South Carolina School of Law's colloquium on "The Law Librarian’s Role in the Scholarly Enterprise" [scroll down & click on "Part 9: Roundtable"]), began to devote more thought to disintermediation in legal information services.  One way that law librarians can adapt to disintermediation is by learning more about the study of legal information systems, that is, legal informatics.  When I began looking closely at legal informatics scholarship last fall, I was dismayed at not being able to locate any single resource that aggregated all of the major scholarly information resources in the field.   As a result, I decided to build one; it’s called Legal Information Systems & Legal Informatics Resources. To provide current information, the site has an accompanying blog , the Legal Informatics Blog, and a Twitter feed.   Building these sites has allowed me to cast a novice’s eye on the field of legal informatics.

Eye

Here is what I’ve glimpsed in the past few months:

I. Surveying the Sources

My exploration of legal informatics has focused initially on information resources. A relatively circumscribed set of scholarly journals, other article sources, preprint services, indexing & abstracting services, blogs, and listservs regularly report research results in legal informatics. A small set of subject headings will retrieve most monographs and dissertations in the field. Accordingly, aggregating access to these resources has been relatively easy, and automating discovery and delivery of many of these sources seems feasible sooner rather than later.

Conferences are trickier.   The number of conferences at which legal informatics issues are addressed is substantial, for several reasons: a large number of researchers from industry as well as academia (see, e.g., the lists of individuals compiled by Dr. Adam Wyner and the organizers of the DEON deontic logic conferences, and this list of departments & institutes), energetically engaged in applied as well as theoretical research, are producing a sizeable output; many of those researchers work in multiple fields; and the pace of technological change is accelerating the research and communication processes.  Several Websites, such as those of the International Association for Artificial Intelligence and Law (IAAIL) and the DEON deontic logic conferences, monitor these meetings, however. Access to proceedings is available from several sources, including ACM’s Portal service, the other major information science indexing services, OCLC WorldCat, and the Legal Information Systems site. As a result, access to most legal informatics conference information and proceedings can be streamlined and hopefully largely automated before too long.

Projects have proven even trickier. Much legal informatics research takes the form of grant-funded projects, of which a great number, particularly in Europe, have been undertaken during the past decade. Political integration in Europe and democratization in many regions encouraged certain governments during the past two decades to fund applied research on legal information systems. Identifying and linking to all of these legal informatics projects seems important for enabling access to legal informatics scholarship. Such a process is quite labor intensive, however, because of the great number of such projects, the lack of a comprehensive list of them, and the many languages in which project documentation is written. A long-term goal of the Legal Information Systems site is to build a database of as many of these projects as can be identified, with links to project Websites, deliverables, and publications.

Since standards and protocols, such as those respecting descriptive metadata and knowledge representation, and data sets constitute additional key resources for legal informatics research, links to many of them have been collected on the Legal Information Systems site. Because many researchers in the field focus on a particular research topic or category of legal information, aggregations of resources on major topics in the field, such as e-rulemaking, evidence, and information behavior, to which the Legal Information Systems site has dedicated pages, and argumentation, to which Dr. Adam Wyner’s blog devotes several pages, may yield efficiencies for researchers. In addition, collections of resources on applied topics such as citation standards, computer-assisted legal research (CALR) services, court technology, the Free Access to Law movement (discussed here by Ginevra Peruginelli & Enrico Francesconi of ITTIG-CNR, with links to resources here), institutional repositories, instructional technology, law practice technology, and open access may be of use to researchers and practitioners alike.

II. Detecting a Communications Gap

From a preliminary scan of the field of legal informatics I’ve learned that legal informaticists and law librarians do not appear to be communicating to any significant extent. For example, law librarians seem to play little or no role at legal informatics conferences and are rarely published in legal informatics journals. (Sarah Rhodes & Dana Neacsu’s recent paper seems an exception.) This seems particularly odd, given that law libraries are developing some of today's most innovative digital legal information systems, such as the Chesapeake Project Legal Information Archive (a project of the Georgetown University Law Library, the Maryland State Law Library, the Virginia State Law Library, and the Legal Information Preservation Alliance), the Law Library of Congress’s Global Legal Information Network (GLIN), the Harvard Law School Library’s Digital Collections, the digital law libraries created by the Rutgers Camden and Rutgers Newark law libraries, and the USC Law Library’s English Medieval Legal Documents Wiki. Law library scholarship -- although it often addresses legal informatics topics such as legal citation (as in studies that reveal information resources utilized by courts), legal information behavior (as in the work of Dean Joan Howland & Nancy Lewis, Dr. Yolanda Jones, and Judith Lihosit ), and the functioning or design of legal information systems such as computer assisted legal research (CALR) services (as in recent studies by Julie Jones, John Doyle, and Dean Mason) -- rather infrequently refers to legal informatics scholarship. That is, two communities of experts respecting the same subject -- legal information systems -- seem for the most part to be talking past each other.

Communication failure

Yet information sharing between law librarians and legal informaticists would substantially benefit both groups.   Law librarians would gain valuable insights into the functioning of the legal information systems they use every day and the likely direction of the legal information industry, as may be gleaned from recent monographs collecting conference papers in the field as well as from the program of the 2009 International Conference on Artificial Intelligence and Law (ICAIL 2009).   Those works show that the primary topics of recent legal informatics scholarship include argumentation and deontic logic (as discussed, for example, in recent dissertations by Dr. Adam Wyner & Dr. Régis Riveret); agent/multi-agent systems; decision support systems; document modeling; several natural language processing issues including multi-language systems, text mining including automated classification and indexing, summarization, segmentation, and information retrieval, as, for example, discussed in proceedings of the TREC Legal Track, and notably in the context of electronic discovery; other applied research topics, particularly concerning e-rulemaking, online dispute resolution, negotiation systems, digital rights management, electronic commerce and contracts, and evidence; and the use of XML, ontologies, and the development of the Semantic Web respecting legal information.

By cooperating with law librarians, legal informaticists for their part would gain access to expert users of legal information systems, quality input respecting the contexts of legal information use (ranging from the information lifecycle to the information behavior of lawyers), and ideas for further research.

Here are some specific suggestions respecting how law librarians could make meaningful contributions to legal informatics research.   First, law librarians could continue to perform legal information behavior research, building on the important recent activity in this area. Second, law librarians who are developing innovative legal information systems could present papers on those systems at legal informatics conferences and write articles about those systems for legal informatics journals.

Third, as expert users of legal information systems and close observers of lawyers, judges, law students, and lay users of legal information, law librarians could generate legal informatics research questions based on their experience and observations. For example, law librarians could recommend research on such little-studied but important legal information systems as conflict of interest control systems and bankruptcy claims agents’ Websites, or on the application of information science and computer science concepts to legal information systems errors, such as those arising from faulty legal drafting practices and overly complex statutory and regulatory schemes.

Fourth, law librarians could provide legal informaticists with expert practitioner and policy perspectives on issues that law librarians have prioritized as a profession, such as authentication, digital preservation, metadata content and management, and user interface design.   Fifth, law librarians could furnish legal informatics researchers with input respecting system capabilities from the vantage of an “expert user,” as Dr. Stephann Makri recently did by including law librarians in his study of lawyers’ information behavior.

Sixth, law librarians engaged in developing innovative digital legal information systems could partner with legal informaticists to study those systems. Seventh, law librarians who are also lawyers could contribute their knowledge of substantive and procedural law to legal informatics research projects, particularly where not all of the legal informaticists involved have legal training.

Finally, law librarians could draw on their in-depth knowledge of legal information systems and users to partner with legal informaticists on the design of research studies.   In particular, those law librarians with training in social science research methods could encourage legal informaticists to employ those methods in their studies of legal information systems, which might benefit from increased use of multiple methodologies.

Handshake

III. Bright Prospects

Greater cooperation between legal informaticists and law librarians would benefit both communities.  The Legal Information Systems site will be developed with an eye toward demonstrating and fostering that cooperation.

[NOTE: This post was updated on 22 August 2011 to reflect new URLs.]

Robert Richards  edits Legal Information Systems & Legal Informatics Resources and its accompanying blog , the Legal Informatics Blog, and  Twitter feed.

VoxPopuLII is edited by Judith Pratt.