digital law » VoxPopuLII

Standardizing the World’s Legislative Information—One hackathon at a time

Annotation of legal texts, Cross-language legal information retrieval, digital law, Electronic government, elegislation, Legal metadata, Legal XML, Open Government Data, Semantic annotation of legal texts, Standards 8 Responses »

Sep 172012

As guest bloggers to this site, we have been asked to write about big ideas. We’ll get to those. But first, a note about hackathons.

Could legal hackathons be like this one day?

Hackathons used to be the exclusive domain of soda-and-coffee-guzzling, pizza-eating, all-night hacking, highly competitive computer programmers. The result of such a hackathon is often supposed to be a cool app (like the forerunner of Twitter) that is even cooler because it was built in the compressed schedule of the event. More recently, hackathons have been popping up in a variety of places, with some unexpected contexts and sponsors including the U.S. House of Representatives, NASA, Brooklyn Law School, New York City government, and others. These events serve as a way to prove (or build) the sponsor’s tech credentials and to cross-fertilize policy and technology expertise. There has been some handwringing and thoughtful commentary about the expansion of “civic” hackathons and what sustainable outcomes they produce.

As co-organizers, with Karen Suhaka, Greg Wilson, Charles Belle and others, of two legislative focused, hackathon-inspired events–the California Law Hackathon, and the International Legislation Un-hackathon–we can attest to their value in bringing engineers and lawyer and policy folks together. We can give some insights into the kinds of benefits these events have had in propelling efforts on legislative data standards, and some of the advances that have taken place in the development of these standards over the last year.

Big Idea: Legislative Data Standards

And now to the big idea: to represent all the world’s legislation in a standard structured data format. That’s actually two big ideas: (1) putting legislation into a structured data format, and (2) designing that format so that it is compatible with the wide variety of laws and legislative document types worldwide.

There are reasons for doing these things: First, introducing structured data to legislation can make it possible to search and analyze the law with greater precision and efficiency. And second, having a common standard can permit more comprehensive bill-tracking and comparison between jurisdictions.

California Bill with Metadata

It also can make it possible for legislatures with small (and shrinking) budgets to benefit from some of the same bill drafting software that is being developed for much larger jurisdictions. (Full disclosure: Xcential has developed such software for more than ten years, including the drafting platform used by the State of California.)

In the age of Google, these ideas may not seem so big; in fact, they are a subset of Google’s far-reaching mission. However, legislation is a corner of the world’s information that Google has not yet addressed in a systematic way. And as regular readers of this blog know, legislation presents its own hurdles, technical and bureaucratic (not necessarily in that order), that make this both an interesting and a challenging problem. One of the challenges is that the kind of people who generally work with data (we’ll call them engineers) and the kind of people who generally work with legislation (we’ll call them lawyers and policy folks) don’t often work on data and legislation together. One of us, a lawyer and policy type, has made this point graphically (and somewhat hyperbolically) in a Quora response to a question about whether version control software could be used for legislation. That question, and a subsequent discussion generated in response to a blogpost by software engineer Abe Voelker about version control for legislation, drew in many engineers and some lawyers and policy folks.

For software engineers who consider such things, it is very attractive to think about treating legal text as if it were software code; we could automatically highlight and cross-validate key terms, run test cases, automate redlining and version control, etc. It would be easy to see what the state of the law was at any particular point in time, and to trace the series of amendments that got us into the mess we’re in today. This desire is often expressed as “What if we had a Github for legislation?” On the other hand, people who work closely with legislation–researching it, drafting it or developing information systems to deal with it–tend to see the many places that the analogy between computer code and legal code break down. Legal texts have been shaped over hundreds of years by technologically conservative institutions, using print-based systems.

The full transformation of law to digital information is not going to happen overnight. While most law is already accessible in electronic format (often pdf), it is not encoded in a way that software engineers could start using their favorite text-munching tools. One of us, an engineer, has described this as the difference between computerization and automation. The move toward better digital tools for automating legislative drafting and research tasks will require more dialogue and working exchanges between engineers and the lawyers and policy folks.

That brings us back to hackathons.

What is a Legislative Hackathon?

Recognizing the need to bring lawyers and engineers together in order to implement our big idea(s), and appreciating the valuable bandwagon that hackathons have become, we decided to jump onboard. The first event we organized, the California Law Hackathon, was hosted just over a year ago, in September 2011, in Berkeley at the offices of Maplight, and in Denver by Karen Suhaka’s team at BillTrack50. The event focused on building web-based visualization tools to track the timeline of amendments to California legislation, and to link particular amendments, through their legislative sponsors, to particular donors or interest groups. We were joined remotely by a number of international participants, including John Sheridan, head of e-services for legislation.gov.uk, and a fellow guest contributor to this blog. As one participant noted, we learned a great deal at the event, including the limits placed on us by the existing data. Neither the legislative record, nor the donations databases are detailed enough to trace influence in politics in the way we hoped. This helped spark an interest in a more in-depth exploration of legislative data formats, and in particular how more and better metadata could be added to legislation.

That led to the International Legislation Un-hackathon, held simultaneously at UC Hastings, Stanford and Denver, with participants from the University of Bologna (Ravenna campus) and around the world. So assuming you can get engineers together with lawyers and policy folks, what do you do with them? We decided that we’d need a user-friendly tool that could be used to explore and add metadata to legislation from around the world. This could highlight a developing legislative XML standard, Akoma Ntoso (more about this standard soon), and give hands-on experience to lawyer and policy types kinds of text and analysis tools that engineers take for granted.

Hacking With A Legislative Editor

So one of us (the engineer, naturally) started building a web-based editor for legislation, while the other (the lawyer, naturally) started organizing the next hackathon. Of course, thought the lawyer, it would just

Legislative Editor at legalhacks.org

be a matter of time before all governments worldwide use such editors to draft their laws and regulations in a standard data format.

Advances in Legislative Data Standards Efforts

Akoma Ntoso

Akoma Ntoso (AkN) is a strong contender to be that format. Developed under the auspices of the UN Department of Economic and Social Affairs, AkN is an XML data structure that is meant to capture high-level forms and semantic ideas that are common to a broad variety of legal texts. OASIS, the folks who brought us the DocBook standards, among others, have convened a standards committee to create an official legislative data standard based on AkN. (More disclosure: the engineer is a member of this committee.) There’s just one problem. Few governments are using AkN to draft or store their legislation.

AkN itself is fast evolving, and with more exposure to legal data structures from different jurisdictions, the OASIS committee will be able to adapt AkN to better model those structures.

We saw the International Legislative Un-Hackathon as a venue to kick off this process. It was conceived with Charles Belle of UC Hastings, as part of the Legal Hacks initiative. The event was held simultaneously at UC Hastings, Stanford, in Denver. Jim Harper and Francis Avila of the Cato Institute came to the Hastings Event. We also had many international participants. Key among them were Professors Monica Palmirani and Fabio Vitali of the University of Bologna, the architects and primary evangelists of AkN. Over the course of the day, participants learned about AkN and, importantly, got a chance to try it out, marking up documents of their choosing with the web editor. In the process, as expected, we found bugs in the software and bugs in the standard. We found structures in U.S. legislation that didn’t fit well with the existing AkN element set. We saw places where there was confusion in applying AkN’s data structures to documents. All of this information was collected to incorporate in the development of both the editor and AkN, underscoring again the importance of getting more practical exposure for both.

University of Bologna Summer School–Ravenna

And we are working to expand the venues for this kind of practical exposure to develop the AkN standard. Every September, the University of Bologna hosts the LEX Summer School in Ravenna, Italy. For them, it’s an opportunity to introduce Akoma Ntoso to new groups of students from around Europe and around the world. For the students, it’s an opportunity to learn about the application of XML to legislation, see the success various groups are having around the world, and to meet interesting new people having a passion for legal informatics. One of us, the engineer, who was a student two years ago, was invited to return last year to present a success story, and this year is returning once more to deliver a class in how to build and use the HTML5-based editor for drafting legislation in XML. For us, this is an opportunity to expose the editor to the European legal traditions in order for us to better understand how our editor must evolve to fulfill our vision of a unified standard around the world with common, highly adaptable, tools.

Chile National Library of Congress Browser-based editor

Another step toward adoption of legislative data standards is a project by Chile’s National Library of Congress (BCN in Spanish) called the “History of the Law” (Historia de la Ley). This ambitious project aims to bring together machine learning, a legislative editor and other features to mark up Chile’s legislative record and other legislative documents. The BCN has chosen Xcential’s browser-based editor, working with the AkN standard, to conduct the mark-up and correction after documents are passed through an automated parser. As with the hackathon, but on a larger scale, we are learning from experience the modifications that are needed to AkN, to make it work with Chile’s live documents. Excitingly, each mismatch we find between AkN and actual legislation can be fed back into the OASIS committee process, to make AkN able to handle a wider variety of real-world use cases.

Other Efforts and the Future of Legislative Data Standards

We see these steps as just the beginning. European governments are also flirting with legislative standards, and Karen Suhaka’s group at BillTrack50 has converted all U.S. bills from all states into a single standard XML format showing that the technical hurdles can be overcome, and many of the practical benefits of doing so. In focusing on the projects (and hackathons) we are most closely involved with, we have certainly left out a lot of the initiatives that are advancing legislative data standards around the world. That’s what the comments are for. Let us know your experience with Akoma Ntosa as a legislative standard, and what you’re doing or interested in doing with AkN or other legislative data standards worldwide.

Grant Vergottini (the engineer) is a founder of Xcential. He is a leading authority on applications of XML data to legislation. Prior to founding Xcential, Grant was the Director of Applications at Chrystal Software, a company dedicated to XML design and reporting software. Before Chrystal, Grant led the redesign of Homestore.com, and founded Genedax Design Automation, which developed innovative team and data management applications for electronics design. Bringing data structures and automation tools to the legislative drafting process parallels the work that Grant did earlier in his career at Mentor Graphics and the Boeing Company, where he participated in the transformation from manual drafting to CAD software. Mr. Vergottini holds a Bachelor of Science in Electrical Engineering from Cleveland State University, where he graduated Summa Cum Laude.

Ari Hershowitz (the lawyer) is a consultant at Xcential, and founder of Tabulaw. Tabulaw develops software for lawyers, including a web-based legal research and writing platform. Prior to Tabulaw, Ari worked to protect wildlife and habitats from Chile to Mexico as Director of the BioGems project for Latin America at the Natural Resources Defense Council. Ari has a law degree from Georgetown University Law Center, a Masters in Computation and Neural Systems from Caltech, and a Bachelors in Molecular Biophysics & Biochemistry from Yale College.

VoxPopuLII is edited by Judith Pratt. Editors-in-Chief are Stephanie Davidson and Christine Kirchberger, to whom queries should be directed.

[Editor’s Note: For topic-related VoxPopuLII posts please see: Núria Casellas, Semantic Enhancement of legal information … Are we up for the challenge?; João Lima, et.al, LexML Brazil Project; and Rinke Hoekstra, The MetaLex Document Server

Protecting Access One Entry at a Time: An Update on the National Inventory of Legal Materials

Access to justice, authentication, digital law, free access to law, Law.gov, Legal citations, Public access to legal information 5 Responses »

Feb 012012

In the fall of 2009, the American Association of Law Libraries (AALL) put out a call for volunteers to participate in our new state working groups to support one of AALL’s top policy priorities: promoting the need for authentication and preservation of digital legal resources. It is AALL policy that the public have no-fee, permanent public access to authentic online legal information. In addition, AALL believes that government information, including the text of all primary legal materials, must be in the public domain and available without restriction.

The response to our call was overwhelming, with volunteers from all 50 states and the District of Columbia expressing interest in participating. To promote our public policy priorities, the initial goals of AALL’s working groups were to:

Take action to oppose any plan in their state to eliminate an official print legal resource in favor of online-only, unless the electronic version is digitally authenticated and will be preserved for permanent public access;
Oppose plans to charge fees to access legal information electronically; and
Ensure that any legal resources in a state’s raw-data portal include a disclaimer so that users know that the information is not an official or authentic resource (similar to what is included on the Code of Federal Regulations XML on Data.gov).

In late 2009, AALL’s then-Director of Government Relations Mary Alice Baish met twice with Law Librarian of Congress Roberta Shaffer and Carl Malamud of Public.Resource.org to discuss Law.gov and Malamud’s idea for a national inventory of legal materials. The inventory would include legal materials from all three branches of government. Mary Alice volunteered our working groups to lead the ambitious effort to contribute to the groundbreaking national inventory. AALL would use this data to update AALL’s 2003 “State-by-State Report on Permanent Public Access to Electronic Government Information” and the 2007 “State-by-State Report on Authentication of Online Legal Resources” and 2009-2010 updates, which revealed that a significant number of state online legal resources are considered to be “official” but that few are authenticating. It would also help the Law Library of Congress, which owns the Law.gov domain name, with their own ambitious projects.

Erika Wayne and Paul Lomio at Stanford University’s Robert Crown Law Library developed a prototype for the national inventory that included nearly 30 questions related to scope, copyright, cost to access, and other use restrictions. They worked with the California State Working Group and the Northern California Association of Law Libraries to populate the inventory with impressive speed, adding most titles in about two months.

AALL’s Government Relations Office staff then expanded the California prototype to include questions related to digital authentication, preservation, and permanent public access. Our volunteers used the following definition of “authentication” provided by the Government Printing Office:

An authentic text is one whose content has been verified by a government entity to be complete and unaltered when compared to the version approved or published by the content originator.

Typically, an authentic text will bear a certificate or mark that conveys information as to its certification, the process associated with ensuring that the text is complete and unaltered when compared with that of the content originator.

An authentic text is able to be authenticated, which means that the particular text in question can be validated, ensuring that it is what it claims to be.

The “Principles and Core Values Concerning Public Information on Government Websites,” drafted by AALL’s Access to Electronic Legal Information Committee (now the Digital Access to Legal Information Committee) and adopted by the Executive Board in 2007, define AALL’s commitment to equitable, no-fee, permanent public access to authentic online legal information. The principle related to preservation states that:

Information on government Web sites must be preserved by the entity, such as a state library, an archives division, or other agency, within the issuing government that is charged with preservation of government information.

Government entities must ensure continued access to all their legal information.

Archives of government information must be comprehensive, including all supplements.

Snapshots of the complete underlying database content of dynamic Web sites should be taken regularly and archived in order to have a permanent record of all additions, changes, and deletions to the underlying data.

Governments must plan effective methods and procedures to migrate information to newer technologies.

In addition, AALL’s 2003 “State-By-State Report on Permanent Public Access to Electronic Government Information” defines permanent public access as, “the process by which applicable government information is preserved for current, continuous and future public access.”

Our volunteers used Google Docs to add to the inventory print and electronic legal titles at the state, county, and municipal levels and answer a series of questions about each title. AALL’s Government Relations Office set up a Google Group for volunteers to discuss issues and questions. Several of our state coordinators developed materials to help other working groups, such as Six Easy Steps to Populating Your State’s Inventory by Maine State Working Group coordinator Christine Hepler, How to Put on a Successful Work Day for Your Working Group by Florida State Working Group co-coordinators Jenny Wondracek and Jamie Keller, and Tips for AALL State Working Groups with contributions from many coordinators.

In October 2010, AALL held a very successful webinar on how to populate the inventories. More than 200 AALL and chapter members participated in the webinar, which included Kentucky State Working Group coordinator Emily Janoski-Haehlen, Maryland State Working Group coordinator Joan Bellistri, and Indiana State Working Group coordinator Sarah Glassmeyer as speakers. By early 2011, more than 350 volunteers were contributing to the state inventories.

Initial Findings

Our dedicated volunteers added more than 7,000 titles to the inventory in time for AALL’s June 30, 2011 deadline. AALL recognized our hard-working volunteers at our annual Advocacy Training during AALL’s Annual Meeting in Philadelphia, and celebrated their significant accomplishments. Timothy L. Coggins, 2010-11 Chair of the Digital Access to Legal Information Committee, presented these preliminary findings:

Authentication: No state reported new resources that have been authenticated since the 2009-2010 Digital Access to Legal Information Committee survey
Official status: Several states have designated at least one legal resource as official, including Arizona, Florida, and Maine
Copyright assertions in digital version: Twenty-five states assert copyright on at least one legal resource, including Oklahoma, Pennsylvania, and Rhode Island
Costs to access official version: Ten states charge fees to access the official version, including Kansas, Vermont, and Wyoming
Preservation and Permanent Public Access: Eighteen states require preservation and permanent public access of at least one legal resource, including Tennessee, Virginia, and Washington

Analyzing and Using the Data

In July 2011, AALL’s Digital Access to Legal Information Committee formed a subcommittee that is charged with reviewing the national inventory data collected by the state working groups. The subcommittee includes Elaine Apostola (Maine State Law and Legislative Reference Library), A. Hays Butler (Rutgers University Law School Library), Sarah Gotschall (University of Arizona Rogers College of Law Library), and Anita Postyn (Richmond Supreme Court Library). Subcommittee members have been reviewing the raw data as entered by the working group volunteers in their state inventories. They will soon focus their attention on developing a report that will also act as an updated version of AALL’s State-by-State Report on Authentication of Online Legal Resources.

The report, to be issued later this year, will once again support what law librarians have known for years: there are widespread issues with access to legal resources and there is an imminent need to prevent a trend of eliminating print resources in favor of electronic resources without the proper safeguards in place. It will also include information on: the official status of legal resources; whether states are providing for authentication, permanent public access, and/or preservation of online legal resources; any use restrictions or copyright claims by the state; and whether a universal (public domain) citation format has been adopted by any courts in the state.

In addition to providing valuable information to the Law Library of Congress and related Law.gov projects, this information has already been helpful to various groups as they proceed to advocate for no-fee, permanent public access to government information. The data has already been useful to advocates of the Uniform Electronic Legal Material Act and will continue to be valuable to those seeking introduction and enactment in their states. The inventory has been used as a starting point for organizations that are beginning digitization projects of their state legal materials. The universal citation data will be used to track the progress of courts recognizing the value of citing official online legal materials through adopting a public domain citation system. Many state working group coordinators have also shared data with their judiciaries and legislatures to help expose the need for taking steps to protect our state legal materials.

The Next Steps: Federal Inventory

In December 2010, we launched the second phase of this project, the Federal Inventory. The Federal Inventory will include:

Legal research materials
Information authored or created by agencies
Resources that are publicly accessible

Our goals are the same as with the state inventories: to identify and answer questions about print and electronic legal materials from all three branches of government. Volunteers from Federal agencies and the courts are already adding information such as decisions, reports and digests (Executive); court opinions, court rules, and Supreme Court briefs (Judicial); and bills and resolutions, the Constitution, and Statutes at Large (Legislative). Emily Carr, Senior Legal Research Specialist at the Law Library of Congress, and Judy Gaskell, retired Librarian of the Supreme Court, are coordinating this project.

Thanks to the contributions of an army of AALL and chapter volunteers, the national inventory of legal materials is nearly complete. Keep an eye on AALL’s website for more information as our volunteers complete the Federal Inventory, analyze the data, and promote the findings to Federal, state and local officials.

Tina S. Ching is the Electronic Services Librarian at Seattle University School of Law. She is the 2011-12 Chair of the AALL Digital Access to Legal Information Committee.

Emily Feltren is Director of Government Relations for the American Association of Law Libraries.

[Editor’s Note: For topic-related VoxPopuLII posts please see: Barbara Bintliff, The Uniform Electronic Legal Material Act Is Ready for Legislative Action; Jason Eiseman, Time to Turn the Page on Print Legal Information; John Joergensen, Authentication of Digital Repositories.]

VoxPopuLII is edited by Judith Pratt. Editors-in-Chief are Stephanie Davidson and Christine Kirchberger, to whom queries should be directed. The information above should not be considered legal advice. If you require legal representation, please consult a lawyer.

Time to Turn the Page on Print Legal Information

authentication, digital law, Legal citation, Legal citations, Public access to legal information 15 Responses »

Sep 152010

Question: Is there a good reason why judges should not be blogging their opinions?

Follow my thinking here.

I, like many librarians, love books. By that I mean I love physical books. I love the feel of paper in my hand. I love the smell of books. When I attended library school, there was no doubt in my mind that I would work in a place surrounded by shelf after shelf of beautiful books. I was confident that I would be able to transfer that love of books to a new generation.

That’s not how things turned out. Without recounting exactly how I got here, I should say that I am a technology librarian, and have been since even before I graduated library school. Technology is where I found my calling, and where libraries seem to need the most help. As I delve deeper into the world of library technology, particularly in the academic setting, I am increasingly forced to confront an uncomfortable reality: Print formats are inferior to electronic. And in some of my darker moments, I may even go so far as to echo the comments of Jeff Jarvis in his book “What Would Google Do” when he writes: “print sucks.”

On page 71, talking about the burden of physical “stuff,” Jarvis writes:

“It’s expensive to produce content for print, expensive to manufacture, and expensive to deliver. Print limits your space and your ability to give readers all they want. It restricts your timing and the ability to keep readers up-to-the-minute. Print is already stale when it’s fresh. It is one-size-fits-all and can’t be adapted to the needs of each customer. It comes with no ability to click for more. It can’t be searched or forwarded. It has no archive. It kills trees. It uses energy. And you really should recycle it, though that’s just a pain. Print sucks. Stuff sucks.”

In this paragraph, Jarvis may as well have been talking about the current state of online legal information. Although we may not have figured out the magic bullets of authenticity and preservation, the fact remains that print is a burden. In many cases, it is a burden to our governments, and our libraries.

There are good reasons to proceed cautiously towards online legal information. However, the most significant barriers to accepting new modes of publishing official legal information online, like judges’ blogging opinions, may be cultural and political. In the end, law librarians and other legal professionals can’t allow our own nostalgia and habit to stand in the way of changes that can, should, and must happen.

AALL Working Groups

As many readers may know, the American Association of Law Libraries (AALL) began forming state working groups earlier this year. The purpose of those working groups was to “help AALL ensure access to electronic legal information in your state.” This is certainly a worthwhile goal, and one I obviously support. But the PDF document online, calling for formation of these working groups, sends a mixed message.

The very first duty of each working group is to “take action to oppose any plan in your state to eliminate an official print legal resource in favor of online-only unless the electronic version is digitally authenticated and will be preserved for permanent public access, or to charge fees to access legal information electronically. This is an increasingly common problem as states respond to severe budget cuts.”

Perhaps it’s just the phrasing of the document that bothered me. Rather than even providing guidance to states planning to eliminate print legal resources, AALL has set as its default position the opposition to any such plan.

In fairness, I note that the document hints that online-only legal resources might be acceptable if states don’t charge for them, or if such resources meet the rather complex standards laid out in the Association of Reporters of Judicial Decisions’ Statement of Principles.

The Association of Reporters of Judicial Decisions (ARJD) published Statement of Principles: “Official” On-Line Documents in February 2007, revised in May 2008. Most tellingly, in Principle 3 of the Statement they write: “Print publication, because of its reliability, is the preferred medium for government documents at present.”

Later in the document we find out why print is so reliable. Talking about electronic versions, the ARJD says they should not be considered official unless they are “permanent in that they are impervious to corruption by natural disaster, technological obsolescence, and similar factors and their digitized form can be readily translated into each successive electronic medium used to publish them.”

Without question, electronic material must be able to survive a natural disaster. The practice of storing information on a single server or keeping all backups in the same facility could be problematic. But emerging trends and best practices could help safeguard against these problems. In addition, programs like LOCKSS (Lots of Copies Keep Stuff Safe) can help alleviate some of these concerns by making sure many copies of each digital item exist at multiple geographic locations.

Also, digital format obsolescence has largely been overstated. PDF documents are not going anywhere anytime soon. Even conservative estimates establish PDF as a reliable format for the foreseeable future.

HTML may be no different. Consider that the very first Web document, Links and Anchors, is almost valid HTML5. Nearly 20 years later, that document is compatible with modern Web browsers.

On the other side of the equation, is print impervious to natural disaster, or even technological obsolescence? Of course not. At Yale, with our rare books library and large historical collection, I have witnessed first hand the damage time can do to a physical book. Even more importantly, books in the last hundred years have been published so cheaply they may fall apart even sooner than books published centuries ago.

Print and Electronic Costs

The reality is that moving to online-only legal information is a good thing for everyone involved in producing and consuming such information. The burden of print is not limited to the costs forced upon states that produce it; that burden is also borne by libraries and citizens who consume it.

As mentioned above respecting the AALL working group document, many states are already looking at going online-only to cut costs, and why shouldn’t they? With current budget situations across the country being what they are, printing costs being particularly high, and electronic publishing costs being so low, of course states are looking at saving money by ending needless printing.

But libraries would also benefit from the cost savings of governments’ moving to electronic formats. Not only do libraries currently have to subsidize printing costs by paying for the “official” print copies of legal materials; libraries also have to pay for the shelf space, as well as manpower to process incoming material and place it on the shelf, and may also have to pay additional costs for preserving the physical material. Not to mention the fact that we may pay for additional services that furnish access to the exact same material in an electronic format.

The costs involved in dealing with print legal resources are well known to most librarians. So why aren’t we clamoring for governments to publish online-only legal information?

Officialness, Authenticity, Preservation, and Citeability

Of course there are genuine concerns about online-only legal information. The big sticking points seem to be (in no particular order) officialness, authenticity, preservation, and citeability. Each issue is worthy of, and has been the subject of, much discussion.

Officiality may be in some ways the easiest and most difficult hurdle for online-only legal information to leap. To make an online version of legal material official, an appropriate authoritative body need only declare that version “official.” The task seems simple enough.

The more difficult part may be political. With organizations like AALL and ARJD currently opposing online-only options, that action may be politically difficult. Persuading lawyers, judges, and legislatures to approve such a declaration could be even more difficult. Can you imagine a bill, regulation, or some other action making a blog the “official” outlet for a particular court’s opinions?

The question of authenticity is more difficult to deal with from a technological perspective, although there has been interesting work done with respect to PDFs, electronic signatures, and public and private keys. The Government Printing Office (GPO) has done a great job leading the way in the area of authenticity: http://www.gpoaccess.gov/authentication/. The new Legislation.gov.uk site unveiled recently has taken a different approach from the GPO’s. As John Sheridan has written in an earlier post, at the moment The U.K. National Archives are not taking any steps towards authenticating the information on the Legislation.gov.uk site, but they recognize the need to address the issue at some point. John Joergensen at Rutgers-Camden has taken yet another approach. And Claire Germain, in a recent paper about authentication practices respecting international legal information (pdf), states that those practices vary throughout the world. Thus the prickly question of authenticating online legal information is an issue that’s not going away any time soon.

AALL and ARJD have made a big deal about preservation of online legal information, an issue that’s important for librarians, too. Unfortunately, this is another area where no good answer exists to guide us. As Sarah Rhodes wrote earlier this year, “our current digital preservation strategies and systems are imperfect – and they most likely will never be perfected.”

The Library of Congress National Digital Information Infrastructure & Preservation Program (NDIIPP) has some helpful resources. The Legal Information Preservation Alliance (LIPA) also provides some good guidance in this area. However, many librarians are still reluctant to accept that digital preservation practices may enable us to end our reliance on print.

A similar reluctance can be seen in resistance to the Durham Statement, which — though directed at law reviews — also says something about other kinds of online legal information. Most notably, Margaret Leary of the University of Michigan chose not to sign the Durham Statement, and discussed her decision to continue to rely on print at a recent AALL program. In a listserv posting quoted in Richard Danner’s recent paper, Ms. Leary asserted: “I do not agree with the call to stop publishing in print, nor do I think we have now or will have in the foreseeable future the requisite ‘stable, open, digital formats’.” Similarly, Richard Leiter explains that he signed the Durham Statement with an asterisk because of the statement’s call for an end to the printing of law reviews.

What constitutes ‘stable, open, digital formats’ for the purposes of satisfying some librarians is unclear. As I mentioned earlier, a number of digital formats currently fit this description. This makes me think that there’s something else going on here, a resistance to abandoning print for other reasons.

Citeability also becomes an issue as print legal information disappears. If there is no print reporter volume in which an opinion is issued, then how would one cite to an opinion (setting aside for a moment Lexis and Westlaw citations)?

However, efforts towards implementing “medium-neutral legal citation formats” have already been made. According to Ivan Mokanov’s recent VoxPopuLII post, most citations in Canada are of a neutral format. In the United States, LegisLink.org has made an effort to improve online citations, as Joe Carmel describes in his recent post. Work on URN:LEX and other standards has resulted in some progress towards dealing with the citeability issue. Organizations like the AALL Electronic Legal Information Access & Citation Committee also deserve credit for taking this on. [Editor’s Note: Those organizations have produced universal citation standards — such as the AALL Universal Citation Guide — which have been adopted by a number of U.S. jurisdictions.] Even The Bluebook supports alternative citation formats. For example, rule 10.3.3, “Public Domain Format,” specifies how to cite to a public domain or “medium-neutral format.” The Bluebook even goes so far as to allow citation in a jurisdiction’s specified format.

But despite all this work, nothing has yet stuck.

The Next Step

One thing you’ll notice respecting all of these issues is that they are currently unsettled. While AALL and ARJD have both suggested that they would look favorably on online-only legal information if it were official, authenticated, and preserved (they do not mention citeability), there is no indication of when we will reach a level of achievement on these issues that would be satisfactory to these organizations. Can governments, libraries, and citizens afford to wait?

Asking states to continue to bear the burden of publishing material in print as they run out of funding, and libraries to bear the expense of preserving that print, is irresponsible. While we might not have all of the answers now, we certainly have enough to move forward in an intelligent manner.

The National Conference of Commissioners on Uniform State Laws (NCCUSL) has been working on an Authentication and Preservation of State Electronic Legal Materials Act. [Editor’s Note: The Chair of the Act’s Drafting Committee is Michele L. Timmons, the Revisor of Statutes for the State of Minnesota, and its Reporter is Professor Barbara Bintliff of the University of Texas School of Law.] According to the Study Committee’s Report and Recommendations for the Act’s Drafting Committee, the goal of the draft should be to “describ[e] minimum standards for the authentication and preservation of online state legal materials.” This seems like an appropriate place to start.

Rather than setting unrealistic or vague expectations, the minimum standards provided by the draft act seem to allow some flexibility for how states could address some of these issues. As opposed to working towards a “stable and open digital format,” which seems more a moving target than an attainable goal, the draft act sets forth an outline for how states can get started with publishing official and authentic online-only legal information. While far from finished, the draft act appears to be a step in the right direction.

What Is the Real Issue?

I think the real sticking point on this matter is mental or emotional. It comes from an uneasiness about how to deal with new methods of publishing legal information. For hundreds of years, legal information has been based in print. Even information available on the Lexis and Westlaw online services has its roots in print, if not full print versions of the same material. It’s as if the lack of a print or print-like version will cause librarians to lose the compass that helps us navigate the complex legal information landscape.

Of course, publishing legal information electronically brings its own challenges and costs for libraries. Electronic memory and space are not free, and setting up the IT infrastructure to consume, make available, and preserve digital materials can be costly. But in the long run, dealing with electronic material can and will be much easier and less costly for all involved, as well as giving greater access to legal information to the citizens who need it.

So Judges Blogging?

Question: Is there a good reason why judges should not be blogging their opinions?

Although he was the co-chair of the ARJD committee that produced the Statement of Principles, even Frank Wagner, the outgoing U.S. Supreme Court reporter of decisions, acknowledges that “budgetary constraints may eventually force most governmental units to abandon the printed word in favor of publishing their official materials exclusively online.” He also recognizes that the GPO’s work in this area may put an end to the printed U.S. Reports sooner than other “official publications.”

So were an appropriate authority to make them official, and some form of authentication were decided on, and methods of preservation and citation had been taken into account, would you feel comfortable with judges’ blogging their opinions?

We have to get over our unease with new formats for publishing online legal information. We have to stop handcuffing governments and libraries by placing unrealistic and unattainable expectations on them for publishing online legal information. We have to prepare ourselves for a world where online is the only outlet for official legal information.

I still enjoy taking a book off the shelf and reading. I enjoy flipping through and browsing the pages. But nostalgia and habit are not valid strategies for libraries of the future.

Jason Eiseman is the Librarian for Emerging Technologies at Yale Law School. He has experience in academic and law firm libraries working with intranets, websites, and technology training.

VoxPopuLII is edited by Judith Pratt. Editor in chief is Robert Richards.

Confessions of a Legal Info-holic

digital law, Digital law libraries, digital libraries, india, information retrieval, liis, open source software 2 Responses »

Feb 012010

In an extraordinary story, Jorge Luis Borges writes of a “Total Library”, organized into ‘hexagons’ that supposedly contained all books:

When it was proclaimed that the Library contained all books, the first impression was one of extravagant happiness. All men felt themselves to be the masters of an intact and secret treasure. . . . At that time a great deal was said about the Vindications: books of apology and prophecy which . . . [contained] prodigious arcana for [the] future. Thousands of the greedy abandoned their sweet native hexagons and rushed up the stairways, urged on by the vain intention of finding their Vindication. These pilgrims disputed in the narrow corridors . . . strangled each other on the divine stairways . . . . Others went mad. . . . The Vindications exist . . . but the searchers did not remember that the possibility of a man’s finding his Vindication, or some treacherous variation thereof, can be computed as zero. As was natural, this inordinate hope was followed by an excessive depression. The certitude that some shelf in some hexagon held precious books and that these precious books were inaccessible, seemed almost intolerable.

About three years ago I spent almost an entire sleepless month coding OpenJudis – my rather cool, “first-of-its-kind” free online database of Indian Supreme Court cases. The database hosts the full texts of about 25,000 cases decided since 1950. In this post I embark on a somewhat personal reflection on the process of creating OpenJudis – what I learnt about access to law (in India), and about “legal informatics,” along with some meditations on future pathways.

Having, by now, attended my share of FLOSS events, I know it is the invariable tendency of anyone who’s written two lines of free code to consider themselves qualified to pronounce on lofty themes – the nature of freedom and liberty, the commodity, scarcity, etc. With OpenJudis, likewise, I feel like I’ve acquired the necessary license to inflict my theory of the world on hapless readers – such as those at VoxPopuLII!

I begin this post by describing the circumstances under which I began coding OpenJudis. This is followed by some of my reflections on how “legal informatics” relates to and could relate to law.

Online Access to Law in India
India is privileged to have quite a robust ICT architecture. Internet access is relatively inexpensive, and the ubiquity of “cyber cafes” has resulted in extensive Internet penetration, even in the absence of individual subscriptions.

Government bodies at all levels are statutorily obliged to publish, on the Internet, vital information regarding their structure and functioning. The National Informatics Centre (NIC), a public sector corporation, is responsible for hosting, maintaining and updating the websites of government bodies across the country. These include, inter alia, the websites of the Union (federal) Government, the various state governments, union and state ministries, constitutional bodies such as the Election Commission and the Planning Commission, and regulatory bodies such as the Securities Exchange Board of India (SEBI). These websites typically host a wealth of useful information including, illustratively, the full texts of applicable legislations, subordinate legislations, administrative rulings, reports, census data, application forms etc.

The NIC has also been commissioned by the judiciary to develop websites for courts at various levels and publish decisions online. As a result, beginning in around the year 2000, the Supreme Court and various high courts have been publishing their decisions on their websites. The full texts of all Supreme Court decisions rendered since 1950 have been made available, which is an invaluable free resource for the public. Most High Court websites however, have not yet made archival material available online, so at present, access remains limited to decisions from the year 2000 onwards. More recently the NIC has begun setting up websites for subordinate courts, although this process is still at a very embryonic stage.

Apart from free government websites, a handful of commercial enterprises have been providing online access to legal materials. Among them, two deserve special mention. SCCOnline – a product of one of the leading law report publishers in India – provides access to the full texts of decisions of the Indian Supreme Court. The CD version of SCCOnline sells for about INR 70,000 (about US$1,500), which is around the same price the company charges for a full set of print volumes of its reporter. For an additional charge, the company offers updates to the database. The other major commercial venture in the field is Manupatra, which offers access to the full text of decisions of various courts and tribunals as well as the texts of legislation. Access is provided for a basic charge of about US$100, plus a charge of about US$1 per document downloaded. While seemingly modest by international standards, these charges are unaffordable by large sections of the legal profession and the lay public.

OpenJudis
In December 2006, I began coding OpenJudis. My reasons were purely selfish. While the full texts of the decisions of the Supreme Court were already available online for free, the search engine on the government website was unreliable and inadequate for (my) advanced research needs. The formatting of the text of cases themselves was untidy, and it was cumbersome to extract passages from them. Frequently, the website appeared overloaded with users, and alternate free sources were unavailable. I couldn’t afford any of the commercial databases. My own private dissatisfaction with the quality of service, coupled with (in retrospect) my completely naive optimism, led me to attempt OpenJudis. A third crucial factor on the input side was time, and a “room of my own,” which I could afford only because of a generous fellowship I had from the Open Society Institute.

I began rashly, by serially downloading the full texts of the 25,000 decisions on the Supreme Court website. Once that was done (it took about a week), I really had no notion of how to proceed. I remember being quite exhilarated by the sheer fact of being in possession of twenty five thousand Supreme Court decisions. I don’t think I can articulate the feeling very well. (I have some hope, however, that readers of this blog and my fellow LII-ers will intuitively understand this feeling.) Here I was, an average Joe poking around on the Internet, and just-like-that I now had an archive of 25,000 key documents of our republic, cumulatively representing the articulations of some of the finest (and some not-so-fine) legal minds of the previous half-century, sitting on my laptop. And I could do anything with them.

The word “archive,” incidentally, as Derrida informs us, derives from the Greek arkheion, the residence of the superior magistrates, the archons – those who commanded. The archons both “held and signified political power,” and were considered to possess the right to both “make and represent the law.” “Entrusted to such archons, these documents in effect speak the law: they recall the law and call on or impose the law”. Surely, or I am much mistaken, a very significant transformation has occurred when ordinary citizens become capable of housing archives – when citizens can assume the role of archons at will.

Giddy with power, I had an immediate impulse to find a way to transmit this feeling, to make it portable, to dissipate it – an impulse that will forever mystify economists wedded to “rational” incentive-based models of human behavior. I wasn’t a computer engineer, I didn’t have the foggiest idea how I’d go about it, but I was somehow going to host my own online free database of Indian Supreme Court cases. The audacity of this optimism bears out one of Yochai Benkler‘s insights about the changes wrought by the new “networked information economy” we inhabit. According to Benkler,

The belief that it is possible to make something valuable happen in the world, and the practice of actually acting on that belief, represent a qualitative improvement in the condition of individual freedom [because of NIE]. They mark the emergence of new practices of self-directed agency as a lived experience, going beyond mere formal permissibility and theoretical possibility.

Without my intending it, the archive itself suggested my next task. I had to clean up the text and extract metadata. This process occupied me for the longest time during the development of OpenJudis. I was very new to programming and had only just discovered the joys of Regular Expressions. More than my inexperience with programming techniques, however, it was the utter heterogeneity of reporting styles that took me a while to accustom myself to. Both opinion-writing and reporting styles had changed dramatically in the course of the fifty years my database covered, and this made it difficult to find patterns when extracting, say, the names of judges involved. Eventually, I had cleaned up the texts of the decisions and extracted an impressive (I thought) set of metadata, including the names of parties, the names of the judges, and the date the case was decided. To compensate for the absence of headnotes, I extracted names of statutes cited in the cases as a rough indicator of what their case might relate to. I did all this programming in PHP with the data housed in a MySQL database.

And then I encountered my first major roadblock that threatened to jeopardize the whole operation: I ran my first full-text Boolean search on the MySQL database and the results took a staggering 20 minutes to display. I was devastated! More elaborate searches took longer. Clearly, this was not a model I could host online. Or do anything useful with. Nobody in their right mind would want to wait 20 minutes for the results of their search. I had to look for a quicker database, or, as I eventually discovered, a super fast, lightweight indexing search engine. After a number of failed attempts with numerous free search engine software programs, none of which offered either the desired speed or the search capability I wanted, I was getting quite desperate. Fortunately, I discovered Swish-e, a lightweight, Perl-based Boolean search engine which was extremely fast and, most importantly, free – exactly what I needed. The final stage of creating the interface, uploading the database, and activating the search engine happened very quickly, and sometime in the early hours of December 22nd, 2006, OpenJudis went live. I sent announcement emails out to several e-groups and waited for the millions to show up at my doorstep.

They never did. After a week, I had maybe a hundred users. In a month, a few hundred. I received some very complimentary emails, which was nice, but it didn’t compensate for the failure of “millions” to show up. Over the next year, I added some improvements:
1) First, I built an automatic update feature that would periodically check the Supreme Court website for new cases and update the database on its own.
2) In October 2007, I coded a standalone MS Windows application of the database that could be installed on any system running Windows XP. This made sense in a country where PC penetration is higher than Internet penetration. The Windows application became quite popular and I received numerous requests for CDs from different corners of the country.
3) Around the same time, I also coded a similar application for decisions of the Central Information Commission – the apex statutory tribunal for adjudicating disputes under the Right to Information Act.
4) In February 2008, both applications were included in the DVD of Digit Magazine – a popular IT magazine in India.

Unfortunately, in August 2008, the Supreme Court website changed its design so that decisions could no longer be downloaded serially in the manner I had been accustomed to. One can only speculate about what prompted this change – since no improvements were made to the actual presentation of the cases. The only thing that changed was that one could no longer download cases serially as I’d been doing. The new format was far more difficult for me to “hack” and I abandoned the attempt. My work left me with no time to attempt to circumvent the new format.

Fortunately at the same time, an exciting new project called IndianKanoon was started by Sushant Sinha, an Indian computer science graduate at Michigan. In addition to decisions of the Supreme Court, his site covers several high courts and links up to the text of legislation of various kinds. Although I have not abandoned plans to develop OpenJudis, the presence of IndianKanoon has allowed me to step back entirely from this domain – secure in the knowledge that it is being taken forward by abler hands than mine.

Predictions, Observations, Conclusions
I’d like to end this already-too-long post with some reflections, randomly ordered, about legal information online.
1) I think one crucial area commonly neglected by most LIIs is client-side software that enables users to store local copies of entire databases. The urgency of this need is highlighted in the following hypothetical about digital libraries by Siva Vaidhyanathan (from The Anarchist in the Library):

So imagine this: An electronic journal is streamed into a library. A library never has it on its shelf, never owns a paper copy, can’t archive it for posterity. Its patrons can access the material and maybe print it, maybe not. But if the subscription runs out, if the library loses funding and has to cancel that subscription, or if the company itself goes out of business, all the material is gone. The library has no trace of what it bought: no record, no archive. It’s lost entirely.

It may be true that the Internet will be around for some time, but it might be worthwhile for LIIs to stop emulating the commercial database models of restricting control while enabling access. Only then can we begin to take seriously the task of empowering users into archons.

2) My second observation pertains to interface and usability. I have for long been planning to incorporate a set of features including tagging, highlighting, annotating, and bookmarking that I myself would most like to use. Additionally, I have been musing about using Web 2.0 to enable user-participation in maintenance and value-add operations – allowing users to proofread the text of judgments and to compose headnotes. At its most ambitious, in these “visions” of mine, OpenJudis looks like a combination of LII + social networking + Wikipedia.

A common objection to this model is that it would upset the authority of legal texts. In his brilliant essay A Brief History of the Internet from the 15th to the 18th century, the philosopher Lawrence Liang reminds us that the authority of knowledge that we today ascribe to printed text was contested for the longest period in modern history.

Far from ensuring fixity or authority, this early history of Printing was marked by uncertainty, and the constant refrain for a long time was that you could not rely on the book; a French scholar Adrien Baillet warned in 1685 that “the multitude of books which grows every day” would cast Europe into “a state as barbarous as that of the centuries that followed the fall of the Roman Empire.”

Europe’s non-descent into barbarism offers us a degree of comfort in dealing with Adrien Baillet-type arguments made in the context of legal information. The stability that we ascribe to law reports today is a relatively recent historical innovation that began in the mid-19th century. “Modern” law has longer roots than that.

3) While OpenJudis may look like quite a mammoth endeavor for one person, I was at all times intensely aware that this was by no means a solitary undertaking, and that I was “standing on the shoulders of giants.” They included the nameless thousands at the NIC who continue to design websites, scan and upload cases on the court websites – a Sisyphian task – and the thousands whose labor collectively produced the free software I used : Fedora Core 4, PHP, MySQL, Swish-E. And lastly, the nameless millions who toil to make the physical infrastructure of the Internet itself possible. Like the ground beneath our feet, we take it for granted, even as the tragic recent events in Haiti in recent weeks remind us to be more attentive. (For a truly Herculean endeavor, however, see Sushant Sinha’s IndianKanoon website, about which many ballads may be composed in the decades to come.)

It might be worthwhile for the custodians of LIIs to enable users to become derivative producers themselves, to engage in “practices of self-directed agency” as Benkler suggests. Without sounding immodest, I think the real story of OpenJudis is how the Internet makes it plausible and thinkable for average Joes like me (and better-than-average people like Sushant Sinha) to think of waging unilateral wars against publishing empires.

4) So, what is the impact that all this ubiquitous, instant, free electronic access to legal information is likely to have on the world of law? In a series of lectures titled “Archive Fever,” the philosopher Derrida posed a similar question in a somewhat different context: What would the discipline of psychoanalysis have looked like, he asked, if Sigmund Freud and his contemporaries had had access to computers, televisions, and email? In brief, his answer was that the discipline of psychoanalysis itself would not have been the same – it would have been transformed “from the bottom up” and its very events would have been altered. This is because, in Derrida’s view:

The archive . . . in general is not only the place for stocking and for conserving an archivable content of the past. . . . No, the technical structure of the archiving archive also determines the structure of the archivable content even in its coming into existence and in its relationship to the future. The archivization produces as much as it records the event.

The implication, following Derrida, is that in the past, law would not have been what it currently is if electronic archives had been possible. And the obverse is true as well: in the future, because of the Internet, “rule of law” will no longer observe the logic of the stable trajectories suggested by its classical “analog” commentators. New trajectories will have to be charted.

5) In the same book, Derrida describes a condition he calls “Archive fever”:

It is to burn with a passion. It is never to rest, interminably, from searching for the archive right where it slips away. It is to run after the archive even if there’s too much of it. It is to have a compulsive, repetitive and nostalgic desire for the archive, an irrepressible desire to return to the origin, a homesickness, a nostalgia for the return to the most archaic place of absolute commencement.

I don’t know about other readers of VoxPopulII (if indeed you’ve managed to continue reading this far!), but for the longest time during and after OpenJudis, I suffered distinctively from this malady. I downloaded indiscriminately whole sets of data that still sit unused on my computer, not having made it into OpenJudis. For those in a similar predicament, I offer Borges’s quote with which I began this text, as a reminder of the foolishness of the notion of “Total Libraries.”

Prashant Iyengar is a lawyer affiliated with the Alternative Law Forum, Bangalore, India. He is currently pursuing his graduate studies at Columbia University in New York. He runs OpenJudis, a free database of Indian Supreme Court cases.

VoxPopuLII is edited by Judith Pratt. Editor in Chief is Rob Richards.

Duopolies, web usability, and legal research instruction

digital law, information retrieval, Law librarians, legal research 8 Responses »

Nov 192009

Kangaroo Boxing It’s been a rocky year for West’s relationship with law librarians.

First, the company declined to participate in this year’s American Association of Law Libraries Price Index for Legal Publications. This led AALL to return West’s sponsorship check for the 2009 AALL Annual Meeting. For attendees, this decision was somewhat academic, as West still occupied a large space in the Exhibitor Hall and once again hosted a well-attended Customer Appreciation Party.

Shortly after the conference, West issued an email promotion to customers that asked:

Are you on a first name basis with the librarian? If so, chances are, you’re spending too much time at the library. What you need is fast, reliable research you can access right in your office.

Many law librarians felt publicly insulted by West, expressing their outrage on listservs, blogs, Twitter, Facebook and anywhere legal information professionals could be found that week.

Most recently, West released a video of University of California, Berkeley professor and law librarian Bob Berring explaining the advantages of “free market” premium legal databases over free legal information websites run by “volunteers:”

It’s not like legal information is going to the Safeway or to buy food. You’re not buying a packaged thing. If you say I need to find statutes about this, or what’s the administrative regulations on that, or have the courts spoken about this, you have to go find it. And just saying it’s all out there — I mean, the ocean is all out there, but you need a map, and you need a compass, and… you need a GPS system now. You need someone to tell you how to get there. That’s why librarians are even more important now, because they’ve got the GPS system. But you have to be working with organized information. The value added by folks like West, where the information is edited as it goes in, and it’s classified, and the hooks are put in — easy hooks for the people who I think are sloppy researchers just playing around on the tops, really sophisticated hooks for the people who take the time to learn how to really use the system and understand it. You just can’t say enough about those kind of things, because to say to the average person, “Well, it’s all out there, the law is all out there,” well, it’s a big bunch of goo.

Adding value to the goo

Unfortunately, the West/Lexis duopoly doesn’t provide consumers with the expected advantages of a free market economy. Neither vendor uses price as a marketing strategy, and both negotiate electronic database contracts with customers rather than charge a flat rate. Considering that West has increased its own annual profit margin to 30% or higher in recent years, while raising the cost of supplements at a rate far exceeding inflation, prices are hardly being driven by free market trends, making a price war seem unlikely. (This doesn’t mean consumers aren’t hopping mad about the price of legal information. They are.)

Instead, at least in the database market, both companies rely on content and features to market their products. Each July at the AALL Annual Meeting, both Lexis and Westlaw use their exhibitor space to educate attendees about whatever new databases and customer conveniences will be rolled out in the coming months.

Thomas Edison and car I often compare these annual feature introductions to the evolution of automobile engines, thanks to a childhood spent watching my father work on the family cars. At first Dad knew every nook and cranny of our vehicles, and there was little he couldn’t repair himself over the course of a few nights. As we traded in cars for newer models, his job became more difficult as engines became more complex. None of the automakers seemed to consider ease of access when adding new parts to an automobile engine. They were simply slapped on top of the existing ones, making it harder to perform simple tasks, like replacing belts or spark plugs.

Lexis and Westlaw also add new components on top of the old ones. To generalize, Lexis tends to add new features in the form of tabs (think “Total Litigator”) while Westlaw adds them in sidebars (think “Results Plus”), to the point where once clean interfaces are now littered with disparate elements sharing adjacent screen real estate.

Finding fault with filters

In a talk at last year’s Web 2.0 Expo in New York, author Clay Shirky stated that the fundamental information problem is not “information overload,” but “filter failure.” Shirky summarized this position in a recent interview with Yale Law School’s Jason Eiseman:

As I’ve often said, there’s no such thing as information overload. It’s filter failure, right? From the minute we had more books to read than the average literate person could read in a lifetime, which depending on the region you’re talking about happened someplace between the 16th and 19th century, from that moment on we’ve always had information overload. That’s the modern condition. What’s happening, I think, to our sense that we’re suffering acutely from information overload now is that the old professional filters have broken. They’re simply not adequate to contain a world in which anyone can put material out in the public.

Whether or not you agree with Shirky’s assessment, it provides an interesting framework with which to view the Lexis/Westlaw information problem. If the primary legal information within these systems are “a big bunch of goo,” then secondary resources, headnotes, subject-specific organization, and other finding aids are the filters necessary to cope with information overload.

For West’s “Are you on a first name basis with the librarian?” promotion to work, Westlaw has to provide the “fast, reliable research you can access right in your office” that it advertises. Assuming for purposes of this essay that the presence of relevant content isn’t an issue (an assumption with which many will quibble), this means the system’s filters need to provide reliable information quickly.

There’s no question that both West and Lexis provide an abundance of subject-specific organization, particularly for case law. Headnotes, topics, digests, tables of authority, citators and cross-references to secondary resources all go above and beyond what researchers find in most freely available resources. But these add-ons, or filters, are only effective if presented in a usable manner.

For an assignment in one of my legal research classes this semester, I provided a fact pattern and asked students to perform a Natural Language search in Westlaw of American Law Reports to find a relevant annotation. In a class of only 19 students, six of them answered with citations to resources other than ALR, including articles from American Jurisprudence, Am.Jur. Proof of Facts, and Shepards’ Causes of Action. The problem, it turned out, wasn’t that they had searched the wrong database. Every one of them searched ALR correctly, but those six students mistook Westlaw’s Results Plus, placed at the top of a sidebar on the results page, for their actual search results. Filter failure, indeed.

On another assignment, students were expected to find a particular statutory code section using a secondary resource, view the code section, then navigate to the code’s table of contents to browse related sections codified nearby. This proved nearly impossible for most of them, as the code section they accessed loaded in a pop-up window with no sidebar, thus providing no visible link to the table of contents. The problems didn’t stop there. Even once I told them to click the “Maximize” button at the bottom of the pop-up window, which reloads the code section into the main window with a sidebar, upon clicking the TOC link, anyone using Firefox for Windows loaded a blank page. (To resolve this error, you have to right-click on the frame where the TOC should’ve loaded and select “This Frame -> Reload This Frame.”)

While completing another portion of the statutory code assignment in Lexis, nearly half the students in the class became confused because numerous clickable links throughout the system display as plain black text which only appear as links when the user hovers over them. Also, within statutory code sections, the navigation links provided within the case annotation index routinely loaded an error page rather than navigating to the proper section further down the page.

This doesn’t even address basic usability issues such as broken back button functionality, heavy usage of frames, lack of permanent document URLs (Lexis and Westlaw each have external workarounds for this), and reliance on pop-up windows (something blocked by default on most browsers). In addition, Lexis still doesn’t support users accessing the system with Firefox for Mac.

The wide availability of secondary resources, annotated codes, and numerous other value-added content provides a clear advantage for Lexis and Westlaw over free and mid-level legal information services, and that’s why everyone continues to pay their steep prices. But so long as the systems themselves don’t provide usable access, each still suffers from filter failure.

Is there an incentive to improve?

VAB Under Construction There is evidence that the companies have the expertise to provide a better user experience. West has two electronic versions (one for desktop computers and one for the iPhone) of Black’s Law Dictionary available that offer more intuitive functionality than what’s provided for the same text in Westlaw. Don’t expect a price break, however. The desktop version of Black’s has a list price of $99, while the iPhone version costs $49.99. By comparison, the print version of Black’s Standard Ninth Edition, which likely has substantially higher production costs than the electronic equivalents, carries a list price of $75, meaning iPhone users receive a slightly lower price while desktop users pay even more. Worse still, both electronic versions as well as the content in Westlaw contain the text of the outdated 8th Edition.

Lexis also has an iPhone app, and it’s a free download that requires an existing Lexis password to function. Substantially simplified from its traditional web interface, the user experience is clean and easy to understand. Yet while one can retrieve both primary and secondary documents, as well as Shepardize documents, none of the documents in this interface contain links, only plain citations that must be copied and pasted into the search form to be retrieved.

Of course, the bigger problem with these progressive moves is that they don’t address any of the existing problems in the web interfaces for either product. No one is redesigning the engine, so to speak. These are simply variations of the now traditional roll-out of new features and functionality on top of existing ones that still have the same significant issues.

This is the problem with a duopoly. There aren’t enough producers in the economy to assert significant pressure on either to improve usability. Consumer power is also limited because multi-year contracts prevent easy product substitution, and there’s only one true product substitute available. The producers dictate the competition, and thus far they have dictated a content competition (“The Tabs and Sidebars War”), rather than a usability one — or even a price one.

There are events on the horizon that could impact this stalemate. Bloomberg continues to develop its own legal research product, allegedly designed to be a Westlaw/Lexis competitor. Perhaps this third producer will see value in using price or usability to gain market share. Lewis & Clark law student (and VoxPopuLII author) Robb Shecter recently introduced OregonLaws.org, a free repository of Oregon law that currently features the entire Oregon Revised Statutes and a legal glossary. The site’s simple, logical navigation reflects current web usability norms more accurately than either Lexis or Westlaw, and for a “micro-fee” users can bookmark code sections for quick access and save unlimited “human readable” research trails. And, of course, Google Scholar just added “Legal opinions and journals.” It’s far too early to know if it will become a true player in legal information, but Google always has the potential to be a game changer with anything it does.

What can legal research instructors DO?

Despite the presence of these interesting new projects, consumers can’t expect a quick usability turnaround from Lexis and Westlaw, nor the sudden presence of a competitor with the same depth and breadth of content. History doesn’t support such an expectation, leaving legal research instructors in a precarious position.

Many schools leave Lexis/Westlaw training solely in the hands of the companies’ representatives. While a company rep will be knowledgeable about the system, he will also paint the product in the best possible light for the company, glossing over usability issues and emphasizing new features. After all, law students are future customers, so this instruction is part of a long-term sales pitch.

In order to provide a balanced picture of these systems, legal research instructors need to provide their own Lexis and Westlaw training. This can either be in place of or in addition to what’s provided by company reps, but students need to hear the voice of an experienced researcher who doesn’t rely on either company for a paycheck. Some may see this as an implied institutional endorsement of the high-priced systems, but the reality is most students will end up working with one or both of these systems on a daily basis after graduation. Ignoring this would be an educational disservice. Any sense of endorsement can be addressed through thorough coverage of the usability limitations and a short education on the price realities. Instructors can also discuss the availability of lower priced databases for lawyers who simply want access to primary legal materials.

If the market is going to change, it won’t be because Lexis and Westlaw spontaneously decide to improve products that generate significant profits already. Until then, legal researchers need to be better educated on the limitations of these systems so that their work product isn’t compromised by over-reliance on a duopoly disguised as a free market.

Tom Boone is a reference librarian and adjunct professor at Loyola Law School in Los Angeles. He’s also webmaster and a contributing editor for Henderson Valley Eggs, a “themed information collective” website covering law library issues.

VoxPopuLII is edited by Judith Pratt

The Recipe for Better Legal Information Services

comparative, digital law, legal research 7 Responses »

Aug 122009

A new style of legal research

An attorney/author in Baltimore is writing an article about state bans of teachers’ religious clothing. She finds one of the tersely written statutes online. The website then does a query of its own and tells her about a useful statute she wasn’t aware of—one setting out the permitted disciplinary actions. When she views it, the site makes the connection clear by showing her the where the second statute references the original. This new information makes her article’s thesis stronger. Recipe card

Meanwhile, 2800 miles away in Oregon, a law student is researching the relationship between the civil and criminal state codes. Browsing a research site, he notices a pattern of civil laws making use of the criminal code, often to enact civil punishments or enable adverse actions. He then engages the website in an interactive text-based dialog, modifying his queries as he considers the previous results. He finally arrives at an interesting discovery: the offenses with the least additional civil burdens are white collar crimes.

A new kind of research system

A new field of computer-assisted legal research is emerging: one that encompasses research in both the academic and the practical “legal research” senses. The two scenarios above both took place earlier this year, enabled by the OregonLaws.org research system that I created and which typifies these new developments.

Interestingly, this kind of work is very recent; it’s distinct from previous uses of computers for researching the law and assisting with legal work. In the past, techniques drawn from computer science have been most often applied to areas such as document management, court administration, and inter-jurisdiction communication. Working to improve administrative systems’ efficiency, people have approached these problem domains through the development of common document formats and methods of data interchange.

The new trend, in contrast, looks in the opposite direction: divergently tackling new problems as opposed to convergently working towards a few focused goals. This organic type of development is occurring because programming and computer science research is vastly cheaper—and much more fun—than it has ever been in the past. Here are a couple of examples of this new trend:

“Computer Programming and the Law”

Law professor Paul Ohm recently wrote a proposal for a new “interdisciplinary research agenda” which he calls “Computer Programming and the Law.” (The law review article is itself also a functioning computer program, written in the literate programming style.) He envisions “researcher-programmers,” enabled by the steadily declining cost of computer-aided research, using computers in revolutionary ways for empirical legal scholarship. He illustrates four new methods for this kind of research: developing computer programs to “gather, create, visualize, and mine data” that can be found in diverse and far-flung sources.

“Computational Legal Studies”

Grad students Daniel Katz and Michael Bommarito (researcher-programmers, as Paul Ohm would call them) created the Computational Legal Studies Blog in March, 2009. The web site is a growing collection of visualization applied to diverse legal and policy issues. The site is part showcase for the authors’ own work and part catalog of the current work of others.

OregonLaws.org

I started the OregonLaws.org project because I wanted faster and and easier access to the 2007 Oregon Revised Statutes (ORS) and other primary and secondary sources. I had a couple of very statute-heavy courses (Wills & Trusts, and Criminal Law) and I frequently needed to quickly find an ORS section. But as I got further into the development, I realized that it could become a platform for experimenting with computational analysis of legal information, similar to the work being done on the Computational Legal Studies Blog.

I developed the system using pretty much the the steps that Paul Ohm discussed:

Gathering data: I downloaded and cleaned up the ORS source documents, converting them from MS Word/HTML to plain text;
Creating: I parsed the texts, creating a database model reflecting the taxonomy of the ORS: Volumes, Titles, Chapters, etc.;
Creating: I created higher-level database entities based on insights into the documents. For example, by modeling textual references between sections explicitly as reference objects which capture a relationship between a referrer and a referent, and;
Mining and Visualizing: Finally, I’ve begun making web-based views of these newly found objects and relationships.

The object database is the key to intelligent research

By taking the time to go through the steps listed above, powerful new features can be created. Below are some additions to the features described in the introductory scenarios:

We can search smarter. In a previous VoxPopulii post, Julie Jones advocates dropping our usual search methods, and applying techniques like subject-based indexing (a la Factiva’s) to legal content.

This is straightforward to implement with an object model. The Oregon Legislature created the ORS with a conceptual structure similar to most states: The actual content is found in Sections. These are grouped into Chapters, which are in turn grouped into Titles. I was impressed by the organization and the architecture that I was discovering—insights that are obscured by the ways statutes are traditionally presented.

And so I sought out ways to make use of the legislature’s efforts whenever it made sense. In the case of search results, the Title organization and naming were extremely useful. Each Section returned by the search engine “knows” what Chapter and Title it belongs to. A small piece of code can then calculate what Titles are represented in the results, and how frequently. The resulting bar graph doubles as an easy way for users to specify filtering by “subject area”. The screenshot above shows a search for forest.

The ORS’s framework of Volumes, Titles, and Chapters was essentially a tag cloud waiting to be discovered.

We can get better authentication. In another VoxPopulii post, John Joergensen discussed the need for authentication of digital resources. One aspect of this is showing the user the chain of custody from the original source to the current presentation. His ideas about using digital signatures are excellent: a scenario of being able to verify an electronic document’s legitimacy with complete assurance.

We can get a good start towards this goal by explicitly modeling content sources. A source is given attributes for everything we’d want to know to create a citation; date last accessed, URL available at, etc. Every content object in the database is linked to one of these source objects. Now, every time we display a document, we can create properly formatted citations to the original sources.

The gather/create/mine/visualize and object-based approaches open up so many new possibilities, they can’t all be discussed in one short article. It sometimes seems that each new step taken enables previously unforeseen features. A few these others are new documents created by re-sorting and aggregating content, web service APIs, and extra annotations that enhance clarity. I believe that in the end, the biggest accomplishment of projects like this will be to raise our expectations for electronic legal research services, increase their quality, and lower their cost.

Robb Shecter is a software engineer and third year law student at Lewis & Clark Law School in Portland, Oregon. He is Managing Editor for the Animal Law Review, plays jazz bass, and has published articles in Linux Journal, Dr. Dobbs Journal, and Java Report.

VoxPopuLII is edited by Judith Pratt.

Suffusion theme by Sayontan Sinha

VoxPopuLII