Davidson Stephanie

Constitutions as Summer Reading

comparative, Digital law libraries, Electronic legal publishing, free law data, Legal text processing, Linked Data No Responses »

Oct 012015

Two years ago my collaborators and I introduced a new resource for understanding constitutions. We call it Constitute. It’s a web application that allows users to extract excerpts of constitutional text, by topic, for nearly every constitution in the world currently in force. One of our goals is to shed some of the drudgery associated with reading legal text. Unlike credit card contracts, Constitutions were meant for reading (and by non-lawyers). We have updated the site again, just in time for summer (See below). Curl up in your favorite retreat with Constitute this summer and tell us what you think.

Some background: Constitute is built primarily for those engaged in the challenge of drafting constitutions, which occurs more frequently than some think (4-5 constitutions are replaced each year and many more are revised in smaller ways). Drafters often want to view examples of text from a representative set of countries – mostly so that they can understand the multiple dimensions of a particular area of law. Of course, scholars and educators will also find many uses for the data. After all, the resource grew out of an effort to study constitutions, not write them.

How does Constitute differ from other constitutional repositories? The core advantage of Constitute is the ability to view constitutional excerpts by topic. These topics are derived from the conceptual inventory of constitutions that my collaborators and I have been developing and refining over the last ten years as part of the Comparative Constitutions Project (CCP). The intent of that project is to record the content of the world’s constitutions in order to answer questions about the origins and effects of various constitutional provisions. In order to build that dataset (CCP), we invested quite a bit of time in (1) identifying when constitutions in each country had been enacted, revised, or replaced, (2) tracking down the texts associated with each of these changes, (3) digitizing and archiving the texts, (4) building the conceptual apparatus to extract information about their content, and finally, (5) reading and interpreting the texts. We leveraged all of this information in building Constitute.

We are committed to refining and elaborating Constitute. Our recent release includes some exciting developments, some of which I describe here.

Now in Arabic! Until now, Constitute’s texts have been in English. However, we believe (with some evidence) that readers strongly prefer to read constitutions in their native language. Thus, with a nod to the constitutional activity borne of the Arab Spring, we have introduced a fully functioning Arabic version of the site, which includes a subset of Constitute’s texts. Thanks here to our partners at International IDEA, who provided valuable intellectual and material resources.

Form and function. One distinction of Constitute is the clarity and beauty of its reading environment. Constitutional interpretation is hard enough as it is. Constitute’s texts are presented in a clean typeset environment that facilitates and invites reading, not sleep and irritability. In the latest release, we introduce a new view of the data — a side-by-side comparison of two constitutions. While in our usual “list view,” you can designate up to eight constitutions for inclusion in the comparison set, once in “compare view,” you can choose any two from that set for side-by-side viewing. In compare view, you’ll find our familiar search bar and topic menu in the left panel to drive and refine the comparison. By default, compare view displays full constitutions with search results highlighted and navigable (if there are multiple results). Alternatively, you can strip away the content and view selected excerpts in isolation by clicking the button at the right of the texts. It is an altogether new, and perhaps better, way to compare texts.

Sharing and analyzing. Many users will want to carve off slices of data for digestion elsewhere. In that sense, scholars and drafting committees alike will appreciate that the site was built by and for researchers. Exporting is painless. Once you pin the results, you can export to a .pdf file or to Google Docs to collaborate with your colleagues. You can also export pinned results to a tabulated .csv file, which will be convenient for those of you who want to manage and analyze the excerpts using your favorite data applications. Not only that, but our “pin search” and “pin comparison” functions allow analysts to carve large slices of data and deposit them in the Pinned page for scaled-up analysis.

Raw data downloads. For those of you who build web applications or are interested in harnessing the power of Linked Data, we have exposed our linked data as a set of downloads and as a SPARQL endpoint, for people and machines to consume. Just follow the Data link on “More Info” in the left panel of the site.

And then there is “deep linking,” so that you can export your pinned results and share them as documents and datafiles. But you can also share excerpts, searches, comparisons, and full constitutions very easily in your direct communications. The most direct way is to copy the URL. All URLs on the site are now deep links, which means that anything you surface on the site is preserved in that URL forever (well, “forever” by internet standards). Suppose you are interested in those constitutions that provide for secession (Scotland and Catalunya have many thinking along those lines). Here are those results to share in your blog post, email, Wikipedia entry, or publication. By the way, do you know which constitutions mention the word “internet?” Chances are you’ll be surprised.

So, please take Constitute with you to the beach this summer and tell us what you think. Any comments or suggestions to the site should be directed to our project address, constitute.project@gmail.com.

Zachary Elkins is Associate Professor at the University of Texas at Austin. His research interests include constitutional design, democracy, and Latin American politics. He co-directs the Comparative Constitutions Project.

VoxPopuLII is edited by Judith Pratt. Editors-in-Chief are Stephanie Davidson and Christine Kirchberger, to whom queries should be directed.

Legal Research Ontology, Part II

Legal ontologies, legal research No Responses »

Aug 202015

My blog post last year about developing a legal research ontology was such an optimistic (i.e., naive), linear narrative. This was one of my final notes:

At this point, I am in the beginning stages of taking advantage of all the semantic web has to offer. The ontology’s classes now have subclasses. I am building the relationships between the classes and subclasses and using Protege to bring them all together.

I should have known better.

What I didn’t realize then was that I really didn’t understand anything about the semantic web. While I could use the term in a sentence and reference RDF and OWL and Protege, once you scratched the surface I was lost. Based on Sara Frug’s recommendation during a presentation at CALI Con 2014, I started reading Semantic Web for Dummies.

It has been, and continues to be, slow going. I don’t have a computer science or coding background, and so much of my project feels like trying to teach myself a new language without immersion or much of a guide. But the process of this project has become just as interesting to me as the end product. How are we equipped to teach ourselves anything? At a certain point, you just have to jump in and do something, anything, to get the project moving.

I had already identified the classes:
* Type of research material;
* Type of research problem;
* Source of law;
* Area of law;
* Legal action; and
* Final product.

I knew that each class has subclasses. Yet in my readings, as I learned how ontologies are used for constructing relationships between entities, I missed the part where I had to construct relationships between the entities. They didn’t just magically appear when you enter the terms into Protege.

I’m using Web Protege, an open-source product developed by the Stanford Center for Biomedical Informatics Research, using the OWL ontology language.

Ontology engineering is a hot topic these days, and there is a growing body of papers, tutorials, and presentations on OWL and ontology engineering. That’s also part of the problem: There’s a little too much out there. I knew that anything I would do with my ontology would happen in Protege, so I decided to start there with the extensive user documentation and user support. Their user guide takes you through setting up your first ontology with step-by-step illustrations and a few short videos. I also discovered a tutorial on the web titled Pizzas in 10 minutes.

Following the tutorial, you construct a basic ontology of pizza using different toppings and sauces. While it took me longer than 10 minutes to complete, it did give me enough familiarity with constructing relationships to take a stab at it with my ontology and its classes. Here’s what I came up with:

This representation doesn’t list every subclass; e.g., in Types of research material, I only listed primary source and in Area of law, I only listed contracts, torts and property. But it gives you an idea of how the classes relate to each other. Something I learned in building the sample pizza ontology in Protege is the importance of creating two properties: the relational “_property and the modifier_” property. The recommendation is to use has or is as prefixes1 for the properties. You can see how classes relate to each other in the above diagram as well as how classes are modified by subclasses and individuals.

I’m continuing to read Semantic Web for Dummies, and I’m currently focusing on Chapter 8: Speaking the Web Ontology Language. It has all kinds of nifty Venn diagrams and lines of computer code, and I’m working on understanding it all. This line keeps me going. However, if you’re looking for a system to draw inferences or to interpret the implications of your assertions (for example, to supply a dynamic view of your data), OWL is for you2.

One of my concerns is that a few of my subclasses belong to more than class. But the beauty of the Semantic web and OWL is that class and subclass are dynamic sets, and when you run the ontology individual members can change from one set to another. This means that Case Law can be both a subclass of Source of Law and an instance of Primary Source in the class Type of Research Material.

The way in which I set up my classes, subclasses, and the relationships between them are simple assertions3. Two equivalent classes would look like a enn diagram with the two sets as completely overlapping. This helps in dealing with synonyms. You can assert equivalence between individuals as well as classes, but it is better to set up each individual’s relationships with its classes, and then let the OWL reasoning system decide if the individuals are truly interchangeable. This is very helpful in a situation in which you are combining ontologies. There are more complicated assertions (equivalence, disjointness, and subsumption), and I am working on applying them and building out the ontology.

Next I need to figure out the characteristics of the properties relating the classes, subclasses, and individuals in my ontology: inverse, symmetric, transitive, intersection, union, complement, and restriction. As I continue to read (and reread) Semantic Web for Dummies, I am gaining a new appreciation for set theory and descriptive logic. Math seems to always have a way of finding you! I am also continuing to fill in the ontology with terms (using simple assertions), and I also need to figure out SPARQL so I can query the ontology. It still feels like one of those one step forward, two steps back endeavors, but it is interesting.

I hope to keep you posted, and I am grateful to the Vox PopuLII blog for having me back to write an update.

Amy Taylor is the Access Services Librarian and Adjunct Professor at American University Washington College of Law. Her main research interests are legal ontologies, organization of legal information and the influence of online legal research on the development of precedent. You can reach her on Twitter @taylor_amy or email: amytaylor@wcl.american.edu.

VoxPopuLII is edited by Judith Pratt. Editors-in-Chief are Stephanie Davidson and Christine Kirchberger, to whom queries should be directed.

—
1 Matthew Horridge, A Practical Guide to Building OWL Ontologies, 20, http://phd.jabenitez.com/wp-content/uploads/2014/03/A-Practical-Guide-To-Building-OWL-Ontologies-Using-Protege-4.pdf (last visited May 19, 2015.

2 Jeffrey Pollock, Semantic Web for Dummies 195 (Wiley 2009).

3 Id. at 200.

Rough Consensus, Running Standards: The Restatement Project

Annotation of legal texts, Electronic legal publishing, Legal argument, Legal informatics, legal language, Legal text processing, Legal XML 2 Responses »

Jun 032014

1. The Death and Life of Great Legal Data Standards

Thanks to the many efforts of the open government movement in the past decade, the benefits of machine-readable legal data — legal data which can be processed and easily interpreted by computers — are now widely understood. In the world of government statutes and reports, machine-readability would significantly enhance public transparency, help to increase efficiencies in providing services to the public, and make it possible for innovators to develop third-party services that enhance civic life.

In the universe of private legal data — that of contracts, briefs, and memos — machine-readability would open up vast potential efficiencies within the law firm context, allow the development of novel approaches to processing the law, and would help to drive down the costs of providing legal services.

However, while the benefits are understood, by and large the vision of rendering the vast majority of legal documents into a machine-readable standard has not been realized. While projects do exist to acquire and release statutory language in a machine-readable format (and the government has backed similar initiatives), the vast body of contractual language and other private legal documents remains trapped in a closed universe of hard copies, PDFs, unstructured plaintext and Microsoft Word files.

Though this is a relatively technical point, it has broad policy implications for society at large. Perhaps the biggest upshot is that machine-readability promises to vastly improve access to the legal system, not only for those seeking legal services, but also for those seeking to provide legal services, as well.

It is not for lack of a standard specification that the status quo exists. Indeed, projects like LegalXML have developed specifications that describe a machine-readable markup for a vast range of different types of legal documents. As of writing, the project includes technical committees working on legislative documents, contracts, court filings, citations, and more.

However, by and large these efforts to develop machine-readable specifications for legal data have only solved part of the problem. Creating the standard is one thing, but actually driving adoption of a legal data standard is another (often more difficult) matter. There are a number of reasons why existing standards have failed to gain traction among the creators of legal data.

For one, the oft-cited aversion of lawyers to technology remains a relevant factor. Particularly in the case of the standardization of legal data, where the projected benefits exist in the future and the magnitude of benefit speculative at the present moment, persuading lawyers and legislatures to adopt a new standard remains a challenge, at best.

Secondly, the financial incentives of some actors may actually be opposed towards rendering the universe of legal documents into a machine-readable standard. A universe of largely machine-readable legal documents would also be one in which it may be possible for third-parties to develop systems that automate and significantly streamline legal services. In the context of the ever-present billable hour, parties may resist the introduction of technological shifts that enable these efficiencies to emerge.

Third, the costs of converting existing legal data into a machine-readable standard may also pose a significant barrier to adoption. Marking up unstructured legal text can be highly costly depending on the intended machine usage of the document and the type of document in question. Persuading a legislature, firm, or other organization with a large existing repository of legal documents to take on large one-time costs to render the documents into a common standard also discourages adoption.

These three reinforcing forces erect a significant cultural and economic barrier against the integration of machine-readable standards into the production of legal text. To the extent that one believes in the benefits from standardization for the legal industry and society at large, the issue is — in fact — not how to define a standard, but how to establish one.

2. Rough Consensus, Running Standards

So, how might one go about promulgating a standard? Particularly in a world in which lawyers, the very actors that produce the bulk of legal data, are resistant to change, mere attempts to mobilize the legal community to action are destined to fail in bringing about the fundamental shift necessary to render most if not all legal documents in a common machine-readable format.

In such a context, implementing a standard in a way that removes humans from the loop entirely may, in fact, be more effective. To do so, one might design code that was capable of automatically rendering legal text into a machine-readable format. This code could then be implemented by applications of all kinds, which would output legal documents in a standard format by default. This would include the word processors used by lawyers, but also integration with platforms like LegalZoom or RocketLawyer that routinely generate large quantities of legal data. Such a solution would eliminate the need for lawyer involvement from the process of implementing a standard entirely: any text created would be automatically parsed and outputted in a machine readable format. Scripts might also be written to identify legal documents online and process them into a common format. As the body of documents rendered in a given format grew, it would be possible for others to write software leveraging the increased penetration of the standard.

There are — obviously — technical limitations in realizing this vision of a generalized legal data parser. For one, designing a truly comprehensive parser is a massively difficult computer science challenge. Legal documents come in a vast diversity of flavors, and no common textual conventions allow for the perfect accurate parsing of the semantic content of any given legal text. Quite simply, any parser will be an imperfect (perhaps highly imperfect) approximation of full machine-readability.

Despite the lack of a perfect solution, an open question exists as to whether or not an extremely rough parsing system, implemented at sufficient scale, would be enough to kickstart the creation of a true common standard for legal text. A popular solution, however imperfect, would encourage others to implement nuances to the code. It would also encourage the design of applications for documents rendered in the standard. Beginning from the roughest of parsers, a functional standard might become the platform for a much bigger change in the nature of legal documents. The key is to achieve the “minimal viable standard” that will begin the snowball rolling down the hill: the point at which the parser is rendering sufficient legal documents in a common format that additional value can be created by improving the parser and applying it to an ever broader scope of legal data.

But, what is the critical mass of documents one might need? How effective would the parser need to be in order to achieve the initial wave of adoption? Discovering this, and learning whether or not such a strategy would be effective, is at the heart of the Restatement project.

3. Introducing Project Restatement

Supported by a grant from the Knight Foundation Prototype Fund, Restatement is a simple, rough-and-ready system which automatically parses legal text into a basic machine-readable JSON format. It has also been released under the permissive terms of the MIT License, to encourage active experimentation and implementation.

The concept is to develop an easily-extensible system which parses through legal text and looks for some common features to render into a standard format. Our general design principle in developing the parser was to begin with only the most simple features common to nearly all legal documents. This includes the parsing of headers, section information, and “blanks” for inputs in legal documents like contracts. As a demonstration of the potential application of Restatement, we’re also designing a viewer that takes documents rendered in the Restatement format and displays them in a simple, beautiful, web-readable version.

Underneath the hood, Restatement is all built upon web technology. This was a deliberate choice, as Restatement aims to provide a usable alternative to document formats like PDF and Microsoft Word. We want to make it easy for developers to write software that displays and modifies legal documents in the browser.

In particular, Restatement is built entirely in JavaScript. The past few years have been exciting for the JavaScript community. We’ve seen an incredible flourishing of not only new projects built on JavaScript, but also new tools for building cool new things with JavaScript. It seemed clear to us that it’s the platform to build on right now, so we wrote the Restatement parser and viewer in JavaScript, and made the Restatement format itself a type of JSON (JavaScript Object Notation) document.

For those who are more technically inclined, we also knew that Restatement needed a parser formalism, that is, a precise way to define how plain text can get transformed into Restatement format. We became interested in recent advance in parsing technology, called PEG (Parsing Expression Grammar).

PEG parsers are different from other types of parsers; they’re unambiguous. That means that plain text passing through a PEG parser has only one possible valid parsed output. We became excited about using the deterministic property of PEG to mix parsing rules and code, and that’s when we found peg.js.

With peg.js, we can generate a grammar that executes JavaScript code as it parses your document. This hybrid approach is super powerful. It allows us to have all of the advantages of using a parser formalism (like speed and unambiguity) while also allowing us to run custom JavaScript code on each bit of your document as it parses. That way we can use an external library, like the Sunlight Foundation’s fantastic citation, from inside the parser.

Our next step is to prototype an “interactive parser,” a tool for attorneys to define the structure of their documents and see how they parse. Behind the scenes, this interactive parser will generate peg.js programs and run them against plaintext without the user even being aware of how the underlying parser is written. We hope that this approach will provide users with the right balance of power and usability.

4. Moving Forwards

Restatement is going fully operational in June 2014. After launch, the two remaining challenges are to (a) continuing expanding the range of legal document features the parser will be able to successfully process, and (b) begin widely processing legal documents into the Restatement format.

For the first, we’re encouraging a community of legal technologists to play around with Restatement, break it as much as possible, and give us feedback. Running Restatement against a host of different legal documents and seeing where it fails will expose the areas that are necessary to bolster the parser to expand its potential applicability as far as possible.

For the second, Restatement will be rendering popular legal documents in the format, and partnering with platforms to integrate Restatement into the legal content they produce. We’re excited to say on launch Restatement will be releasing the standard form documents used by the startup accelerator Y Combinator, and Series Seed, an open source project around seed financing created by Fenwick & West.

It is worth adding that the Restatement team is always looking for collaborators. If what’s been described here interests you, please drop us a line! I’m available at tim@robotandhwang.org, and on Twitter @RobotandHwang.

Jason Boehmig is a corporate attorney at Fenwick & West LLP, a law firm specializing in technology and life science matters. His practice focuses on startups and venture capital, with a particular emphasis on early stage issues. He is an active maintainer of the Series Seed Documents, an open source set of equity financing documents. Prior to attending law school, Jason worked for Lehman Brothers, Inc. as an analyst and then as an associate in their Fixed Income Division.

Tim Hwang currently serves as the managing human partner at the offices of Robot, Robot & Hwang LLP. He is curator and chair for the Stanford Center on Legal Informatics FutureLaw 2014 Conference, and organized the New and Emerging Legal Infrastructures Conference (NELIC) at Berkeley Law in 2010. He is also the founder of the Awesome Foundation for the Arts and Sciences, a distributed, worldwide philanthropic organization founded to provide lightweight grants to projects that forward the interest of awesomeness in the universe. Previously, he has worked at the Berkman Center for Internet and Society at Harvard University, Creative Commons, Mozilla Foundation, and the Electronic Frontier Foundation. For his work, he has appeared in the New York Times, Forbes, Wired Magazine, the Washington Post, the Atlantic Monthly, Fast Company, and the Wall Street Journal, among others. He enjoys ice cream.

Paul Sawaya is a software developer currently working on Restatement, an open source toolkit to parse, manipulate, and publish legal documents on the web. He previously worked on identity at Mozilla, and studied computer science at Hampshire College.

VoxPopuLII is edited by Judith Pratt. Editors-in-Chief are Stephanie Davidson and Christine Kirchberger, to whom queries should be directed.

Late Summer Reading: Book Edition

Reading Recommendations No Responses »

Aug 022013

Maybe it’s a bit late for a summer reading list, or maybe you’re just now starting to pack for your vacation, deep in a Goodreads list that you don’t ever expect to dig your way out of. Well, let us add to your troubles with a handful of books your editors are currently enjoying.

Non-Fiction
Clearing in the forest : law, life, and mind, by Steven L. Winter. A 2001 cognitive science argument for studying and developing law. Perhaps a little heavy for poolside, one of your editors finds it perfect for multi-day midwestern summer rainstorms, alons with a pot of tea. Review by Lawrence Solan in the Brooklyn Law Review, as part of a symposium.

Digital Disconnect: How Capitalism is Turning the Internet Against Democracy, by Robert W. McChesney.

“In Digital Disconnect, Robert McChesney offers a groundbreaking critique of the Internet, urging us to reclaim the democratizing potential of the digital revolution while we still can.”

This is currently playing on my work commute.

The Cognitive Style of Power Point: Pitching Out Corrupts Within, by Edward Tufte. Worth re-reading every so often, especially heading into conference/teaching seasons.

Delete: The Virtue of Forgetting in a Digital Age, by VoxPopuLII contributor Viktor Mayer-Schonberger. Winner of the 2010 Marshall McLuhan Award for Outstanding Book in Media ecology, Media Ecology Association; Winner of the 2010 Don K. Price Award for Best Book in Science and Technology Politics, Section on Science, Technology, and Environmental Politics (STEP) by the American Political Science Association. Review at the Times Higher Education.

Piracy: The Intellectual Property Wars from Gutenberg to Gates, by Adrian Johns (2010). A historian’s view of Intellectual Property — or, this has all happened before. Reviews at the Washington Post and the Electronic Frontier Foundation. From the latter, “Radio arose in the shadow of a patent thicket, became the province of tinkers, and posed a puzzle for a government worried that “experimenters” would ruin things by mis-adjusting their sets and flooding the ether with howling oscillation. Many will immediately recognize the parallels to modern controversies about iPhone “jailbreaking,” user innovation, and the future of the Internet.”

The Master Switch: The Rise and Fall of Information Empires, by Tim Wu (2010). A history of communications technologies, and the cyclical (or not) trends of their openness, and a theory on the fate of the Internet. Nice reviews on Ars Tecnica and The Guardian.

Too Big to Know: Rethinking Knowledge Now That the Facts Aren’t the Facts, Experts Are Everywhere, and the Smartest Person in the Room Is the Room, by David Weinberger (author of the Cluetrain Manifesto). For more, check out this excerpt by Weinberger in The Atlantic and

You are not so smart, by David McRaney. Examines the myth of being intelligent — a very refreshing read for the summer. A review of the book can be found at Brainpickings, which by the way is an excellent blog and definitely worth a look.

On a rainy day you can always check out the BBC series “QI” with a new take on what we think we know but don’t know. Hosted by Stephen Fry. Comedians share their intelligence with witty humour and you will learn a thing or two along the way. The TV show has also led to a few books, e.g. Qi: the Book of General Ignorance (Q1), by John Lloyd

Fiction

Sparing the cheesy beach reads, here’s a fiction set that you may find interesting.

The Ware Tetralogy: Ware #1-4 , by Rudy Rucker (currently $6.99 for the four-pack)

Rucker’s four Ware novels–Software (1982), Wetware (1988), Freeware (1997), and Realware (2000)–form an extraordinary cyberweird future history with the heft of an epic fantasy novel and the speed of a quantum processor. Still exuberantly fresh despite their age, they primarily follow two characters (and their descendants): Cobb Anderson, who instigated the first robot revolution and is offered immortality by his grateful “children,” and stoner Sta-Hi Mooney, who (against his impaired better judgment) becomes an important figure in robot-human relations. Over several generations, humans, robots, and society evolve, but even weird drugs and the wisdom gathered from interstellar signals won’t stop them from making the same old mistakes in new ways. Rucker is both witty and serious as he combines hard science and sociology with unrelentingly sharp observations of all self-replicating beings. — Publisher’s Weekly

Happy reading! We’ll return mid-August with a feature on AT4AM.

–editors

VoxPopuLII is edited by Judith Pratt. Editors-in-Chief are Stephanie Davidson and Christine Kirchberger, to whom queries should be directed.

CourtListener: Where We Are and Where We'd Like to Go

Digital legal publishing 2 Responses »

Mar 042013

At CourtListener, we are making a free database of court opinions with the ultimate goal of providing the entire U.S. case-law corpus to the world for free and combining it with cutting-edge search and research tools. We–like most readers of this blog–believe that for justice to truly prevail, the law must be open and equally accessible to everybody.

It is astonishing to think that the entire U.S. case-law corpus is not currently available to the world at no cost. Many have started down this path and stopped, so we know we’ve set a high goal for a humble open source project. From time to time it’s worth taking a moment to reflect on where we are and where we’d like to go in the coming years.

The current state of affairs

We’ve created a good search engine that can provide results based on a number of characteristics of legal cases. Our users can search for opinions by the case name, date, or any text that’s in the opinion, and can refine by court, by precedential status or by citation. The results are pretty good, but are limited based on the data we have and the “relevance signals” that we have in place.

A good legal search engine will use a number of factors (a.k.a. “relevance signals”) to promote documents to the top of their listings. Things like:

How recent is the opinion?
How many other opinions have cited it?
How many journals have cited it?
How long is it?
How important is the court that heard the case?
Is the case in the jurisdiction of the user?
Is the opinion one that the user has looked at before?
What was the subsequent treatment of the opinion?

And so forth. All of the above help to make search results better, and we’ve seen paid legal search tools make great strides in their products by integrating these and other factors. At CourtListener, we’re using a number of the above, but we need to go further. We need to use as many factors as possible, we need to learn how the factors interact with each other, which ones are the most important, and which lead to the best results.

A different problem we’re working to solve at CourtListener is getting primary legal materials freely onto the Web. What good is a search engine if the opinion you need isn’t there in the first place? We currently have about 800,000 federal opinions, including West’s second and third Federal Reporters, F.2d and F.3d, and the entire Supreme Court corpus. This is good and we’re very proud of the quality of our database–we think it’s the best free resource there is. Every day we add the opinions from the Circuit Courts in the federal system and the U.S. Supreme Court, nearly in real-time. But we need to go further: we need to add state opinions, and we need to add not just the latest opinions but all the historical ones as well.

This sounds daunting, but it’s a problem that we hope will be solved in the next few years. Although it’s taking longer than we would like, in time we are confident that all of the important historical legal data will make its way to the open Internet. Primary legal sources are already in the public domain, so now it’s just a matter of getting it into good electronic formats so that anyone can access it and anyone can re-use it. If an opinion only exists as unsearchable scanned versions, in bound books, or behind a pricey pay wall, then it’s closed to many people that should have access to it. As part of our citation identification project, which I’ll talk about next, we’re working to get the most important documents properly digitized.

Our citation identification project was developed last year by U.C. Berkeley School of Information students Rowyn McDonald and Karen Rustad to identify and cross-link any citations found in our database. This is a great feature that makes all the citations in our corpus link to the correct opinions, if we have them. For example, if you’re reading an opinion that has a reference to Roe v. Wade, you can click on the citation, and you’ll be off and reading Roe v. Wade. By the way, if you’re wondering how many Federal Appeals opinions cite Roe v. Wade, the number in our system is 801 opinions (and counting). If you’re wondering what the most-cited opinion in our system is, you may be bemused: With about 10,000 citations, it’s an opinion about ineffective assistance of legal counsel in death penalty cases, Strickland v. Washington, 466 U.S. 668 (1984).

A feature we’ll be working on soon will tie into our citation system to help us close any gaps in our corpus. Once the feature is done, whenever an opinion is cited that we don’t yet have, our users will be able to pay a small amount–one or two dollars–to sponsor the digitization of that opinion. We’ll do the work of digitizing it, and after that point the opinion will be available to the public for free.

This brings us to the next big feature we added last year: bulk data. Because we want to assist academic researchers and others who might have a use for a large database of court opinions, we provide free bulk downloads of everything we have. Like Carl Malamud’s Resource.org, (to whom we owe a great debt for his efforts to collect opinions and provide them to others for free and for his direct support of our efforts) we have giant files you can download that provide thousands of opinions in computer-readable format. These downloads are available by court and date, and include thousands of fixes to the Resource.org corpus. They also include something you can’t find anywhere else: the citation network. As part of the metadata associated with each opinion in our bulk download files, you can look and see which opinions it cites as well as which opinions cite it. This provides a valuable new source of data that we are very eager for others to work with. Of course, as new opinions are added to our system, we update our downloads with the new citations and the new information.

Finally, we would be remiss if we didn’t mention our hallmark feature: daily, weekly and monthly email alerts. For any query you put into CourtListener, you can request that we email you whenever there are new results. This feature was the first one we created, and one that we continue to be excited about. This year we haven’t made any big innovations to our email alerts system, but its popularity has continued to grow, with more than 500 alerts run each day. Next year, we hope to add a couple small enhancements to this feature so it’s smoother and easier to use.

The future

I’ve hinted at a lot of our upcoming work in the sections above, but what are the big-picture features that we think we need to achieve our goals?

We do all of our planning in the open, but we have a few things cooking in the background that we hope to eventually build. Among them are ideas for adding oral argument audio, case briefs, and data from PACER. Adding these new types of information to CourtListener is a must if we want to be more useful for research purposes, but doing so is a long-term goal, given the complexity of doing them well.

We also plan to build an opinion classifier that could automatically, and without human intervention, determine the subsequent treatment of opinions. Done right, this would allow our users to know at a glance if the opinion they’re reading was subsequently followed, criticized, or overruled, making our system even more valuable to our users.

In the next few years, we’ll continue building out these features, but as an open-source and open-data project, everything we do is in the open. You can see our plans on our feature tracker, our bugs in our bug tracker, and can get in touch in our forum. The next few years look to be very exciting as we continue building our collection and our platform for legal research. Let’s see what the new year brings!

Michael Lissner is the co-founder and lead developer of CourtListener, a project that works to make the law more accessible to all. He graduated from U.C. Berkeley’s School of Information, and when he’s not working on CourtListener he develops search and eDiscovery solutions for law firms. Michael is passionate about bringing greater access to our primary legal materials, about how technology can replace old legal models, and about open source, community-driven approaches to legal research.

Brian W. Carver is Assistant Professor at the U.C. Berkeley School of Information where he does ressearch on and teaches about intellectual property law and cyberlaw. He is also passionate about the public’s access to the law. In 2009 and 2010 he advised an I School Masters student, Michael Lissner, on the creation of CourtListener.com, an alert service covering the U.S. federal appellate courts. After Michael’s graduation, he and Brian continued working on the site and have grown the database of opinions to include over 750,000 documents. In 2011 and 2012, Brian advised I School Masters students Rowyn McDonald and Karen Rustad on the creation of a legal citator built on the CourtListener database.

VoxPopuLII is edited by Judith Pratt. Editors-in-Chief are Stephanie Davidson and Christine Kirchberger, to whom queries should be directed. The information above should not be considered legal advice. If you require legal representation, please consult a lawyer.

Opening Up State Legal Data

Demand for public access to legal information, Digital legal publishing, Electronic legal publishing, elegislation, elegislation systems, free law data, Legislative information systems 2 Responses »

Dec 082012

There have been a series of efforts to create a national legislative data standard – one master XML format to which all states will adhere for bills, laws, and regulations.Those efforts have gone poorly.

Few states provide bulk downloads of their laws. None provide APIs. Although nearly all states provide websites for people to read state laws, they are all objectively terrible, in ways that demonstrate that they were probably pretty impressive in 1995. Despite the clear need for improved online display of laws, the lack of a standard data format and the general lack of bulk data has enabled precious few efforts in the private sector. (Notably, there is Robb Schecter’s WebLaws.org, which provides vastly improved experiences for the laws of California, Oregon, and New York. There was also a site built experimentally by Ari Hershowitz that was used as a platform for last year’s California Laws Hackathon.)

A significant obstacle to prior efforts has been the perceived need to create a single standard, one that will accommodate the various textual legal structures that are employed throughout government. This is a significant practical hurdle on its own, but failure is all but guaranteed by also engaging major stakeholders and governments to establish a standard that will enjoy wide support and adoption.

What if we could stop letting the perfect be the enemy of the good? What if we ignore the needs of the outliers, and establish a “good enough” system, one that will at first simply work for most governments? And what if we completely skip the step of establishing a standard XML format? Wouldn’t that get us something, a thing superior to the nothing that we currently have?

The State Decoded
This is the philosophy behind The State Decoded. Funded by the John S. and James L. Knight Foundation, The State Decoded is a free, open source program to put legal codes online, and it does so by simply skipping over the problems that have hampered prior efforts. The project does not aspire to create any state law websites on its own but, instead, to provide the software to enable others to do so.

Still in its development (it’s at version 0.4), The State Decoded leaves it to each implementer to gather up the contents of the legal code in question and interface it with the program’s internal API. This could be done via screen-scraping off of an existing state code website, modifying the parser to deal with a bulk XML file, converting input data into the program’s simple XML import format, or by a few other methods. While a non-trivial task, it’s something that can be knocked out in an afternoon, thus avoiding the need to create a universal data format and to persuade Wexis to provide their data in that format.

The magic happens after the initial data import. The State Decoded takes that raw legal text and uses it to populate a complete, fully functional website for end-users to search and browse those laws. By packaging the Solr search engine and employing some basic textual analysis, every law is cross-referenced with other laws that cite it and laws that are textually similar. If there exists a repository of legal decisions for the jurisdiction in question, that can be incorporated, too, displaying a list of the court cases that cite each section. Definitions are detected, loaded into a dictionary, and make the laws self-documenting. End users can post comments to each law. Bulk downloads are created, letting people get a copy of the entire legal code, its structural elements, or the automatically assembled dictionary. And there’s a REST-ful, JSON-based API, ready to be used by third parties. All of this is done automatically, quickly, and seamlessly. The time elapsed varies, depending on server power and the length of the legal code, but it generally takes about twenty minutes from start to finish.

The State Decoded is a free program, released under the GNU Public License. Anybody can use it to make legal codes more accessible online. There are no strings attached.

It has already been deployed in two states, Virginia and Florida, despite not actually being a finished project yet.

State Variations
The striking variations in the structures of legal codes within the U.S. required the establishment of an appropriately flexible system to store and render those codes. Some legal codes are broad and shallow (e.g., Louisiana, Oklahoma), while others are narrow and deep (e.g., Connecticut, Delaware). Some list their sections by natural sort order, some in decimal, a few arbitrarily switch between the two. Many have quirks that will require further work to accommodate.

For example, California does not provide a catch line for their laws, but just a section number. One must read through a law to know what it actually does, rather than being able to glance at the title and get the general idea. Because this is a wildly impractical approach for a state code, the private sector has picked up the slack – Westlaw and LexisNexis each write their own titles for those laws, neatly solving the problem for those with the financial resources to pay for those companies’ offerings. To handle a problem like this, The State Decoded either needs to be able to display legal codes that lack section titles, or pointedly not support this inferior approach, and instead support the incorporation of third-party sources of title. In California, this might mean mining the section titles used internally by the California Law Revision Commission, and populating the section titles with those. (And then providing a bulk download of that data, allowing it to become a common standard for California’s section titles.)

Many state codes have oddities like this. The State Decoded combines flexibility with open source code to make it possible to deal with these quirks on a case-by-case basis. The alternative approach is too convoluted and quixotic to consider.

Regulations
There is strong interest in seeing this software adapted to handle regulations, especially from cash-strapped state governments looking to modernize their regulatory delivery process. Although this process is still in an early stage, it looks like rather few modifications will be required to support the storage and display of regulations within The State Decoded.

More significant modifications would be needed to integrate registers of regulations, but the substantial public benefits that would provide make it an obvious and necessary enhancement. The present process required to identify the latest version of a regulation is the stuff of parody. To select a state at random, here are the instructions provided on Kansas’s website:

To find the latest version of a regulation online, a person should first check the table of contents in the most current Kansas Register, then the Index to Regulations in the most current Kansas Register, then the current K.A.R. Supplement, then the Kansas Administrative Regulations. If the regulation is found at any of these sequential steps, stop and consider that version the most recent.

If Kansas has electronic versions of all this data, it seems almost punitive not to put it all in one place, rather than forcing people to look in four places. It seems self-evident that the current Kansas Register, the Index to Regulations, the K.A.R. Supplement, and the Kansas Administrative Regulations should have APIs, with a common API atop all four, which would make it trivial to present somebody with the current version of a regulation with a single request. By indexing registers of regulations in the manner that The State Decoded indexes court opinions, it would at least be possible to show people all activity around a given regulation, if not simply show them the present version of it, since surely that is all that most people want.

A Tapestry of Data
In a way, what makes The State Decoded interesting is not anything that it actually does, but instead what others might do with the data that it emits. By capitalizing on the program’s API and healthy collection of bulk downloads, clever individuals will surely devise uses for state legal data that cannot presently be envisioned.

The structural value of state laws is evident when considered within the context of other open government data.

Major open government efforts are confined largely to the upper-right quadrant of this diagram – those matters concerned with elections and legislation. There is also some excellent work being done in opening up access to court rulings, indexing scholarly publications, and nascent work in indexing the official opinions of attorneys general. But the latter group cannot be connected to the former group without opening up access to state laws. Courts do not make rulings about bills, of course – it is laws with which they concern themselves. Law journals cite far more laws than they do bills. To weave a seamless tapestry of data that connects court decisions to state laws to legislation to election results to campaign contributions, it is necessary to have a source of rich data about state laws. The State Decoded aims to provide that data.

Next Steps
The most important next step for The State Decoded is to complete it, releasing a version 1.0 of the software. It has dozens of outstanding issues – both bug fixes and new features – so this process will require some months. In that period, the project will continue to work with individuals and organizations in states throughout the nation who are interested in deploying The State Decoded to help them get started.

Ideally, The State Decoded will be obviated by states providing both bulk data and better websites for their codes and regulations. But in the current economic climate, neither are likely to be prioritized within state budgets, so unfortunately there’s liable to remain a need for the data provided by The State Decoded for some years to come. The day when it is rendered useless will be a good day.

Waldo Jaquith is a website developer with the Miller Center at the University of Virginia in Charlottesville, Virginia. He is a News Challenge Fellow with the John S. and James L. Knight Foundation and runs Richmond Sunlight, an open legislative service for Virginia. Jaquith previously worked for the White House Office of Science and Technology Policy, for which he developed Ethics.gov, and is now a member of the White House Open Data Working Group.
[Editor’s Note: For topic-related VoxPopuLII posts please see: Ari Hershowitz & Grant Vergottini, Standardizing the World’s Legal Information – One Hackathon At a Time; Courtney Minick, Universal Citation for State Codes; John Sheridan, Legislation.gov.uk; and Robb Schecter, The Recipe for Better Legal Information Services. ]

VoxPopuLII is edited by Judith Pratt. Editors-in-Chief are Stephanie Davidson and Christine Kirchberger, to whom queries should be directed.

Digital Law: What Lawyers Need to Learn from Accountants

authentication, Legislative information systems 4 Responses »

Nov 162012

In 1494, Luca Pacioli published “Particularis de Computis et Scripturis,” which is widely regarded as the first written treatise on bookkeeping. In the 600+ years since that event, we have become completely accustomed to the concepts of ledgers, journals and double-entry bookkeeping. Like all profound ideas, the concept of a transaction ledger nowadays seems to be completely natural, as if always existed as some sort of natural law.

Whenever there is a need for detailed, defensible records of how a financial state of affairs (such as a company balance sheet or a profit and loss statement) came to be, we employ Pacioli’s concepts without even thinking about them any more. Of course you need ledgers of transactions as the raw material from which to derive the financial state of affairs at whatever chosen point in time is of interest. How else could you possibly do it?

Back in Pacioli’s day, there was nothing convenient about ledgers. After all, back then, all ledger entries had to be painstakingly made by hand into paper volumes. Care was needed to pair up the debits and credits. Ledger page totals and sub-totals had to checked and re-checked. Very labor intensive stuff. But then computers came along and lo! all the benefits of ledgers in terms of the rigorous audit trail could be enjoyed without all the hard labor.

Doubtless, somewhere along the line in the early days of the computerization of financial ledgers, it occurred to somebody that ledger entries need not be immutable. That is to say, there is no technical reason to carry forward the “limitation” that pen and ink imposes on ledger writers, that an entry – once made – cannot be changed without leaving marks on the page that evidence the change. Indeed, bookkeeping has long had the concept of a “contra-entry” to handle the immutability of pen and ink. For example, if a debit of a dollar is made to a ledger by mistake, then another ledger entry is made – this time a credit – for a dollar to counter-balance the mistake while preserving the completeness of the audit-trail.

Far from being a limitation of the paper-centric world, the concept of an “append-only” ledger turns out, in my opinion, to be the key to the trustworthiness and transparency of financial statements. Accountants and auditors can take different approaches to how information from the ledgers is grouped/treated, but the ledgers are the ledgers are the ledgers. Any doubt that the various summations accurately reflect the ledgers can readily be checked.

Now let us turn to the world of law. Well, law is so much more complicated! Laws are not simple little numerical values that fit nicely into transaction rows either in paper ledgers or in database tables. True, but does it follow that the many benefits of the ledger-centric approach cannot be enjoyed in our modern day digital world where we do not have the paper-centric ledger limitations of fixed size lines to fit our information into? Is the world of legal corpus management really so different from the world of financial accounting?

What happens if we look at, say, legal corpus management in a prototypical U.S. legislature, from the perspective of an accountant? What would an accountant see? Well, there is this asset called the statute. That is the “opening balance” inventory of the business in accounting parlance. There is a time concept called a Biennium which is an “accounting period”. All changes to the statute that happen in the accounting period are recorded in the form of bills. bills are basically accounting transactions. The bills are accumulated into a form of ledger typically known as Session Laws. At the end of the accounting period – the Biennium – changes to the statute are rolled forward from the Session Laws into the statute. In accounting parlance, this is the period-end accounting culminating in a new set of opening balances (statute), for the start of the next Biennium. At the start of the Biennium, all the ledger transactions are archived off and a fresh set of ledgers is created; that is, bill numbers/session law numbers are reset, the active Biennium name changes etc.

I could go on and on extending the analogy (chamber journals are analogous to board of directors meeting minutes; bill status reporting is analogous to management accounting, etc.) but you get the idea. Legal corpus management in a legislature can be conceptualized in accounting terms. Is it useful to do so? I would argue that it is incredibly useful to do so. Thanks to computerization, we do not have to limit the application of Luca Pacioli’s brilliant insight to things that fit neatly into little rows of boxes in paper ledgers. We can treat bills as transactions and record them architecturally as 21st century digital ledger transactions. We can manage statute as a “balance” to be carried forward to the next Biennium. We can treat engrossments of bills and statute alike as forms of trail balance generation and so on.

Now I am not for a moment suggesting that a digital legislative architecture be based on any existing accounting system. What I am saying is that the concepts that make up an accounting system can – and I would argue should – be used. A range of compelling benefits accrue from this. A tremendous amount of the back-office work that goes on in many legislatures can be traced back to work-in-progress (WIP) reporting and period-end accounting of what is happening with the legal corpus. Everything from tracking bill status to the engrossment of committee reports becomes significantly easier once all the transactions are recorded in legislative ledgers. The ledgers then becomes the master repository from which all reports are generated. The reduction in overall IT moving parts, reduction in human effort, reduction in latency and the increase in information consistency that can be achieved by doing this is striking.

For many hundreds of years we have had ledger-based accounting. For hundreds of years the courts have taken the view that, for example, a company cannot simply announce a Gross Revenue figure to tax officials or to investors, without having the transaction ledgers to back it up. Isn’t in interesting that we do not do the same for the legal corpus? We have all sorts of publishers in the legal world, from public bodies to private sector, who produce legislative outputs that we have to trust because we do not have any convenient form of access to the transaction ledgers. Somewhere along the line, we seem to have convinced ourselves that the level of rigorous audit trail routinely applied in financial accounting cannot be applied to law. This is simply not true.

We can and should fix that. The prize is great, the need is great and the time is now. The reason the time is now is that all around us, I see institutions that are ceasing to produce paper copies of critical legal materials in the interests of saving costs and streamlining workflows. I am all in favour of both of these goals, but I am concerned that many of the legal institutions going fully paperless today are doing so without implementing a ledger-based approach to legal corpus management. Without that, the paper versions of everything from registers to regulations to session laws to chamber journals to statute books – for all their flaws – are the nearest thing to an immutable set of ledgers that exist. Take away what little audit trail we have and replace it will a rolling corpus of born digital documents without a comprehensive audit trail of who changed what and when?…Not good.

Once an enterprise-level ledger-based approach is utilised, another great prize can be readily won; namely, the creation of a fully digital yet fully authenticated and authoritative corpus of law. To see why, let us step back into the shoes of the accountant for a moment. When computers came along and the financial paper ledgers were replaced with digital ledgers, the world of accounting did not find itself in a crisis concerning authenticity in the way the legal world has. Why so?

I would argue that the reason for this is that ledgers – Luca Pacioli’s great gift to the world – are the true source of authenticity for any artifact derived from the ledgers. Digital authenticity of balance sheets or Statute sections does not come from digital signatures or thumb-print readers or any of the modern high tech gadgetry of the IT security landscape. Authenticity come from knowing that what you are looking at was mechanically and deterministically derived from a set of ledgers and that those ledgers are available for inspection. What do financial auditors do for living? They check authenticity of financial statements. How do they do it? They do it by inspecting the ledgers. Why is authenticity of legal materials such a tough nut to crack? Because there are typically no ledgers!

From time to time we hear an outburst of emotion about the state of the legal corpus. From time to time we hear how some off-the-shelf widget will fix the problem. Technology absolutely holds the solutions, but it can only work, in my opinion, when the problem of legal corpus management is conceptualized as ledger-centric problem where we put manic focus on the audit trail. Then, and only then, can we put the legal corpus on a rigorous digital footing and move forward to a fully paperless world with confidence.

From time to time, we hear an outburst of enthusiasm to create standards for legal materials and solve our problems that way. I am all in favour of standards but we need to be smart about what we standardize. Finding common ground in the industry for representing legislative ledgers would be an excellent place to start, in my opinion.

Is this something that some standards body such as OASIS or NIEM might take on? I would hope so and hopeful that it will happen at some point. Part of why I am hopeful is that I see an increasing recognition of the value of ledger-based approaches in the broader world of GRC (Governance, Risk and Compliance). For too long now, the world of law has existed on the periphery of the information sciences. It can, and should be, an exemplar of how a critical piece of societal infrastructure has fully embraced what it means to be “born digital”. We have known conceptually how to do it since 1494. The technology all exists today to make it happen. A number of examples already exist in production use in legislatures in Europe and in the USA. What is needed now, is for the idea to spread like wildfire the same way that Pacioli’s ideas spread like wildfire into the world of finance all those years ago.

Perhaps some day, when the ledger-centric approach to legal corpus management had removed doubts about authenticity/reliability, we will look back and think digital law was always done with ledgers, just as today we think that accounting was always done that way.

Sean McGrath is co-founder and CTO of Propylon, based in Lawrence, Kansas. He has 30 years of experience in the IT industry, most of it in the legal and regulatory publishing space. He holds a first class honors degree in Computer Science from Trinity College Dublin and served as an invited expert to the W3C special interest group that created the XML standard in 1996. He is the author of three books on markup languages published by Prentice Hall in the Dr Charles F. Goldfarb Series on Open Information Management. He is a regular speaker at industry conferences and runs a technology-oriented blog at http://seanmcgrath.blogspot.com.

VoxPopuLII is edited by Judith Pratt. Editors-in-Chief are Stephanie Davidson and Christine Kirchberger, to whom queries should be directed.

Open Sesame

Uncategorized 1 Response »

May 232012

For some time, Open Access has been a sort of gnat in my office, bugging me periodically, but always just on the edge of getting my full attention. Perhaps due in large part to the fact that our journals simply cost much less than those in other disciplines, law librarians have been able to stay mostly on the outside of this discussion. The marketing benefits of building institutional repositories are just as strong for law schools as other disciplines, however, and many law schools are now boarding the train — with librarians conducting. If you’re new to the discussion of Open Access in general, I suggest Peter Suber’s Open Access Overview for an excellent introduction. This piece is meant to briefly summarize the goals, progress, and future of OA as it applies (mostly) to legal scholarship.

Background and History
Open Access is not merely the buzzword of the moment: Open Access, or OA, describes work that is free to read, by anyone. Though usually tied to discussions of Institutional or Scholarly Repositories, the two do not necessarily have to be connected. Publications can be made “open” via download from an author’s institutional or personal home page, a disciplinary archive such as SSRN or BePress, or through nearly any other type of digital collection – so long as is it provided for free. For readers, free should mean free of cost and free of restrictions. These are sometimes described as gratis OA and libre OA, respectively. As Peter Suber notes, “Gratis OA is free as in beer. Libre OA is free as in beer and free as in speech.”

In addition to the immediate benefits of OA for researchers and for libraries (who would save a great deal of money spent on collections), strong ethical arguments can be made for OA as a necessary public service, given the enormous public support of research (tax dollars). The argument sharpens when research is explicitly supported by Federal or other grant funds. Paying to access grant-funded work amounts to a second charge to the taxpayer, while private publishers profit.

Of course, OA wasn’t an option with print resources; while anyone is “free” to go to a library that subscribes to a journal and read it, physical location itself is a barrier to access. In the networked digital environment, physical location need not be a barrier anymore. For members of the scholarly community who wish to share and discuss work with each other, that might be the end of the story. But while the technology is mature, policies and politics are still developing, and fraught with challenges posed largely by rights holders with significant financial interests in the current publishing system. One vocal segment of that market raises economic objections based on their financial support of the peer review process and other overhead costs related to production and dissemination of scholarly research. Since publishers control the permissions necessary to make OA work most fully, their opposition frustrates the efforts of many OA advocates. Not all publishers are invested in erecting barriers to OA, though; see, e.g., the ROMEO directory of publisher copyright policies and self-archiving. Though some impose embargo periods before posting, many publishers across disciplines allow deposit of the final published version of work.

In the midst of this conflict, many OA proponents acknowledge that production of scholarship is not without costs; Old Faithful didn’t start spouting Arrogant Bastard Ale one bright morning. Separate from the mechanism for sharing the Open Access version of an article, there are charges associated with its production that must be supported. The OA movement seeks a new model for recuperating these costs, rather than eliminating the costs altogether.

Interoperability
So, the “open” part of Open Access is roughly equivalent to “free” (for the reader), which presents economic challenges that remain to be solved. What about the “access” part?

Access to physical literature was largely a matter of indexing and physical copies; inclusion in the leading index(es) of a field was an honor (and potential economic advantage) to journals. Collection development decisions used to be made based in part on whether a journal was indexed. Access to online literature requires more than simply the digital equivalent in order to sufficiently serve the community, though: both the ability to download the article, and the ability to search across the literature are required for researchers to effectively manage the volume of literature.

As a foundational matter, openness in scholarly communication requires a certain amount of interoperability between the archives that serve up scholarship. The Open Archives Initiative (OAI) develop standards to promote interoperability between archives. Such standards support harvesting and assembling the metadata from multiple OAI-compliant archives to facilitate searching and browsing across collections in an institution, field, or discipline.

Paths to OA
One repeated practical question around Open Access is logistical: Who will build the archive, and how will it be populated on a regular basis? There are several models for implementing Open Access. Disciplinary Archives, Institutional or Unit/Departmental Repositories, and Self Archiving are all paths that can be taken, somewhat separate from publishers’ progress towards OA.

Disciplinary repositories are somewhat common around the academic community: PLoS & PubMedCentral, for example, provide access to a large collection of works in Science and Medicine. Like SSRN/LSN, they provide a persistent, accessible host for scholarship, and searchable collection for new papers in the field. One difference in the legal community is in the primary publishing outlets: for most law faculty, the most prestigious placement is in a top-20 law school-published law journal. These journals vary on their OA friendliness, but many faculty read their agreements in such a way to allow this sort of archiving. SSRN has thus provided a low bar for legal scholars to make their work available openly. SSRN also provides a relatively simple, if not entirely useful, metric for scholarly impact in appointments and in promotion and tenure discussions. As of last checking, SSRN’s abstract database was at 395k+, and their full text collection at 324k.

Institutional or Unit/Departmental Repositories (IRs) are also becoming a popular choice for institutions seeking to promote their brand, and to increase the profile of their faculty. A variety of options are available for creating an IR, from open-source hosting to turnkey or hosted systems like BePress’ Digital Commons. Both avenues tend to offer flexibility in creating communities within the IR for subjects or other series, for handling embargoes and other specialized needs. BePress’ Digital Commons, for example, can serve as an IR and/or a publishing system for the peer-review and editing process. As a path to Open Access, the only barriers to IRs are institutional support for the annual licensing/hosting fee and some commitment of staff for populating the IR with publications (or facilitating, if authors will self-archive).

Self-archiving represents an appeal directly to authors, who are not the tough sell that publishers tend to be. As Suber notes, the scholarly publishing arena lacks the economic disincentives to OA normally present for authors. Scholarly law journal articles, the bread and butter of the legal academy, do not produce royalties, so authors have nothing to lose from making their work available in OA platforms. One route to OA, therefore, is self-archiving by researchers. But while they might support OA in principle, researchers’ own best interests may push them to publish in “barrier-based” journals to protect their tenure and grant prospects, despite the interests of both the public and their own scientific community in no-cost, barrier-free access.

What about mandates as part of the path to OA? Recently, some academic institutions and grant agencies have begun instituting some form of mandate of open access publication. The NIH mandate, for example, implemented in 2008, requires deposit in PubMed Central within twelve months of publication for the results of any of their funded research. Others have followed, including Harvard Law School. As a path to OA, both are useful, though funder mandates alone wouldn’t hit enough of the literature to make a difference in terms of access for researchers. Institutional mandates, however, just might:

“When complemented by funding agency and foundation public-access mandates that capture the work originating with industry and government researchers who may not have faculty status, university mandates will, in time, produce nearly universal access to all the scientific literature.”

— David Shulenberger

ROARMAP tracks these mandates and the directed repositories for each. Though other universities and departments have instituted mandates, the 2008 Harvard Law mandate is notable for having originated with the faculty:

“The Harvard Law School is committed to disseminating the fruits of its research and scholarship as widely as possible. In keeping with that commitment, the Faculty adopts the following policy: Each Faculty member grants to the President and Fellows of Harvard College permission to make his or her scholarly articles and to exercise the copyright in those articles. More specifically, each Faculty member grants to the President and Fellows a nonexclusive, irrevocable, worldwide license to exercise any and all rights under copyright relating to each of his or her scholarly articles, in any medium, and to authorize others to do the same, provided that the articles are not sold for a profit. The policy will apply to all scholarly articles authored or co-authored while the person is a member of the Faculty except for any articles completed before the adoption of this policy and any articles for which the Faculty member entered into an incompatible licensing or assignment agreement before the adoption of this policy. The Dean or the Dean’s designate will waive application of the policy to a particular article upon written request by a Faculty member explaining the need.”

Federal Input
Two recent bills dealt with open access: FRPAA, which would mandate OA for federally-funded research; and the Research Works Act (RWA), which would have prohibited such mandates. RWA (HR 3699) was withdrawn in late February of 2012, following Elsevier’s withdrawal of support. Its sponsors issued this statement:

“As the costs of publishing continue to be driven down by new technology, we will continue to see a growth in open-access publishers. This new and innovative model appears to be the wave of the future. … The American people deserve to have access to research for which they have paid. This conversation needs to continue, and we have come to the conclusion that the Research Works Act has exhausted the useful role it can play in the debate.”

FRPAA (HR 4004 and S 2096), on the other hand, is intended “to provide for Federal agencies to develop public access policies relating to research conducted by employees of that agency or from funds administered by that agency.” FRPAA would require any agencies with expenditures over $100 million annually to make manuscripts of the articles published from their funding public within six months of publication – FRPAA puts the burden/freedom on each agency to maintain an archive or draw on an existing archive (e.g., PMC). Each agency is free to develop their own policy as fits their needs (and perhaps their researchers’ needs). The bill also gives the agency a nonexclusive license to disseminate the work, with no other impact on copyright or patent rights. The bill also requires that the agency have a long-term preservation plan for such publications.

Copyright Tangles
How does copyright limit the effectiveness of mandates and other archiving? Less than the average law librarian might imagine. Except where an author’s publishing agreement specifies otherwise, the scholarly community generally agrees that an author holds copyright in his or her submitted manuscript. That copy, referred to as the pre-refereeing preprint, may generally be deposited in an Institutional repository such as the University of Illinois’ IDEALS, posted to an author’s/institution’s SSRN or BePress account, or to their own personal web page.

Ongoing Work
ARL/SPARC encourages universities to voice their approval and support of FRPAA. Researchers around the academy are beginning to show support as well: research has indicated that researchers would self-archive if they were 1) informed about the option, and 2) permitted by their copyright/licensing agreements with publishers to do so. With greater education about the benefits of Open Access for the institution as well as the scholarly community, authors could be encouraged to make better use of institutional and other archives.

In the legal academy, scholarly publishing is somewhat unusual. The preprint distribution culture is strong, and the main publishing outlets are run by the law schools – not by large, publicly-traded U.S. and foreign media corporations. Reprint permission requests are often handled by a member of the law school’s staff – or by a law student – and it’s unclear how much the journals know or care about republication or OA issues in general. But authors and their home institutions aren’t necessarily waiting around for answers; they’re archiving now, and taking down works later if asked. Carol Watson and James Donovan have written extensively about their experience with building and implementing an institutional repository at the University of Georgia, using the Berkeley Electronic Press Digital Commons software. See, e.g., Institutional Repositories: Essential Infrastructure for Scholarship in the Digital Age, Carol A. Watson, James M. Donovan, and Pamela Bluh; White Paper: Behind a Law School’s Decision to Implement an Institutional Repository, James M. Donovan and Carol A. Watson; and Implementing BePress’ Digital Commons Institutional Repository Solution: Two Views from the Trenches, James M. Donovan and Carol A. Watson.

Conclusion The bottom-line is, whether you’re an author or a librarian (or some other type of information/knowledge professional), you should be thinking about current and future access to the results of research — and the logistical/economical/political challenges — whether that research is happening in law or elsewhere in the academy.

Stephanie Davidson is Head of Public Services at the University of Illinois in Champaign. Her research addresses public services in the academic law library, and understanding scholarly research methods and behavior.

Surveying is Hard

Law librarians, law library assessment, Legal information behavior 4 Responses »

Nov 092009

Where the culture of assessment meets actual learning about users.

These days, anyone with a pulse can sign up for a free Surveymonkey account, ask users a set of questions and call it a survey. The software will tally the results and create attractive charts to send upstairs, reporting on anything you want to know about your users. The technology of running surveys is easy, certainly, but thinking about what the survey should accomplish and ensuring that it meets your needs is not. Finding accurate measures — of the effectiveness of instructional programs, the library’s overall service quality, or efficiency, or of how well we’re serving the law school’s mission — is still something that is very, very hard. But librarians like to know that programs are effective, and Deans, ranking bodies, and prospective students all want to be able to compare libraries, so the draw of survey tools is strong. The logistics are easy, so where are the problems with assessment?

Between user surveys and various external questionnaires, we gather a lot of data about law library stacks law libraries. Do they provide us with satisfactory methods of evaluating the quality of our libraries? Do they offer satisfactory methods for comparing and ranking libraries? The data we gather is rooted in an old model of the law library where collections could be measured in volumes, and that number was accepted as the basis for comparing library collections. We’ve now rejected that method of assessment, but struggle nevertheless for a more suitable yardstick. The culture of assessment from the broader library community has also entered law librarianship, bringing standardized service quality assessment tools. But despite these tools, and a lot of work on finding the right measurement of library quality, are we actually moving forward, or is some of this work holding us back from improvement? There are two types of measurement widely used to evaluate law libraries: assessments and surveys, which tend to be inward-looking, and the use of data such as budget figures and square footage, which can be used to compare and rank libraries. These are compared below, followed by an introduction to qualitative techniques for studying libraries and users.

(Self)Assessment

There are many tools available for conducting surveys of users, but the tool most familiar to law librarians is probably LibQUAL+®. Distributed as a package by ARL, LibQUAL+® is a “suite of services that libraries use to solicit, track, understand, and act upon users’ opinions of service quality.” The instrument itself is well-vetted, making it possible for libraries to run it without any pre-testing.

The goal is straightforward: to help librarians assess the quality of library services by asking patrons what they think. So, in “22 items and a box,” users can report on whether the library is doing things they expect, and whether the librarians are helpful. LibQUAL+® aligns with the popular “culture of assessment” in libraries, helping administrators to support regular assessment of the quality of their services. Though LibQUAL+® can help libraries assess user satisfaction with what they’re currently doing, it’s important to note that the survey results don’t tell a library what they’re not doing (and/or should be doing). It doesn’t identify gaps in service, or capture opinions on the library’s relevance to users’ work. And as others have noted, such surveys focus entirely on patron satisfaction, which is contextual and constantly shifting. Users with low expectations will be satisfied under very different conditions that users with higher expectations, and the standard instrument can’t fully account for that.

Ranking Statistics

The more visible or external data gathering for law libraries occurs annually, when libraries answer questionnaires from their accrediting bodies. The focus of these instruments is on numbers: quantitative data that can be used to rate and rank law libraries. The ABA’s annual questionnaire counts both space and money. Site visits every seven years add detail and richness to the picture of the institution and provide additional criteria for assessment against the ABA’s standards, but the annually reported data is primarily quantitative. The ABA also asks which methods libraries use “to survey student and faculty satisfaction of library services”, but they don’t gather the results of those surveys.

The ALL-SIS Statistics Committee has been working on developing better measures for the quality of libraries, leading discussions on the AALLNet list (requires login) and inviting input from the wider law librarian community, but this is difficult work, and so far few Big Ideas have emerged. One proposal suggested reporting, via the ALL-SIS supplemental form, responses from students, faculty, and staff regarding how the library’s services, collections and databases contribute to scholarship and teaching/learning, and how the library’s space contributes to their work. This is promising, but it would require more work to build rich qualitative data.

Another major external data gathering initiative is coordinated by the ARL itself, which collects data on law libraries as part of their general data collection for ARL-member (University) libraries. ARL statistics are similarly heavy on numbers, though: their questionnaire counts volumes (dropped just this year from the ABA questionnaire) and current serials, as well as money spent.

Surveys ≠ Innovation

When assessing the quality of libraries, two options for measurement dominate: user satisfaction, and collection size (using dollars spent, volumes, space allocated, or a combination of those). Both present problems: the former is simply insufficient as the sole measure of library quality, and is not useful for comparing libraries, and the latter ignores fundamental differences between the collection development and access issues of different libraries, making the supposedly comparable figures nearly meaningless. A library that is part of a larger university campus will likely have a long list of resources paid for by the main library, and a stand-alone law school won’t. Trying to use the budget figures for these two libraries to compare the size of the collection or the quality of the library would be like comparing apples and apple-shaped things. There’s also something limiting about rating libraries primarily based on their size; is the size of the collection, or the money spent on the collection, the strongest indicator of quality? The Yankees don’t win the World Series every year, after all, despite monetary advantages.

The field of qualitative research (a.k.a. naturalistic or ethnographic research) could offer some hope. The techniques of naturalistic inquiry have deep roots in the social sciences, but have not yet gained a stronghold in library and information science. The use of naturalistic techniques could be particularly useful for understanding the diverse community of law library users. While not necessarily applicable as a means for rating or ranking libraries, the techniques could lead to a greater understanding of users of law libraries and their needs, and help libraries to develop measures that directly address the match between library and users’ needs.

How many of us have learned things about a library simply by having lunch with students, or chatting with faculty at a college event, or visiting another library? Participants in ABA Site Visits, for instance, get to know an institution in a way that numbers and reports can’t convey. Naturalistic techniques formalize the process of getting to know users, their culture and work, and the way that they use the library. Qualitative research could help librarians to see past habits and assumptions, teaching us about what our users do and what they need. Could those discoveries also shape our definition of service quality, and lead to better measures of quality?

In 2007, librarians at the University of Rochester River Campus conducted an ethnographic study with the help of their resident Lead Anthropologist (!). (The Danes did something similar a few years ago, coordinated through DEFF , the Danish libraries’ group.) The Rochester researchers asked — what do students really do when they write papers? The librarians had set goals to do more, reach more students, and better support the University’s educational mission. Through a variety of techniques, including short surveys, photo diaries, and charrette-style workshops, the librarians learned a great deal about how students work, how their work is integrated into their other life activities, and how students view the library. Some results led to immediate pilot programs: a late-night librarian program during crunch times, for instance. But equally important to the researchers was understanding the students’ perspective on space design and layout in preparation for a reading room renovation.

Concerns about how libraries will manage the increased responsibilities that may accrue from such studies are premature. Our service planning should take into account the priorities of our users. Perhaps some longstanding library services just aren’t that important to our users, after all. Carl Yirka recently challenged librarians on the assumption that everything we currently do is still necessary — and so far, few have risen to the challenge. Some of the things that librarians place value on are not ours to value; our patrons decide whether Saturday reference, instructional sessions on using the wireless internet, and routing of print journals are valuable services. Many services provided by librarians are valuable because they’re part of our responsibility as professionals: to select high-quality information, to organize and maintain it, and to help users find what they need. But the specific ways we do that may always be shifting. Having the Federal Reporter in your on-site print collection is not, in and of itself, a valuable thing, or an indicator of the strength of your collection.

“Measuring more is easy; measuring better is hard.”
Charles Handy (from Joseph R. Matthews, Strategic Planning and Management for Library Managers (2005))

Thinking is Hard

upside down toddler

Where does this leave us? The possibilities for survey research may be great, and the tools facile, but the discussion is still very difficult. At the ALL-SIS-sponsored “Academic Law Library of 2015” workshop this past July, one small group addressed the question of what users would miss if the library didn’t do what we currently do. If functions like purchasing and management of space were absorbed by other units on campus or in the college, what would be lost? Despite the experience of the group, it was a very challenging question. There were a few concrete ideas that the group could agree were unique values contributed by law librarians, including the following:

Assessment of the integrity of legal information
Evaluation of technologies and resources
Maintaining an eye on the big picture/long term life of information

The exercise was troubling, particularly in light of statements throughout the day by the many attendees who insisted on the necessity of existing services, while unable to articulate the unique value of librarians and libraries to the institution. The Yirka question (and follow-up) was a suggestion to release some tasks in order to absorb new ones, but we ought to be open to the possibility that we need a shift in the kind of services we provide, in addition to balancing the workload. As a professional community, we’re still short on wild fantasies of the library of the future, and our users may be more than happy to help supply some of their own.

Doing Qualitative work

Could good qualitative research move the ball forward? Though good research is time-consuming, it could help us to answer fundamental questions about how patrons use legal information services, how they use the library, and why they do or don’t use the library for their work. Qualitative research could also explore patron expectations in greater detail than quantitative studies like LibQual+, following up on how the library compares to other physical spaces and other sources of legal information that patrons use.

It’s important that librarians tap into resources on campus to support survey research, though, whether qualitative or quantitative. When possible, librarians should use previously vetted instruments, pretested for validity and reliability. This may be a great opportunity for AALL, working with researchers in library and information science to build a survey instrument that could be used by academic law libraries.

VoxPopuLII is edited by Judith Pratt

Suffusion theme by Sayontan Sinha

VoxPopuLII