skip navigation
search
US Law Code

c.c. BY-SA 3.0. wikipedia.org

If you think that law isn't written for lawyers, try reading some.  It can even start looking normal after a while (say about the length of time it takes to get through law degree).  But research on the main street impact of legal language suggests that for most people, the law is likely to be either incomprehensible or very hard to read.

This problem is a focus of a research project which a team of us at ANU and Cornell LII have been addressing over the past months (Eric McCreath (Australian National University, Research School of Computer Science), Wayne Weibel (Cornell University Law School, Legal Information Insitute), Nic Ceynowa (LII), Sara Frug (LII), Tom Bruce (LII) and myself (ANU)).  With the generous help of thousands of LII users, as part of a citizen science project, we've been collecting data on the readability of law as well as demographic data about the users of law.

If you are concerned about access to law, and many are, the current situation is not really good enough.  Whether you tend to 'human rights', 'democratic values', 'economic efficiency', 'rule of law' or are just wanting to make sure your hapless minions follow your every command, you'll be able to think of a good reason why the law should be more accessible (readable) than it is.

Of course the problem has been around for a very long time, and plain language is a standing goal of many legislative drafting offices.  Reform efforts have been underway since the middle ages.  Certainly legal language has improved considerably, particularly as a result of 19th and 20th century reforms with that goal in mind.  Still, the law can't be said to be readily accessible to the general public, in the sense of its readability.

What has changed that makes the problem more urgent today is that the general public can now at least get to the law.  That's the revolution that's been achieved by online publishers of the law, including the Free Access to Law Movement and official and commercial law publishers.  As the UK's First Parliamentary Counsel observed last year:

Legislation affects us all. And increasingly, legislation is being searched for, read and used by a broad range of people. It is no longer confined to professional libraries; websites like legislation.gov.uk have made it accessible to everyone. So the digital age has made it easier for people to find the law of the land; but once they have found it, they may be baffled. The law is regarded by its users as intricate and intimidating.

In 2010, the Plain Writing Act was adopted by the US Congress with the aim of improving government writing. Sad to say, the Act itself is no model of plain language. Section, sub-section and paragraph roll on, line after line, provision after convoluted provision. In substance they say not much more than: write clearly so that the public can understand and use what you write.  Didn't anyone see the irony?  Then again, reality check, most legislation is never read by the people who vote to make it law. Just to make sure the drawbridge was well and truly up, if you read through to the fine print at the end there is an important rider.  What happens if no one can understand what the law is supposed to mean? Well, nothing a judge can do about it.  Great aspiration, but ...

A sea change could be on the way, though. The Good Law initiative is one great example of efforts to address the complexity and readability of legislation. What is significant is that how we are thinking about legal rules is changing.  Official publishers of the law are beginning to talk about the law as if it's data.  The UK National Archives Office has even published an API -- Application Programmers Interface (basically a 'how to' for developers who want to use the "data").So now we're thinking of law as data.  And we're going to unleash computer scientists on it, to do whatever their imaginations can come up with. Bommarito and Katz' work on the legal code as a mathematical network is a great example of the virtually infinite possibilities.

Our own research uses the potential of computational technologies in another way. Online legal sites are not just 'documents'.  They are places where people are actively interacting with the law. We used crowd-sourcing to engage with this audience, asking them to rate law on readability characteristics as well as exploring the demographics of who uses the law. Our aim was to develop a labelled dataset that could be used as input to machine learning. "Labelled data" is machine learning gold -- hard to get, but if you can you get it, you can use it to make predictions about what human judges would say. In our case we are trying to predict whether a legal sentence will be readable or not.

In the process we learned quite a bit about the audience using the law, and about which law they use. Scouring the Google Analytics data, it became obvious that the law is not equally read. We may all be equal before the law, but the law is not equal before us. Just 37 sections of the US Code account for almost 10% of the page visits to US code pages (there are about 65,000). So a tiny fraction of the Code is being read all the time.  On the other hand there are huge swathes of the Code that hardly ever see the light of a back illuminated screen. This is not trivial news. Computer scientists love lists. Prioritised lists get their own special lectures for first year CS students -- and here we have a prioritized list. You want to know what law is at the top of your priority list -- the users will tell you. If you're concerned with cleaning up the law code or making it easier to understand, there's useful stuff here.

Ranking of sections by frequency of readership (on a logarithmic scale)

Ranking of sections by frequency of readership (on a logarithmic scale)

It will be no surprise that we found that law is harder for just about every part of the community than legal professionals.  What was surprising was that legal professionals (including law students), turn out to be a minority of those interested enough to respond, on the LII site at least.

These were just a few of the demographic insights we were able to draw.

On the machine learning front, we were able to show that machine learning can improve on traditional readability metrics  in predicting language difficulty (they've long been regarded as suspect in application to legal texts anyway). That said, it's early days and we would like to extend the research we have done so far. There is a lot of potential for future research applying computational techniques to the readability of law.  A co-authored publication further describing the research introduced in this article will be presented at this year's Law Via the Internet Conference being held at the end of September.

But while we're thinking about it, there are other ways to think about `access' to law.  What if instead of writing the law, it was visualized?  You know -- like in pictures.  Before you storm off in contempt, note this: research is validating that pictures can improve user experience -- for example in the contract space, where what your clients think of your contract can impact on your bottom line.

It's radical enough unleashing computer scientists on legal rules. What might the law look like if we try thinking like designers?   'User experience' of legal rules? That one didn't come up in law school.  We're in some surreally different world at this point. Designers create artefacts for people to use which are optimised for functionality, beauty and other characteristics –- not things that are meant to tell people what to do. 'User experience' is their kind of thinking.

As readers of Vox Pop will know, the idea of legal design is starting to get traction. Helena Haapio and Stefania Passera's great article on legal design covers some of the field. An article they jointly published last year points out some of the benefits of visualization. Earlier this year, we worked on a joint paper exploring the feasibility of automating legal visualization. We were able to demonstrate the automation of visualization of clauses, such as a contract term clause, a liquidated damages clause or a payment clause. Visit our proof of concept site, where you can play with visualizing different options.

OK. So perhaps some of the above reads like we're on the up-slope of the hype curve. But that of course is the fun. For those of us who've spent many years in the law, looking at the law from a different professional paradigm can help us see things that didn't stand out before. It certainly enjoyable and brings a breath of fresh air to the law.

Michael CurtottiMichael Curtotti is undertaking a PhD in the Research School of Computer Science at the Australian National University.  His co-authored publications on legal informatics include: A Right to Access Implies a Right to Know:  An Open Online Platform for Readability ResearchEnhancing the Visualization of Law and A corpus of Australian contract language: description, profiling and analysis.  He holds a Bachelor of Laws and a Bachelor of Commerce from the University of New South Wales, and a Masters of International Law from the Australian National University.  He works part-time as a legal adviser to the ANU Students Association and the ANU Post-graduate & research students Association, providing free legal services to ANU students.

---------------------------

Other related posts on VoxPopuLII on this topic include Law in the Last-Mile: The Potential of Mobile Integration into Legal Services by Sean Martin McDonald, Incomprehension Compounded by Mistranslation – The Imperatives of Access to Legal Information in South Africa by Eve Gray and Accessible Law by Nick Holmes

VoxPopuLII is edited by Judith Pratt. Editors-in-Chief are Stephanie Davidson and Christine Kirchberger, to whom queries should be directed.

Take a look at your bundle of tags on Delicious. Would you ever believe you're going to change the law with a handful of them?

You're going to change the way you research the law. The way you apply it. The way you teach it and, in doing so, shape the minds of future lawyers.

Do you think I'm going too far? Maybe.

But don't overlook the way taxonomies have changed the law and shaped lawyers’ minds so far. Taxonomies? Yeah, taxonomies.

We, the lawyers, have used extensively taxonomies through the years; Civil lawyers in particular have shown to be particularly prone to them. We’ve used taxonomies for three reasons: to help legal research, to help memorization and teaching, and to apply the law.

 

Taxonomies help legal research.

2959826262_9b724b5a72First, taxonomies help us retrieve what we’ve stored (rules and case law).

Are you looking for a rule about a sales contract? Dive deep into the "Obligations" category and the corresponding book (Recht der Schuldverhältnisse, Obbligazioni, Des contrats ou des obligations conventionnelles en général, you name it ).

If you are a Common Lawyer, and ignore the perverse pleasure of browsing through Civil Code taxonomy, you'll probably know Westlaw's classification and its key numbering system. It has much more concrete categories and therefore much longer lists than the Civilians' classification.
Legal taxonomies are there to help users find the content they're looking for.

However, taxonomies sometimes don't reflect the way the users reason; when this happens, you just won't find what you're looking for.

The problem with legal taxonomies.

If you are a German lawyer, you'll probably be searching the “Obligations” book for rules concerning marriage; indeed in the German lawyer’s frame of mind, marriage is a peculiar form of contract. But if you are Italian, like I am, then you will most probably start looking in the "Persons" book; marriage rules are simply there, and we have been taught that marriage is not a contract but an agreement with no economic content (we have been trained to overlook the patrimonial shade in deference to the sentimental one).

So if I, the Italian, look for rules about marriage in the German civil code, I won't find anything in the “Persons” book.
In other words, taxonomies work when they're used by someone who reasons like the creator or–-and this happens with lawyers inside a certain legal system–-when users are trained to use the same taxonomy, and lawyers are trained at length.

But let's take my friend Tim; he doesn't have a legal education. He's navigating Westlaw's key number system looking for some relevant case law on car crashes. By chance he knows he should look below “torts,” but where? Is this injury and damage from act (k439)? Is this injury to a person in general (k425)? Is this injury to property or right of property in general (k429)? Wait, should he look below “crimes” (he is unclear on the distinction between torts and crimes)? And so on. Do these questions sound silly to you, the lawyers? Consider this: the titles we mentioned give no hint of the content, unless you already know what's in there.

Because Law, complex as it is, needs a map. Lawyers have been trained to use the map. But what about non-lawyers?

In other words, the problems with legal taxonomies occur when the creators and the users don't share the same frame of mind. And this is most likely to happen when the creators of the taxonomy are lawyers and the users are not lawyers.
Daniel Dabney wrote something similar some time ago. Let's imagine that I buy a dog, take the little pooch home and find out that it's mangy. Let's imagine I'm that kind of aggressively unsatisfied customer and want to sue the seller, but know nothing about law. I go to the library and what will I look for? Rules on dogs sale? A book on Dog's law? I'm lucky, there's one, actually: “Dog law”, a book that gathers all laws regarding dogs and dogs owners.
But of course, that's just luck, and  if I had to browse through legal category in the Westlaw's index, I would never have found anything regarding “dogs”. I will never find the word “dog”, which is nonetheless the first word a non-legal trained person would think of. A savvy lawyer would look for rules regarding sales and warranties: general categories I may not know of (or think of) if I'm not a lawyer. If I'm not a lawyer I may not know that "the sale of arguably defective dogs are to be governed by the same rules that apply to other arguably defective items, like leaky fountain pens”. Dogs are like pens for a lawyer, but they are just dogs for a dogs-owner: so a dogs owner will look for rules about dogs, not rules about sales and warranties (or at least he would look for sale of dogs). And dog law, a  user aimed, object oriented category would probably fits his needs.

Observation #1: To make legal content available to everyone we must change the information architecture through which legal information are presented.

Will folksonomies make a better job?
Let's come to folksonomies now. Here, the mismatch between creators (lawyers) and users' way of reasoning is less likely to occur. The very same users decide which category to create and what to put into it. Moreover, more tags can overlap; that is, the same object can be tagged more than once. This allows the user to consider the same object from different perspectives. Take Delicious. If you search for "Intellectual property" on the Delicious search engine, you find a page about Copyright definition on Wikipedia. It was tagged mainly with "copyright." But many users also tagged it with "wikipedia," "law" and "intellectual-property" and even "art". Maybe it was the non-lawyers out there who found it more useful to tag it with the "law" tag (a lawyer’s tag would have been more specific); maybe it was the lawyers who massively tagged it with "art" (there are a few "art" tags in their libraries). Or was it the other way around? The thing is, it's up to users to decide where to classify it.

People also tag laws on Delicious using different labels that may or may not be related to law, because Delicious is a general-use website. But instead, let’s take a crowdsourced legal content website like Docracy. Here, people upload and tag their contracts, so it’s only legal content, and they tag them using only legal categories.

On Docracy, I found out that a whole category of documents that was dedicated to Terms of Service. Terms of Service is not a traditional legal category—-like torts, property, and contracts—-but it was a particularly useful category for Docracy users.

Docracy: WordPress Terms of Service are tagged with "TOS" but also with "Website".

Docracy: WordPress Terms of Service are tagged with "TOS" but also with "Website".

If I browse some more, I see that the WordPress TOS are also tagged with "website." Right, it makes sense; that is, if I'm a web designer looking for the legal stuff I need to know before deploying my website. If I start looking just from "website," I'll find TOS, but also "contract of works for web design" or "standard agreements for design services" from AIGA.

You got it? What legal folksonomies bring us is:

  1. User-centered categories
  2. Flexible categorization systems. Many items can be tagged more than once and so be put into different categories. Legal stuff can be retrieved through different routes but also considered under different lights.

Will this enhance findability? I think it will, especially if the users are non-lawyers. And services that target the low-end of the legal market usually target non-lawyers.

Alright, I know what you're thinking. You're thinking, oh no, again another naive folksonomy supporter! And then you say: "Folksonomie structures are too flat to constitute something useful for legal research!" and "Law is too a specific sector with highly technical vocabulary and structure. Non-legal trained users would just tag wrongly".

Let me quickly address these issues.

Objection 1: Folksonomies are too flat to constitute something  useful for legal research

Let's start from a premise: we have no studies on legal folksonomies yet. Docracy is not a full folksonomy yet ( users can tag but tags are pre-determined by administrators). But we do have examples of folksonomies tout court, so my argument moves analogically from them. Folksonomies do work. Take  the Library of Congress Flickr project. Like an old grandmother, the Library gathered thousands of pictures that no-one ever had the time to review and categorize.  So pictures were uploaded on Flickr and left for the users to tag and comment. They did it en masse, mostly by using descriptive or topical tags (non-subjective) that were useful for retrieval. If folksonomies work for pictures (Flickr), books (Goodreads), questions and answers (Quora), basically everything else (Delicious), why shouldn't they work for law? Given that premise, let's move to first objection: folksonomies are flat. Wrong. As folksonomies evolve, we find out that they can have two, three and even more levels of categories. Take a look at the Quora hierarchy.

That's not flat. Look, there are at least four levels in the screenshot: Classical Musicians & Composers > Pianists > Jazz Pianists > Ray Charles > What'd I Say. Right, Jazz pianists are not classical musicians: but mistakes do occur and the good point in folksonomies is that users can freely correct them.

Second point: findability doesn't depend only on hierarchies. You can browse the folksonomy's categories but you can also use free text search to dig into it.  In this case, users' tags are metadata and so findability is enhanced because the search engine retrieves what users have tagged--not what admins have tagged.

 

Objection 2: Non-legal people will use the wrong tags

Uhm, yes, you're right. They will tag a criminal law document with “tort” and a tort case involving a car accident with “car crash”. And so? Who cares? What if the majority of users find it useful? We forget too often that law is a social phenomenon, not a tool for technicians. And language is a social phenomenon too. If users consistently tag a legal document with the "wrong" tag X instead of the "right" tag Y, it means that they usually name that legal document with X. So most of them, when looking for that document, will look for X. And they'll retrieve it, and be happy with that.

Of course, legal-savvy people would like to search by typical legal words (like, maybe, “chattel”?) or by using the legal categories they know so well.  Do we want to compromise? The fact is, in a system where there is only user-generated content, it goes without saying that a traditional top-down taxonomy would not work. But if we have to imagine a system where content is not user-generated, like a legal or case law database, that could happen. There could be, for instance, a mixed taxonomy-folksonomy system where taxonomy is built with traditional legal terms and scheme, whereas folksonomy is built by the users who are free to tag. Search in the end, can be done by browsing the taxonomy, by browsing the folksonomy or by means of a search engine which fishes on content relying both on metadata chosen by system administrators and on metadata chosen by the users who tagged the content.

This may seem like an imaginary system--but it's happening already. Amazon uses traditional categories and leave the users free to tag. The BBC website followed a similar pattern, moving from full taxonomy system to a hybrid taxonomy-folksonomy one. Resilience, resilience, as Andrea Resmini and Luca Rosati put it in their seminal book on information architecture. Folksonomies and taxonomies can coexist. But this is not what this article is about, so sorry for the digression and let's move to the first prediction.

Prediction #1: Folksonomies will provide the right information architecture for non-legal users.

Taxonomies and folksonomies help legal teaching.

7797310218_8d42f4743bSecondly, taxonomies help us memorize rules and case law. Put all the things in a box and group them on the basis of a common feature, and you'll easily remember where they are. For this reason, taxonomies have played a major role in legal teaching. I’ll tell you a little story. Civil lawyers know very well the story of Gaius, the ancient Roman jurist who created a successful taxonomy for his law handbook, the Institutiones. His taxonomy was threefold: all law can be divided into persons, things, and actions. Five centuries later (five centuries!) Emperor Justinian transferred the very same taxonomy into his own Institutiones, a handbook aimed at youth "craving for legal knowledge" (cupida legum iuventes). Why? Because it worked! How powerful, both the slogan and the taxonomy! Indeed more than 1000 years later, we found it again, with a few changes, in German, French, Italian, and Spanish Civil Codes and that, in a whole bunch of nutshells, explains private law following the taxonomy of the Codes.

And now, consider what the taxonomies have done to lawyers’ minds.

Taxonomies have shaped their way of considering facts. Think. Put something into a category and you will lose all the other points of view on the same thing. The category shapes and limits our way to look at that particular thing.

Have you ever noticed how civil lawyers and common lawyers have a totally different way of looking at facts? Common lawyers see and take into account the details. Civil lawyers overlook them because the taxonomy they use has told them to do so.

In Rylands vs Fletcher (a UK tort case) some water escapes from a reservoir and floods a mine nearby. The owner of the reservoir could not possibly foresee the event and prevent it. However, the House of Lords states that the owner of the mine has the right to recover damages, even if there is no negligence. ("The person who for his own purpose brings on his lands and collects and keeps there anything likely to do mischief, if it escapes, must keep it in at his peril, and if he does not do so, is prima facie answerable for all the damage which is the natural consequence of its escape.")

In Read vs Lyons, however, an employee gets injured during an explosion occurring in the ammunition factory where she is employed. The rule set in Rylands couldn't be applied, as, according to the House of Lords, the case was very different; there is no escape.

On the contrary, for a Civil lawyer the decision would have been the same in both cases. For instance, under Italian Civil Code (but French and German Codes are not substantially different on this point), one would apply the general rule that grants reward for damages caused by "dangerous activities" and requires no proof of negligence on the plaintiff (art.2050 of the Civil Code), no matter what causes the danger (a big reservoir of water, an ammunition factory, whatever else).

Observation#2: taxonomies are useful for legal teaching and they shape lawyers minds.

Folksonomies for legal teaching?

Okay, and what about folksonomies? What if the way people tag legal concepts makes its way into legal teaching?

Take the Docracy's TOS category—have you ever thought about a course on TOS?

Another website, another example: Rocket Lawyer. Its categorization is not based on folksonomy, however; it's purposely built around a user’s needs, which have been tested over the years, so in a way the taxonomy of the website comes from its users. One category is "identity theft", which should be quite popular if it is prompted on the first page. What about teaching a course on identity theft? That would merge some material traditionally taught in privacy law, criminal law, and torts courses. Some course areas would overlap, which is good for memorization. Think again to the example of “Dog Law” by Dabney. What about a course about Dog Law, collecting material that refers to dogs across traditional legal categories?

Also, the same topic would be considered from different points of view.

What if students were trained to the specifications of the above-mentioned flexibility of categories? They wouldn’t get trapped into a single way of seeing things. If folksonomies account for different levels of abstractions, they would be trained to consider details. Not only that,  they would develop a very flexible frame of mind.

Prediction #2: legal folksonomies in legal teaching would keep lawyers’ minds flexible.

 

Taxonomies and folksonomies SHAPE the law.

Third, taxonomies make the law apply differently. Think about it. They are the very highways that allow the law to travel down to us. And here it comes, the real revolutionary potential of legal folksonomies, if we were to make them work.

Let's start from taxonomies, with a couple of examples.

Civil lawyers are taught that Public and Private Law are two distinctive areas of law, to which different rules apply. In common law, the distinction is not that clear-cut. In Rigby vs Chief Constable of Northamptonshire  (a tort case from UK case law) the police—in an attempt to catch a criminal—damage a private shop by accidentally firing a canister of gas and setting the shop ablaze. The Queen's Bench Division establishes that the police are liable under the tort of negligence only because the plaintiff manages to prove the police’s fault; they apply a private law category to a public body.
How would the same case have been decided under, say, French law? As the division between public and private law is stricter, the category of liability without fault, which is traditionally used when damages are caused by public bodies, would apply. The State would have to indemnify the damage, no matter if there was negligence.

Remember Rylands vs Fletcher and Lyons vs Read? The presence of escape/no escape was determinant, because the English taxonomy is very concrete. Civil lawyers work with taxonomies that have fewer, larger, and more abstract categories. If you cause damages by performing a risky activity, even if conducted without fault, you have to repay them. Period. Abstract taxonomy sweeps out any concrete detail. I think that Robert Berring had something like this in mind--although he referred to legal research--when he said  that “classification  defines the world of thinkable thoughts”. Or, as Dabney puts it, “thoughts that aren't represented in the system had become unthinkable”.
So taxonomies make the law apply differently. In the former case, by setting a boundary between the public-private spheres; in the latter by creating a different framework for the application of more abstract or more detailed rules.

 

You don't get it? All right, it's tough, but do you have two minutes more? Let's take this example by Dabney. Key number system's taxonomy distinguishes between Navigable and Non-navigable waters (in the screenshot: waters and water courses). There's a reason for that: lands under navigable waters presumptively belongs to the state, because “private ownership of the land under navigable waters would (…) compromise the use of those waters for navigation ad commerce”. So there are two categories because different laws apply to each. But now look at this screenshot.avulsion

Find anything strange? Yes:  avulsion rules are “doubled”: they are contained in both categories. But they are the very same: rules concerning avulsion don't change if the water is navigable or not (check avulsion definition if you, like me, don't remember what it is ). Dabney: “In this context,(...) there is no difference in the legal rules that are applied that depend on whether or not the water is navigable. Navigability has an effect on a wide range of issues concerning waters, but not on the accretion/avulsion issue. Here, the organization of the system needlessly separates cases from each other on the basis of an irrelevant criterion”. And you think, ok, but as long as we are aware of this error and know the rules concerning avulsion are the same, it's not biggie. Right, but in the future?

“If searchers, over time, find cases involving navigable waters in one place and non-navigable waters in another, there might develop two distinct bodies of law.” Got it? Dabney foresees it. The way we categorize the law would shape the way we apply it.

Observation #3 Different taxonomies entail different ways to apply the law.

So, what if we substitute taxonomies with folksonomies?

And what if they had the power to shape the way judges, legal scholars, lawmakers and legal operators think?

Legal folksonomies are just starting out, and what I envisage is still yet to come. Which makes this article kind of a visionary one, I admit.

However, what Docracy is teaching us is that users—I didn't say lawyers, but users—are generating decent legal content. Would you have bet your two cents on this, say, five years ago?
What if users started generating new legal categories (legal folksonomies?)

Berring wrote something really visionary more than ten years ago in his beautiful "Legal Research and the World of Thinkable Thoughts". He couldn't have folksonomies in mind, and still, wouldn't you think he referred to them when writing: "There is simply too much stuff to sort through. No one can write a comprehensive treatise any more, and no one can read all of the new cases. Machines are sorting for us. We need a new set of thinkable thoughts.  We need a new Blackstone. We need someone, or more likely a group of someones, who can reconceptualize the structure of legal information."?

Prediction #3 Legal folksonomies will make the law apply differently.

Let's wait and see. Let the users tag. Where this tagging is going to take us is unpredictable, yes, but if you look at where taxonomies have taken us for all these years, you may find a clue.

I have a gut feeling that folksonomies are going to change the way we search, teach, and apply the law.

IMG_1221

 

 

Serena Manzoli is a legal architect and the founder at Wildcat, legal search for curious humans. She has been a Euro bureaucrat, a cadet, an in-house counsel, a bored lawyer. She holds an LLM from University of Bologna. She blogs at Lawyers are boring.  Twitter: SquareLaw

Prosumption: shifting the barriers between information producers and consumers

One of the major revolutions of the Internet era has been the shifting of the frontiers between producers and consumers [1]. Prosumption refers to the emergence of a new category of actors who not only consume but also contribute to content creation and sharing. Under the umbrella of Web 2.0, many sites indeed enable users to share multimedia content, data, experiences [2], views and opinions on different issues, and even to act cooperatively to solve global problems [3]. Web 2.0 has become a fertile terrain for the proliferation of valuable user data enabling user profiling, opinion mining, trend and crisis detection, and collective problem solving [4].

The private sector has long understood the potentialities of user data and has used them for analysing customer preferences and satisfaction, for finding sales opportunities, for developing marketing strategies, and as a driver for innovation. Recently, corporations have relied on Web platforms for gathering new ideas from clients on the improvement or the development of new products and services (see for instance Dell’s Ideastorm; salesforce’s IdeaExchange; and My Starbucks Idea). Similarly, Lego’s Mindstorms encourages users to share online their projects on the creation of robots, by which the design becomes public knowledge and can be freely reused by Lego (and anyone else), as indicated by the Terms of Service. Furthermore, companies have been recently mining social network data to foresee future action of the Occupy Wall Street movement.

Even scientists have caught up and adopted collaborative methods that enable the participation of laymen in scientific projects [5].

Now, how far has government gone in taking up this opportunity?

Some recent initiatives indicate that the public sector is aware of the potential of the “wisdom of crowds.” In the domain of public health, MedWatcher is a mobile application that allows the general public to submit information about any experienced drug side effects directly to the US Food and Drug Administration. In other cases, governments have asked for general input and ideas from citizens, such as the brainstorming session organized by Obama government, the wiki launched by the New Zealand Police to get suggestions from citizens for the drafting of a new policing act to be presented to the parliament, or the Website of the Department of Transport and Main Roads of the State of Queensland, which encourages citizens to share their stories related to road tragedies.

Even in so crucial a task as the drafting of a constitution, government has relied on citizens’ input through crowdsourcing [6]. And more recently several other initiatives have fostered crowdsourcing for constitutional reform in Morocco and in Egypt .

It is thus undeniable that we are witnessing an accelerated redefinition of the frontiers between experts and non-experts, scientists and non-scientists, doctors and patients, public officers and citizens, professional journalists and street reporters. The 'Net has provided the infrastructure and the platforms for enabling collaborative work. Network connection is hardly a problem anymore. The problem is data analysis.

In other words: how to make sense of the flood of data produced and distributed by heterogeneous users? And more importantly, how to make sense of user-generated data in the light of more institutional sets of data (e.g., scientific, medical, legal)? The efficient use of crowdsourced data in public decision making requires building an informational flow between user experiences and institutional datasets.

Similarly, enhancing user access to public data has to do with matching user case descriptions with institutional data repositories (“What are my rights and obligations in this case?”; “Which public office can help me”?; “What is the delay in the resolution of my case?"; “How many cases like mine have there been in this area in the last month?”).

From the point of view of data processing, we are clearly facing a problem of semantic mapping and data structuring. The challenge is thus to overcome the flood of isolated information while avoiding excessive management costs. There is still a long way to go before tools for content aggregation and semantic mapping are generally available. This is why private firms and governments still mostly rely on the manual processing of user input.

The new producers of legally relevant content: a taxonomy

Before digging deeper into the challenges of efficiently managing crowdsourced data, let us take a closer look at the types of user-generated data flowing through the Internet that have some kind of legal or institutional flavour.

One type of user data emerges spontaneously from citizens' online activity, and can take the form of:

  • citizens' forums
  • platforms gathering open public data and comments over them (see for instance data-publica )
  • legal expert blogs (blawgs)
  • or the journalistic coverage of the legal system.

User data can as well be prompted by institutions as a result of participatory governance initiatives, such as:

  • crowdsourcing (targeting a specific issue or proposal by government as an open brainstorming session)
  • comments and questions addressed by citizens to institutions through institutional Websites or through e-mail contact.

This variety of media supports and knowledge producers gives rise to a plurality of textual genres, semantically rich but difficult to manage given their heterogeneity and quick evolution.

Managing crowdsourcing

The goal of crowdsourcing in an institutional context is to extract and aggregate content relevant for the management of public issues and for public decision making. Knowledge management strategies vary considerably depending on the ways in which user data have been generated. We can think of three possible strategies for managing the flood of user data:

Pre-structuring: prompting the citizen narrative in a strategic way

A possible solution is to elicit user input in a structured way; that is to say, to impose some constraints on user input. This is the solution adopted by IdeaScale, a software application that was used by the Open Government Dialogue initiative of the Obama Administration. In IdeaScale, users are asked to check whether their idea has already been covered by other users, and alternatively to add a new idea. They are also invited to vote for the best ideas, so that it is the community itself that rates and thus indirectly filters the users’ input.

The MIT Deliberatorium, a technology aimed at supporting large-scale online deliberation, follows a similar strategy. Users are expected to follow a series of rules to enable the correct creation of a knowledge map of the discussion. Each post should be limited to a single idea, it should not be redundant, and it should be linked to a suitable part of the knowledge map. Furthermore, posts are validated by moderators, who should ensure that new posts follow the rules of the system. Other systems that implement the same idea are featurelist and Debategraph [7].

While these systems enhance the creation and visualization of structured argument maps and promote community engagement through rating systems, they present a series of limitations. The most important of these is the fact that human intervention is needed to manually check the correct structure of the posts. Semantic technologies can play an important role in bridging this gap.

Semantic analysis through ontologies and terminologies

Ontology-driven analysis of user-generated text implies finding a way to bridge Semantic Web data structures, such as formal ontologies expressed in RDF or OWL, with unstructured implicit ontologies emerging from user-generated content. Sometimes these emergent lightweight ontologies take the form of unstructured lists of terms used for tagging online content by users. Accordingly, some works have dealt with this issue, especially in the field of social tagging of Web resources in online communities. More concretely, different works have proposed models for making compatible the so-called top-down metadata structures (ontologies) with bottom-up tagging mechanisms (folksonomies).

The possibilities range from transforming folksonomies into lightly formalized semantic resources (Lux and Dsinger, 2007; Mika, 2005) to mapping folksonomy tags to the concepts and the instances of available formal ontologies (Specia and Motta, 2007; Passant, 2007). As the basis of these works we find the notion of emergent semantics (Mika, 2005), which questions the autonomy of engineered ontologies and emphasizes the value of meaning emerging from distributed communities working collaboratively through the Web.

We have recently worked on several case studies in which we have proposed a mapping between legal and lay terminologies. We followed the approach proposed by Passant (2007) and enriched the available ontologies with the terminology appearing in lay corpora. For this purpose, OWL classes were complemented with a has_lexicalization property linking them to lay terms.

The first case study that we conducted belongs to the domain of consumer justice, and was framed in the ONTOMEDIA project. We proposed to reuse the available Mediation-Core Ontology (MCO) and Consumer Mediation Ontology (COM) as anchors to legal, institutional, and expert knowledge, and therefore as entry points for the queries posed by consumers in common-sense language.

The user corpus contained around 10,000 consumer questions and 20,000 complaints addressed from 2007 to 2010 to the Catalan Consumer Agency. We applied a traditional terminology extraction methodology to identify candidate terms, which were subsequently validated by legal experts. We then manually mapped the lay terms to the ontological classes. The relations used for mapping lay terms with ontological classes are mostly has_lexicalisation and has_instance.

A second case study in the domain of consumer law was carried out with Italian corpora. In this case domain terminology was extracted from a normative corpus (the Code of Italian Consumer law) and from a lay corpus (around 4000 consumers’ questions).

In order to further explore the particularities of each corpus respecting the semantic coverage of the domain, terms were gathered together into a common taxonomic structure [8]. This task was performed with the aid of domain experts. When confronted with the two lists of terms, both laypersons and technical experts would link most of the validated lay terms to the technical list of terms through one of the following relations:

  • Subclass: the lay term denotes a particular type of legal concept. This is the most frequent case. For instance, in the class objects, telefono cellulare (cell phone) and linea telefonica (phone line) are subclasses of the legal terms prodotto (product) and servizio (service), respectively. Similarly, in the class actors agente immobiliare (estate agent) can be seen as subclass of venditore (seller). In other cases, the linguistic structures extracted from the consumers’ corpus denote conflictual situations in which the obligations have not been fulfilled by the seller and therefore the consumer is entitled to certain rights, such as diritto alla sostituzione (entitlement to a replacement). These types of phrases are subclasses of more general legal concepts such as consumer right.
  • Instance: the lay term denotes a concrete instance of a legal concept. In some cases, terms extracted from the consumer corpus are named entities that denote particular individuals, such as Vodafone, an instance of a domain actor, a seller.
  • Equivalent: a legal term is used in lay discourse. For instance, contratto (contract) or diritto di recessione (withdrawal right).
  • Lexicalisation: the lay term is a lexical variant of the legal concept. This is the case for instance of negoziante, used instead of the legal term venditore (seller) or professionista (professional).

The distribution of normative and lay terms per taxonomic level shows that, whereas normative terms populate mostly the upper levels of the taxonomy [9], deeper levels in the hierarchy are almost exclusively represented by lay terms.

Term distribution per taxonomic level

The result of this type of approach is a set of terminological-ontological resources that provide some insights on the nature of laypersons' cognition of the law, such as the fact that citizens’ domain knowledge is mainly factual and therefore populates deeper levels of the taxonomy. Moreover, such resources can be used for the further processing of user input. However, this strategy presents some limitations as well. First, it is mainly driven by domain conceptual systems and, in a way, they might limit the potentialities of user-generated corpora. Second, they are not necessarily scalable. In other words, these terminological-ontological resources have to be rebuilt for each legal subdomain (such as consumer law, private law, or criminal law), and it is thus difficult to foresee mechanisms for performing an automated mapping between lay terms and legal terms.

Beyond domain ontologies: information extraction approaches

One of the most important limitations of ontology-driven approaches is the lack of scalability. In order to overcome this problem, a possible strategy is to rely on informational structures that occur generally in user-generated content. These informational structures go beyond domain conceptual models and identify mostly discursive, emotional, or event structures.

Discursive structures formalise the way users typically describe a legal case. It is possible to identify stereotypical situations appearing in the description of legal cases by citizens (i.e., the nature of the problem; the conflict resolution strategies, etc.). The core of those situations is usually predicates, so it is possible to formalize them as frame structures containing different frame elements. We followed such an approach for the mapping of the Spanish corpus of consumers’ questions to the classes of the domain ontology (Fernández-Barrera and Casanovas, 2011). And the same technique was applied for mapping a set of citizens’ complaints in the domain of acoustic nuisances to a legal domain ontology (Bourcier and Fernández-Barrera, 2011). By describing general structures of citizen description of legal cases we ensure scalability.

Emotional structures are extracted by current algorithms for opinion- and sentiment mining. User data in the legal domain often contain an important number of subjective elements (especially in the case of complaints and feedback on public services) that could be effectively mined and used in public decision making.

Finally, event structures, which have been deeply explored so far, could be useful for information extraction from user complaints and feedback, or for automatic classification into specific types of queries according to the described situation.

Crowdsourcing in e-government: next steps (and precautions?)

Legal prosumers' input currently outstrips the capacity of government for extracting meaningful content in a cost-efficient way. Some developments are under way, among which are argument-mapping technologies and semantic matching between legal and lay corpora. The scalability of these methodologies is the main obstacle to overcome, in order to enable the matching of user data with open public data in several domains.

However, as technologies for the extraction of meaningful content from user-generated data develop and are used in public-decision making, a series of issues will have to be dealt with. For instance, should the system developer bear responsibility for the erroneous or biased analysis of data? Ethical questions arise as well: May governments legitimately analyse any type of user-generated content? Content-analysis systems might be used for trend- and crisis detection; but what if they are also used for restricting freedoms?

The “wisdom of crowds” can certainly be valuable in public decision making, but the fact that citizens’ online behaviour can be observed and analysed by governments without citizens' acknowledgement poses serious ethical issues.

Thus, technical development in this domain will have to be coupled with the definition of ethical guidelines and standards, maybe in the form of a system of quality labels for content-analysis systems.

[Editor's Note: For earlier VoxPopuLII commentary on the creation of legal ontologies, see Núria Casellas, Semantic Enhancement of Legal Information… Are We Up for the Challenge? For earlier VoxPopuLII commentary on Natural Language Processing and legal Semantic Web technology, see Adam Wyner, Weaving the Legal Semantic Web with Natural Language Processing. For earlier VoxPopuLII posts on user-generated content, crowdsourcing, and legal information, see Matt Baca and Olin Parker, Collaborative, Open Democracy with LexPop; Olivier Charbonneau, Collaboration and Open Access to Law; Nick Holmes, Accessible Law; and Staffan Malmgren, Crowdsourcing Legal Commentary.]


[1] The idea of prosumption existed actually long before the Internet, as highlighted by Ritzer and Jurgenson (2010): the consumer of a fast food restaurant is to some extent as well the producer of the meal since he is expected to be his own waiter, and so is the driver who pumps his own gasoline at the filling station.

[2] The experience project enables registered users to share life experiences, and it contained around 7 million stories as of January 2011: http://www.experienceproject.com/index.php.

[3] For instance, the United Nations Volunteers Online platform (http://www.onlinevolunteering.org/en/vol/index.html) helps volunteers to cooperate virtually with non-governmental organizations and other volunteers around the world.

[4] See for instance the experiment run by mathematician Gowers on his blog: he posted a problem and asked a large number of mathematicians to work collaboratively to solve it. They eventually succeeded faster than if they had worked in isolation: http://gowers.wordpress.com/2009/01/27/is-massively-collaborative-mathematics-possible/.

[5] The Galaxy Zoo project asks volunteers to classify images of galaxies according to their shapes: http://www.galaxyzoo.org/how_to_take_part. See as well Cornell's projects Nestwatch (http://watch.birds.cornell.edu/nest/home/index) and FeederWatch (http://www.birds.cornell.edu/pfw/Overview/whatispfw.htm), which invite people to introduce their observation data into a Website platform.

[6] http://www.participedia.net/wiki/Icelandic_Constitutional_Council_2011.

[7] See the description of Debategraph in Marta Poblet's post, Argument mapping: visualizing large-scale deliberations (http://serendipolis.wordpress.com/2011/10/01/argument-mapping-visualizing-large-scale-deliberations-3/).

[8] Terms have been organised in the form of a tree having as root nodes nine semantic classes previously identified. Terms have been added as branches and sub-branches, depending on their degree of abstraction.

[9] It should be noted that legal terms are mostly situated at the second level of the hierarchy rather than the first one. This is natural if we take into account the nature of the normative corpus (the Italian consumer code), which contains mostly domain specific concepts (for instance, withdrawal right) instead of general legal abstract categories (such as right and obligation).

REFERENCES

Bourcier, D., and Fernández-Barrera, M. (2011). A frame-based representation of citizen's queries for the Web 2.0. A case study on noise nuisances. E-challenges conference, Florence 2011.

Fernández-Barrera, M., and Casanovas, P. (2011). From user needs to expert knowledge: Mapping laymen queries with ontologies in the domain of consumer mediation. AICOL Workshop, Frankfurt 2011.

Lux, M., and Dsinger, G. (2007). From folksonomies to ontologies: Employing wisdom of the crowds to serve learning purposes. International Journal of Knowledge and Learning (IJKL), 3(4/5): 515-528.

Mika, P. (2005). Ontologies are us: A unified model of social networks and semantics. In Proc. of Int. Semantic Web Conf., volume 3729 of LNCS, pp. 522-536. Springer.

Passant, A. (2007). Using ontologies to strengthen folksonomies and enrich information retrieval in Weblogs. In Int. Conf. on Weblogs and Social Media, 2007.

Poblet, M., Casellas, N., Torralba, S., and Casanovas, P. (2009). Modeling expert knowledge in the mediation domain: A Mediation Core Ontology, in N. Casellas et al. (Eds.), LOAIT- 2009. 3rd Workshop on Legal Ontologies and Artificial Intelligence Techniques joint with 2nd Workshop on Semantic Processing of Legal Texts. Barcelona, IDT Series n. 2.

Ritzer, G., and Jurgenson, N. (2010). Production, consumption, prosumption: The nature of capitalism in the age of the digital "prosumer." In Journal of Consumer Culture 10: 13-36.

Specia, L., and Motta, E. (2007). Integrating folksonomies with the Semantic Web. Proc. Euro. Semantic Web Conf., 2007.

Meritxell Fernández-Barrera is a researcher at the Cersa (Centre d'Études et de Recherches de Sciences Administratives et Politiques) -CNRS, Université Paris 2-. She works on the application of natural language processing (NLP) to legal discourse and legal communication, and on the potentialities of Web 2.0 for participatory democracy.

VoxPopuLII is edited by Judith Pratt. Editor-in-Chief is Robert Richards, to whom queries should be directed. The statements above are not legal advice or legal representation. If you require legal advice, consult a lawyer. Find a lawyer in the Cornell LII Lawyer Directory.

Raise your hand if you’ve heard (or said) a variation of one of these tired truisms: "Politics is dominated by lobbyists and spending." "Policy making has degenerated into a glorified yelling match." "Our country has never been more polarized." "Today’s online communities foster echo chambers of the like-minded rather than fora for discussion."

Is your hand raised? Because ours certainly are.

The only thing anyone can seem to agree on today is that the current U.S. political system is broken. We’re mired in a confluence of corporate spending, ugly discourse, and voter voicelessness.

LexPop provides an open public platform for tackling these problems.

Meet LexPop

LexPop allows participants to collaborate in the creation of legislative bills -- bills that are later introduced by actual legislators. At its most basic, LexPop is a Wikipedia for creating public policy. (There’s a lot more to it than that, as we’ll explain below.) In our first project, Massachusetts Representative Tom Sannicandro (D-Ashland) -- one of those actual legislators we’re talking about -- has agreed to introduce a net neutrality bill created on LexPop.

LexPop has two primary goals. Our first goal is to give the public a voice. We hope to provide a space for ordinary people (i.e., people who can’t afford to hire lobbyists) to contribute substantively to public policy -- to give their best ideas a fair hearing.

As you know, lobbyists write the bulk of the legislation coming out of our various legislatures. LexPop provides a voxlobbylane.jpgcounterpoint to the current model -- a way for the public to provide legislators with voter-created model legislation. A legitimate, 21st-century democracy will invite the public into meaningful collaboration, and LexPop is part of the march in that direction.

Our second goal is to determine the best way to achieve the first. That is, a compelling movement is attempting to take governance into the 21st century, and organizations like PopVox and OpenCongress are doing great work. Several organizations and initiatives, including a government-sponsored effort in Brazil, are trying to make it possible for citizens to help write legislation. But at this point, nobody knows the best way to make the co-creation of laws a reality. Our work will contribute to figuring out what’s possible, what works, and what doesn’t.

How LexPop works

There are two ways to use LexPop. Our primary focus is on Policy Drives -- where legislators pledge to introduce bills written on the site. Policy Drives are somewhat analogous to what goes on at Wikipedia, but LexPop provides more structure through the use of three specific phases:

  • Phase 1: Initial discussion, debate, argument, and research;
  • Phase 2: Outlining the bill in plain English (for those who aren’t regular readers of Vox PopuLII); and
  • Phase 3: Transforming the ‘plain English’ outline into legislative text.

voxnet-neutrality.jpgWe’re currently in the discussion phase of our first Policy Drive, devoted to the net neutrality bill Rep. Sannicandro has agreed to introduce.

A second option on LexPop is working on a “WikiBill.” WikiBills are written via the familiar, wide-open wiki model, and they offer a spot for the public to create model legislation on their own, without the three-phase structure of Policy Drives, and without a legislator-sponsor. WikiBill creators collaborate through a free-for-all process, very similar to Wikipedia -- start from scratch and cobble the bill together. There’s no end to the WikiBill process, so participants can create a bill, submit it to their representatives, modify it, and submit it again.

Yeah, sounds great. But can this really work?

It’s usually at this point in the conversation that questions start coming up. LexPop, and similar projects, are largely operating in uncharted waters, and so there’s good reason to think the project sounds ambitious, perhaps even crazy. Below are a few of the questions we’re asked most often, along with our preliminary answers.

Will anyone contribute to this sort of effort?
We think so. (Obviously.)

Here’s why: Ordinary people collaborate on difficult projects online -- especially online -- often with great success. Take Linux, the open source operating system. The vast majority of people who work on Linux aren’t paid; they’ve incrementally created it in their spare time.

Are you reading this blog on Firefox? Well, guess what? Your browser was built almost entirely by volunteers.

At LexPop, we’re asking people who are passionate about certain issues to give some of their free time to developing better policy, in the same way engineers have asked them to help develop software. Sure, it will be complicated, but people are smart, and given the right opportunity and tools, they’ll be able to (once again) create something extraordinary.

Politics is too controversial -- How can you expect people to come to consensus on one answer?
To answer this question, we like to look to Jesus -- the "Jesus" page on Wikipedia, that is.

There are plenty of controversial topics addressed on Wikipedia, but it’s the pages for these topics that are often the most accurate. Wikipedians who edit the Jesus page know the topic is controversial, so they back up what they say with facts -- otherwise, the crowd of users won’t allow it. Over time, the Jesus page has turned into something that most users are pretty happy about. And this is the similarity between LexPop and Wikipedia: They’re both about collaboratively writing something that isn’t perfect in the eyes of any one participant, but is better than the alternative.

Fine, but isn’t there a better model than a wiki?
This is one of the things we’re trying to figure out, and one of the things with which we need your help. We’re starting with a modified wiki (the three phases), but as we learn, we’ll adapt. A wiki allows a certain type of collaboration (the kind found on Wikipedia), but it may not be the best way to collaborate. Is the three-step process we’re using the right model, or should the phases be combined? With your help, we’ll find out -- and we promise to share our findings.

Will legislation created on LexPop be representative?
We don’t claim that bills made on LexPop will be perfectly representative, and we’re not trying to make representative democracy obsolete. After a bill is written on the site, it will still have to go through the same bill-into-law process as every other piece of legislation.

voxexperts.jpgBut LexPop will certainly be more representative than the system we have now. With LexPop, non-profit organizations with valuable knowledge of an issue, passionate experts well-versed on a topic, and regular voters (Joes the Plumber, if you will) will no longer be shut out of the process. Right now, we live in a world where participation too often means a voter pours out her heart in a letter and receives a form response that the intended recipient didn’t write, read, or even sign. Our system for adding more voices to lawmaking may not be perfect, but it will be less imperfect than the current political system.

LexPop provides a first draft of legislation that’s written by people, not by lobbyists. This is our value-add; we’re opening a new channel for public participation, and taking a step toward a more legitimate and deliberative democracy.

But we need your helpvoxmeeting_brains.jpg

And we need it big time. For a project like this to work, we need participants.

If you’re interested in collaborative democracy, please get involved in the conversation. You’ll be helping even if you post only one comment. Even if you aren’t particularly interested in net neutrality, we encourage you to learn more about it on the site, and then make sure you come back when we have a Policy Drive on your favorite issue.

Also, we’d be grateful if you spread the word about our site. Like us on Facebook, Tweet about LexPop (@LexPopOrg), blog about us, or, even better, let us write a guest blog post on your site (Thanks, VoxPopuLII !).

We’d also love for you to tell us what we’re doing wrong. LexPop is perfect in neither theory nor practice. So please help us make LexPop and, ultimately, deliberative democracy better with your feedback. We have a Google Group for discussion about LexPop, or you can contact us through the website.

Coda

LexPop is a platform for public engagement and empowerment. LexPop provides a space for discussion-driven public policy and a stronger, more agile democracy. LexPop is about more voices. Add yours.

Matt_BacaMatt Baca is a joint J.D./M.P.A. student at New York University School of Law and the Harvard Kennedy School. He's interested law, public policy, government 2.0, and the Rockies (team and mountains).

Olin_Grant_ParkerOlin Parker is a Master's in Public Policy student at the Harvard Kennedy School. His interests include disability policy, education reform, the states of Kansas and Louisiana, and his 17 month-old daughter.

VoxPopuLII is edited by Judith Pratt. Editor in chief is Robert Richards.

Readers of this blog are probably already familiar with the U.S. Federal Courts' system for electronic access called PACER (Public Access to Court Electronic Records).  PACER is unlike any other country's electronic public access system that I am aware of, because it provides complete access to docket text, opinions, and all documents filed (except sealed records, of course).  It is a tremendously useful tool, and (at least at the time of its Web launch in the late 1990s) was tremendously ahead of its time.

However, PACER is unique in another important way: it imposes usage charges on citizens for downloading, viewing, and even searching for case materials. This limitation unfortunately forecloses a great deal of democracy-enhancing activity.

Aaron SwartzThe PACER Liberation Front

In 2008, I happened upon PACER in the course of trying to research a First Amendment issue.  I am not a lawyer, but I was trying to get a sense of the federal First Amendment case law across all federal jurisdictions, because that case law had a direct effect on some activists at the time.  I was at first excited that so much case law was apparently available online, but then disappointed when I discovered that the courts were charging for it.  After turning over my credit card number to PACER, I was shocked that the system was charging for every single search I performed.  With the type of research I was trying to do, it was inevitable that I would have to do countless searches to find what I was looking for.  What's more, the search functionality provided by PACER turned out to be nearly useless for the task at hand -- there was no way to search for keywords, or within documents at all.  The best I could do was pay for all the documents in particular cases that I suspected were relevant, and then try to sort through them on my own hard drive. Even this would be far from comprehensive.

This led to the inevitable conclusion that there is simply no way to know federal case law without going through a lawyer, doing laborious research using print legal resources, or paying for a high-priced database service.  My only hope for getting use out of PACER was to find some way to affordably get a ton of documents.  This is when I ran across a nascent project led by open government prophet Carl Malamud. He called it PACER Recycling.  Carl offered to host any PACER documents that anybody happened to have, so that other people could download them.  At that time, he had only a few thousand documents, but an ingenious plan: The federal courts were conducting a trial of free access at about sixteen libraries across the country. Anyone who walked in to one of those libraries and asked for PACER could browse and download documents for free. Carl was encouraging a "thumb drive corps" to bring USB sticks into those libraries and download caches of PACER documents.

The main bottleneck with this approach was volume. PACER contains hundreds of millions of documents, and manually downloading them all was just not going to happen. I had a weekend to kill, and an idea for building on his plan. I wrote up a Perl script that could run off of a USB drive and that would automatically start going through PACER cases and downloading all of the documents in an organized fashion. I didn't live near one of the "free PACER" libraries, so I had to test the script using my own non-free PACER account... which got expensive. I began to contemplate the legal ramifications -- if any -- of downloading public records in bulk via this method. The following weekend I ran into Aaron Swartz.

Aaron is one of my favorite civic hackers. He's a great coder and has a tendency to be bold. I told him about my little project, and he asked to see the code. He made some improvements and, given his higher tolerance for risk, proceeded to use the modified code to download about 2,700,000 files from PACER. The U.S. Courts freaked out, cancelled the free access trial, and said that "[t]he F.B.I. is conducting an investigation." We had a hard time believing that the F.B.I. would care about the liberation of public records in a seemingly legal fashion, and told The New York Times as much. (Media relations pro tip: If you don't want to be quoted, always, repeatedly emphasize that your comments are "on background" only. Even though I said this when I talked to The Times, they still put my name in the corresponding blog post. That was the first time I had to warn my fiancée that if the feds came to the door, she should demand a warrant.)

A few months later, Aaron got curious about whether the FBI was really taking this seriously. In a brilliantly ironic move, he filed a FOIA for his own FBI record, which was delivered in due course and included such gems as:

Between September 4, 2008 and September 22, 2008, PACER was accessed by computers from outside the library utilizing login information from two libraries participating in the pilot project. The Administrative Office of the U.S. Courts reported that the PACER system was being inundated with requests. One request was being made every three seconds.

[…] The two accounts were responsible for downloading more than eighteen million pages with an approximate value of $1.5 million.

The full thing is worth a read, and it includes details about the feds looking through Aaron's Facebook and LinkedIn profiles. However, the feds were apparently unable to determine Aaron's current residence and ended up staking out his parents' house in Illinois. The feds had to call off the surveillance because, in their words: "This is a heavily wooded, dead-end street, with no other cars parked on the road making continued surveillance difficult to conduct without severely increasing the risk of discovery." The feds eventually figured out Aaron wasn't in Illinois when he posted to Facebook: "Want to meet the man behind the headlines? Want to have the F.B.I. open up a file on you as well? Interested in some kind of bizarre celebrity product endorsement? I’m available in Boston and New York all this month." They closed the case.

RECAPTurning PACER Around

Carl published Aaron's trove of documents (after conducting a very informative privacy audit), but the question was: what to do next? I had long given up on my initial attempt to merely understand a narrow aspect of First Amendment jurisprudence, and had taken up the PACER liberation cause wholeheartedly. At the time, this consisted of writing about the issue and giving talks. I ran across a draft article by some folks at Princeton called "Government Data and the Invisible Hand." It argued:

Rather than struggling, as it currently does, to design sites that meet each end-user need, we argue that the executive branch should focus on creating a simple, reliable and publicly accessible infrastructure that exposes the underlying data. Private actors, either nonprofit or commercial, are better suited to deliver government information to citizens and can constantly create and reshape the tools individuals use to find and leverage public data.

I couldn't have agreed more, and their prescription for the executive branch made sense for the brain-dead PACER interface too. I called up one of the authors, Ed Felten, and he told me to come down to Princeton to give a talk about PACER. Afterwards, two graduate students, Harlan Yu and Tim Lee, came up to me and made an interesting suggestion. They proposed a Firefox extension that anyone using PACER could install. As users paid for documents, those documents would automatically be uploaded to a public archive. As users browsed dockets, if any documents were available for free, the system would notify them of that, so that the users could avoid charges. It was a beautiful quid-pro-quo, and a way to crowdsource the PACER liberation effort in a way that would build on the existing document set.

So Harlan and Tim built the extension and called it RECAP (tagline: "Turning PACER around" Get it? eh?). It was well received, and you can read the great endorsements from The Washington Post, The L.A. Times, The Guardian, and many like-minded public interest organizations. The courts freaked out again, but ultimately realized they couldn't go after people for republishing the public record.

I helped with a few of the details, and eventually ended up coming down to work at their research center, the Center for Information Technology Policy. Last year, a group of undergrads built a fantastic web interface to the RECAP database that allows better browsing and searching than PACER. Their project is just one example of the principle laid out in the "Government Data and the Invisible Hand" paper: when presented with the raw data, civic hackers can build better interfaces to that data than the government.

PACER Revenue/Expenditure GraphFrom Fee to Free

Despite all of our efforts, the database of free PACER materials still contains only a fraction of the documents stored in the for-fee database. The real end-game is for the courts to change their mind about the PACER paywall approach in the first place. We have made this case in many venues. Influential senators have sent them letters. I have even pointed out that the courts are arguably violating The 2002 E-Government Act. As it happens, PACER brings in over $100 million annually through user fees. These fees are spent partially on supporting PACER's highly inefficient infrastructure, but are also partially spent on various other things that the courts deem somehow related to public access. This includes what one judge described as expenditures on his courtroom:

"Every juror has their own flatscreen monitors. We just went through a big upgrade in my courthouse, my courtroom, and one of the things we've done is large flatscreen monitors which will now -- and this is a very historic courtroom so it has to be done in accommodating the historic nature of the courthouse and the courtroom -- we have flatscreen monitors now which will enable the people sitting in the gallery to see these animations that are displayed so they're not leaning over trying to watch it on the counsel table monitor. As well as audio enhancements. In these big courtrooms with 30, 40 foot ceilings where audio gets lost we spent a lot of money on audio so the people could hear what's going on. We just put in new audio so that people -- I'd never heard of this before -- but it actually embeds the speakers inside of the benches in the back of the courtroom and inside counsel tables so that the wood benches actually perform as amplifiers."

I am not against helping courtroom visitors hear and see trial testimony, but we must ask whether it is good policy to restrict public access to electronic materials on the Internet in the name of arbitrary courtroom enhancements (even assuming that allocating PACER funds to such enhancements is legal, which is questionable). The real hurdle to liberating PACER is that it serves as a cross-subsidy to other parts of our underfunded courts. I parsed a bunch of appropriations data and committee reports in order to write up a report on actual PACER costs and expenditures. What is just as shocking as the PACER income's being used for non-PACER expenses, is the actual claimed cost of running PACER, which is orders of magnitude higher than any competent Web geek would tell you it should be (especially for a system whose administrators once worried that "one request was being made every three seconds."). The rest of the federal government has been moving toward cloud-based "Infrastructure as a Service", while the U.S. Courts continue to maintain about 100 different servers in each jurisdiction, each with their own privately leased internet connection. (Incidentally, if you enjoy conspiracy theories, try to ID the pseudonymous "Schlomo McGill" in the comments of this post and this post.)

The ultimate solution to the PACER fee problem unfortunately lies not in exciting spy-vs-spy antics (although those can be helpful and fun), but in bureaucratic details of authorization subcommittees and technical details of network architecture. This is the next front of PACER liberation. We now have friends in Washington, and we understand the process better every day. We also have very smart geeks, and I think that the ultimate finger on the scale may be our ability to explain how the U.S. Courts could run a tremendously more efficient system that would simultaneously generate a diversity of new democratic benefits. We also need smart librarians and archivists making good policy arguments. That is one reason why the Law.gov movement is so exciting to me. It has the potential not only to unify open-law advocates, but to go well beyond the U.S. Federal Case Law fiefdom of PACER.

Perhaps then I can finally get the answer to that narrow legal question I tried to ask in 2008. I'm sure that the answer will inevitably be: "It's complicated."

Stephen SchultzeSteve Schultze is Associate Director of The Center for Information Technology Policy at Princeton. His work includes Internet privacy, security, government transparency, and telecommunications policy. He holds degrees in Computer Science, Philosophy, and Media Studies from Calvin College and MIT. He has also been a Fellow at The Berkman Center for Internet & Society at Harvard, and helped start the Public Radio Exchange.

In May of this year, HeinOnline began taking a new approach to legal research, offering researchers the ability to search or browse varying types of legal research material all related to a specialized area of law in one database. We introduced this concept as a new legal research platform with the release of World Constitutions Illustrated: Contemporary & Historical Documents & Resources, which we’ll discuss in further detail later on in this post. First, we must take a brief look at how HeinOnline started and where it is going. Then, we will continue on by looking at the scope of the new platform and how it is being implemented across HeinOnline’s newest library modules.

This is how we started...
Traditionally, HeinOnline libraries featured one title or a single type of legal research material. For example, the Law Journal Library, HeinOnline’s largest and most used database, contains law and law-related periodicals. The Federal Register Library contains the Federal Register dating back to inception, with select supporting resources. The U.S. Statutes at Large Library contains the U.S. Statutes at Large volumes dating back to inception, with select supporting resources.

WhereBeen

This is where we are going...
The new subject-specific legal research platform, introduced earlier this year, has shifted from that traditional approach to a more dynamic approach of offering research libraries focused on a subject area, versus a single title or resource. This platform combines primary and secondary resources, books, law review articles, periodicals, government documents, historical documents, bibliographic references and other supporting resources all related to the same area of law, into one database, thus providing researchers one central place to find what they need.

WhereWeAreGoing

How is this platform being implemented?
In May, HeinOnline introduced the platform with the release of a new library called World Constitutions Illustrated: Contemporary & Historical Documents & Resources. The platform has since been implemented in every new library that HeinOnline has released including History of Bankruptcy: Taxation & Economic Reform in America, Part III and Intellectual Property Law Collection.

Pilot project: World Constitutions Illustrated
First, let’s look at the pilot project, World Constitutions Illustrated. Our goal when releasing this new library was to present legal researchers with a different scope than what is currently available for those studying constitutional law and political science. To achieve this, the library was built upon the new legal research platform, which brings together: constitutional documents, both current and historical; secondary sources such as the CIA’s World Fact Book, Modern Legal Systems Cyclopedia, the Library of Congress’s Country Studies and British and Foreign State Papers; books; law review articles; bibliographies; and links to external resources on the Web that directly relate to the political and historical development of each country. By presenting the information in this format, researchers no longer have to visit multiple Web sites or pull multiple sources to obtain the documentary history of the development of a country’s constitution and government.

Inside the interface, every country has a dedicated resource page that includes the Constitutions and Fundamental Laws, Commentaries & Other Relevant Sources, Scholarly Articles Chosen by Our Editors, a Bibliography of Select Constitutional Books, External Links, and a news feed. Let’s take a look at France.

France

Constitutions & Fundamental Laws
France has a significant hierarchy of constitutional documents from the current constitution as amended to 2008 all the way back to the Declaration of the Rights of Man and of the Citizen promulgated in 1789. Within the hierarchy of documents, one can find consolidated texts, amending laws, and the original text in multiple languages when translations are available.

FranceConstitutions

Commentaries & Other Relevant Sources
Researchers will find more than 100 commentaries and other relevant sources of information related to the development of the government of France and the French Constitution. These sources include secondary source books and classic constitutional law books. To further connect these sources to the French Constitution, our Editors have reviewed each source book and classic constitutional book and linked researchers to the specific chapters or sections of the works that directly relate to the study of the French Constitution. For example, the work titled American Nation: A History, by Albert Bushnell Hart, has direct links to chapters from within volumes 11 and 13, each of which discusses and relates to the development of the French government.

Commentaries

Scholarly Articles Chosen by Our Editors
This section features more than 40 links to scholarly articles from HeinOnline’s Law Journal Library that are directly related to the study of the French Constitution and the development of the government of France. The Editors hand-selected and included these articles from the thousands of articles in the Law Journal Library due to their significance and relation to the constitutional and political development of the nation. When browsing the list of articles, one will also find Hein’s ScholarCheck integrated, which allows a researcher to view other law review articles that cite that specific article. In order for researchers to access the law review articles, they must be subscribed to the Law Journal Library.

ScholarlyArticles

Bibliography of Select Constitutional Books
There are thousands of books related to constitutional law. Our Editors have gone through an extensive list of these resources and hand-selected books relevant to the constitutional development of each country. The selections are presented as a bibliography within each country. France has nearly 100 bibliographic references. Many bibliographic references also contain the ISBN which links to WorldCat, allowing researchers to find the work in a nearby library.

Bibliography

External Links
External links are also selected by the Editors as they are developing the constitutional hierarchies for each country. If there are significant online resources available that support the study of the constitution or the country’s political development, the links are included on the country page.

ExternalLinks

News Feeds
The last component on each country’s page is a news feed featuring recent articles about the country’s constitution. The news feed is powered by a Google RSS news feed and researchers can easily use the RSS icon to add it to their own RSS readers.

NewsFeed

In addition to the significant and comprehensive coverage of every country in World Constitutions Illustrated, the collection also features an abundance of material related to the study of constitutional law at a higher level. This makes it useful for those researching more general or regional constitutional topics.

Searching capabilities on the new platform
To further enhance the capabilities of this platform, researchers are presented with a comprehensive search tool that allows one to search the documents and books by a number of metadata points including the document date, promulgated date, document source, title, and author. For researchers studying the current constitution, the search can be narrowed to include just the current documents that make up the constitution for a country. Furthermore, a search can be generated across all the documents, classic books, or reference books for a specific country, or it can be narrower in scope to include a specific type of resource. After a search is generated, researchers will receive faceted search results, allowing them to quickly and easily drill down their results set by using facets including document type, date, country, and title.

ConstitutionSearch

Contributing to the project
An underlying concept behind the new legal research platform is encouraging legal scholars, law libraries, subject area experts, and other professionals to contribute to the project. HeinOnline wants to work with scholars and libraries from all around the world to continue to build upon the collection and to continue developing the constitutional timelines for every country. Several libraries and scholars from around the world have already contributed constitutional works from their libraries to World Constitutions Illustrated.

Extending the platform beyond the pilot project
As mentioned earlier, this platform has been implemented in every new library that HeinOnline has released including History of Bankruptcy: Taxation & Economic Reform in America, Part III and Intellectual Property Law Collection. Therefore, it’s necessary to briefly take a moment to look at these two libraries.

History of Bankruptcy: Taxation & Economic Reform in America, Part III
The History of Bankruptcy library includes more than 172,000 pages of legislative histories, treatises, documents and more, all related to bankruptcy law in America. The primary resources in this library are the legislative histories, which can be browsed by title, public law number, or popular name. Also included are classic books dating back to the late 1800’s and links to scholarly articles that were selected by our editors due to their significance to the study of bankruptcy law in America.

Bankruptcy

As with the searching capabilities presented in the World Constitutions Illustrated library, researchers can narrow a search by the type of resource, or search across everything in the library. After a search is generated, researchers will receive faceted search results, allowing them to quickly and easily drill down their results set by document type, date, or title.

banksearch.png

Intellectual Property Law Collection
The Intellectual Property Law Collection, released just over a month ago, features nearly 2 million pages of legal research material related to patents, trademarks, and copyrights in America. It includes more than 270 books, more than 100 legislative histories, links to more than 50 legal periodicals, federal agency documents, the Manual of Patent Examining Procedure, CFR Title 37, U.S. Code Titles 17 & 35, and links to scholarly articles chosen by our Editors, all related to intellectual property law in America.

IntellectualProperty

Furthermore, this library features a Google Patent Search widget that will allows researchers to search across more than 7 million patents made available to the public through an arrangement with Google and the United States Patent and Trademark Office.

GooglePatents

Searching in the Intellectual Property Law Collection allows researchers to search across all types of documents, or narrow a search to just books, legislative histories, or federal agency decisions, for example. After a search is generated, researchers will receive faceted search results, allowing them to quickly and easily drill down their results set by using facets including document type, date, country, or title.

SearchIP

HeinOnline is the modern link to legal history, and the new legal research platform bolsters this primary objective. The platform brings together the primary and secondary sources, other supporting documents, books, links to articles, periodicals, and links to other online sources, making it a central stop for researchers to begin their search for legal research material. The Editors have selected the books, articles, and sources that they deem significant to that area of the law. This is then presented in one database, making it easier for researchers to find what they need. With the tremendous growth of digital media and online sources, it can prove difficult for a researcher to quickly navigate to the most significant sources of information. HeinOnline’s goal is to make this navigation easier with the implementation of this new legal research platform.

BaranichMarcie Baranich is the Marketing Manager at William S. Hein & Co., Inc. and is responsible for the strategic marketing processes for both Hein's traditional products and its growing online legal research database, HeinOnline. In addition to her Marketing role, she is also active in the product development, training and support areas for HeinOnline. She is an author of the HeinOnline Blog, Wiki, YouTube channel, Facebook, and Twitter pages, and manages the strategic use of these resources to communicate and assist users with their legal research needs.

VoxPopuLII is edited by Judith Pratt.

Editor-in-Chief is Robert Richards, to whom queries should be directed.