skip navigation
search

There have been a series of efforts to create a national legislative data standard – one master XML format to which all states will adhere for bills, laws, and regulations.Those efforts have gone poorly.

Few states provide bulk downloads of their laws. None provide APIs. Although nearly all states provide websites for people to read state laws, they are all objectively terrible, in ways that demonstrate that they were probably pretty impressive in 1995. Despite the clear need for improved online display of laws, the lack of a standard data format and the general lack of bulk data has enabled precious few efforts in the private sector. (Notably, there is Robb Schecter’s WebLaws.org, which provides vastly improved experiences for the laws of California, Oregon, and New York. There was also a site built experimentally by Ari Hershowitz that was used as a platform for last year’s California Laws Hackathon.)

A significant obstacle to prior efforts has been the perceived need to create a single standard, one that will accommodate the various textual legal structures that are employed throughout government. This is a significant practical hurdle on its own, but failure is all but guaranteed by also engaging major stakeholders and governments to establish a standard that will enjoy wide support and adoption.

What if we could stop letting the perfect be the enemy of the good? What if we ignore the needs of the outliers, and establish a “good enough” system, one that will at first simply work for most governments? And what if we completely skip the step of establishing a standard XML format? Wouldn’t that get us something, a thing superior to the nothing that we currently have?

The State Decoded
This is the philosophy behind The State Decoded. Funded by the John S. and James L. Knight Foundation, The State Decoded is a free, open source program to put legal codes online, and it does so by simply skipping over the problems that have hampered prior efforts. The project does not aspire to create any state law websites on its own but, instead, to provide the software to enable others to do so.

Still in its development (it’s at version 0.4), The State Decoded leaves it to each implementer to gather up the contents of the legal code in question and interface it with the program’s internal API. This could be done via screen-scraping off of an existing state code website, modifying the parser to deal with a bulk XML file, converting input data into the program’s simple XML import format, or by a few other methods. While a non-trivial task, it’s something that can be knocked out in an afternoon, thus avoiding the need to create a universal data format and to persuade Wexis to provide their data in that format.

The magic happens after the initial data import. The State Decoded takes that raw legal text and uses it to populate a complete, fully functional website for end-users to search and browse those laws. By packaging the Solr search engine and employing some basic textual analysis, every law is cross-referenced with other laws that cite it and laws that are textually similar. If there exists a repository of legal decisions for the jurisdiction in question, that can be incorporated, too, displaying a list of the court cases that cite each section. Definitions are detected, loaded into a dictionary, and make the laws self-documenting. End users can post comments to each law. Bulk downloads are created, letting people get a copy of the entire legal code, its structural elements, or the automatically assembled dictionary. And there’s a REST-ful, JSON-based API, ready to be used by third parties. All of this is done automatically, quickly, and seamlessly. The time elapsed varies, depending on server power and the length of the legal code, but it generally takes about twenty minutes from start to finish.

The State Decoded is a free program, released under the GNU Public License. Anybody can use it to make legal codes more accessible online. There are no strings attached.

It has already been deployed in two states, Virginia and Florida, despite not actually being a finished project yet.

State Variations
The striking variations in the structures of legal codes within the U.S. required the establishment of an appropriately flexible system to store and render those codes. Some legal codes are broad and shallow (e.g., Louisiana, Oklahoma), while others are narrow and deep (e.g., Connecticut, Delaware). Some list their sections by natural sort order, some in decimal, a few arbitrarily switch between the two. Many have quirks that will require further work to accommodate.

For example, California does not provide a catch line for their laws, but just a section number. One must read through a law to know what it actually does, rather than being able to glance at the title and get the general idea. Because this is a wildly impractical approach for a state code, the private sector has picked up the slack – Westlaw and LexisNexis each write their own titles for those laws, neatly solving the problem for those with the financial resources to pay for those companies’ offerings. To handle a problem like this, The State Decoded either needs to be able to display legal codes that lack section titles, or pointedly not support this inferior approach, and instead support the incorporation of third-party sources of title. In California, this might mean mining the section titles used internally by the California Law Revision Commission, and populating the section titles with those. (And then providing a bulk download of that data, allowing it to become a common standard for California’s section titles.)

Many state codes have oddities like this. The State Decoded combines flexibility with open source code to make it possible to deal with these quirks on a case-by-case basis. The alternative approach is too convoluted and quixotic to consider.

Regulations
There is strong interest in seeing this software adapted to handle regulations, especially from cash-strapped state governments looking to modernize their regulatory delivery process. Although this process is still in an early stage, it looks like rather few modifications will be required to support the storage and display of regulations within The State Decoded.

More significant modifications would be needed to integrate registers of regulations, but the substantial public benefits that would provide make it an obvious and necessary enhancement. The present process required to identify the latest version of a regulation is the stuff of parody. To select a state at random, here are the instructions provided on Kansas’s website:

To find the latest version of a regulation online, a person should first check the table of contents in the most current Kansas Register, then the Index to Regulations in the most current Kansas Register, then the current K.A.R. Supplement, then the Kansas Administrative Regulations. If the regulation is found at any of these sequential steps, stop and consider that version the most recent.

If Kansas has electronic versions of all this data, it seems almost punitive not to put it all in one place, rather than forcing people to look in four places. It seems self-evident that the current Kansas Register, the Index to Regulations, the K.A.R. Supplement, and the Kansas Administrative Regulations should have APIs, with a common API atop all four, which would make it trivial to present somebody with the current version of a regulation with a single request. By indexing registers of regulations in the manner that The State Decoded indexes court opinions, it would at least be possible to show people all activity around a given regulation, if not simply show them the present version of it, since surely that is all that most people want.

A Tapestry of Data
In a way, what makes The State Decoded interesting is not anything that it actually does, but instead what others might do with the data that it emits. By capitalizing on the program’s API and healthy collection of bulk downloads, clever individuals will surely devise uses for state legal data that cannot presently be envisioned.

The structural value of state laws is evident when considered within the context of other open government data.

Major open government efforts are confined largely to the upper-right quadrant of this diagram – those matters concerned with elections and legislation. There is also some excellent work being done in opening up access to court rulings, indexing scholarly publications, and nascent work in indexing the official opinions of attorneys general. But the latter group cannot be connected to the former group without opening up access to state laws. Courts do not make rulings about bills, of course – it is laws with which they concern themselves. Law journals cite far more laws than they do bills. To weave a seamless tapestry of data that connects court decisions to state laws to legislation to election results to campaign contributions, it is necessary to have a source of rich data about state laws. The State Decoded aims to provide that data.

Next Steps
The most important next step for The State Decoded is to complete it, releasing a version 1.0 of the software. It has dozens of outstanding issues – both bug fixes and new features – so this process will require some months. In that period, the project will continue to work with individuals and organizations in states throughout the nation who are interested in deploying The State Decoded to help them get started.

Ideally, The State Decoded will be obviated by states providing both bulk data and better websites for their codes and regulations. But in the current economic climate, neither are likely to be prioritized within state budgets, so unfortunately there’s liable to remain a need for the data provided by The State Decoded for some years to come. The day when it is rendered useless will be a good day.

Waldo Jaquith is a website developer with the Miller Center at the University of Virginia in Charlottesville, Virginia. He is a News Challenge Fellow with the John S. and James L. Knight Foundation and runs Richmond Sunlight, an open legislative service for Virginia. Jaquith previously worked for the White House Office of Science and Technology Policy, for which he developed Ethics.gov, and is now a member of the White House Open Data Working Group.
[Editor's Note: For topic-related VoxPopuLII posts please see: Ari Hershowitz & Grant Vergottini, Standardizing the World's Legal Information - One Hackathon At a Time; Courtney Minick, Universal Citation for State Codes; John Sheridan, Legislation.gov.uk; and Robb Schecter, The Recipe for Better Legal Information Services. ]

VoxPopuLII is edited by Judith Pratt. Editors-in-Chief are Stephanie Davidson and Christine Kirchberger, to whom queries should be directed.

Supreme Court Building, IndiaIndian Kanoon is a free search engine for Indian law, providing access to more than 1.4 million central laws, and judgments from The Supreme Court of India, 24 High Courts, 17 law tribunals, constituent assembly debates, law commission reports, and a few law journals.

The development of Indian Kanoon began in the summer of 2007 and was publicly announced on 4 January 2008. Developing this service was a part-time project when I was working towards my doctorate degree in Computer Science at the University of Michigan under of guidance of Professor Farnam Jahanian of Arbor Networks fame. My work on Indian Kanoon continues to be a part-time affair because of my full-time job at Yahoo! India (Bangalore). Keep in mind, however, that I don’t have a law background,  nor am I an expert on information retrieval. My PhD thesis is entitled Context-Aware Network Security.

The Genesis

Indian Kanoon was started as a result of my curiosity about publicly available law data. In a blog article, Indian Kanoon – The road so far and the road ahead, written a year after the launch of Indian Kanoon, I explained how the project was started, how it ran during the first year, and the promises for the next year.

When I was considering starting Indian Kanoon, the idea of free Indian law search was not new. Prashant Iyengar, a law student from NALSAR Hyderabad, borgestotallibrary.jpgfaced the same problem. The law data was available but the search tools were far from satisfactory. So he started OpenJudis to provide search tools for Indian law data that were publicly available. He traces the availability of government data and the development of OpenJudis in detail in his VoxPopuLII post, Confessions of a Legal Info-holic.

Prashant Iyengar traces the genesis, successes, and impacts of Indian Kanoon in a more detailed fashion in his 2010 report, Free Access to Law in India – Is it Here to Stay?

The Goal

I have to make it clear that Indian Kanoon was started in a very informal fashion; the goals of Indian Kanoon were not well established at the outset. The broadest goal for the project came to me while I was writing the “About” page of Indian Kanoon. From this point on, the goals for Indian Kanoon started to crystallize. The second paragraph of this page summed it up as follows:

india-fear-justice.jpg“Even when laws empower citizens in a large number of ways, a significant fraction of the population is completely ignorant of their rights and privileges. As a result, common people are afraid of going to police and rarely go to court to seek justice. People continue to live under fear of unknown laws and a corrupt police.”

The Legal Thirst

During the first year after the launch of Indian Kanoon, one constant doubt that lingered in the minds of everyone familiar with the project (including me) concerned just how many people really needed a tool like Indian Kanoon. After all, this was a very specialized tool, which quite possibly would be useful only to lawyers or law students. But what constantly surprises me is the increasing number of users of the Website.  Indian Kanoon now has roughly half a million users per month, and the number keeps growing.

The obvious question is: Why is this legal thirst — this desire for access to full text of the law — arising in India now? I can think of umpteen reasons, such as an increase in the number of Indian citizens getting on the Internet, which is proving to be a better access medium than libraries; or that the general media awareness of law, or the spread of blogging culture, is fueling this desire.voxthirstgateofindia.jpg

On further reflection, I think there are two main drivers of this thirst for legal information. The first one is the resources now available for free and open access to law. Until very recently, most law resources in India were provided by libraries or Websites that charged a significant amount of money. In effect, they prohibited access to a significant portion of the population that wanted to look into legal issues. The average time spent per page on the Indian Kanoon Website is six minutes; this shows that most users actually read the legal text, and apparently find it easier to understand than they had previously expected. (This is precisely what I discovered when I began to read legal texts on a regular basis.)

The spread of the Internet, considered by itself, is not an important reason for the current thirst for law in India, in my view. Subscription-based legal Websites have been around for a while in India, but because of the pay-walls that they erected, none of them has been able to generate a strong user base. While the open nature of the Internet made it easy to compete against these providers, the availability of legal information free of charge — not just availability of the Internet — has removed huge barriers, both to start ups, and to access by the public.

The second major reason for this thirst for legal information — and for the traffic growth to Indian Kanoon — lies in technological advancement. Government websites and even private legal information providers in India are, generally, quite technologically deficient. To provide access to law documents, these providers typically have offered interfaces that are mere replicas of the library world. For example, our Supreme Court website allows searching for judgments by petitioner, respondent, case number, etc. While lawyers are often accustomed to using these interfaces, and of course understand these technical legal terms,indiasupreme_court_files.gif requiring prior knowledge of this kind of technical legal information as a prerequisite for performing a search raises a big barrier to access by common people. Further, the free-text search engines provided by these Websites have no notion of relevance. So while the technology world has significantly advanced in the areas of text search and relevance, government-based — and, to some extent, private, fee-based — legal resources in India have remained tied to stone-age technology.

Better Technology Improves Access

Allowing users to try and test any search terms that they have in mind, and providing a relevant set of links in response to their queries, significantly reduces the need for users to understand technical legal information as a prerequisite for reading and comprehending the law of the land. So, overall, I think advances in technology, some of which have been introduced by Indian Kanoon, are responsible for fostering a desire to read the law, and for affording more people access to the legal resources of India.

The Road Ahead

Considering, however, that fear of unknown laws remains in the minds of large numbers of the Indian people, now is not the time to gloat over the initial success of IndianKanoon. The task of Indian Kanoon is far from complete, and certainly more needs to be done to make searching for legal information by ordinary people easy and effective.

Sushant Sinha runs the search engine Indian Kanoon and currently works on the document processing team for Yahoo! India. Earlier he earned his PhD in Computer Science from the University of Michigan under the guidance of Professor Farnam Jahanian. He received his bachelor and masters degrees in computer science from IIT Madras, Chennai and was born and brought up in Jamshedpur, India. He was recently named one of “18 Young Innovators under 35 in India” by MIT’s Technology Review India.

VoxPopuLII is edited by Judith Pratt. Editor in chief is Robert Richards.