skip navigation
search

When I started to write a post about THOMAS and its place in open government about three months ago, I was feeling apologetic. I was going make a heavy-handed, literal comparison of the opening hours of the Law Library of Congress (one of the maintainers of THOMAS, and my former employer, whose views are not at all represented here) with open government. I planned to wax sympathetic on the history of THOMAS, and how little has changed since it was first built. But, that post would not have added anything new to the #opengovdata conversation, or really mentioned data at all.

Just over one month ago, #freeTHOMAS reached a fever pitch surrounding the passage of H.R. 5882, the Legislative Branch Appropriations Act for FY2013. Just before passage, H.Rpt.511 directed official conversation on open legislative data for the coming fiscal year by saying, "let's talk." In a section about the Government Printing Office, the House Appropriations Committee expressed concerns about authentication and open legislative data, but called for a task force "composed of staff representatives of the Library of Congress, the Congressional Research Service, the Clerk of the House, the Government Printing Office, and such other congressional offices as may be necessary," to look in to the matter.

Opengovdata was disappointed in government. The tone of the House Report suggested that government had been dismissive of opengovdata--and in all fairness, others were beginning to be dismissive of opengovdata as well.

But, a clear classification problem was emerging. Inspired by Lawrence Lessig's Freedom To Connect speech at the AFI in late May, I had a very librarian moment on organizational hierarchies.

#OpenGov is a really big tent.

People who want a more open environment for government to communicate with the governed want data, increased transparency, plain language legislation, open court filings, access to government funded research, silly walks, etc. Accountability through transparency often dominates the conversation, thanks to well-funded non-profits and high profile projects. However, there's more to the movement. In that spirit...

Let's be transparent about transparency.

When the goal of an open government project is legislative transparency through freely accessible data, let's focus on that. When the goal is something else, let's focus on that too. We hear about government accountability through data because the voices calling for it are loud. But data can do much more than bring about a more transparent lawmaking process.

In the words of a wise man, make transparency your Number Two.

If you haven't had the chance to watch this aforementioned Freedom To Connect talk, and you've got half an hour to spare, I highly recommend it. The subject is community broadband, but it's hard not to be inspired to frame other issues smarter, with transparency ever-present in the background.

Let's focus on Number One.

If the THOMAS data, for example, were open right now, this instant, you couldn't watch it on TV. You couldn't read it on your Kindle. It's mere presence would not increase transparency. Someone would have to do something with it.  Number One is the thing you do with the data to reach your own goal--and that goal might not be legislative transparency.

As a public law librarian to a broad constituency, my goals are different than those of a non-profit think tank, or a law firm, or a law school, or even a non-law public library. In a climate of doing more with less, of needing to show much return for little investment, we each have to frame specific, measurable, achievable Number Ones tailored to the needs of our institutions. Without these Number Ones (goals, mission statements, benchmarks, or whatever management word your organization uses), we flounder off-mission, lose focus, and potentially lose funding.

Librarians are foot soldiers for the First Amendment--we like open information, we place a high value on the freedom to know. However, we're among the first to be cut in tight budget situations, and we're all too familiar with the perils of asking for something that's overly broad, or asking for something that you can't show narrowly tailored value for later on.

With respect to open gov data: government accountability is not unimportant to me as a voter. However, as a law librarian, I need to focus on Number Ones with more specific, smaller-scale goals than transparency, that will create measurable outcomes, allowing me to show concrete value to my institution. The big picture of how information is available, and the relationship between the government and the governed is important, but it doesn't always get you funding, and it can't always answer the question of the patron in front of you.

 What's your Number One?
There's plenty of data out there. What are you doing with it? How can you manipulate raw free resources into something good for your institution? There is much to be said for information for the sake of information. I can't imagine needing to convince most library-types of that. That said, we library-types, we information professionals, we decision makers, and perhaps we citizens need to narrow open gov to make it work for us. Data is good, but a real-time interactive civics education program based on THOMAS data for K-12 students is better. Let transparency folks fight the good fight, and don't forget their work. But while you've got your librarian hat on, focus on a Number One that works for you.

Meg Lulofs is an information professional at large, blogging at librarylulu.com, editing Pimsleur's Checklists of Basic American Legal Publications, and making mischief. She earned a J.D. from the University of Baltimore, and a M.L.I.S. from Catholic University. She welcomes feedback at meglulofs@gmail.com. You can follow her on Twitter @librarylulu, or on Facebook at facebook.com/librarylulu.

 

VoxPopuLII is edited by Judith Pratt. Editors-in-Chief are Stephanie Davidson and Christine Kirchberger, to whom queries should be directed. The information above should not be considered legal advice. If you require legal representation, please consult a lawyer.

[Editor's Note] For topic-related VoxPopuLII posts please see: Nick Holmes, Accessible Law; John Sheridan, Legislation.gov.uk; David Moore, OpenGovernment.org: Researching U.S. State Legislation.

In the fall of 2009, the American Association of Law Libraries (AALL) put out a call for volunteers to participate in our new state working groups to support one of AALL’s top policy priorities: promoting the need for authentication and preservation of digital legal resources. It is AALL policy that the public have no-fee, permanent public access to authentic online legal information. In addition, AALL believes that government information, including the text of all primary legal materials, must be in the public domain and available without restriction.

The response to our call was overwhelming, with volunteers from all 50 states and the District of Columbia expressing interest in participating. To promote our public policy priorities, the initial goals of AALL’s working groups were to:

  • Take action to oppose any plan in their state to eliminate an official print legal resource in favor of online-only, unless the electronic version is digitally authenticated and will be preserved for permanent public access;
  • Oppose plans to charge fees to access legal information electronically; and
  • Ensure that any legal resources in a state’s raw-data portal include a disclaimer so that users know that the information is not an official or authentic resource (similar to what is included on the Code of Federal Regulations XML on Data.gov).

In late 2009, AALL’s then-Director of Government Relations Mary Alice Baish met twice with Law Librarian of Congress Roberta Shaffer and Carl Malamud of Public.Resource.org to discuss Law.gov and Malamud’s idea for a national inventory of legal materials. The inventory would include legal materials from all three branches of government. Mary Alice volunteered our working groups to lead the ambitious effort to contribute to the groundbreaking national inventory. AALL would use this data to update AALL’s 2003 “State-by-State Report on Permanent Public Access to Electronic Government Information and the 2007 “State-by-State Report on Authentication of Online Legal Resourcesand 2009-2010 updates, which revealed that a significant number of state online legal resources are considered to be “official” but that few are authenticating. It would also help the Law Library of Congress, which owns the Law.gov domain name, with their own ambitious projects.

Erika Wayne and Paul Lomio at Stanford University’s Robert Crown Law Library developed a prototype for the national inventory that included nearly 30 questions related to scope, copyright, cost to access, and other use restrictions. They worked with the California State Working Group and the Northern California Association of Law Libraries to populate the inventory with impressive speed, adding most titles in about two months.

AALL’s Government Relations Office staff then expanded the California prototype to include questions related to digital authentication, preservation, and permanent public access. Our volunteers used the following definition of “authentication” provided by the Government Printing Office:

An authentic text is one whose content has been verified by a government entity to be complete and unaltered when compared to the version approved or published by the content originator.

Typically, an authentic text will bear a certificate or mark that conveys information as to its certification, the process associated with ensuring that the text is complete and unaltered when compared with that of the content originator.

An authentic text is able to be authenticated, which means that the particular text in question can be validated, ensuring that it is what it claims to be.

The “Principles and Core Values Concerning Public Information on Government Websites,” drafted by AALL’s Access to Electronic Legal Information Committee (now the Digital Access to Legal Information Committee) and adopted by the Executive Board in 2007, define AALL’s commitment to equitable, no-fee, permanent public access to authentic online legal information. The principle related to preservation states that:

Information on government Web sites must be preserved by the entity, such as a state library, an archives division, or other agency, within the issuing government that is charged with preservation of government information.

  • Government entities must ensure continued access to all their legal information.
  • Archives of government information must be comprehensive, including all supplements.
  • Snapshots of the complete underlying database content of dynamic Web sites should be taken regularly and archived in order to have a permanent record of all additions, changes, and deletions to the underlying data.
  • Governments must plan effective methods and procedures to migrate information to newer technologies.

In addition, AALL’s 2003 “State-By-State Report on Permanent Public Access to Electronic Government Information” defines permanent public access as, “the process by which applicable government information is preserved for current, continuous and future public access.”

Our volunteers used Google Docs to add to the inventory print and electronic legal titles at the state, county, and municipal levels and answer a series of questions about each title. AALL’s Government Relations Office set up a Google Group for volunteers to discuss issues and questions. Several of our state coordinators developed materials to help other working groups, such as Six Easy Steps to Populating Your State’s Inventory by Maine State Working Group coordinator Christine Hepler, How to Put on a Successful Work Day for Your Working Group by Florida State Working Group co-coordinators Jenny Wondracek and Jamie Keller, and Tips for AALL State Working Groups with contributions from many coordinators.

In October 2010, AALL held a very successful webinar on how to populate the inventories. More than 200 AALL and chapter members participated in the webinar, which included Kentucky State Working Group coordinator Emily Janoski-Haehlen, Maryland State Working Group coordinator Joan Bellistri, and Indiana State Working Group coordinator Sarah Glassmeyer as speakers. By early 2011, more than 350 volunteers were contributing to the state inventories.

Initial Findings

Our dedicated volunteers added more than 7,000 titles to the inventory in time for AALL’s June 30, 2011 deadline. AALL recognized our hard-working volunteers at our annual Advocacy Training during AALL’s Annual Meeting in Philadelphia, and celebrated their significant accomplishments. Timothy L. Coggins, 2010-11 Chair of the Digital Access to Legal Information Committee, presented these preliminary findings:

  • Authentication: No state reported new resources that have been authenticated since the 2009-2010 Digital Access to Legal Information Committee survey
  • Official status: Several states have designated at least one legal resource as official, including Arizona, Florida, and Maine
  • Copyright assertions in digital version: Twenty-five states assert copyright on at least one legal resource, including Oklahoma, Pennsylvania, and Rhode Island
  • Costs to access official version: Ten states charge fees to access the official version, including Kansas, Vermont, and Wyoming
  • Preservation and Permanent Public Access: Eighteen states require preservation and permanent public access of at least one legal resource, including Tennessee, Virginia, and Washington

Analyzing and Using the Data

In July 2011, AALL’s Digital Access to Legal Information Committee formed a subcommittee that is charged with reviewing the national inventory data collected by the state working groups. The subcommittee includes Elaine Apostola (Maine State Law and Legislative Reference Library), A. Hays Butler (Rutgers University Law School Library), Sarah Gotschall (University of Arizona Rogers College of Law Library), and Anita Postyn (Richmond Supreme Court Library). Subcommittee members have been reviewing the raw data as entered by the working group volunteers in their state inventories. They will soon focus their attention on developing a report that will also act as an updated version of AALL’s State-by-State Report on Authentication of Online Legal Resources.

The report, to be issued later this year, will once again support what law librarians have known for years: there are widespread issues with access to legal resources and there is an imminent need to prevent a trend of eliminating print resources in favor of electronic resources without the proper safeguards in place. It will also include information on: the official status of legal resources; whether states are providing for authentication, permanent public access, and/or preservation of online legal resources; any use restrictions or copyright claims by the state; and whether a universal (public domain) citation format has been adopted by any courts in the state.

In addition to providing valuable information to the Law Library of Congress and related Law.gov projects, this information has already been helpful to various groups as they proceed to advocate for no-fee, permanent public access to government information. The data has already been useful to advocates of the Uniform Electronic Legal Material Act and will continue to be valuable to those seeking introduction and enactment in their states. The inventory has been used as a starting point for organizations that are beginning digitization projects of their state legal materials. The universal citation data will be used to track the progress of courts recognizing the value of citing official online legal materials through adopting a public domain citation system. Many state working group coordinators have also shared data with their judiciaries and legislatures to help expose the need for taking steps to protect our state legal materials.

The Next Steps: Federal Inventory

In December 2010, we launched the second phase of this project, the Federal Inventory. The Federal Inventory will include:

  • Legal research materials
  • Information authored or created by agencies
  • Resources that are publicly accessible

Our goals are the same as with the state inventories: to identify and answer questions about print and electronic legal materials from all three branches of government. Volunteers from Federal agencies and the courts are already adding information such as decisions, reports and digests (Executive); court opinions, court rules, and Supreme Court briefs (Judicial); and bills and resolutions, the Constitution, and Statutes at Large (Legislative). Emily Carr, Senior Legal Research Specialist at the Law Library of Congress, and Judy Gaskell, retired Librarian of the Supreme Court, are coordinating this project.

Thanks to the contributions of an army of AALL and chapter volunteers, the national inventory of legal materials is nearly complete. Keep an eye on AALL's website for more information as our volunteers complete the Federal Inventory, analyze the data, and promote the findings to Federal, state and local officials.

Tina S. Ching is the Electronic Services Librarian at Seattle University School of Law. She is the 2011-12 Chair of the AALL Digital Access to Legal Information Committee.

 

Emily Feltren is Director of Government Relations for the American Association of Law Libraries.

 
 

[Editor's Note: For topic-related VoxPopuLII posts please see: Barbara Bintliff, The Uniform Electronic Legal Material Act Is Ready for Legislative Action; Jason Eiseman, Time to Turn the Page on Print Legal Information; John Joergensen, Authentication of Digital Repositories.]

VoxPopuLII is edited by Judith Pratt. Editors-in-Chief are Stephanie Davidson and Christine Kirchberger, to whom queries should be directed. The information above should not be considered legal advice. If you require legal representation, please consult a lawyer.

Readers of this blog are probably already familiar with the U.S. Federal Courts' system for electronic access called PACER (Public Access to Court Electronic Records).  PACER is unlike any other country's electronic public access system that I am aware of, because it provides complete access to docket text, opinions, and all documents filed (except sealed records, of course).  It is a tremendously useful tool, and (at least at the time of its Web launch in the late 1990s) was tremendously ahead of its time.

However, PACER is unique in another important way: it imposes usage charges on citizens for downloading, viewing, and even searching for case materials. This limitation unfortunately forecloses a great deal of democracy-enhancing activity.

Aaron SwartzThe PACER Liberation Front

In 2008, I happened upon PACER in the course of trying to research a First Amendment issue.  I am not a lawyer, but I was trying to get a sense of the federal First Amendment case law across all federal jurisdictions, because that case law had a direct effect on some activists at the time.  I was at first excited that so much case law was apparently available online, but then disappointed when I discovered that the courts were charging for it.  After turning over my credit card number to PACER, I was shocked that the system was charging for every single search I performed.  With the type of research I was trying to do, it was inevitable that I would have to do countless searches to find what I was looking for.  What's more, the search functionality provided by PACER turned out to be nearly useless for the task at hand -- there was no way to search for keywords, or within documents at all.  The best I could do was pay for all the documents in particular cases that I suspected were relevant, and then try to sort through them on my own hard drive. Even this would be far from comprehensive.

This led to the inevitable conclusion that there is simply no way to know federal case law without going through a lawyer, doing laborious research using print legal resources, or paying for a high-priced database service.  My only hope for getting use out of PACER was to find some way to affordably get a ton of documents.  This is when I ran across a nascent project led by open government prophet Carl Malamud. He called it PACER Recycling.  Carl offered to host any PACER documents that anybody happened to have, so that other people could download them.  At that time, he had only a few thousand documents, but an ingenious plan: The federal courts were conducting a trial of free access at about sixteen libraries across the country. Anyone who walked in to one of those libraries and asked for PACER could browse and download documents for free. Carl was encouraging a "thumb drive corps" to bring USB sticks into those libraries and download caches of PACER documents.

The main bottleneck with this approach was volume. PACER contains hundreds of millions of documents, and manually downloading them all was just not going to happen. I had a weekend to kill, and an idea for building on his plan. I wrote up a Perl script that could run off of a USB drive and that would automatically start going through PACER cases and downloading all of the documents in an organized fashion. I didn't live near one of the "free PACER" libraries, so I had to test the script using my own non-free PACER account... which got expensive. I began to contemplate the legal ramifications -- if any -- of downloading public records in bulk via this method. The following weekend I ran into Aaron Swartz.

Aaron is one of my favorite civic hackers. He's a great coder and has a tendency to be bold. I told him about my little project, and he asked to see the code. He made some improvements and, given his higher tolerance for risk, proceeded to use the modified code to download about 2,700,000 files from PACER. The U.S. Courts freaked out, cancelled the free access trial, and said that "[t]he F.B.I. is conducting an investigation." We had a hard time believing that the F.B.I. would care about the liberation of public records in a seemingly legal fashion, and told The New York Times as much. (Media relations pro tip: If you don't want to be quoted, always, repeatedly emphasize that your comments are "on background" only. Even though I said this when I talked to The Times, they still put my name in the corresponding blog post. That was the first time I had to warn my fiancée that if the feds came to the door, she should demand a warrant.)

A few months later, Aaron got curious about whether the FBI was really taking this seriously. In a brilliantly ironic move, he filed a FOIA for his own FBI record, which was delivered in due course and included such gems as:

Between September 4, 2008 and September 22, 2008, PACER was accessed by computers from outside the library utilizing login information from two libraries participating in the pilot project. The Administrative Office of the U.S. Courts reported that the PACER system was being inundated with requests. One request was being made every three seconds.

[…] The two accounts were responsible for downloading more than eighteen million pages with an approximate value of $1.5 million.

The full thing is worth a read, and it includes details about the feds looking through Aaron's Facebook and LinkedIn profiles. However, the feds were apparently unable to determine Aaron's current residence and ended up staking out his parents' house in Illinois. The feds had to call off the surveillance because, in their words: "This is a heavily wooded, dead-end street, with no other cars parked on the road making continued surveillance difficult to conduct without severely increasing the risk of discovery." The feds eventually figured out Aaron wasn't in Illinois when he posted to Facebook: "Want to meet the man behind the headlines? Want to have the F.B.I. open up a file on you as well? Interested in some kind of bizarre celebrity product endorsement? I’m available in Boston and New York all this month." They closed the case.

RECAPTurning PACER Around

Carl published Aaron's trove of documents (after conducting a very informative privacy audit), but the question was: what to do next? I had long given up on my initial attempt to merely understand a narrow aspect of First Amendment jurisprudence, and had taken up the PACER liberation cause wholeheartedly. At the time, this consisted of writing about the issue and giving talks. I ran across a draft article by some folks at Princeton called "Government Data and the Invisible Hand." It argued:

Rather than struggling, as it currently does, to design sites that meet each end-user need, we argue that the executive branch should focus on creating a simple, reliable and publicly accessible infrastructure that exposes the underlying data. Private actors, either nonprofit or commercial, are better suited to deliver government information to citizens and can constantly create and reshape the tools individuals use to find and leverage public data.

I couldn't have agreed more, and their prescription for the executive branch made sense for the brain-dead PACER interface too. I called up one of the authors, Ed Felten, and he told me to come down to Princeton to give a talk about PACER. Afterwards, two graduate students, Harlan Yu and Tim Lee, came up to me and made an interesting suggestion. They proposed a Firefox extension that anyone using PACER could install. As users paid for documents, those documents would automatically be uploaded to a public archive. As users browsed dockets, if any documents were available for free, the system would notify them of that, so that the users could avoid charges. It was a beautiful quid-pro-quo, and a way to crowdsource the PACER liberation effort in a way that would build on the existing document set.

So Harlan and Tim built the extension and called it RECAP (tagline: "Turning PACER around" Get it? eh?). It was well received, and you can read the great endorsements from The Washington Post, The L.A. Times, The Guardian, and many like-minded public interest organizations. The courts freaked out again, but ultimately realized they couldn't go after people for republishing the public record.

I helped with a few of the details, and eventually ended up coming down to work at their research center, the Center for Information Technology Policy. Last year, a group of undergrads built a fantastic web interface to the RECAP database that allows better browsing and searching than PACER. Their project is just one example of the principle laid out in the "Government Data and the Invisible Hand" paper: when presented with the raw data, civic hackers can build better interfaces to that data than the government.

PACER Revenue/Expenditure GraphFrom Fee to Free

Despite all of our efforts, the database of free PACER materials still contains only a fraction of the documents stored in the for-fee database. The real end-game is for the courts to change their mind about the PACER paywall approach in the first place. We have made this case in many venues. Influential senators have sent them letters. I have even pointed out that the courts are arguably violating The 2002 E-Government Act. As it happens, PACER brings in over $100 million annually through user fees. These fees are spent partially on supporting PACER's highly inefficient infrastructure, but are also partially spent on various other things that the courts deem somehow related to public access. This includes what one judge described as expenditures on his courtroom:

"Every juror has their own flatscreen monitors. We just went through a big upgrade in my courthouse, my courtroom, and one of the things we've done is large flatscreen monitors which will now -- and this is a very historic courtroom so it has to be done in accommodating the historic nature of the courthouse and the courtroom -- we have flatscreen monitors now which will enable the people sitting in the gallery to see these animations that are displayed so they're not leaning over trying to watch it on the counsel table monitor. As well as audio enhancements. In these big courtrooms with 30, 40 foot ceilings where audio gets lost we spent a lot of money on audio so the people could hear what's going on. We just put in new audio so that people -- I'd never heard of this before -- but it actually embeds the speakers inside of the benches in the back of the courtroom and inside counsel tables so that the wood benches actually perform as amplifiers."

I am not against helping courtroom visitors hear and see trial testimony, but we must ask whether it is good policy to restrict public access to electronic materials on the Internet in the name of arbitrary courtroom enhancements (even assuming that allocating PACER funds to such enhancements is legal, which is questionable). The real hurdle to liberating PACER is that it serves as a cross-subsidy to other parts of our underfunded courts. I parsed a bunch of appropriations data and committee reports in order to write up a report on actual PACER costs and expenditures. What is just as shocking as the PACER income's being used for non-PACER expenses, is the actual claimed cost of running PACER, which is orders of magnitude higher than any competent Web geek would tell you it should be (especially for a system whose administrators once worried that "one request was being made every three seconds."). The rest of the federal government has been moving toward cloud-based "Infrastructure as a Service", while the U.S. Courts continue to maintain about 100 different servers in each jurisdiction, each with their own privately leased internet connection. (Incidentally, if you enjoy conspiracy theories, try to ID the pseudonymous "Schlomo McGill" in the comments of this post and this post.)

The ultimate solution to the PACER fee problem unfortunately lies not in exciting spy-vs-spy antics (although those can be helpful and fun), but in bureaucratic details of authorization subcommittees and technical details of network architecture. This is the next front of PACER liberation. We now have friends in Washington, and we understand the process better every day. We also have very smart geeks, and I think that the ultimate finger on the scale may be our ability to explain how the U.S. Courts could run a tremendously more efficient system that would simultaneously generate a diversity of new democratic benefits. We also need smart librarians and archivists making good policy arguments. That is one reason why the Law.gov movement is so exciting to me. It has the potential not only to unify open-law advocates, but to go well beyond the U.S. Federal Case Law fiefdom of PACER.

Perhaps then I can finally get the answer to that narrow legal question I tried to ask in 2008. I'm sure that the answer will inevitably be: "It's complicated."

Stephen SchultzeSteve Schultze is Associate Director of The Center for Information Technology Policy at Princeton. His work includes Internet privacy, security, government transparency, and telecommunications policy. He holds degrees in Computer Science, Philosophy, and Media Studies from Calvin College and MIT. He has also been a Fellow at The Berkman Center for Internet & Society at Harvard, and helped start the Public Radio Exchange.