Free (as in Law)
We’d like to share a bit about how we approach the heavy traffic that’s been keeping us on our toes. We’ve been talking a lot recently about the ways in which changes to Google’s traffic reports have been an occasion for us to revisit our thinking about who uses the site and how they do so. Here we’re relying not just on the data from people whose use of the website shows up in Google Analytics, but also from the ever-growing assemblage of automated user agents ranging from long-standing search engine crawlers to newer corporate large language model crawlers to custom crawlers that seem to represent some very curious researchers on home internet connections.
A little over a month ago, LII’s leadership attended the Law Via the Internet Conference, meeting with other members of the worldwide Free Access to Law Movement. As you might guess, everyone was talking about the use and refinement of artificial intelligence tools, particularly large language models. The atmosphere encompassed a mix of excitement, caution, curiosity, and realism. But one element of the discussion stood out as a point of strong divergence: what are the limits of the “free” and “access” in “free access to law”? Does free mean “free as in beer” or “free as in speech”?
It turns out that we have a different intuition from some of our sibling LIIs who run services designed to be used primarily by members of the legal profession. Although (like us) they would never stand in the way of others gaining free access to information published by their respective governments, (also like us) they have invested significant amounts of resources over many years to standardize and enrich the documents they work with. But because (unlike us) many of them focus primarily on publishing case law, privacy concerns alone have given them plenty of reasons not to allow crawlers access to their websites. A policy not to now open their data for free AI-related consumption is not far from their long-standing practice. Use of their services is free as in beer.
We take a very different approach. Our content is findable primarily because it has always been crawlable, and (almost all) reusable. That policy enabled our site to reach the top of search rankings for vast numbers of search terms. By extension, it enabled our data to become part of the pool of documents used to train most of the large language models. In turn, that openness – and the openness of other U.S.-based free access to law operations – means that even if they’re still having some trouble getting their facts straight today, the large language models we interact with have access to the data they need to get it right tomorrow. Even if we don’t ever see the readers who benefit from that data, it means that our work can serve the public in ways we can’t begin to imagine. As best we can, we offer services that are free as in speech.
But it gets tricky. These crawlers all receive free access to the LII website and in turn make LII’s work usable by millions more people than we would ever reach on our own. But their traffic is not free for us to field. (If you’re wondering why we don’t consider changing our approach, the reality is that we are a very small group, and even if we wished to change our policy to include a paid service for corporate reusers, it would come at the expense of using our technical and administrative resources for direct support of our core mission.) For both principled and practical reasons, everyone – human or crawler alike – receives a standard of access that supports all of our users.