Bruce Thomas

Thomas R. Bruce is co-founder and director of the Legal Information Institute at the Cornell Law School, the first legal-information web site in the world. He was the author of the first Web browser for Microsoft Windows, and has been the principal technical architect for online legal resources ranging from fourteenth-century law texts to the current decisions of the United States Supreme Court. Mr. Bruce has consulted on Internet matters for numerous commercial, governmental, and academic organizations on four continents. He has been a fellow of the Center for Online Dispute Resolution at the University of Massachusetts, and a Senior International Fellow at the University of Melbourne Law School. He is an affiliated researcher in Cornell's program in Information Science, where he works closely with faculty and students who experiment with the application of advanced technologies to legal texts. He currently serves as a member of the ABA Administrative Law Section Special Committee on e-Rulemaking, on Cornell's Faculty Advisory Board for Information Technology, and is a longtime member of the board of directors of the Center for Computer-Assisted Legal Instruction. He has been known to play music at high volume.

Deeply intertwingled laws

25-for-25 No Responses »

Oct 062017

by John Sheridan

There is no other form of written texts quite like legislation, nor a form so suited to the web. In retrospect, readers of legislation had been waiting a long time for a hypertext system, such as the web, to be invented. It turns out, we needed one all along. The web has been transformative, primarily in terms of its impact to enable access to the law. Ready and easy public access to legislation (ideally, up to date, revised versions of the text) is of crucial importance to the rule of law, as Carl Malamud wrote in his Birthday Message as part of this series.

Widening access

The web has widened access to the law beyond all recognition in just a couple of decades. As services like legislation.gov.uk demonstrate, non-lawyers can and do routinely consult primary sources of law. We see millions of visits every month to legislation.gov.uk and we know most users of our service are not legally trained or qualified. This is to be celebrated, not least as an important contribution to the rule of law. In publishing the first legislation on the web in the early 1990s, Tom Bruce and his team at Cornell LII saw the potential of the new technology very early on and have blazed a trail ever since. We are still working through the implications of that transformation.

Legislation as data

Part of this transformation is to think about legislation as data. Anne Washington writes eloquently in her post Documents to Data – A Legal Memex, this is in itself an enormous leap forward. As the official publisher of legislation in the UK, we designed legislation.gov.uk to be both an end user service and to provide legislation data as infrastructure through an open API. We deliver legislation as open data, which others can readily access, use and re-use in their own products and services.

Non-linear content

There’s another story here too. Despite the drafter’s best efforts to create a narrative structure that tells a story through the flow of provisions, legislation is intrinsically non-linear content. It positively lends itself to a hypertext based approach. The need for legislation to escape the confines of the printed form predates the all major innovators and innovations in hypertext, from Vannevar Bush’s vision in ” As We May Think“, to Ted Nelson’s coining of the term “hypertext”, through to and Berners-Lee’s breakthrough world wide web. I like to think that Nelson’s concept of transclusion was foreshadowed several decades earlier by the textual amendment (where one Act explicitly alters – inserts, omits or amends – the text of another Act, an approach introduced to UK legislation at the beginning of the 20th century).

Function and form

There’s long been a relationship between the framing of law and the form of law – “we shape our tools and they in turn shape us” so to speak. England has a body of statute law dating back to the 13th century. Not least through the records we hold at The National Archives, it is possible to trace the evolution of the related concepts of Parliament, Law and Government over 800 years – and through that evolution see the impact of the technologies used to encode the law. From the Statute of Marlborough (the statute being the physical roll onto which the agreements of the Marlborough Parliament were written – only later, in part thanks to printing, did we come to view these portions of texts as separate Acts), the modern form of legislation has emerged gradually over time.

The Act of Supremacy, 1534 (26 Hen. VIII c. 1), a typical example of hand written legislation.

The printing press had an enormous impact on the form of law, codified for the those of us with the Westminster system, in the 19th century by Lord Thring in his book Practical Legislation. More recently, modern drafting tools, that enable the easy management of cross-references within and between texts, have enabled far more deeply entwined laws. The complexities of the modern world have given us deeply intertwingled laws.

Legislation is deeply intertwingled

Today’s statutes can be conceptually dense. Yet there’s no rhetoric or repetition to aid the reader, nor, in the UK at least, are there often purpose provisions, examples or diagrams to elucidate the meaning. Legislating in a common law environment, an Act of Parliament, even a principal Act, does not so much set out the law in an area, as set out the changes to an existing body of law – changes across both statute and common law. Statutes rely on the existing law as well as change it. They will use terminology around which a body of pre-existing law exists, either through Interpretation Acts, or established case law for terms such as “reasonable”, or “having regard to”. Moreover, they will be silent on topics where the existing law does not need to change. The intertwingled nature of modern statute law provides a significant barrier to understanding and comprehension, for lawyers and non-lawyers alike. As Alison Bertlin writes in her Loophole article, What works best for the reader? A study on drafting and presenting legislation, “Readers seem to have very little grasp of how legislation is structured and organised. Their ‘mental model’ of it is simply not very good.” It is here that we publishers can best make a contribution to accessible law through how we present legislation on the web – retrofitting the mental model the reader lacks, through the user interfaces we design.

Limits on machine processable laws

As a technologist by background, I recognise the desire to think of the body of legislation as being the Operating System for a country. In many ways, I like the analogy and we are making progress towards machine processable laws. There are valuable standardisation efforts like LegalRuleML, which seeks to define a rule interchange language for the legal domain. The aim is to enable the structuring, evaluation and comparison of legal arguments represented using the rule language. In the era of digital government this idea has real allure. There are potentially enormous benefits and efficiency gains to be had from running legislation as code. However, I often wonder whether these efforts perhaps miss the nature of the law, or at least the common law. The intertwingled nature of the law – a feature that aids the common law’s resilience and ultimately underpins our notion of justice – poses significant engineering challenges. There may be better prospects for machine processable legislation where there are formal legal codes, which sets out all the law in a given domain. It will be interesting whether legislation becomes more codified, modular and self-contained in future, in a drive towards machine processability.

There is much still to do

We have turned our legislation into hypertext and massively widened access, but we have yet to escape the printed form of legislation or to realise the full benefits. This is compounded by the limitations of the web as a hypertext system and the complexity of legislation – imagine what we could do if we had something closer to Ted Nelson’s original vision! It is a revolution still underway, and for all of us interested in legal informatics, there is much still to do.

John Sheridan is the Digital Director for the National Archives of the United Kingdom. He was previously Head of Legislation Services at The National Archives, where he led the team responsible for developing legislation.gov.uk.

25 for 25 - Documents to Data : A Legal Memex

25-for-25 No Responses »

Aug 212017

by Anne Washington

When the United States first mandated the preservation of government proceedings, clerks wrote in bound blank books. In 1789, Congress passed a law providing for the “safe-keeping” of legislative and government documents (1 Stat. 68).

When the Cornell Legal Information Institute started, digital information was shared on CD-ROMs and 3-1/2 inch disks. In 1992, two people on a cold day in upstate New York turned on a server box to provide law on the Internet.

To celebrate the remarkable work of the Cornell Legal Information Institute in its 25th year and the 228th year of the first US Congress, this is a provocation about what we might celebrate in future.

Vannaver Bush in 1945 imagined a machine he called a Memex that would enable the instant retrieval of “trails” of memory. He briefly discussed how it could apply to the law: “The lawyer has at his touch the associated opinions and decisions … The patent attorney has on call the millions of issued patents, with familiar trails to every point of his client’s interest.” (Bush, 1945, p 8)

What features might be in a future legal Memex ?

Imagine that it is 2042 and the 50th anniversary of LII.

The Vice President is listening to a debate on the Senate floor about a bill related to a campaign issue. She tracks public opinion on the social media site Face-oogle, particularly noting what the Majority Leader’s constituents are saying. More importantly, she receives an automated predictive score from the Congressional Data & Budget Office (CDB) about which existing laws might be impacted by a proposed amendment.

She focuses on the predicted laws that meet her preferred portfolio factors. She realizes that this would change an important law. She requests 3-5 potential placements of the amendment in the United States Code and creates composite texts. After comparing the composite text with the official United States Code source, she is ready to spread the word.

The daughter of the first librarian in the White House notifies the Attorney General.

The Attorney General quickly turns to the Legal Information Institute which now tracks all USC citations and their use in case law. The law has been used in several appeals court cases. LII has a detailed description of a pending case before the US Supreme Court that is related to this law. The Attorney General specifically notes the cases that have appeared before the appeals courts associated with the Majority Leader’s state.

In a brief summary, the Attorney General gives legal advice about potential enforcement issues. The message contains the proposed amendment, as it would appear in the United States Code if passed.

In an unprecedented continuation of two American political dynasties, Vice President Barbara Pierce Bush, sends a confidential instant message to President Mahlia Obama. The two are known for their ability to bridge by partisan divides through quick access to open government documents.

The President wants to understand the impact of the amendment on administrative law. A quick LII search provides a list of federal regulations that may be related to changes in the United States Code. In combination with internal government sources, her assistant is able to identify pending Federal Regulations that might be impacted by the amendment. He reviews the historic point-in-time directories of the Code of Federal Regulation. He completes an analysis of how this amendment might streamline workflows or create conflicts between agencies.

The Vice-President reaches across the aisle to the Majority Leader and shares her analysis. They agree on slight changes to the language in order to meet constituent opinion, address the policy concerns, and to keep the government running effectively.

When the amendment passes, LII notifies all lawyers, who have argued or written about the law and who have opted-in to notification services. The Senate immediately publishes everything as open data and it appears on G! government entertainment live-casting.

My legal Memex builds a network of the people and laws available in the public records of politicians and organizations. The infrastructure for this vision relies on open data, free access to law, and instantaneously availability. The text analysis and machine learning assumes a neutral analysis that is both defendable and leaves room for interpretation.

How far are we from meeting this vision?

As a scholar of legislative organizations, I consider the necessary institutional foundations for a successful legal information institute (LII). The past gives us encouragement. The past also shows how some fundamental principles are necessary to create reliable, accurate, timely, and authoritative data.

A few examples in the history of the United States show the progression necessary for creating the basis for the transition from documents to data.

Two weeks before the end of the first Congress, the chambers pass a law which orders the printing of government records. By the 13th Congress, they realized that distribution is just as important as printing. The Federal Depository Library Program (FDLP), which ensures that documents and skilled librarians are available in every jurisdiction, has its origins in 3 Stat 140 (1813).

The idea that there was too much legal information for any one person to track was evident by the early 20th century. The United States Code, finished in 1926, provided subject access to general and permanent federal laws. Congress thought it was a good idea to create indexes and summaries of all pending legislation as well. The Digest of Public General Bills, which first appeared in 1936, continued to be published in annual volumes through 1990 when it was subsumed into an online database. Congress, as early as 1999, embraced structured data formats so today we have at least the United States Code, votes, legislation, administrative law (Code of Federal Regulations) published in XML.

Law must be documented, printed, distributed, indexed, and structured. While important to the free access to law movement, these developments are equally vital to the internal procedures of public sector organizations. The legislative, judicial, and executive branches could learn from each other and grow together.

The transition from documents to data is vital to modernizing the functioning of our centuries-old bureaucracies around the world, as my other 25 for 25 authors have attested. The projects initiated by Sarah Frug, Sylvia Kwakye and others working on LII in 2017 lead the way with innovative solution for tomorrow.

Law and legislation do not sit still. For instance, the promulgation of laws in the form of the US Records Act was updated in 1795, 1796, 1814, 1816, 1820 and 1842. We need patience as we move forward. Policy has always been iterative.

It is time for the next generation of legal, policy, and technology pioneers to move the free access to law movement forward.

What is your ideal legal Memex?

Anne L. Washington is a digital government scholar who specializes in informatics and technology management. Her expertise on government data currently addresses the emerging policy and governance needs of data science. The National Science Foundation has funded her research on open government data and data-intensive political science. Her work draws on both interpretive research methods and computational text analysis. She is an Assistant Professor at the Schar School of Policy and Government at George Mason University in Arlington, VA, where she teaches organizational ethnography, socio-technical analysis, and electronic government. Prof. Washington serves on the Advisory Board of the Electronic Privacy Information Center (EPIC) and the Open Government Foundation. She has also served on the United Nations World E-parliament Working Group on XML in Parliament, the Oasis LegalXML technical committee of citations, and Federal Web Content Managers Usability Task Force. She was an invited expert to the W3C E-Government Interest Group and the W3C Government Linked Data Working Group.She holds a Bachelors of Arts (BA) in computer science from Brown University, and a Masters in Library Information Science (MLIS) from Rutgers University. She earned a PhD in Information Systems and Technology Management from The George Washington University School of Business with a secondary field in Organization Behavior. Prior to completing her doctorate, she had extensive work experience in information architecture and information technology after years with the Congressional Research Service at the Library of Congress, Barclays Global Investors, Wells Fargo Nikko Investment Advisors and Apple Computer.

25 for 25: For 25 Years, the LII Has Set the Standard for Legal Publishing Online

Uncategorized 1 Response »

May 122017

by Robert J. Ambrogi

On the 25th anniversary of the Legal Information Institute, I’m wondering how I talk about its significance without sounding like an old fogey. You know what I mean – those statements from your elders that always start, “Why, when I was younger …”

But to understand how far ahead of its time the LII was, you need to understand something about the world of legal information as it existed in 1992, the year it launched.

First of all, the Internet was still in its infancy, relative to what we know it as today. It was still a limited, text-only medium used primarily by academics and scientists, navigable only through archaic protocols with names such as Gopher, Archie, Jughead and Veronica (I’m not making those up), using esoteric and confusing commands.

The first functioning hyperlinked version of the Internet – what we came to call the World Wide Web – had been developed just the year before, by Tim Berners-Lee at CERN in Switzerland. The web only began to gain momentum in 1993 – a year after the LII’s founding – with the development of the first two browsers that allowed graphical elements on web pages. One, Mosaic, later became Netscape Navigator, and the other, Cello, was the first browser designed to work with Microsoft Windows 3.1, which first went on the market the year before.

Remarkably, the Cello browser was created at the LII by cofounder Thomas R. Bruce so that the LII could begin to implement its vision of publishing hyperlinked legal materials on the Internet. How’s that for ahead of its time? No one had yet created a graphical browser that worked with Windows, so the LII built the first one.

As for the availability of legal information on the Internet in 1992 – fuggedaboutit, there wasn’t any. Neither Westlaw nor Lexis-Nexis were accessible through the Internet; access required either a proprietary dial-up terminal connecting over painfully slow phone lines or a visit to the library for hard-copy volumes. There was virtually no online access to court opinions or statutes or legal materials of any kind. Few legal professionals had even heard of the Internet.

In fact, the LII was the first legal site on the Internet. Think about that – about the proliferation and ubiquity of law-related websites today – and consider how prescient and trailblazing were Bruce and cofounder Peter W. Martin when they started the LII that quarter-century ago.

In short order, they developed and set in motion a model of free legal publishing that carried us to where we are today. They were the first to begin regularly publishing Supreme Court opinions on the Internet – at least a decade before the Supreme Court even had a website of its own. They published the first online edition of the U.S. Code in 1994. They created the first “crowdsourced” legal encyclopedia, Wex.

Blazing the Internet Trail

I came to the Internet party a bit later. In 1995, I began syndicating a column for lawyers about the Internet. In my third column, in May 1995, I surveyed the availability of court decisions on the Internet. Apart from the Supreme Court decisions then available through the LII’s website and some academic FTP sites, the only other decisions that could be found for free on the Web were those of two federal circuits, the 3rd and 11th (and only a year’s worth); the New York Court of Appeals; and the Alaska Supreme Court and Court of Appeals. North Carolina opinions were online via an older Gopher site.

In short, even three years after the LII’s launch, the Internet was still far from a viable medium for legal research. Here is how I described the situation in a December 1995 column:

When it comes to legal research, the Internet remains a promise waiting to be fulfilled. The promise is of virtually no-cost, electronic access to vast libraries of information, of an easily affordable alternative to Westlaw and Lexis that will put solo and small-firm lawyers on the same footing as their large-firm brothers and sisters.

The reality is that the Information Superhighway is littered with speed bumps. Courts, legislatures and government agencies have been slow to put their resources online. Those that do offer only recent information, with little in the way of archives. Secondary sources, such as treatises, remain even rarer. On top of it all, information on the Internet can be hard to find, requiring resort to a variety of indexes and search engines.

Yes, youngsters, we used to call it the Information Superhighway. Blame Al Gore.

The point is, we’ve come a long way baby. And there is little question in my mind that we would not be where we are today had Tom and Peter not had the crazy idea to launch the LII. From the start, their notion was to make the law freely and easily available to everyone. As the website says to this day, the LII “believes everyone should be able to read and understand the laws that govern them, without cost.”

In 1992, that was a revolutionary concept. Heck, in 2017, it is a revolutionary concept.

They didn’t have to go that route. They could have pursued a commercial enterprise in the hope of cashing in on the potential they saw in this emerging medium. But they didn’t. They chose the route they now call “law-not-com.”

Continuing to Set the Standard

In my old-fogey nostalgia, I’ve been speaking about the LII in the past tense. Yet what is perhaps most remarkable about the LII is that it continues to set the standard for excellence and innovation in legal publishing. In the technology world, trailblazers often get left in the dust of the stampede that follows in their paths. But the LII continues to expand and innovate, both in the collections it houses and in its reach to global audiences.

Last year, for example, the LII became the new home for Oyez, the definitive collection of audio recordings of Supreme Court oral arguments. And, as more and more citizens take an interest in understanding their legal rights, traffic to the LII has been booming.

Twenty-five years after the LII ventured out into a largely barren Internet, striving to make legal information more widely available to the public, it is remarkable how far we’ve come. Even so, it is also disappointing how far we still have to go. Unfortunately, the legal-information landscape remains dotted with locked bunkers that keep many primary legal materials outside the public domain.

I don’t begrudge commercial publishing and research companies their right to charge for content they’ve created and innovations they’ve engineered. But I staunchly believe that there needs to be a baseline of free access for everyone to legal and government information. That was the goal of the LII when it launched in 1992 and that is the goal it has continued to work towards ever since. Were it not for the work of the LII, we would be nowhere as near to achieving that goal as we are today.

Robert J. Ambrogi is a lawyer and journalist who has been writing and speaking about legal technology and the Internet for over two decades. He writes the award-winning blog LawSites and is a technology columnist for Above the Law, ABA Journal and Law Practice magazine. Bob is a fellow of the College of Law Practice Management and was named in 2011 to the inaugural Fastcase 50, honoring “the law’s smartest, most courageous innovators, techies, visionaries and leaders.”

25 for 25: Underestimating the Internet

25-for-25 No Responses »

Mar 102017

[ Ed. note: two of Professor Perritt’s papers have strongly influenced the LII. We recommend them here as “extra credit” reading. “Federal Electronic Information Policy” provides the rationale on which the LII has always built its innovations. “Should Local Governments Sell Local Spatial Databases Through State Monopolies?” unpacks issues around the resale of public information that are still with us (and likely always will be )]

In 1990 something called the “Internet” was just becoming visible to the academic community. Word processing on small proprietary networks had gained traction in most enterprises, and PCs had been more than toys for about five years. The many personal computing magazines predicted that each year would be “the Year of the LAN”–local area network–but the year of the LAN always seemed to be next year. The first edition of my How to Practice Law With Computers, published in 1988, said that email could be useful to lawyers and had a chapter on Westlaw and Lexis. It predicted that electronic exchange of documents over wide-area networks would be useful, as would electronic filing of documents, but the word “Internet” did not appear in the index. My 1991 Electronic Contracting, Publishing and EDI Law, co-authored with Michael Baum, focused on direct mainframe connections for electronic commerce between businesses, but barely mentioned the Internet. The word Internet did appear in the index, but was mentioned in only two sentences in 871 pages.

Then, in 1990, The Kennedy school at Harvard University held a conference on the future of the Internet. Computer scientists from major universities, Department of Defense officials, and a handful of representatives of commercial entities considered how to release the Internet from its ties to defense and university labs and to embrace the growing desire to exploit it commercially. I was fortunate to be on sabbatical leave at the Kennedy School and to be one of the few lawyers participating in the conference. In a chapter of a book published by Harvard afterwards on market structures, I said, “A federally sponsored high-speed digital network with broad public, non-profit and private participation presents the possibility of a new kind of market for electronic information products, one in which the features of information products are ‘unbundled’ and assembled on a network.”

The most important insight from the 1990 conference was that the Internet would permit unbundling of value. My paper for the Harvard conference and a law review article I published shortly thereafter in the Harvard Journal of Law and Technology talked about ten separate value elements, ranging from content to payment systems, with various forms of indexing and retrieval in the middle. The Internet meant that integrated products were a thing of the past; you didn’t have to go inside separate walled gardens to shop. You didn’t have to pay for West’s key numbering system in order to get the text of a judicial opinion written by a public employee on taxpayer time. Soon, you wouldn’t have to buy the whole music album with nine songs you didn’t particularly like in order to get the one song you wanted. Eventually, you wouldn’t have to buy the whole cable bundle in order to get the History Channel, or to be a Comcast cable TV subscriber to get a popular movie or the Super Bowl streamed to your mobile device.

A handful of related but separate activities developed some of the ideas from the Harvard conference further. Ron Staudt, Peter Martin, Tom Bruce, and I experimented with unbundling of legal information on small servers connected to the Internet to permit law students, lawyers, and members of public to obtain access to court decisions, statutes, and administrative agency decisions in new ways. Cornell’s Legal Information Institute was the result.

David Johnson, Ron Plesser, Jerry Berman, Bob Gellman, Peter Weiss, and I worked to shape the public discourse on how the law should channel new political and economic currents made possible by the Internet. Larry Lessig was a junior recruit to some of these earliest efforts, and he went on to be the best of us all in articulating a philosophy.

By 1996, I wrote a supplement to How to Practice Law With Computers, called Internet Basics for Lawyers, which encouraged lawyers to use the Internet for email and eventually to deliver legal services and to participate in litigation and agency rulemaking and adjudication. In the same year, I published a law review article explaining how credit-card dispute resolution procedures protected consumers in ecommerce.

One by one, the proprietary bastions fell—postal mail, libraries, bookstores, department stores, government agency reading rooms–as customers chose the open and ubiquitous over the closed and incompatible. Now, only a few people remember MCImail, Western Union’s EasyLink, dial up Earthlink, or CompuServe. AOL is a mere shadow of its former self, trying to grab the tail of the Internet that it too long resisted. The Internet gradually absorbed not only libraries and government book shops but also consumer markets and the legislative and adjudicative processes. Blockbuster video stores are gone. Borders Books is gone. The record labels are mostly irrelevant. Google, Amazon, and Netflix are crowding Hollywood. Millions of small merchants sell their goods every second on Amazon and eBay The shopping malls are empty. Amazon is building brick-and-mortar fulfillment warehouses all over the place. Tens of millions of artists are able to show their work on YouTube.

Now the Internet is well along in absorbing television and movies, and has begun absorbing the telephone system and two-way radio. Video images move as bit streams within IP packets. The rate at which consumers are cutting the cord and choosing to watch their favorite TV shows or the latest Hollywood blockbusters through the Internet is dramatic.

Television and other video entertainment are filling up the Internet’s pipes. New content delivery networks bypass the routers and links that serve general Internet users in the core of the Internet. But the most interesting engineering developments relate to the edge of the Internet, not its core. “Radio Access Networks,” including cellphone providers, are rushing to develop micro-, nano-, pico-, and femto-cells beyond the traditional cell towers to offload some of the traffic. Some use Wi-Fi, and some use licensed spectrum with LTE cellphone modulation. Television broadcasters meanwhile are embracing ATSC 3.0, which will allow their hundred megawatt transmitters to beam IP traffic over their footprint areas and – a first for television – to be able to charge subscribers for access.

The telephone industry and the FCC both have acknowledged that within a couple of years the public telephone system will no longer be the Public Switched Telephone System; circuit switches will be replaced completely by IP routers.

Already, the infrastructure for land mobile radio (public safety and industrial and commercial services) comprises individual handsets and other mobile transceivers communicating by VHF and UHF radio with repeater site or satellites, tied together through the Internet.

Four forces have shaped success: Conservatism, Catastrophe forecasts, Keepers of the Commons, and Capitalism. Conservatism operates by defending the status quo and casting doubt about technology’s possibilities. Opponents of technology have never been shy. A computer on every desk? “Never happen,” the big law firms said. “We don’t want our best and brightest young lawyers to be typists.”

Communicate by email? ”It would be unethical,” dozens of CLE presenters said. “Someone might read the emails in transit while they are resting on store-and-forward servers.” (The email technology of the day did not use store-and-forward servers.).

Buy stuff online? “It’s a fantasy,” the commercial lawyers said. “No one will trust a website with her credit card number. Someone will have to invent a whole new form of cybermoney.”

Catastrophe has regularly been forecast. “Social interaction will atrophy. Evil influences will ruin our kids. Unemployment will skyrocket,” assistant professors eager for tenure and journalists jockeying to lead the evening news warned. “The Internet is insecure!” cybersecurity experts warned. “We’ve got to stick with paper and unplugged computers.” The innovators went ahead anyway and catastrophe did not happen. A certain level of hysteria about how new technologies will undermine life is normal. It is always easier to ring alarm bells than to understand the technology and think about its potential.

Keepers of the Commons—the computer scientists who invented the Internet—articulated two philosophies, which proved more important than engineering advances in enabling the Internet to replace one after another of preceding ways of organizing work, play, and commerce. To be sure, new technologies mattered. Faster, higher quality printers were crucial in placing small computers and the Internet at the heart of new virtual libraries, first Westlaw and Lexis and then and then Google and Amazon. Higher speed modems and the advanced modulation schemes they enabled made it faster to retrieve an online resource than to walk to the library and pull the same resource off the shelf. One-click ordering made e-commerce more attractive. More than 8,000 RFCs define technology standards for the Internet.

The philosophies shaped use of the technologies. The first was the realization that an open architecture opens up creative and economic opportunities for millions of individuals and small businesses that otherwise were barred from the marketplace by high barriers to entry. Second was the realization that being able to contribute creatively can be powerful motivators for activity, alongside expectations of profit. The engineers who invented the Internet have been steadfast in protecting the commons: articulating the Internet’s philosophy of indifference to content, leaving application development for the territory beyond its edges, and contributing untold hours to the development of open standards called “Requests for Comment” (“RFCs”). Continued work on RFCs, services like Wikipedia and LII, and posts to YouTube show that being able to contribute is a powerful motivator, regardless of whether you make any money. Many of these services prove that volunteers can add a great deal of value to the marketplace, with higher quality, often, than commercial vendors.

Capitalism has operated alongside the Commons, driving the Internet’s buildout and flourishing as a result. Enormous fortunes have been made in Seattle and Silicon Valley. Many of the largest business enterprises in the world did not exist in 1990.

Internet Capitalism was embedded in evangelism. The fortunes resulted from revolutionary ideas, not from a small-minded extractive philosophy well captured by the song “Master of the House” in the musical play, Les Miserables:

Nothing gets you nothing, everything has got a little price

Reasonable charges plus some little extras on the side

Charge ‘em for the lice, extra for the mice,

Two percent for looking in the mirror twice

Here a little slice, there a little cut,

Three percent for sleeping with the window shut.

Throughout most of the 1990s, the established, legacy firms were Masters of the House, unwilling to let the smallest sliver of an intellectual property escape their clutches without a payment of some kind. They reinforced the walls around their asset gardens and recruited more tolltakers than creative talent. The gates into the gardens work differently, but each charges a toll.

Meanwhile, the Apples, Googles, and Amazons of the world flourished because they offered something completely different–more convenient and more tailored to the way that consumers wanted to do things. Nobody ever accused Steve Jobs of giving away much for free or being shy in his pricing, but he made it clear that when you bought something from him you were buying something big and something new.

The tension between Commons and Capitalism continues. In the early days, it was contest between those who wanted to establish a monopoly over some resource – governmental information such as patent office records, Securities and Exchange Commission filings, or judicial opinions and statutes –and new entrants who fought to pry open new forms of access. Now the video entertainment industry’s Master of the House habits are getting in the way of the necessary adaptation to cord cutting, big time. The video entertainment industry is scrambling to adapt its business models.

Intellectual property law can be an incentive to innovate, but it also represents a barrier to innovation. Throughout much of the 1980s, when the Internet was taking shape, law was uncertain whether either patent or copyright was available for computer software. Yet new businesses by the hundreds of thousands flocked to offer the fruits of their innovative labors to the marketplace. To be sure, as Internet related industries matured, their managers and the capital markets supporting seek copyright and patent protection of assets to encourage investment.

Whether you really believe in a free market depends on where you sit at a particular time. When you have just invented something, you think a free market is great as you try to build a customer base. Interoperability should be the norm. Once you have a significant market share, you think barriers to entry are great, and so do your potential investors. Switching costs should be as high as possible.

The Master of the House still operates his inns and walled gardens. Walled gardens reign supreme with respect to video entertainment. Popular social media sites like Facebook, Snapchat, Twitter, and YouTube are walled gardens. Technologically the Master of the House favors mobile apps at the expense of mobile web browsers; it’s easier to lock customers in a walled garden with an app; an app is a walled garden.

An Internet architecture designed to handle video entertainment bits in the core of the Internet will make it more difficult to achieve net neutrality. CDNs are private networks, outside the scope of the FCC’s Open Network order. They are free to perpetuate and extend the walled gardens that Balkanize the public space with finely diced and chopped intellectual property rights.

Net Neutrality is fashionable, but it also is dangerous. Almost 100 years of experience with the Interstate Commerce Commission, the FCC, and the Civil Aeronautics Board shows that regulatory schemes adopted initially to ensure appropriate conduct in the marketplace also make the coercive power of the government available to legacy defenders of the status quo who use it to stifle innovation and the competition that results from it.

It’s too easy for some heretofore un-appreciated group to claim that it is being underserved, that there is a new “digital divide,” and that resources need to be devoted primarily to making Internet use equitable. In addition, assistant professors seeking tenure and journalists seeking the front page or to lead the evening news are always eager to write stories and articles about how new Internet technologies present a danger to strongly held values. Regulating the Internet like telephone companies provide a well-established channel for these political forces to override economic and engineering decisions.

Security represents another potent threat. Terrorists, cyberstalkers, thieves, spies, and saboteurs use the Internet – like everyone else. Communication networks from the earliest days of telegraph, telephones, and radio, have made the jobs of police and counterintelligence agencies more difficult. The Internet does now. Calls for closer monitoring of Internet traffic, banning certain content, and improving security are nothing new. Each new threat, whether it be organization of terrorist cells, more creative email phishing exploits, or Russian interference in American elections intensifies calls for restrictions on the Internet. The incorporation of the telephone system and public safety two-way radio into the Internet will make it easier for exaggerated concerns about network security to make it harder to use the Internet. Security can always be improved by disconnecting. It can be improved by obscuring usefulness behind layers of guards. The Internet may be insecure, but it is easy to use.

These calls have always had a certain level of resonance with the public, but so far have mostly given way to stronger voices protecting the Internet’s philosophy of openness. Whether that will continue to happen is uncertain, given the weaker commitment to freedom of expression and entrepreneurial latitude in places like China, or even some places in Europe. Things might be different this time around because of the rise of a know-nothing populism around the world.

The law has actually has had very little to do with the Internet’s success. The Internet has been shaped almost entirely by entrepreneurs and engineers. The two most important Internet laws are shields: In 1992, my first law review article on the Internet, based on my participation in the Harvard conference said:

Any legal framework . . . should serve the following three goals: (1) There should be a diversity of information products and services in a competitive marketplace; this means that suppliers must have reasonable autonomy in designing their products; (2) users and organizers of information content should not be foreclosed from access to markets or audiences; and (3) persons suffering legal injury because of information content should be able to obtain compensation from someone if they can prove traditional levels of fault.

It recommended immunizing intermediaries from liability from harmful content as long as they acted like common carriers, not discriminating among content originators—a concept codified in the safe harbor provisions of the Digital Millennium Copyright Act and section 230 of the Communications Decency Act, both of which shield intermediaries from liability for allegedly harmful content sponsored by others. In that article and other early articles and policy papers, I urged a light touch for regulation.

David Johnson, David Post, Ron Plesser, Jack Goldsmith, and I used to argue in the late 1990s about whether the world needed some kind of new some kind of new Internet law or constitution. Goldsmith took the position that existing legal doctrines of tort, contract, and civil procedure were perfectly capable of adapting themselves to new kinds of Internet disputes. He was right.

Law is often criticized for being behind technology. That is not a weakness; it is a strength. For law to be ahead of technology stifles innovation. What is legal depends on guesses lawmakers have made about the most promising directions of technological development. Those guesses are rarely correct. Law should follow technology, because only if it does so will it be able to play its most appropriate role of filling in gaps and correcting the directions of other societal forces that shape behavior: economics, social pressure embedded in the culture, and private lawsuits.

Here is the best sequence: a new technology is developed. A few bold entrepreneurs take it up and build it into their business plans. In some cases it will be successful and spread; most cases it will not. The technologies that spread will impact other economic players. It will threaten to erode their market shares; it will confront them with choosing new technology if they wish to remain viable businesses; it will goad them into seeking innovations in their legacy technologies.

The new technology will probably cause accidents, injuring and killing some of its users and injuring the property and persons of bystanders. Widespread use of the technology also will have adverse effects on other, intangible interests, such as privacy and intellectual property. Those suffering injury will seek compensation from those using the technology and try to get them to stop using it.

Most of these disputes will be resolved privately without recourse to governmental institutions of any kind. Some of them will find their way to court. Lawyers will have little difficulty framing the disputes in terms of well-established rights, duties, privileges, powers, and liabilities. The courts will hear the cases, with lawyers on opposing sides presenting creative arguments as to how the law should be understood in light of the new technology. Judicial decisions will result, carefully explaining where the new technology fits within long-accepted legal principles.

Law professors, journalists, and interest groups will write about the judicial opinions, and, gradually, conflicting views will crystallize as to whether the judge-interpreted law is correct for channeling the technology’s benefits and costs. Eventually, if the matter has sufficient political traction, someone will propose a bill in a city council, state legislature, or the United States Congress. Alternately, an administrative agency will issue a notice of proposed rulemaking, and a debate over codification of legal principles will begin.

This is a protracted, complex, unpredictable process, and that may make it seem undesirable. But it is beneficial, because the kind of interplay that results from a process like this produces good law. It is the only way to test legal ideas thoroughly and assess their fit with the actual costs and benefits of technology as it is actually deployed in a market economy.

A look backwards warrants optimism for the future, despite new or renewed threats. The history of the Internet has always involved argument between those who said it would never take off because people would prefer established ways of doing business. It has always been subjected to various economic and legal attempts to block its use by new competitors. The Master of the House has always lurked. Shrill voices have always warned about its catastrophic social effects. Despite these enemies, it has prevailed and opened up new pathways for human fulfilment. The power of that vision and the experience of that fulfillment will continue to push aside the forces that are afraid of the future.

Henry H. Perritt, Jr. is Professor of Law and Director of the Graduate Program in Financial Services Law at the Chicago-Kent School of Law. A pioneer in Federal information policy, he served on President Clinton’s Transition Team, working on telecommunications issues, and drafted principles for electronic dissemination of public information, which formed the core of the Electronic Freedom of Information Act Amendments adopted by Congress in 1996. During the Ford administration, he served on the White House staff and as deputy under secretary of labor.

Professor Perritt served on the Computer Science and Telecommunications Policy Board of the National Research Council, and on a National Research Council committee on “Global Networks and Local Values.” He was a member of the interprofessional team that evaluated the FBI’s Carnivore system. He is a member of the bars of Virginia (inactive), Pennsylvania (inactive), the District of Columbia, Maryland, Illinois and the United States Supreme Court. He is a published novelist and playwright.

25 for 25: AustLII 1995: What did we think we were doing?

25-for-25 1 Response »

Feb 242017

From an equally long time ago, and in one of those galaxies so far far away it is sometimes mistaken for the mythical Oz, we received Tom Bruce’s call for reflection on the history of free access to legal information. “Here’s what we *thought* we were doing, and here’s what it really turned into”, he suggested, so I have taken him up on that. Andrew Mowbray and I started the Australasian Legal Information Institute (AustLII) in 1995, and our second employee, Philip Chung, now AustLII’s Executive Director, joined us within a year. We are still working together 22 years later.

AustLII had a back-story, a preceding decade of collaborative research from 1985, in which Andrew and I were two players in the first wave of ‘AI and law’ (aka ‘legal expert systems’). Our ‘DataLex Project’ research was distinctive in one respect: we insisted that ‘inferencing systems’ (AI) could not be a closed box, but must be fully integrated with both hypertext and text retrieval (for reasons beyond this post). Andrew wrote our own search engine, hypertext engine, and inferencing engine; we developed applications on IP and on privacy, and had modest commercial success with them in the early 90s. Tools for relatively large-scale automation of mark-up of texts for hypertext and retrieval purposes were a necessary by-product. In that pre-Web era, when few had CD ROM drives, and free access to anything was impractical and unknown, products were distributing on bundles of disks. Our pre-Web ideology of ‘integrated legal information systems’ is encapsulated in a 1995 DataLex article. But a commercial publisher pulled the plug on our access to necessary data, and DataLex turned out to have more impact in its non-commercial after-life as AustLII.

Meanwhile, in January 1995 Andrew and I (for UTS and UNSW Law Schools) had obtained a grant of AUD $100,000 from the Australian Research Council’s research infrastructure fund, in order to explore the novel proposition that the newly-developing World-Wide-Web could be used to distribute legal information, and for free access, to assist academic legal research. A Sun SPARCstation, one ex-student employee, and a part-time consultant followed. Like Peter & Tom we sang from Paul Simon’s text, ‘let’s get together and call ourselves an Institute’, because it sounded so established.

What were we thinking? (and doing)

What were we thinking when we obtained this grant, and put it into practice in that first year? We can reconstruct this somewhat, not simply from faulty memories, but from what we actually did, and from our first article about AustLII in 1995, which contained something of a manifesto about the obligations of public bodies to facilitate free access to law. So here are things we did think we were doing in 1995 – no doubt we also had some bad ideas, now conveniently forgotten, but these ones have more or less stuck.

End monopolies – Australia had been plagued for a decade by private sector and public sector monopolies (backed by Crown copyright) over computerised legal data. Our core principle was (polite) insistence on the ‘right to republish’ legislation, cases, and other publicly funded legal information. We appropriated our first large database (Federal legislation), but got away with it. The High Court told the federal government to supply ‘its cases’ to AustLII, and other courts followed.
Rely on collaboration – Our 1995 ‘manifesto’ insisted that courts and legislative offices should provide the best quality data available to all who wished to republish it. Insistence on collaboration was a survival strategy, because we would never have enough resources to manage any other way. From the start, some courts started to email cases, and adopt protocols for consistent presentation, and eventually all did so.
Disrupt publishing – Much Australian commercial legal publishing in 1995 was not much more than packaging raw legal materials, with little added value, for obscene prices. We stated that we intended to force 2nd-rate publishing to lift its game (‘you can’t compete with free’). It did, and what survived, thrived.
Stay independent – While we had material support from our two Law Schools, and an ARC start-up grant, we tried from the start to be financially independent of any single source. Within a year we had other funds from a Foundation, and a business group (for industrial law), and were negotiating funding from government agencies. Later, as the funds needed for sustainability became larger, this was much more of a challenge. However, independence meant we could publish any types of content that we could fund, with no one else dictating what was appropriate. A 93 volume Royal Commission report on ‘Aboriginal deaths in custody’ for which the federal government had ‘lost’ the master copy was an early demonstration of this.
Automate, integrate, don’t edit – The DataLex experience gave us good tools for complex automated mark-up of large sets of legislation, cases etc. Collaboration in data supply from official bodies multiplied the effect of this. We edited documents only when unavoidable. Sophisticated hypertexts also distinguished the pioneering work of the LII (Cornell) and LexUM from the chaff of commercial publishers. AustLII inherited from DataLex a preoccupation with combining the virtues of hypertext and text retrieval, most apparent from day 1 in the ‘Noteup’ function.
Cater for all audiences – Our initial grant’s claim to serve academic research was only ever a half-truth, and our intention was to try to build a system that would cater for all audiences from practitioners to researchers to the general public. The LII (Cornell) had already demonstrated that there was a ‘latent legal market’, an enormous demand for primary legal materials from the public at large.
All data types welcome – We believed that legislation, cases, treaties, law reform, and some publicly-funded scholarship should all be free access, and a LII should aim to provide them, as its resources allowed. This was a corollary of aiming to ‘serve all audiences’. In AustLII’s first year we included examples of all of these (and a Royal Commission report), the final element being the Department of Foreign Affairs agreement to partner a Treaties Library. It took us much longer to develop serious ‘law for the layperson’ content.
‘Born digital’ only – In 1995 there was already more digital data to gather than AustLII could handle, and scanning/OCR’ing data from paper was too expensive and low quality, so we ignored it, for more than a decade.
‘Comprehensiveness’ – As Daniel Poulin says in this series, AustLII was first to aim to create a nationally comprehensive free access system, or to succeed. But the initial aims of comprehensiveness were limited to the current legislation of all 9 Australian jurisdictions, and the decisions of the superior courts of each. That took 4 years to achieve. Addition of decisions of all lower courts and tribunals, and historical materials, were much later ambitions, still not quite achieved.
‘Australasian’ but ‘LII’ – We asked Cornell if we could borrow the ‘LII’ tag, and had vague notions that we might be part of a larger international movement, but no plans. Our 1995 article exaggerates in saying ‘AustLII is part of the expanding international network of public legal information servers’ – we wished! However, the ‘Australasian’ aim was serious: NZLII’s superb content is a major part of AustLII, but PNG content found a better home on PacLII.
Neutral citations, backdated – As soon as AustLII started receiving cases, we applied our own ‘neutral citations’ (blind to medium or publisher) to them, and applied this retrospectively to back-sets, partly so that we could automate the insertion of hypertext links. As in Canada, this was a key technical enabler. A couple of years later, the High Court of Australia led the Council of Chief Justices to adopt officially a slight variation of what AustLII had done (and we amended our standard). The neutral citation standard set with ‘[1998] HCA 1’ has since been adopted in many common law countries. AustLII has applied it retrospectively as a parallel citation, for example ‘[1220] EngR 1’ and so on. Later, the value of neutral citations as a common-law-wide interconnector enabled the LawCite citator.
Reject ‘value-adding’ – We saw invitations to distinguish ‘value-added’ (now ‘freemium’ or chargeable) services from AustLII’s ‘basic’ free content as a slippery slope, a recipe for free access always being second rate. So AustLII has stayed 100% free access content, including all technical innovations.
‘Free’ includes free from surveillance – Access was and is anonymous with no logins, cookies, advertisements or other surveillance mechanisms beyond logging of IP addresses. We used the Robot Exclusion Standard to prevent spidering/searching of case law by Google etc, and most (not all) other LIIs have done likewise. This has helped establish a reasonable balance between privacy and open justice in many common law jurisdictions. It also helps prevent asset stripping – AustLII is a free access publisher, not a repository.

This ‘bakers dozen’ aspirations comes from another century, but the issues and questions they address still need consideration by anyone aiming to provide free access to law.

Why we were lucky

In at least four respects, we did not know how fortunate we were in Australia: the Australian Research Council awarded annual competitive grant funding for development of research infrastructure, not just for research; all Australian law schools were willing to back AustLII as a joint national facility (already in 1995 ‘supported by the Australian Committee of Law Deans’); UNSW and UTS Law Faculties backed us with both material assistance and academic recognition; later, we obtained charitable status for donations; and our courts never required AustLII to redact cases (contrast Canada and New Zealand), they did it themselves where it was necessary. Our colleagues in other common law jurisdictions were often not so fortunate.

Cornell, LexUM and AustLII were all also fortunate to be better prepared than most commercial or government legal information publishers to take advantage of the explosion of public usage of the Internet (and the then-new WWW) in 1994/5. None of us were ‘just another publisher’, but were seen as novel developments. Later LIIs did not have this ‘first mover advantage’, and often operated in far more difficult circumstances in developing countries.

Unimaginables

Given what AustLII, and free access to law globally, have developed into, what did we not imagine, back in 1995? Here are a few key unforseens.

Digitisation from paper did not became financially feasible for AustLII until about 2007. Since then, capturing historical data has become a major part of what AustLII does, with results such as the complete back-sets of over 120 non-commercial Australasian law journals, and almost all Australasian reported cases and annual legislation 1788-1950. The aims of both ‘horizontal’ comprehensiveness of all current significant sources of law, and ‘vertical’ comprehensiveness of past sources, is new and no longer seems crazy nor unsustainable.

We did not envisage the scale of what AustLII would need to manage, whether data (currently 749 Australasian databases, and almost as much again internationally), sources (hundreds of email feeds), page accesses (about 1M per day), or collaborations (daily replication of other LII content), nor the equipment (and funding) demands this scale would pose. Independence allowed us to obtain hundreds of funding contributors for maintenance. Innovative developments are still supported by ARC and other grants. The future holds no guarantees, but as Poulin says, history has now demonstrated that sustainable large-scale LII developments are possible.

While AustLII’s initial aims were limited to Australasia, by the late 90s requests for assistance to create similar free access LIIs involved AustLII, LexUM and the LII (Cornell) in various countries. The Free Access to Law Movement (FALM) has expanded to nearly 70 members, has directly delivered considerable benefits of free access to law in many countries, and has encouraged governments almost everywhere to accept that free access to legislation and cases is now the norm, in a way that it was not in the early 90s. The delivery of free access content by independent LIIs has, for reasons Poulin outlines, turned out to sit more comfortably in common law than in civil law jurisdictions, and no global access to law via a LII framework has emerged. However, although this was not envisaged back in 1995, AustLII has been able to play a coordinating role in a network of collaborating LIIs from many common law jurisdictions, with compatible standards and software, resulting in access via CommonLII to nearly 1500 databases, and to the daily interconnection of their citations via LawCite. This extent of collaboration was not foreseeable in 1995.

Every free access to law provider has a different story to tell, with different challenges to overcome in environments typically much more difficult than Australia. Somewhere in each of our stories there is a corner reserved for the pioneering contributions of Martin, Bruce and the LII at Cornell. The LII (Cornell) continues to innovate 25 years after they set the wheels in motion.

Graham Greenleaf is Professor of Law & Information Systems at UNSW Australia. He was co-founder of AustLII, with Andrew Mowbray, and Co-Director (with Andrew and Philip Chung) until 2016, and is now Senior Researcher.

25 for 25: Long ago, in a galaxy far, far away...

25-for-25, free access to law, Public access to legal information 3 Responses »

Jan 172017

[ Note: This year marks the LII’s 25th year of operation. In honor of the occasion, we’re going to offer “25 for 25” — blog posts by 25 leading thinkers in the field of open access to law and legal informatics, published here in the VoxPopuLII blog. Submissions will appear irregularly, approximately twice per month. We’ll open with postings from the LII’s original co-directors, and conclude with posts from the LII’s future leadership. Today’s post is by Tom Bruce; Peter Martin’s will follow later in the month.]

It all started with two gin-and-tonics, consumed by a third party. At the time I was the Director of Educational Technologies for the Cornell Law School. Every Friday afternoon, there was a small gathering of people like me in the bar of the Statler Hotel, maybe 8 or 10 from different holes and corners in Cornell’s computer culture. A lot of policy issues got solved there, and more than a few technical ones. I first heard about something called “Perl” there.

The doyen of that group was Steve Worona, then some kind of special-assistant-for-important-stuff in the office of Cornell’s VP for Computing. Knowing that the law school had done some work with CD-ROM based hypertext, he had been trying to get me interested in a Campus-Wide Information System (CWIS) platform called Gopher and, far off into the realm of wild-eyed speculation, this thing called the World-Wide Web. One Friday in early 1992, noting that Steve was two gin-and-tonics into a generous mood, I asked him if he might have a Sun box laying around that I might borrow to try out a couple of those things.

He did, and the Sun 4-c in question became fatty.law.cornell.edu — named after Fatty the Bookkeeper, a leading character in the Brecht-Weill opera “Mahagonny”, which tells the story of a “City of Nets”. It was the first institutional web server that had information about something other than high-energy physics, and somewhere around the 30th web server in the world. We still get a fair amount of traffic via links to “fatty”, though the machine name has not been in use for a decade and a half (in fact, we maintain a considerable library of redirection code so that most of the links that others have constructed to us over a quarter-century still work).

What did we put there? First, a Gopher server. Gopher pages were either menus or full text — it was more of a menuing system than full hypertext, and did not permit internal links. Our first effort — Peter’s idea — was Title 17 of the US Code, whose structure was an excellent fit with Gopher’s capabilities, and whose content (copyright) was an excellent fit with the obsessions of those who had Internet access in those days. It got a lot of attention, as did Peter’s first shot at a Supreme Court opinion in HTML form, Two Pesos v. Taco Cabana.

Other things followed rapidly, and later that year we began republishing all Supreme Court opinions in electronic form. Initially we linked to the Cleveland Freenet site; then we began republishing them from text in ATEX format; later we were to add our own Project Hermes subscription. Not long after we began publishing, I undertook to develop the first-ever web browser for Microsoft Windows — mostly because at the time it seemed unlikely that anyone else would, anytime soon. We were just as interested in editorial innovations. Our first legal commentary — then called “Law About…..”, and now WEX, was put together in 1993, based on work done by Peter and by Jim Milles in constructing a topics list useful to both lawyers and the general public. A full US Code followed in 1994. Our work with CD-ROM continued for a surprisingly long time — we offered statutory supplements and leading Supreme Court cases on CD for a number of years, and our version of Title 26 was the basis for a CD distributed by the IRS into the new millennium. Back in the day when there was, plausibly, a book called “The Whole Internet User’s Guide and Catalog”, we appeared in it eight times.

To talk about the early days solely in terms of technical firsts or Internet-publishing landmarks is, I think, to miss the more important parts of what we did. First, as Peter Martin remarks in a promotional video that we made several years ago, we showed that there was tremendous potential for law schools to become creative spaces for innovation in all things related to legal information (they still have that tremendous potential, though very few are exercising it). We used whatever creativity we could muster to break the stranglehold of commercial publishers not just on legal publishing as a product, but also on thinking about how law should be published, and where, and for whom. In those days, it was all about caselaw, all about lawyers, and a mile wide and an inch deep. Legal academia, and the commercial publishers, were preoccupied with caselaw and with the relative prestige and authority of courts that publish it; they did not seem to imagine that there was a need for legal information outside the community of legal practitioners. We thought, and did, differently.

We were followed in that by others, first in Canada, and then in Australia, and later in a host of other places. Many of those organizations — around 20 among them, I think — have chosen to use “LII” as part of their names, and “Legal Information Institute” has become a kind of brand name for open access to law. Many of our namesakes offer comprehensive access to the laws of their countries, and some are de facto official national systems. Despite recurring fascination with the idea of a “free Westlaw”, a centralized free-to-air system has never been a practical objective for an academically-based operation in the United States. We have, from the outset, seen our work as a practical exploration of legal information technology, especially as it facilitates integration and aggregation of numerous providers. The ultimate goal has been to develop new methods that will help people — all people, lawyers or not — find and understand the law, without fees.

It was obvious to us from the start that effective work in this area would require deep, equal-status collaboration between legal experts and technologists, beginning with the two of us. My collaboration with Peter was the center of my professional life for 20 years. I was lucky to have the opportunity. Legal-academic culture and institutions are often indifferent or hostile to such collaborations, and they are far rarer and much harder to maintain than they should be. These days, it’s all the rage to talk about “teaching lawyers to code”. I think that lawyers would get better results if they would learn to communicate and collaborate with those who already know how.

Finally, we felt then – as we do now – that the best test of ideas was to implement them in practical, full-scale systems offered to the public in all its Internet-based, newfound diversity. The resulting work, and the LII itself, have been defined by the dynamism of opposites — technological expertise vs. legal expertise, practical publishing vs. academic research, bleeding-edge vs. when-the-audience-is-ready, an audience of lawyers vs. an audience of non-lawyer professionals and private citizens. That is a complicated, multidirectional balancing act — but we are still on the high-wire after 25 years, and that balancing act has been the most worthwhile thing about the organization, and one that will enable a new set of collaborators to do many more important things in the years to come.

Thomas R. Bruce is the Director of the Legal Information Institute, which he co-founded with Peter W. Martin in 1992.

Metadata Quality in a Linked Data Context

Linked Data 4 Responses »

Jan 242013

Van Winkle wakes

In this post, we return to a topic we first visited in a book chapter in 2004. At that time, one of us (Bruce) was an electronic publisher of Federal court cases and statutes, and the other (Hillmann, herself a former law cataloger) was working with large, aggregated repositories of scientific papers as part of the National Sciences Digital Library project. Then, as now, we were concerned that little attention was being paid to the practical tradeoffs involved in publishing high quality metadata at low cost. There was a tendency to design metadata schemas that said absolutely everything that could be said about an object, often at the expense of obscuring what needed to be said about it while running up unacceptable costs. Though we did not have a name for it at the time, we were already deeply interested in least-cost, use-case-driven approaches to the design of metadata models, and that naturally led us to wonder what “good” metadata might be. The result was “The Continuum of Metadata Quality: Defining, Expressing, Exploiting”, published as a chapter in an ALA publication, Metadata in Practice.

In that chapter, we attempted to create a framework for talking about (and evaluating) metadata quality. We were concerned primarily with metadata as we were then encountering it: in aggregations of repositories containing scientific preprints, educational resources, and in caselaw and other primary legal materials published on the Web. We hoped we could create something that would be both domain-independent and useful to those who manage and evaluate metadata projects. Whether or not we succeeded is for others to judge.

The Original Framework

At that time, we identified seven major components of metadata quality. Here, we reproduce a part of a summary table that we used to characterize the seven measures. We suggested questions that might be used to draw a bead on the various measures we proposed:

Quality Measure	Quality Criteria
Completeness	Does the element set completely describe the objects? Are all relevant elements used for each object?
Provenance	Who is responsible for creating, extracting, or transforming the metadata? How was the metadata created or extracted? What transformations have been done on the data since its creation?
Accuracy	Have accepted methods been used for creation or extraction? What has been done to ensure valid values and structure? Are default values appropriate, and have they been appropriately used?
Conformance to expectations	Does metadata describe what it claims to? Are controlled vocabularies aligned with audience characteristics and understanding of the objects? Are compromises documented and in line with community expectations?
Logical consistency and coherence	Is data in elements consistent throughout? How does it compare with other data within the community?
Timeliness	Is metadata regularly updated as the resources change? Are controlled vocabularies updated when relevant?
Accessibility	Is an appropriate element set for audience and community being used? Is it affordable to use and maintain? Does it permit further value-adds?

There are, of course, many possible elaborations of these criteria, and many other questions that help get at them. Almost nine years later, we believe that the framework remains both relevant and highly useful, although (as we will discuss in a later section) we need to think carefully about whether and how it relates to the quality standards that the Linked Open Data (LOD) community is discovering for itself, and how it and other standards should affect library and publisher practices and policies.

… and the environment in which it was created

Our work was necessarily shaped by the environment we were in. Though we never really said so explicitly, we were looking for quality not only in the data itself, but in the methods used to organize, transform and aggregate it across federated collections. We did not, however, anticipate the speed or scale at which standards-based methods of data organization would be applied. Commonly-used standards like FOAF, models such as those contained in schema.org, and lightweight modelling apparatus like SKOS are all things that have emerged into common use since, and of course the use of Dublin Core — our main focus eight years ago — has continued even as the standard itself has been refined. These days, an expanded toolset makes it even more important that we have a way to talk about how well the tools fit the job at hand, and how well they have been applied. An expanded set of design choices accentuates the need to talk about how well choices have been made in particular cases.

Although our work took its inspiration from quality standards developed by a government statistical service, we had not really thought through the sheer multiplicity of information services that were available even then. We were concerned primarily with work that had been done with descriptive metadata in digital libraries, but of course there were, and are, many more people publishing and consuming data in both the governmental and private sectors (to name just two). Indeed, there was already a substantial literature on data quality that arose from within the management information systems (MIS) community, driven by concerns about the reliability and quality of mission-critical data used and traded by businesses. In today’s wider world, where work with library metadata will be strongly informed by the Linked Open Data techniques developed for a diverse array of data publishers, we need to take a broader view.

Finally, we were driven then, as we are now, by managerial and operational concerns. As practitioners, we were well aware that metadata carries costs, and that human judgment is expensive. We were looking for a set of indicators that would spark and sustain discussion about costs and tradeoffs. At that time, we were mostly worried that libraries were not giving costs enough attention, and were designing metadata projects that were unrealistic given the level of detail or human intervention they required. That is still true. The world of Linked Data requires well-understood metadata policies and operational practices simply so publishers can know what is expected of them and consumers can know what they are getting. Those policies and practices in turn rely on quality measures that producers and consumers of metadata can understand and agree on. In today’s world — one in which institutional resources are shrinking rather than expanding — human intervention in the metadata quality assessment process at any level more granular than that of the entire data collection being offered will become the exception rather than the rule.

While the methods we suggested at the time were self-consciously domain-independent, they did rest on background assumptions about the nature of the services involved and the means by which they were delivered. Our experience had been with data aggregated by communities where the data producers and consumers were to some extent known to one another, using a fairly simple technology that was easy to run and maintain. In 2013, that is not the case; producers and consumers are increasingly remote from each other, and the technologies used are both more complex and less mature, though that is changing rapidly.

The remainder of this blog post is an attempt to reconsider our framework in that context.

The New World

The Linked Open Data (LOD) community has begun to consider quality issues; there are some noteworthy online discussions, as well as workshops resulting in a number of published papers and online resources. It is interesting to see where the work that has come from within the LOD community contrasts with the thinking of the library community on such matters, and where it does not.

In general, the material we have seen leans toward the traditional data-quality concerns of the MIS community. LOD practitioners seem to have started out by putting far more emphasis than we might on criteria that are essentially audience-dependent, and on operational concerns having to do with the reliability of publishing and consumption apparatus. As it has evolved, the discussion features an intellectual move away from those audience-dependent criteria, which are usually expressed as “fitness for use”, “relevance”, or something of the sort (we ourselves used the phrase “community expectations”). Instead, most realize that both audience and usage are likely to be (at best) partially unknown to the publisher, at least at system design time. In other words, the larger community has begun to grapple with something librarians have known for a while: future uses and the extent of dissemination are impossible to predict. There is a creative tension here that is not likely to go away. On the one hand, data developed for a particular community is likely to be much more useful to that community; thus our initial recognition of the role of “community expectations”. On the other, dissemination of the data may reach far past the boundaries of the community that develops and publishes it. The hope is that this tension can be resolved by integrating large data pools from diverse sources, or by taking other approaches that result in data models sufficiently large and diverse that “community expectations” can be implemented, essentially, by filtering.

For the LOD community, the path that began with “fitness-for-use” criteria led quickly to the idea of maintaining a “neutral perspective”. Christian Fürber describes that perspective as the idea that “Data quality is the degree to which data meets quality requirements no matter who is making the requirements”. To librarians, who have long since given up on the idea of cataloger objectivity, a phrase like “neutral perspective” may seem naive. But it is a step forward in dealing with data whose dissemination and user community is unknown. And it is important to remember that the larger LOD community is concerned with quality in data publishing in general, and not solely with descriptive metadata, for which objectivity may no longer be of much value. For that reason, it would be natural to expect the larger community to place greater weight on objectivity in their quality criteria than the library community feels that it can, with a strong preference for quantitative assessment wherever possible. Librarians and others concerned with data that involves human judgment are theoretically more likely to be concerned with issues of provenance, particularly as they concern who has created and handled the data. And indeed that is the case.

The new quality criteria, and how they stack up

Here is a simplified comparison of our 2004 criteria with three views taken from the LOD community.

Bruce & Hillmann	Dodds, McDonald	Flemming
Completeness	Completeness Boundedness Typing	Amount of data
Provenance	History Attribution Authoritative	Verifiability
Accuracy	Accuracy Typing	Validity of documents
Conformance to expectations	Modeling correctness Modeling granularity Isomorphism	Uniformity
Logical consistency and coherence	Directionality Modeling correctness Internal consistency Referential correspondence Connectedness	Consistency
Timeliness	Currency	Timeliness
Accessibility	Intelligibility Licensing Sustainable	Comprehensibility Versatility Licensing
Accessibility (technical) Performance (technical)

Placing the “new” criteria into our framework was no great challenge; it appears that we were, and are, talking about many of the same things. A few explanatory remarks:

Boundedness has roughly the same relationship to completeness that precision does to recall in information-retrieval metrics. The data is complete when we have everything we want; its boundedness shows high quality when we have only what we want.
Flemming’s amount of data criterion talks about numbers of triples and links, and about the interconnectedness and granularity of the data. These seem to us to be largely completeness criteria, though things to do with linkage would more likely fall under “Logical coherence” in our world. Note, again, a certain preoccupation with things that are easy to count. In this case it is somewhat unsatisfying; it’s not clear what the number of triples in a triplestore says about quality, or how it might be related to completeness if indeed that is what is intended.
Everyone lists criteria that fit well with our notions about provenance. In that connection, the most significant development has been a great deal of work on formalizing the ways in which provenance is expressed. This is still an active level of research, with a lot to be decided. In particular, attempts at true domain independence are not fully successful, and will probably never be so. It appears to us that those working on the problem at DCMI are monitoring the other efforts and incorporating the most worthwhile features.
Dodds’ typing criterion — which basically says that dereferenceable URIs should be preferred to string literals — participates equally in completeness and accuracy categories. While we prefer URIs in our models, we are a little uneasy with the idea that the presence of string literals is always a sign of low quality. Under some circumstances, for example, they might simply indicate an early stage of vocabulary evolution.
Flemming’s verifiability and validity criteria need a little explanation, because the terms used are easily confused with formal usages and so are a little misleading. Verifiability bundles a set of concerns we think of as provenance. Validity of documents is about accuracy as it is found in things like class and property usage. Curiously, none of Flemming’s criteria have anything to do with whether the information being expressed by the data is correct in what it says about the real world; they are all designed to convey technical criteria. The concern is not with what the data says, but with how it says it.
Dodds’ modeling correctness criterion seems to be about two things: whether or not the model is correctly constructed in formal terms, and whether or not it covers the subject domain in an expected way. Thus, we assign it to both “Community expectations” and “Logical coherence” categories.
Isomorphism has to do with the ability to join datasets together, when they describe the same things. In effect, it is a more formal statement of the idea that a given community will expect different models to treat similar things similarly. But there are also some very tricky (and often abused) concepts of equivalence involved; these are just beginning to receive some attention from Semantic Web researchers.
Licensing has become more important to everyone. That is in part because Linked Data as published in the private sector may exhibit some of the proprietary characteristics we saw as access barriers in 2004, and also because even public-sector data publishers are worried about cost recovery and appropriate-use issues. We say more about this in a later section.
A number of criteria listed under Accessibility have to do with the reliability of data publishing and consumption apparatus as used in production. Linked Data consumers want to know that the endpoints and triple stores they rely on for data are going to be up and running when they are needed. That brings a whole set of accessibility and technical performance issues into play. At least one website exists for the sole purpose of monitoring endpoint reliability, an obvious concern of those who build services that rely on Linked Data sources. Recently, the LII made a decision to run its own mirror of the DrugBank triplestore to eliminate problems with uptime and to guarantee low latency; performance and accessibility had become major concerns. For consumers, due diligence is important.

For us, there is a distinctly different feel to the examples that Dodds, Flemming, and others have used to illustrate their criteria; they seem to be looking at a set of phenomena that has substantial overlap with ours, but is not quite the same. Part of it is simply the fact, mentioned earlier, that data publishers in distinct domains have distinct biases. For example, those who can’t fully believe in objectivity are forced to put greater emphasis on provenance. Others who are not publishing descriptive data that relies on human judgment feel they can rely on more “objective” assessment methods. But the biggest difference in the “new quality” is that it puts a great deal of emphasis on technical quality in the construction of the data model, and much less on how well the data that populates the model describes real things in the real world.

There are three reasons for that. The first has to do with the nature of the discussion itself. All quality discussions, simply as discussions, seem to neglect notions of factual accuracy because factual accuracy seems self-evidently a Good Thing; there’s not much to talk about. Second, the people discussing quality in the LOD world are modelers first, and so quality is seen as adhering primarily to the model itself. Finally, the world of the Semantic Web rests on the assumption that “anyone can say anything about anything”, For some, the egalitarian interpretation of that statement reaches the level of religion, making it very difficult to measure quality by judging whether something is factual or not; from a purist’s perspective, it’s opinions all the way down. There is, then, a tendency to rely on formalisms and modeling technique to hold back the tide.

In 2004, we suggested a set of metadata-quality indicators suitable for managers to use in assessing projects and datasets. An updated version of that table would look like this:

Quality Measure	Quality Criteria
Completeness	Does the element set completely describe the objects? Are all relevant elements used for each object? Does the data contain everything you expect? Does the data contain only what you expect?
Provenance	Who is responsible for creating, extracting, or transforming the metadata? How was the metadata created or extracted? What transformations have been done on the data since its creation? Has a dedicated provenance vocabulary been used? Are there authenticity measures (eg. digital signatures) in place?
Accuracy	Have accepted methods been used for creation or extraction? What has been done to ensure valid values and structure? Are default values appropriate, and have they been appropriately used? Are all properties and values valid/defined?
Conformance to expectations	Does metadata describe what it claims to? Does the data model describe what it claims to? Are controlled vocabularies aligned with audience characteristics and understanding of the objects? Are compromises documented and in line with community expectations?
Logical consistency and coherence	Is data in elements consistent throughout? How does it compare with other data within the community? Is the data model technically correct and well structured? Is the data model aligned with other models in the same domain? Is the model consistent in the direction of relations?
Timeliness	Is metadata regularly updated as the resources change? Are controlled vocabularies updated when relevant?
Accessibility	Is an appropriate element set for audience and community being used? Is the data and its access methods well-documented, with exemplary queries and URIs? Do things have human-readable labels? Is it affordable to use and maintain? Does it permit further value-adds? Does it permit republication? Is attribution required if the data is redistributed? Are human- and machine-readable licenses available?
Accessibility — technical	Are reliable, performant endpoints available? Will the provider guarantee service (eg. via a service level agreement)? Is the data available in bulk? Are URIs stable?

The differences in the example questions reflect the differences of approach that we discussed earlier. Also, the new approach separates criteria related to technical accessibility from questions that relate to intellectual accessibility. Indeed, we suspect that “accessibility” may have been too broad a notion in the first place. Wider deployment of metadata systems and a much greater, still-evolving variety of producer-consumer scenarios and relationships have created a need to break it down further. There are as many aspects to accessibility as there are types of barriers — economic, technical, and so on.

As before, our list is not a checklist or a set of must-haves, nor does it contain all the questions that might be asked. Rather, we intend it as a list of representative questions that might be asked when a new Linked Data source is under consideration. They are also questions that should inform policy discussion around the uses of Linked Data by consuming libraries and publishers.

That is work that can be formalized and taken further. One intriguing recent development is work toward a Data Quality Management Vocabulary. Its stated aims are to

support the expression of quality requirements in the same language, at web scale;
support the creation of consensual agreements about quality requirements
increase transparency around quality requirements and measures
enable checking for consistency among quality requirements, and
generally reduce the effort needed for data quality management activities

The apparatus to be used is a formal representation of “quality-relevant” information. We imagine that the researchers in this area are looking forward to something like automated e-commerce in Linked Data, or at least a greater ability to do corpus-level quality assessment at a distance. Of course, “fitness-for-use” and other criteria that can really only be seen from the perspective of the user will remain important, and there will be interplay between standardized quality and performance measures (on the one hand) and audience-relevant features on the other. One is rather reminded of the interplay of technical specifications and “curb appeal” in choosing a new car. That would be an important development in a Semantic Web industry that has not completely settled on what a car is really supposed to be, let alone how to steer or where one might want to go with it.

Conclusion

Libraries have always been concerned with quality criteria in their work as a creators of descriptive metadata. One of our purposes here has been to show how those criteria will evolve as libraries become publishers of Linked Data, as we believe that they must. That much seems fairly straightforward, and there are many processes and methods by which quality criteria can be embedded in the process of metadata creation and management.

More difficult, perhaps, is deciding how these criteria can be used to construct policies for Linked Data consumption. As we have said many times elsewhere, we believe that there are tremendous advantages and efficiencies that can be realized by linking to data and descriptions created by others, notably in connecting up information about the people and places that are mentioned in legislative information with outside information pools. That will require care and judgement, and quality criteria such as these will be the basis for those discussions. Not all of these criteria have matured — or ever will mature — to the point where hard-and-fast metrics exist. We are unlikely to ever see rigid checklists or contractual clauses with bullet-pointed performance targets, at least for many of the factors we have discussed here. Some of the new accessibility criteria might be the subject of service-level agreements or other mechanisms used in electronic publishing or database-access contracts. But the real use of these criteria is in assessments that will be made long before contracts are negotiated and signed. In that setting, these criteria are simply the lenses that help us know quality when we see it.

References

Bruce, Thomas R., and Diane Hillmann (2004). “The Continuum of Metadata Quality: Defining, Expressing, Exploiting”. In Metadata in Practice, Hillmann and Westbrooks, eds. Online at http://www.ecommons.cornell.edu/handle/1813/7895
DCMI Metadata Provenance Task Group, at http://dublincore.org/groups/provenance/ .
Dodds, Leigh (2010) “Quality Indicators for Linked Data Datasets”. Online posting at http://answers.semanticweb.com/questions/1072/quality-indicators-for-linked-data-datasets .
Flemming, Annika (2010) “Quality Criteria for Linked Data Sources”. Online at http://sourceforge.net/apps/mediawiki/trdf/index.php?title=Quality_Criteria_for_Linked_Data_sources&action=history
Fürber, Christian, and Martin Hepp (2011).”Towards a Vocabulary for Data Quality Management in Semantic Web Architectures”. Presentation at the First International Workshop on Linked Web Data Management, Uppsala, Sweden. Online at http://www.slideshare.net/cfuerber/towards-a-vocabulary-for-data-quality-management-in-semantic-web-architectures .
W3C, “Provenance Vocabulary Mappings”. At http://www.w3.org/2005/Incubator/prov/wiki/Provenance_Vocabulary_Mappings .

Thomas R. Bruce is the Director of the Legal Information Institute at the Cornell Law School.

Diane Hillmann is a principal in Metadata Management Associates, and a long-time collaborator with the Legal Information Institute. She is currently a member of the Advisory Board for the Dublin Core Metadata Initiative (DCMI), and was co-chair of the DCMI/RDA Task Group.

VoxPopuLII is edited by Judith Pratt. Editors-in-Chief are Stephanie Davidson and Christine Kirchberger, to whom queries should be directed.

VoxPop poses a Prisoner's Dilemma (sort of)

Uncategorized 1 Response »

Nov 202009

We pride ourselves on the murkiness of our authorial invitation process at VoxPop. How are authors selected, exactly? Nobody knows, not even the guy who does the selecting. We’d like to lift the veil briefly, by asking for volunteers to help us with a particular area we’re interested in.

We’d like to run a point-counterpoint style dual entry on the subject of authenticity in legal documents. Yes, we’ve treated the issue before. But just today we were fishing around in obscure corners of the LII’s WEX legal dictionary, and we found this definition of the Ancient Document Rule:

Under the Federal Rules of Evidence, a permissible method to authenticate a document. Under the rule, if if a document is (1) more than 20 years old; (2) is regular on its face with no signs of obvious alterations; and (3) found in a place of natural custody, or in a place where it would be expected to be found, then the document is found to be prima facie authenticated and therefore admissible.

The specific part of FRE involved — Rule 901 — is here.

Why would or wouldn’t we apply this thinking to large existing document repositories — such as the backfile of federal documents at GPO? Is 20 years a reasonable limit? Should it be 5, or 7? What does “where found” mean? We’d like to see two authors — one pro, one con — address these questions in side-by-side posts to VoxPop.

Where does the Prisoner’s Dilemma come in? Well… if we get no volunteers, we won’t run this. If we get volunteers on only one side of the issue, we’ll run a one-sided piece. So, it’s up to you to decide whether both sides will be heard or not. The window for volunteering will close on Tuesday; send your requests to the murky selector at { tom – dot – bruce – att – cornell – dot – edu }.

We’d also be happy to hear from others who want to write for VoxPop — the new year is fast approaching, and we need fresh voices. Speak up!

Laws in Translation

comparative, legal language, transnational 3 Responses »

May 062009

As a comparative law academic, I have had an interest in legal translation for some time. I’m not alone. In our overseas programs at Nagoya University, we teach students from East and Central Asia who have a keen interest in the workings of other legal systems in the region, including Japan. We would like to supply them with an accessible base of primary resources on which to ground their research projects. At present, we don’t. We can’t, as a practical matter, because the source for such material, the market for legal translation, is broken at its foundation. One of my idle dreams is that one day it might be fixed. The desiderata are plain enough, and simple to describe. To be useful as a base for exploring the law (as opposed to explaining it), I reckon that a reference archive based on translated material should have the following characteristics:

Intelligibility Text should of course be readable (as opposed to unreadable), and terms of art should be consistent across multiple laws, so that texts can safely be read together.
Coverage A critical mass of material must be available. The Civil Code is not of much practical use without the Code of Civil Procedure and supporting instruments.
Currency If it is out of date, its academic value is substantially reduced, and its practical value vanishes almost entirely. If it is not known to be up-to-date, the vanishing happens much more quickly.
Accessibility Bare text is nice, but a reference resource ought to be enriched with cross-references, indexes, links to relevant cases, the original text on which the translation is based.
Sustainability Isolated efforts are of limited utility. There must be a sustained incentive to maintain the archive over time.

In an annoying confluence of market incentives, these criteria do not travel well together. International law firms may have the superb in-house capabilities that they claim, but they are decidedly not in the business of disseminating information. As for publishers, the large cost of achieving significant coverage means that the incentive to maintain and enhance accuracy and readability declines in proportion to the scope of laws translated by a given service. As a result, no commercial product performs well on both of the first two criteria, and there is consequently little market incentive to move beyond them and attend to the remaining items in the list. So much for the invisible hand.

When markets fail, government can provide, of course, but a government itself is inevitably driven by well-focused interests (such as foreign investors) more than by wider communities (divorcing spouses, members of a foreign labor force, or, well, my students). Bureaucratic initiatives tend to take on a life of their own, and without effective market signals, it is hard to measure how well real needs are actually being met. In any case, barring special circumstances such as those obtaining within the EU, the problem of sustainability ever lurks in the background.

Unfortunately, these impediments to supply on terms truly attractive to the consumer are not limited to a single jurisdiction with particularly misguided policies; the same dismal logic applies everywhere (in a recent article, Carol Lawson provides an excellent and somewhat hopeful review of the status quo in Japan). At the root of our discomfiture are, I think, two factors: the cookie-cutter application of copyright protection to this category of material; and a lack of adequate, recognized, and meaningful standards for legal translation (and of tools to apply them efficiently in editorial practice). The former raises an unnecessary barrier to entry. The latter saps value by aggravating agency problems, and raises risk for both suppliers and consumers of legal translations.

I first toyed with this problem a decade ago, in a fading conference paper now unknown to search engines (but still available through the kind offices of the Web Archive). At the time, I was preoccupied with the problem of barriers to entry and the dog-in-the-manger business strategies that are they foster, and this led me to think of the translation conundrum as an intractable, self-sustaining Gordian knot of conflicting interests, capable of resolution only through a sudden change in the rules of the game. Developments in subsequent years, in Japan and elsewhere, have taught me that both the optimism and the pessimism embedded in that view may have been misplaced. The emergence of standards, slow and uncertain though it be, may be our best hope of improvement over time.

To be clear, the objective is not freedom as in free beer. Reducing the cost of individual statutory translations is less important than fostering an environment in which (a) scarce resources are not wasted in the competitive generation of identical content within private or protected containers; and (b) there is a reasonably clear and predictable relationship between quality (in terms of the list above) and cost. Resolving such problems are a common role for standards, both formal and informal. It is not immediately clear how far voluntary standards can penetrate a complex, dispersed and often closed activity like the legal translation service sector — but one need not look far for cases in which an idea about standardization achieved acceptance on its merits and went on to have a significant impact on behavior in a similarly fragmented and dysfunctional market. There is at least room for hope.

In 2006, as part of a Japanese government effort to improve the business environment (for that vocal group of foreign investors referred to above), an interdisciplinary research group in my own university led by Yoshiharu Matsuura and Katsuhiko Toyama released the first edition of a standard bilingual dictionary for legal translation (the SBD) to the Web. Aimed initially at easing the burden of the translation initiative on hard-pressed government officials charged with implementing it, the SBD has since gone through successive revisions, and recently found a new home on a web portal providing government-sponsored statutory translations. (This project is one of two major translation initiatives launched in the same period, the other being a funded drive to render a significant number of court decisions into English).

The benefits of the Standard Bilingual Dictionary are evident in new translations emerging in connection with the project. Other contributors to this space will have more to say about the technology and workflows underlying the SBD, and the roadmap for its future development. My personal concern is that it achieve its proper status, not only as a reference and foundation source for side products, but as a community standard. Paradoxically, restricting the licensing terms for distribution may be the simplest and most effective way of disseminating it as an industry standard. A form of license requiring attribution to the SBD maintainers, and prohibiting modification of the content without permission, would give commercial actors an incentive to return feedback to the project. I certainly hope that the leaders of the project will consider such a scheme, as it would help assure that their important efforts are not dissipated in a flurry of conflicting marketplace “improvements” affixed, one must assume, with more restrictive licensing policies.

There is certainly something to be said for making changes in the way that copyright applies to translated law more generally. The peak demand for law in translation is the point of first enactment or revision. Given the limited pool of translator time available, once a translation is prepared and published, there is a case to be made for a compulsory licensing system, as a means of widening the channel of dissemination, while protecting the economic interest of translators and their sponsors. The current regime, providing (in the case of Japan) for exclusive rights of reproduction for a period extending to fifty years from the death of the author (Japanese Copyright Act, section 51), really makes no sense in this field. As a practical matter, we must depend on legislatures, of course, for core reform of this kind. Alas, given the recent track record on copyright reform among influential legislative bodies in the United States and Europe, I fear that we may be in for a very long wait. In the meantime, we can nonetheless move the game forward by adopting prudent licensing strategies for standards-based products that promise to move this important industry to the next level.

Frank Bennett is an Associate Professor in the Graduate School of Law at Nagoya University.

Vox PopulLII is edited by Judith Pratt

The Florentine Debate on Free Access to Law

European Union, liis, transnational 2 Responses »

Apr 092009

On the 30th and 31st of October 2008, the 9th International Conference on “Law via the Internet”met in Florence, Italy. The Conference was organized by the Institute of Legal Information Theory and Techniques of the Italian National Research Council (ITTIG-CNR), acting as a member of the Legal Information Institutes network (LIIs). About 300 participants, from 39 countries and five continents, attended the conference. The conference had previously been held in Montreal, Sydney, Paris, and Vanuatu.

The conference was a special event for ITTIG, which is one of the institutions where legal informatics started in Europe, and which has supported free access to law without interruption since its origin. It was a challenge and privilege for ITTIG to host experts from all over the world as they discussed crucial emerging problems related to new technologies and law.

Despite having dedicated special sessions to wine tasting in the nearby hills (!), the Conference mainly focused on digital legal information, analyzing it in the light of the idea of freedom of access to legal information, and discussing the technological progress that is shaping such access. Within this interaction of technological progress and law, free access to information is only the first step — but it is a fundamental one.
Increased use of digital information in the field of law has played an important role in developing methodologies for both data creation and access. Participants at the conference agreed that complete, reliable legal data is essential for access to law, and that free access to law is a fundamental right, enabling citizens to exercise their rights in a conscious and effective way. In this context, the use of new technologies becomes an essential tool of democracy for the citizens of an e-society.

The contributions of legal experts from all over the world reflected this crucial need for free access to law. Conference participants analysed both barriers to free access, and the techniques that might overcome those barriers. Session topics included:

In general, discussions at the conference covered four main points. The first is that official free access to law is not enough. Full free access requires a range of different providers and competitive republishing by third parties, which in turn requires an anti-monopoly policy on the part of the creator of legal information. Each provider will offer different types of services, tailored to various public needs. This means that institutions providing legal data sources have a public duty to offer a copy of their output — their judgments and legislation in the most authoritative form — to anyone who wishes to publish it, whether that publication is for free or for fee.

Second, countries must find a balance between the potential for commercial exploitation of information and the needs of the public. This is particularly relevant to open access to publicly funded research.

The third point concerns effective access to, and re-usability of, legal information. Effective access requires that most governments promote the use of technologies that improve access to law, abandoning past approaches such as technical restrictions on the reuse of legal information. It is important that governments not only allow, but also help others to reproduce and re-use their legal materials, continually removing any impediments to re-publication.

Finally, international cooperation is essential to providing free access to law. One week before the Florence event, the LII community participated in a meeting of experts organised by the Hague Conference on Private International Law’s Permanent Bureau; a meeting entitled “Global Co-operation on the Provision of On-line Legal Information.” Among other things, participants discussed how free, on-line resources can contribute to resolving trans-border disputes. At this meeting, a general consensus was reached on the need for countries to preserve their legal materials in order to make them available. The consensus was that governments should:

give access to historical legal material
provide translations in other languages
develop multi-lingual access functionalities
use open standards and metadata for primary materials

All these points were confirmed at the Florence Conference.

The key issue that emerged from the Conference is that the marketplace has changed and we need to find new models to distribute legal information, as well as create equal market opportunities for legal providers. In this context, legal information is considered to be an absolute public good on which everyone should be free to build.

Many speakers at the Conference also tackled multilingualism in the law domain, highlighting the need for semantic tools, such as lexicons and ontologies, that will enhance uniformity of legal language without losing national traditions. The challenge to legal information systems worldwide lies in providing transparent access to the multilingual information contained in distributed archives and, in particular, allowing users express requests in their preferred language and to obtain meaningful results in that language. Cross-language information retrieval (CLIR) systems can greatly contribute to open access to law, facilitating discovery and interpretation of legal information across different languages and legal orders, thus enabling people to share legal knowledge in a world that is becoming more interconnected every day.

From the technical point of view, the Conference underlined the paramount importance of adopting open standards. Improving the quality of access to legal information requires interoperability among legal information systems across national boundaries. A common, open standard used to identify sources of law on the international level is an essential prerequisite for interoperability .

In order to reach this goal, countries need to adopt a unique identifier for legal information materials. Interest groups within several countries have already expressed their intention to adopt a shared solution based on URI (Universal Resource Identifier) techniques. Especially among European Union Member States, the need for a unique identifier, based on open standards and providing advanced modalities of document hyper-linking, has been expressed in several conferences by representatives of the Office for Official Publications of the European Communities (OPOCE).

Similar concerns about promoting interoperability among national and European information systems have been aired by international groups. The Permanent Bureau of the Hague Conference on Private International Law is considering a resolution that would encourage member states to “adopt neutral methods of citation of their legal materials, including methods that are medium-neutral, provider-neutral and internationally consistent.” ITTIG is particularly involved in this issue, which is currently running in parellel with the pan-European Metalex/CEN initiative to define standards for sources of European law.

The wide discussions raised during the Conference are collected in a volume of Proceedings published in April 2009 by European Press Academic Publishing – EPAP.

— E. Francesconi ,G. Peruginelli

Ginevra Peruginelli

Ginevra has a degree in law from the University of Florence, a MA/MSc Diploma in Information Science awarded by the University of Northumbria, Newcastle, UK and a Ph.D. in Telematics and Information Society from the University of Florence. Currently she is a researcher at the Institute of Legal Theory and Techniques of the Italian National Research Council (ITTIG-CNR). In 2003, she was admitted to Bar of the Court of Florence as a lawyer. She carries out her research activities in various sectors, such as standards to represent data and metadata in the legal environment; law and legal language documentation; and open access for law.

Enrico Francesconi

Enrico is a researcher at ITTIG-CNR. His main activities include knowledge representation and ontology learning, legal standards, artificial intelligence techniques for legal document classification and knowledge extraction. He is a member of the Italian and European working groups establishing XML and URI standards for legislation. He has been involved in various projects within the framework programs of DG Information Society & Media of the European Commission and for the Office for Official Publications of the European Communities.

VoxPopuLII is edited by Judith Pratt

Suffusion theme by Sayontan Sinha

VoxPopuLII

Bruce Thomas

Deeply intertwingled laws

Widening access

Legislation as data

Non-linear content

Function and form

Legislation is deeply intertwingled

Limits on machine processable laws

There is much still to do

25 for 25 - Documents to Data : A Legal Memex

25 for 25: For 25 Years, the LII Has Set the Standard for Legal Publishing Online

25 for 25: Underestimating the Internet

25 for 25: AustLII 1995: What did we think we were doing?

25 for 25: Long ago, in a galaxy far, far away...

Metadata Quality in a Linked Data Context

Van Winkle wakes

The Original Framework

… and the environment in which it was created

The New World

The new quality criteria, and how they stack up

Conclusion

References

VoxPop poses a Prisoner's Dilemma (sort of)

Laws in Translation

The Florentine Debate on Free Access to Law

Recent Posts

VoxPop people and posts

Subscribe to VoxPopuLII

Blogroll

Bruce Thomas

Widening access

Legislation as data

Non-linear content

Function and form

Legislation is deeply intertwingled

Limits on machine processable laws

There is much still to do

Van Winkle wakes

The Original Framework

… and the environment in which it was created

The New World

The new quality criteria, and how they stack up

Conclusion

References

Recent Posts

VoxPop people and posts

Subscribe to VoxPopuLII

Blogroll

Tags