skip navigation

Bruce Thomas

Thomas R. Bruce is co-founder and director of the Legal Information Institute at the Cornell Law School, the first legal-information web site in the world. He was the author of the first Web browser for Microsoft Windows, and has been the principal technical architect for online legal resources ranging from fourteenth-century law texts to the current decisions of the United States Supreme Court. Mr. Bruce has consulted on Internet matters for numerous commercial, governmental, and academic organizations on four continents. He has been a fellow of the Center for Online Dispute Resolution at the University of Massachusetts, and a Senior International Fellow at the University of Melbourne Law School. He is an affiliated researcher in Cornell's program in Information Science, where he works closely with faculty and students who experiment with the application of advanced technologies to legal texts. He currently serves as a member of the ABA Administrative Law Section Special Committee on e-Rulemaking, on Cornell's Faculty Advisory Board for Information Technology, and is a longtime member of the board of directors of the Center for Computer-Assisted Legal Instruction. He has been known to play music at high volume.

25 logo[ Note: This year marks the LII’s 25th year of operation.  In honor of the occasion, we’re going to offer “25 for 25” — blog posts by 25 leading thinkers in the field of open access to law and legal informatics, published here in the VoxPopuLII blog.  Submissions will appear irregularly, approximately twice per month.  We’ll open with postings from the LII’s original co-directors, and conclude with posts from the LII’s future leadership. Today’s post is by Tom Bruce; Peter Martin’s will follow later in the month.

It all started with two gin-and-tonics, consumed by a third party.  At the time I was the Director of Educational Technologies for the Cornell Law School.  Every Friday afternoon, there was a small gathering of people like me in the bar of the Statler Hotel, maybe 8 or 10 from different holes and corners in Cornell’s computer culture.  A lot of policy issues got solved there, and more than a few technical ones.  I first heard about something called “Perl” there.

The doyen of that group was Steve Worona, then some kind of special-assistant-for-important-stuff in the office of Cornell’s VP for Computing.  Knowing that the law school had done some work with CD-ROM based hypertext, he had been trying to get me interested in a Campus-Wide Information System (CWIS) platform called Gopher and, far off into the realm of wild-eyed speculation, this thing called the World-Wide Web.  One Friday in early 1992, noting that Steve was two gin-and-tonics into a generous mood, I asked him if he might have a Sun box laying around that I might borrow to try out a couple of those things.

He did, and the Sun 4-c in question became — named after Fatty the Bookkeeper, a leading character in the Brecht-Weill opera “Mahagonny”, which tells the story of a “City of Nets”.   It was the first institutional web server that had information about something other than high-energy physics, and somewhere around the 30th web server in the world. We still get a fair amount of traffic via links to “fatty”, though the machine name has not been in use for a decade and a half (in fact, we maintain a considerable library of redirection code so that most of the links that others have constructed to us over a quarter-century still work).

What did we put there?  First, a Gopher server.  Gopher pages were either menus or full text — it was more of a menuing system than full hypertext, and did not permit internal links.  Our first effort — Peter’s idea — was Title 17 of the US Code, whose structure was an excellent fit with Gopher’s capabilities, and whose content (copyright) was an excellent fit with the obsessions of those who had Internet access in those days.  It got a lot of attention, as did Peter’s first shot at a Supreme Court opinion in HTML form, Two Pesos v. Taco Cabana.

Other things followed rapidly, and later that year we began republishing all Supreme Court opinions in electronic form.  Initially we linked to the Cleveland Freenet site; then we began republishing them from text in ATEX format; later we were to add our own Project Hermes subscription.  Not long after we began publishing, I undertook to develop the first-ever web browser for Microsoft Windows — mostly because at the time it seemed unlikely that anyone else would, anytime soon.  We were just as interested in editorial innovations.   Our first legal commentary — then called “Law About…..”, and now WEX, was put together in 1993, based on work done by Peter and by Jim Milles in constructing a topics list useful to both lawyers and the general public. A full US Code followed in 1994.  Our work with CD-ROM continued for a surprisingly long time — we offered statutory supplements and leading Supreme Court cases on CD for a number of years, and our version of Title 26 was the basis for a CD distributed by the IRS into the new millennium.  Back in the day when there was, plausibly, a book called “The Whole Internet User’s Guide and Catalog”, we appeared in it eight times.

To talk about the early days solely in terms of technical firsts or Internet-publishing landmarks is, I think, to miss the more important parts of what we did.  First, as Peter Martin remarks in a promotional video that we made several years ago, we showed that there was tremendous potential for law schools to become creative spaces for innovation in all things related to legal information (they still have that tremendous potential, though very few are exercising it).  We used whatever creativity we could muster to break the stranglehold of commercial publishers not just on legal publishing as a product, but also on thinking about how law should be published, and where, and for whom.  In those days, it was all about caselaw, all about lawyers, and a mile wide and an inch deep. Legal academia, and the commercial publishers, were preoccupied with caselaw and with the relative prestige and authority of courts that publish it; they did not seem to imagine that there was a need for legal information outside the community of legal practitioners.  We thought, and did, differently.  

We were followed in that by others, first in Canada, and then in Australia, and later in a host of other places.  Many of those organizations — around 20 among them, I think — have chosen to use “LII” as part of their names, and “Legal Information Institute” has become a kind of brand name for open access to law.   Many of our namesakes offer comprehensive access to the laws of their countries, and some are de facto official national systems.  Despite recurring fascination with the idea of a “free Westlaw”, a centralized free-to-air system has never been a practical objective for an academically-based operation in the United States. We have, from the outset, seen our work  as a practical exploration of legal information technology, especially as it facilitates integration and aggregation of numerous providers.  The ultimate goal has been to develop new methods that will help people — all people, lawyers or not —  find and understand the law, without fees.

It was obvious to us from the start that effective work in this area would require deep, equal-status collaboration between legal experts and technologists, beginning with the two of us.  My collaboration with Peter was the center of my professional life for 20 years.  I was lucky to have the opportunity.  Legal-academic culture and institutions are often indifferent or hostile to such collaborations, and they are far rarer and much harder to maintain than they should be.  These days, it’s all the rage to talk about “teaching lawyers to code”. I think that lawyers would get better results if they would learn to communicate and collaborate with those who already know how.

Finally, we felt then – as we do now – that the best test of ideas was to implement them in practical, full-scale systems offered to the public in all its Internet-based, newfound diversity.  The resulting work, and the LII itself,  have been defined by the dynamism of opposites — technological expertise vs. legal expertise, practical publishing vs. academic research, bleeding-edge vs. when-the-audience-is-ready,  an audience of lawyers vs. an audience of non-lawyer professionals and private citizens.  That is a complicated, multidirectional balancing act — but we are still on the high-wire after 25 years, and that balancing act has been the most worthwhile thing about the organization, and one that will enable a new set of collaborators to do many more important things in the years to come.

Thomas R. Bruce is the Director of the Legal Information Institute, which he co-founded with Peter W. Martin in 1992.


Van Winkle wakes

In this post, we return to a topic we first visited in a book chapter in 2004.  At that time, one of us (Bruce) was an electronic publisher of Federal court cases and statutes, and the other (Hillmann, herself a former law cataloger) was working with large, aggregated repositories of scientific papers as part of the National Sciences Digital Library project.  Then, as now, we were concerned that little attention was being paid to the practical tradeoffs involved in publishing high quality metadata at low cost.  There was a tendency to design metadata schemas that said absolutely everything that could be said about an object, often at the expense of obscuring what needed to be said about it while running up unacceptable costs.  Though we did not have a name for it at the time, we were already deeply interested in least-cost, use-case-driven approaches to the design of metadata models, and that naturally led us to wonder what “good” metadata might be.  The result was “The Continuum of Metadata Quality: Defining, Expressing, Exploiting”, published as a chapter in an ALA publication, Metadata in Practice.

In that chapter, we attempted to create a framework for talking about (and evaluating) metadata quality.  We were concerned primarily with metadata as we were then encountering it: in aggregations of repositories containing scientific preprints, educational resources, and in caselaw and other primary legal materials published on the Web.   We hoped we could create something that would be both domain-independent and useful to those who manage and evaluate metadata projects.  Whether or not we succeeded is for others to judge.

The Original Framework

At that time, we identified seven major components of metadata quality. Here, we reproduce a part of a summary table that we used to characterize the seven measures. We suggested questions that might be used to draw a bead on the various measures we proposed:

Quality Measure Quality Criteria
Completeness Does the element set completely describe the objects?
Are all relevant elements used for each object?
Provenance Who is responsible for creating, extracting, or transforming the metadata?
How was the metadata created or extracted?
What transformations have been done on the data since its creation?
Accuracy Have accepted methods been used for creation or extraction?
What has been done to ensure valid values and structure?
Are default values appropriate, and have they been appropriately used?
Conformance to expectations Does metadata describe what it claims to?
Are controlled vocabularies aligned with audience characteristics and understanding of the objects?
Are compromises documented and in line with community expectations?
Logical consistency and coherence Is data in elements consistent throughout?
How does it compare with other data within the community?
Timeliness Is metadata regularly updated as the resources change?
Are controlled vocabularies updated when relevant?
Accessibility Is an appropriate element set for audience and community being used?
Is it affordable to use and maintain?
Does it permit further value-adds?


There are, of course, many possible elaborations of these criteria, and many other questions that help get at them.  Almost nine years later, we believe that the framework remains both relevant and highly useful, although (as we will discuss in a later section) we need to think carefully about whether and how it relates to the quality standards that the Linked Open Data (LOD) community is discovering for itself, and how it and other standards should affect library and publisher practices and policies.

… and the environment in which it was created

Our work was necessarily shaped by the environment we were in.  Though we never really said so explicitly, we were looking for quality not only in the data itself, but in the methods used to organize, transform and aggregate it across federated collections.  We did not, however, anticipate the speed or scale at which standards-based methods of data organization would be applied.  Commonly-used standards like FOAF, models such as those contained in, and lightweight modelling apparatus like SKOS are all things that have emerged into common use since, and of course the use of Dublin Core — our main focus eight years ago — has continued even as the standard itself has been refined.  These days, an expanded toolset makes it even more important that we have a way to talk about how well the tools fit the job at hand, and how well they have been applied. An expanded set of design choices accentuates the need to talk about how well choices have been made in particular cases.

Although our work took its inspiration from quality standards developed by a government statistical service, we had not really thought through the sheer multiplicity of information services that were available even then.  We were concerned primarily with work that had been done with descriptive metadata in digital libraries, but of course there were, and are, many more people publishing and consuming data in both the governmental and private sectors (to name just two).  Indeed, there was already a substantial literature on data quality that arose from within the management information systems (MIS) community, driven by concerns about the reliability and quality of  mission-critical data used and traded by businesses.  In today’s wider world, where work with library metadata will be strongly informed by the Linked Open Data techniques developed for a diverse array of data publishers, we need to take a broader view.  

Finally, we were driven then, as we are now, by managerial and operational concerns. As practitioners, we were well aware that metadata carries costs, and that human judgment is expensive.  We were looking for a set of indicators that would spark and sustain discussion about costs and tradeoffs.  At that time, we were mostly worried that libraries were not giving costs enough attention, and were designing metadata projects that were unrealistic given the level of detail or human intervention they required.  That is still true.  The world of Linked Data requires well-understood metadata policies and operational practices simply so publishers can know what is expected of them and consumers can know what they are getting. Those policies and practices in turn rely on quality measures that producers and consumers of metadata can understand and agree on.  In today’s world — one in which institutional resources are shrinking rather than expanding —  human intervention in the metadata quality assessment process at any level more granular than that of the entire data collection being offered will become the exception rather than the rule.   

While the methods we suggested at the time were self-consciously domain-independent, they did rest on background assumptions about the nature of the services involved and the means by which they were delivered. Our experience had been with data aggregated by communities where the data producers and consumers were to some extent known to one another, using a fairly simple technology that was easy to run and maintain.  In 2013, that is not the case; producers and consumers are increasingly remote from each other, and the technologies used are both more complex and less mature, though that is changing rapidly.

The remainder of this blog post is an attempt to reconsider our framework in that context.

The New World

The Linked Open Data (LOD) community has begun to consider quality issues; there are some noteworthy online discussions, as well as workshops resulting in a number of published papers and online resources.  It is interesting to see where the work that has come from within the LOD community contrasts with the thinking of the library community on such matters, and where it does not.  

In general, the material we have seen leans toward the traditional data-quality concerns of the MIS community.  LOD practitioners seem to have started out by putting far more emphasis than we might on criteria that are essentially audience-dependent, and on operational concerns having to do with the reliability of publishing and consumption apparatus.   As it has evolved, the discussion features an intellectual move away from those audience-dependent criteria, which are usually expressed as “fitness for use”, “relevance”, or something of the sort (we ourselves used the phrase “community expectations”). Instead, most realize that both audience and usage  are likely to be (at best) partially unknown to the publisher, at least at system design time.  In other words, the larger community has begun to grapple with something librarians have known for a while: future uses and the extent of dissemination are impossible to predict.  There is a creative tension here that is not likely to go away.  On the one hand, data developed for a particular community is likely to be much more useful to that community; thus our initial recognition of the role of “community expectations”.  On the other, dissemination of the data may reach far past the boundaries of the community that develops and publishes it.  The hope is that this tension can be resolved by integrating large data pools from diverse sources, or by taking other approaches that result in data models sufficiently large and diverse that “community expectations” can be implemented, essentially, by filtering.

For the LOD community, the path that began with  “fitness-for-use” criteria led quickly to the idea of maintaining a “neutral perspective”. Christian Fürber describes that perspective as the idea that “Data quality is the degree to which data meets quality requirements no matter who is making the requirements”.  To librarians, who have long since given up on the idea of cataloger objectivity, a phrase like “neutral perspective” may seem naive.  But it is a step forward in dealing with data whose dissemination and user community is unknown. And it is important to remember that the larger LOD community is concerned with quality in data publishing in general, and not solely with descriptive metadata, for which objectivity may no longer be of much value.  For that reason, it would be natural to expect the larger community to place greater weight on objectivity in their quality criteria than the library community feels that it can, with a strong preference for quantitative assessment wherever possible.  Librarians and others concerned with data that involves human judgment are theoretically more likely to be concerned with issues of provenance, particularly as they concern who has created and handled the data.  And indeed that is the case.

The new quality criteria, and how they stack up

Here is a simplified comparison of our 2004 criteria with three views taken from the LOD community.

Bruce & Hillmann Dodds, McDonald Flemming
Completeness Completeness
Amount of data
Provenance History
Accuracy Accuracy
Validity of documents
Conformance to expectations Modeling correctness
Modeling granularity
Logical consistency and coherence Directionality
Modeling correctness
Internal consistency
Referential correspondence
Timeliness Currency Timeliness
Accessibility Intelligibility
Accessibility (technical)
Performance (technical)

Placing the “new” criteria into our framework was no great challenge; it appears that we were, and are, talking about many of the same things. A few explanatory remarks:

  • Boundedness has roughly the same relationship to completeness that precision does to recall in information-retrieval metrics. The data is complete when we have everything we want; its boundedness shows high quality when we have only what we want.
  • Flemming’s amount of data criterion talks about numbers of triples and links, and about the interconnectedness and granularity of the data.  These seem to us to be largely completeness criteria, though things to do with linkage would more likely fall under “Logical coherence” in our world. Note, again, a certain preoccupation with things that are easy to count.  In this case it is somewhat unsatisfying; it’s not clear what the number of triples in a triplestore says about quality, or how it might be related to completeness if indeed that is what is intended.
  • Everyone lists criteria that fit well with our notions about provenance. In that connection, the most significant development has been a great deal of work on formalizing the ways in which provenance is expressed.  This is still an active level of research, with a lot to be decided.  In particular, attempts at true domain independence are not fully successful, and will probably never be so.  It appears to us that those working on the problem at DCMI are monitoring the other efforts and incorporating the most worthwhile features.
  • Dodds’ typing criterion — which basically says that dereferenceable URIs should be preferred to string literals  — participates equally in completeness and accuracy categories.  While we prefer URIs in our models, we are a little uneasy with the idea that the presence of string literals is always a sign of low quality.  Under some circumstances, for example, they might simply indicate an early stage of vocabulary evolution.
  • Flemming’s verifiability and validity criteria need a little explanation, because the terms used are easily confused with formal usages and so are a little misleading.  Verifiability bundles a set of concerns we think of as provenance.  Validity of documents is about accuracy as it is found in things like class and property usage.  Curiously, none of Flemming’s criteria have anything to do with whether the information being expressed by the data is correct in what it says about the real world; they are all designed to convey technical criteria.  The concern is not with what the data says, but with how it says it.
  • Dodds’ modeling correctness criterion seems to be about two things: whether or not the model is correctly constructed in formal terms, and whether or not it covers the subject domain in an expected way.  Thus, we assign it to both “Community expectations” and “Logical coherence” categories.
  • Isomorphism has to do with the ability to join datasets together, when they describe the same things.  In effect, it is a more formal statement of the idea that a given community will expect different models to treat similar things similarly. But there are also some very tricky (and often abused) concepts of equivalence involved; these are just beginning to receive some attention from Semantic Web researchers.
  • Licensing has become more important to everyone. That is in part because Linked Data as published in the private sector may exhibit some of the proprietary characteristics we saw as access barriers in 2004, and also because even public-sector data publishers are worried about cost recovery and appropriate-use issues.  We say more about this in a later section.
  • A number of criteria listed under Accessibility have to do with the reliability of data publishing and consumption apparatus as used in production.  Linked Data consumers want to know that the endpoints and triple stores they rely on for data are going to be up and running when they are needed.  That brings a whole set of accessibility and technical performance issues into play.  At least one website exists for the sole purpose of monitoring endpoint reliability, an obvious concern of those who build services that rely on Linked Data sources. Recently, the LII made a decision to run its own mirror of the DrugBank triplestore to eliminate problems with uptime and to guarantee low latency; performance and accessibility had become major concerns. For consumers, due diligence is important.

For us, there is a distinctly different feel to the examples that Dodds, Flemming, and others have used to illustrate their criteria; they seem to be looking at a set of phenomena that has substantial overlap with ours, but is not quite the same.  Part of it is simply the fact, mentioned earlier, that data publishers in distinct domains have distinct biases. For example, those who can’t fully believe in objectivity are forced to put greater emphasis on provenance. Others who are not publishing descriptive data that relies on human judgment feel they can rely on more  “objective” assessment methods.  But the biggest difference in the “new quality” is that it puts a great deal of emphasis on technical quality in the construction of the data model, and much less on how well the data that populates the model describes real things in the real world.  

There are three reasons for that.  The first has to do with the nature of the discussion itself. All quality discussions, simply as discussions, seem to neglect notions of factual accuracy because factual accuracy seems self-evidently a Good Thing; there’s not much to talk about.  Second, the people discussing quality in the LOD world are modelers first, and so quality is seen as adhering primarily to the model itself.  Finally, the world of the Semantic Web rests on the assumption that “anyone can say anything about anything”, For some, the egalitarian interpretation of that statement reaches the level of religion, making it very difficult to measure quality by judging whether something is factual or not; from a purist’s perspective, it’s opinions all the way down.  There is, then, a tendency to rely on formalisms and modeling technique to hold back the tide.

In 2004, we suggested a set of metadata-quality indicators suitable for managers to use in assessing projects and datasets.  An updated version of that table would look like this:

Quality Measure Quality Criteria
Completeness Does the element set completely describe the objects?
Are all relevant elements used for each object?
Does the data contain everything you expect?
Does the data contain only what you expect?
Provenance Who is responsible for creating, extracting, or transforming the metadata?
How was the metadata created or extracted?
What transformations have been done on the data since its creation?
Has a dedicated provenance vocabulary been used?
Are there authenticity measures (eg. digital signatures) in place?
Accuracy Have accepted methods been used for creation or extraction?
What has been done to ensure valid values and structure?
Are default values appropriate, and have they been appropriately used?
Are all properties and values valid/defined?
Conformance to expectations Does metadata describe what it claims to?
Does the data model describe what it claims to?
Are controlled vocabularies aligned with audience characteristics and understanding of the objects?
Are compromises documented and in line with community expectations?
Logical consistency and coherence Is data in elements consistent throughout?
How does it compare with other data within the community?
Is the data model technically correct and well structured?
Is the data model aligned with other models in the same domain?
Is the model consistent in the direction of relations?
Timeliness Is metadata regularly updated as the resources change?
Are controlled vocabularies updated when relevant?
Accessibility Is an appropriate element set for audience and community being used?
Is the data and its access methods well-documented, with exemplary queries and URIs?
Do things have human-readable labels?
Is it affordable to use and maintain?
Does it permit further value-adds?
Does it permit republication?
Is attribution required if the data is redistributed?
Are human- and machine-readable licenses available?
Accessibility — technical Are reliable, performant endpoints available?
Will the provider guarantee service (eg. via a service level agreement)?
Is the data available in bulk?
Are URIs stable?


The differences in the example questions reflect the differences of approach that we discussed earlier. Also, the new approach separates criteria related to technical accessibility from questions that relate to intellectual accessibility. Indeed, we suspect that “accessibility” may have been too broad a notion in the first place. Wider deployment of metadata systems and a much greater, still-evolving variety of producer-consumer scenarios and relationships have created a need to break it down further.  There are as many aspects to accessibility as there are types of barriers — economic, technical, and so on.

As before, our list is not a checklist or a set of must-haves, nor does it contain all the questions that might be asked.  Rather, we intend it as a list of representative questions that might be asked when a new Linked Data source is under consideration.  They are also questions that should inform policy discussion around the uses of Linked Data by consuming libraries and publishers.  

That is work that can be formalized and taken further. One intriguing recent development is work toward a Data Quality Management Vocabulary.   Its stated aims are to

  • support the expression of quality requirements in the same language, at web scale;
  • support the creation of consensual agreements about quality requirements
  • increase transparency around quality requirements and measures
  • enable checking for consistency among quality requirements, and
  • generally reduce the effort needed for data quality management activities


The apparatus to be used is a formal representation of “quality-relevant” information.   We imagine that the researchers in this area are looking forward to something like automated e-commerce in Linked Data, or at least a greater ability to do corpus-level quality assessment at a distance.  Of course, “fitness-for-use” and other criteria that can really only be seen from the perspective of the user will remain important, and there will be interplay between standardized quality and performance measures (on the one hand) and audience-relevant features on the other.   One is rather reminded of the interplay of technical specifications and “curb appeal” in choosing a new car.  That would be an important development in a Semantic Web industry that has not completely settled on what a car is really supposed to be, let alone how to steer or where one might want to go with it.


Libraries have always been concerned with quality criteria in their work as a creators of descriptive metadata.  One of our purposes here has been to show how those criteria will evolve as libraries become publishers of Linked Data, as we believe that they must. That much seems fairly straightforward, and there are many processes and methods by which quality criteria can be embedded in the process of metadata creation and management.

More difficult, perhaps, is deciding how these criteria can be used to construct policies for Linked Data consumption.  As we have said many times elsewhere, we believe that there are tremendous advantages and efficiencies that can be realized by linking to data and descriptions created by others, notably in connecting up information about the people and places that are mentioned in legislative information with outside information pools.   That will require care and judgement, and quality criteria such as these will be the basis for those discussions.  Not all of these criteria have matured — or ever will mature — to the point where hard-and-fast metrics exist.  We are unlikely to ever see rigid checklists or contractual clauses with bullet-pointed performance targets, at least for many of the factors we have discussed here. Some of the new accessibility criteria might be the subject of service-level agreements or other mechanisms used in electronic publishing or database-access contracts.  But the real use of these criteria is in assessments that will be made long before contracts are negotiated and signed.  In that setting, these criteria are simply the lenses that help us know quality when we see it.



Thomas R. Bruce is the Director of the Legal Information Institute at the Cornell Law School.

Diane Hillmann is a principal in Metadata Management Associates, and a long-time collaborator with the Legal Information Institute.  She is currently a member of the Advisory Board for the Dublin Core Metadata Initiative (DCMI), and was co-chair of the DCMI/RDA Task Group.

VoxPopuLII is edited by Judith Pratt. Editors-in-Chief are Stephanie Davidson and Christine Kirchberger, to whom queries should be directed.

monogamy-as-prisoners-dilemma-1.gifWe pride ourselves on the murkiness of our authorial invitation process at VoxPop.  How are authors selected, exactly?  Nobody knows, not even the guy who does the selecting.  We’d like to lift the veil briefly, by asking for volunteers to help us with a particular area we’re interested in.

We’d like to run a point-counterpoint style dual entry on the subject of authenticity in legal documents.  Yes, we’ve treated the issue before.  But just today we were fishing around in obscure corners of the LII’s WEX legal dictionary, and we found this definition of the Ancient Document Rule:

Under the Federal Rules of Evidence, a permissible method to authenticate a document. Under the rule, if if a document is (1) more than 20 years old; (2) is regular on its face with no signs of obvious alterations; and (3) found in a place of natural custody, or in a place where it would be expected to be found, then the document is found to be prima facie authenticated and therefore admissible.

The specific part of FRE involved — Rule 901 —  is here.

Why would or wouldn’t we apply this thinking to large existing document repositories — such as the backfile of federal documents at GPO?  Is 20 years a reasonable limit?  Should it be 5, or 7?  What does “where found” mean?  We’d like to see two authors — one pro, one con — address these questions in side-by-side posts to VoxPop.

Where does the Prisoner’s Dilemma come in?  Well… if we get no volunteers, we won’t run this.  If we get volunteers on only one side of the issue, we’ll run a one-sided piece.  So, it’s up to you to decide whether both sides will be heard or not.  The window for volunteering will close on Tuesday; send your requests to the murky selector at { tom –  dot –  bruce – att – cornell – dot – edu }.

We’d also be happy to hear from others who want to write for VoxPop — the new year is fast approaching, and we need fresh voices.  Speak up!

As a comparative law academic, I have had an interest in legal translation for some time.  I’m not alone.  In our overseas programs at Nagoya University, we teach students from East and Central Asia who have a keen interest in the workings of other legal systems in the region, including Japan. We would like to supply them with an accessible base of primary resources on which to ground their research projects. At present, we don’t.  We can’t, as a practical matter, because the source for such material, the market for legal translation, is broken at its foundation.  One of my idle dreams is that one day it might be fixed. The desiderata are plain enough, and simple to describe. To be useful as a base for exploring the law (as opposed to explaining it), I reckon that a reference archive based on translated material should have the following characteristics:

  • Intelligibility Text should of course be readable (as opposed to unreadable), and terms of art should be consistent across multiple laws, so that texts can safely be read together.
  • Coverage A critical mass of material must be available. The Civil Code is not of much practical use without the Code of Civil Procedure and supporting instruments.
  • Currency If it is out of date, its academic value is substantially reduced, and its practical value vanishes almost entirely. If it is not known to be up-to-date, the vanishing happens much more quickly.
  • Accessibility Bare text is nice, but a reference resource ought to be enriched with cross-references, indexes, links to relevant cases, the original text on which the translation is based.
  • Sustainability  Isolated efforts are of limited utility.  There must be a sustained incentive to maintain the archive over time.

In an annoying confluence of market incentives, these criteria do not travel well together.  International law firms may have the superb in-house capabilities that they claim, but they are decidedly not in the business of disseminating information.  As for publishers, the large cost of achieving significant coverage means that the incentive to maintain and enhance accuracy and readability declines in proportion to the scope of laws translated by a given service.  As a result, no commercial product performs well on both of the first two criteria, and there is consequently little market incentive to move beyond them and attend to the remaining items in the list. So much for the invisible hand.

When markets fail, government can provide, of course, but a government itself is inevitably driven by well-focused interests (such as foreign investors) more than by wider communities (divorcing spouses, members of a foreign labor force, or, well, my students).  Bureaucratic initiatives tend to take on a life of their own, and without effective market signals, it is hard to measure how well real needs are actually being met.  In any case, barring special circumstances such as those obtaining within the EU, the problem of sustainability ever lurks in the background.

Unfortunately, these impediments to supply on terms truly attractive to the consumer are not limited to a single jurisdiction with particularly misguided policies; the same dismal logic applies everywhere (in a recent article, Carol Lawson provides an excellent and somewhat hopeful review of the status quo in Japan).  At the root of our discomfiture are, I think, two factors: the cookie-cutter application of copyright protection to this category of material; and a lack of adequate, recognized, and meaningful standards for legal translation (and of tools to apply them efficiently in editorial practice). The former raises an unnecessary barrier to entry. The latter saps value by aggravating agency problems, and raises risk for both suppliers and consumers of legal translations.

I first toyed with this problem a decade ago, in a fading conference paper now unknown to search engines (but still available through the kind offices of the Web Archive). At the time, I was preoccupied with the problem of barriers to entry and the dog-in-the-manger business strategies that are they foster, and this led me to think of the translation conundrum as an intractable, self-sustaining Gordian knot of conflicting interests, capable of resolution only through a sudden change in the rules of the game. Developments in subsequent years, in Japan and elsewhere, have taught me that both the optimism and the pessimism embedded in that view may have been misplaced. The emergence of standards, slow and uncertain though it be, may be our best hope of improvement over time.

To be clear, the objective is not freedom as in free beer.  Reducing the cost of individual statutory translations is less important than fostering an environment in which (a) scarce resources are not wasted in the competitive generation of identical content within private or protected containers; and (b) there is a reasonably clear and predictable relationship between quality (in terms of the list above) and cost. Resolving such problems are a common role for standards, both formal and informal.  It is not immediately clear how far voluntary standards can penetrate a complex, dispersed and often closed activity like the legal translation service sector — but one need not look far for cases in which an idea about standardization achieved acceptance on its merits and went on to have a significant impact on behavior in a similarly fragmented and dysfunctional market.  There is at least room for hope.

In 2006, as part of a Japanese government effort to improve the business environment (for that vocal group of foreign investors referred to above), an interdisciplinary research group in my own university led by Yoshiharu Matsuura and Katsuhiko Toyama released the first edition of a standard bilingual dictionary for legal translation (the SBD) to the Web. Aimed initially at easing the burden of the translation initiative on hard-pressed government officials charged with implementing it, the SBD has since gone through successive revisions, and recently found a new home on a web portal providing government-sponsored statutory translations. (This project is one of two major translation initiatives launched in the same period, the other being a funded drive to render a significant number of court decisions into English).

The benefits of the Standard Bilingual Dictionary are evident in new translations emerging in connection with the project. Other contributors to this space will have more to say about the technology and workflows underlying the SBD, and the roadmap for its future development. My personal concern is that it achieve its proper status, not only as a reference and foundation source for side products, but as a community standard. Paradoxically, restricting the licensing terms for distribution may be the simplest and most effective way of disseminating it as an industry standard.  A form of license requiring attribution to the SBD maintainers, and prohibiting modification of the content without permission, would give commercial actors an incentive to return feedback to the project.  I certainly hope that the leaders of the project will consider such a scheme, as it would help assure that their important efforts are not dissipated in a flurry of conflicting marketplace “improvements” affixed, one must assume, with more restrictive licensing policies.

There is certainly something to be said for making changes in the way that copyright applies to translated law more generally.  The peak demand for law in translation is the point of first enactment or revision. Given the limited pool of translator time available, once a translation is prepared and published, there is a case to be made for a compulsory licensing system, as a means of widening the channel of dissemination, while protecting the economic interest of translators and their sponsors.  The current regime, providing (in the case of Japan) for exclusive rights of reproduction for a period extending to fifty years from the death of the author (Japanese Copyright Act, section 51), really makes no sense in this field.  As a practical matter, we must depend on legislatures, of course, for core reform of this kind.  Alas, given the recent track record on copyright reform among influential legislative bodies in the United States and Europe, I fear that we may be in for a very long wait.  In the meantime, we can nonetheless move the game forward by adopting prudent licensing strategies for standards-based products that promise to move this important industry to the next level.

Frank Bennett is an Associate Professor in the Graduate School of Law at Nagoya University.

Vox PopulLII is edited by Judith Pratt

On the 30th and 31st of October 2008, the 9th International Conference on “Law via the Internet”met in Florence, Italy. The Conference was organized by the Institute of Legal Information Theory and Techniques of the Italian National Research Council (ITTIG-CNR), acting as a member of the Legal Information Institutes network (LIIs). About 300 participants, from 39 countries and five continents, attended the conference.   The conference had previously been held in Montreal, Sydney, Paris, and Vanuatu.

The conference was a special event for ITTIG, which is one of the institutions where legal informatics started in Europe, and which has supported free access to law without interruption since its origin. It was a challenge and privilege for ITTIG to host experts from all over the world as they discussed crucial emerging problems related to new technologies and law.

Despite having dedicated special sessions to wine tasting in the nearby hills (!), the Conference mainly focused on digital legal information, analyzing it in the light of the idea of freedom of access to legal information, and discussing the technological progress that is shaping such access. Within this interaction of technological progress and law, free access to information is only the first step — but it is a fundamental one.
Increased use of digital information in the field of law has played an important role in developing methodologies for both data creation and access. Participants at the conference agreed that complete, reliable legal data is essential for access to law, and that free access to law is a fundamental right, enabling citizens to exercise their rights in a conscious and effective way. In this context, the use of new technologies becomes an essential tool of democracy for the citizens of an e-society.

The contributions of legal experts from all over the world reflected this crucial need for free access to law. Conference participants analysed both barriers to free access, and the techniques that might overcome those barriers. Session topics included:

In general, discussions at the conference covered four main points. The first is that official free access to law is not enough. Full free access requires a range of different providers and competitive republishing by third parties, which in turn requires an anti-monopoly policy on the part of the creator of legal information. Each provider will offer different types of services, tailored to various public needs. This means that institutions providing legal data sources have a public duty to offer a copy of their output — their judgments and legislation in the most authoritative form — to anyone who wishes to publish it, whether that publication is for free or for fee.

Second, countries must find a balance between the potential for commercial exploitation of information and the needs of the public. This is particularly relevant to open access to publicly funded research.

The third point concerns effective access to, and re-usability of, legal information. Effective access requires that most governments promote the use of technologies that improve access to law, abandoning past approaches such as technical restrictions on the reuse of legal information. It is important that governments not only allow, but also help others to reproduce and re-use their legal materials, continually removing any impediments to re-publication.

Finally, international cooperation is essential to providing free access to law. One week before the Florence event, the LII community participated in a meeting of experts organised by the Hague Conference on Private International Law’s Permanent Bureau;  a meeting entitled “Global Co-operation on the Provision of On-line Legal Information.” Among other things, participants discussed how free, on-line resources can contribute to resolving trans-border disputes. At this meeting, a general consensus was reached on the need for countries to preserve their legal materials in order to make them available. The consensus was that governments should:

  • give access to historical legal material
  • provide translations in other languages
  • develop multi-lingual access functionalities
  • use open standards and metadata for primary materials

All these points were confirmed at the Florence Conference.

The key issue that emerged from the Conference is that the marketplace has changed and we need to find new models to distribute legal information, as well as create equal market opportunities for legal providers. In this context, legal information is considered to be an absolute public good on which everyone should be free to build.

Many speakers at the Conference also tackled multilingualism in the law domain, highlighting the need for semantic tools, such as lexicons and ontologies, that will enhance uniformity of legal language without losing national traditions. The challenge to legal information systems worldwide lies in providing transparent access to the multilingual information contained in distributed archives and, in particular, allowing users express requests in their preferred language and to obtain meaningful results in that language. Cross-language information retrieval (CLIR) systems can greatly contribute to open access to law, facilitating discovery and interpretation of legal information across different languages and legal orders, thus enabling people to share legal knowledge in a world that is becoming more interconnected every day.

From the technical point of view, the Conference underlined the paramount importance of adopting open standards. Improving the quality of access to legal information requires interoperability among legal information systems across national boundaries. A common, open standard used to identify sources of law on the international level is an essential prerequisite for interoperability .

In order to reach this goal, countries need to adopt a unique identifier for legal information materials. Interest groups within several countries have already expressed their intention to adopt a shared solution based on URI (Universal Resource Identifier) techniques. Especially among European Union Member States, the need for a unique identifier, based on open standards and providing advanced modalities of document hyper-linking, has been expressed in several conferences by representatives of the Office for Official Publications of the European Communities (OPOCE).

Similar concerns about promoting interoperability among national and European information systems have been aired by international groups. The Permanent Bureau of the Hague Conference on Private International Law is considering a resolution that would encourage member states to “adopt neutral methods of citation of their legal materials, including methods that are medium-neutral, provider-neutral and internationally consistent.” ITTIG is particularly involved in this issue, which is currently running in parellel with the pan-European Metalex/CEN initiative to define standards for sources of European law.

The wide discussions raised during the Conference are collected in a volume of Proceedings published in April 2009 by European Press Academic Publishing – EPAP.

— E. Francesconi ,G. Peruginelli

Ginevra Peruginelli

Ginevra has a degree in law from the University of Florence, a MA/MSc Diploma in Information Science awarded by the University of Northumbria, Newcastle, UK and a Ph.D. in Telematics and Information Society from the University of Florence. Currently she is a researcher at the Institute of Legal Theory and Techniques of the Italian National Research Council (ITTIG-CNR). In 2003, she was admitted to Bar of the Court of Florence as a lawyer. She carries out her research activities in various sectors, such as standards to represent data and metadata in the legal environment; law and legal language documentation; and open access for law.

Enrico Francesconi

Enrico is a researcher at ITTIG-CNR. His main activities include knowledge representation and ontology learning, legal standards, artificial intelligence techniques for legal document classification and knowledge extraction. He is a member of the Italian and European working groups establishing XML and URI standards for legislation. He has been involved in various projects within the framework programs of DG Information Society & Media of the European Commission and for the Office for Official Publications of the European Communities.

VoxPopuLII is edited by Judith Pratt