skip navigation

Confessions of a Legal Info-holic

borgestotallibrary.jpgIn an extraordinary story, Jorge Luis Borges writes of a “Total Library”, organized into ‘hexagons’ that supposedly contained all books:

When it was proclaimed that the Library contained all books, the first impression was one of extravagant happiness. All men felt themselves to be the masters of an intact and secret treasure. . . . At that time a great deal was said about the Vindications: books of apology and prophecy which . . . [contained] prodigious arcana for [the] future. Thousands of the greedy abandoned their sweet native hexagons and rushed up the stairways, urged on by the vain intention of finding their Vindication. These pilgrims disputed in the narrow corridors . . . strangled each other on the divine stairways . . . . Others went mad. . . . The Vindications exist . . . but the searchers did not remember that the possibility of a man’s finding his Vindication, or some treacherous variation thereof, can be computed as zero.  As was natural, this inordinate hope was followed by an excessive depression. The certitude that some shelf in some hexagon held precious books and that these precious books were inaccessible, seemed almost intolerable.

About three years ago I spent almost an entire sleepless month coding OpenJudis - my rather cool, “first-of-its-kind” free online database of Indian Supreme Court cases. The database hosts the full texts of about 25,000 cases decided since 1950. In this post I embark on a somewhat personal reflection on the process of creating OpenJudis - what I learnt about access to law (in India), and about “legal informatics,” along with some meditations on future pathways.

Having, by now, attended my share of FLOSS events, I know it is the invariable tendency of anyone who’s written two lines of free code to consider themselves qualified to pronounce on lofty themes - the nature of freedom and liberty, the commodity, scarcity, etc. With OpenJudis, likewise, I feel like I’ve acquired the necessary license to inflict my theory of the world on hapless readers - such as those at VoxPopuLII!

I begin this post by describing the circumstances under which I began coding OpenJudis. This is followed by some of my reflections on how “legal informatics” relates to and could relate to law.

Online Access to Law in India
India is privileged to have quite a robust ICT architecture. Internet access is relatively India Cyber Cafeinexpensive, and the ubiquity of “cyber cafes” has resulted in extensive Internet penetration, even in the absence of individual subscriptions.

Government bodies at all levels are statutorily obliged to publish, on the Internet, vital information regarding their structure and functioning. The National Informatics Centre (NIC), a public sector corporation, is responsible for hosting, maintaining and updating the websites of government bodies across the country. These include, inter alia, the websites of the Union (federal) Government, the various state governments, union and state ministries, constitutional bodies such as the Election Commission and the Planning Commission, and regulatory bodies such as the Securities Exchange Board of India (SEBI). These websites typically host a wealth of useful information including, illustratively, the full texts of applicable legislations, subordinate legislations, administrative rulings, reports, census data, application forms etc.

The NIC has also been commissioned by the judiciary to develop websites for courts at various levels and publish decisions online. As a result, beginning in around the year 2000, the Supreme Court and various high courts have been publishing their decisions on their websites. The full texts of all Supreme Court decisions rendered since 1950 have been made available, which is an invaluable free resource for the public. Most High Court websites however, have not yet made archival material available online, so at present, access remains limited to decisions from the year 2000 onwards. More recently the NIC has begun setting up websites for subordinate courts, although this process is still at a very embryonic stage.

Apart from free government websites, a handful of commercial enterprises have been providing online access to legal materials. Among them, two deserve special mention. SCCOnline - a product of one of the leading law report publishers in India - provides access to the full texts of decisions of the Indian Supreme Court. The CD version of SCCOnline sells for about INR 70,000 (about US$1,500), which is around the same price the company charges for a full set of print volumes of its reporter. For an additional charge, the company offers updates to the database. The other major commercial venture in the field is Manupatra, which offers access to the full text of decisions of various courts and tribunals as well as the texts of legislation. Access is provided for a basic charge of about US$100, plus a charge of about US$1 per document downloaded. While seemingly modest by international standards, these charges are unaffordable by large sections of the legal profession and the lay public.

OpenJudis
In December 2006, I began coding OpenJudis. My reasons were purely selfish. While the full texts of the decisions of the Supreme Court were already available online for free, the search engine on the government website was unreliable and inadequate for (my) advanced research needs. The formatting of the text of cases themselves was untidy, and it was cumbersome to extract passages from them. Frequently, the website appeared overloaded with users, and alternate free sources were unavailable. I couldn’t afford any of the commercial databases. My own private dissatisfaction with the quality of service, coupled with (in retrospect) my completely naive optimism, led me to attempt OpenJudis. A third crucial factor on the input side was time, and a “room of my own,” which I could afford only because of a generous fellowship I had from the Open Society Institute.

I began rashly, by serially downloading the full texts of the 25,000 decisions on the India’s Supreme CourtSupreme Court website. Once that was done (it took about a week), I really had no notion of how to proceed. I remember being quite exhilarated by the sheer fact of being in possession of twenty five thousand Supreme Court decisions. I don’t think I can articulate the feeling very well. (I have some hope, however, that readers of this blog and my fellow LII-ers will intuitively understand this feeling.) Here I was, an average Joe poking around on the Internet, and just-like-that I now had an archive of 25,000 key documents of our republic,  cumulatively representing the articulations of some of the finest (and  some not-so-fine) legal minds of the previous half-century,  sitting on my laptop. And I could do anything with them.

The word “archive,” incidentally, as Derrida informs us, derives from the Greek arkheion, the residence of the superior magistrates, the archons - those who commanded. The archons both “held and signified political power,” and were considered to possess the right to both “make and represent the law.” “Entrusted to such archons, these documents in effect speak the law: they recall the law and call on or impose the law”. Surely, or I am much mistaken, a very significant transformation has occurred when ordinary citizens become capable of housing Return of the Archonsarchives - when citizens can assume the role of archons at will.

Giddy with power, I had an immediate impulse to find a way to transmit this feeling, to make it portable, to dissipate it - an impulse that will forever mystify economists wedded to “rational” incentive-based models of human behavior. I wasn’t a computer engineer, I didn’t have the foggiest idea how I’d go about it, but I was somehow going to host my own online free database of Indian Supreme Court cases. The audacity of this optimism bears out one of Yochai Benkler’s insights about the changes wrought by the new “networked information economy” we inhabit. According to Benkler,

The belief that it is possible to make something valuable happen in the world, and the practice of actually acting on that belief, represent a qualitative improvement in the condition of individual freedom [because of NIE]. They mark the emergence of new practices of self-directed agency as a lived experience, going beyond mere formal permissibility and theoretical possibility.

Without my intending it, the archive itself suggested my next task. I had to clean up the text and extract metadata. This process occupied me for the longest time during the development of OpenJudis. I was very new to programming and had only just discovered the joys of Regular Expressions. More than my inexperience with programming techniques, however, it was the utter heterogeneity of reporting styles that took me a while to accustom myself to. Both opinion-writing and reporting styles had changed dramatically in the course of the fifty years my database covered, and this made it difficult to find patterns when extracting, say, the names of judges involved. Eventually, I had cleaned up the texts of the decisions and extracted an impressive (I thought) set of metadata, including the names of parties, the names of the judges, and the date the case was decided. To compensate for the absence of headnotes, I extracted names of statutes cited in the cases as a rough indicator of what their case might relate to. I did all this programming in PHP with the data housed in a MySQL database.

And then I encountered my first major roadblock that threatened to jeopardize the wholePunching Computer operation: I ran my first full-text Boolean search on the MySQL database and the results took a staggering 20 minutes to display. I was devastated! More elaborate searches took longer. Clearly, this was not a model I could host online. Or do anything useful with. Nobody in their right mind would want to wait 20 minutes for the results of their search. I had to look for a quicker database, or, as I eventually discovered, a super fast, lightweight indexing search engine. After a number of failed attempts with numerous free search engine software programs, none of which offered either the desired speed or the search capability I wanted, I was getting quite desperate. Fortunately, I discovered Swish-e, a lightweight, Perl-based Boolean search engine which was extremely fast and, most importantly, free - exactly what I needed. The final stage of creating the interface, uploading the database, and activating the search engine happened very quickly, and sometime in the early hours of December 22nd, 2006, OpenJudis went live. I sent announcement emails out to several e-groups and waited for the millions to show up at my doorstep.

They never did. After a week, I had maybe a hundred users. In a month, a few hundred. I received some very complimentary emails, which was nice, but it didn’t compensate for the failure of “millions” to show up. Over the next year, I added some improvements:
1) First, I built an automatic update feature that would periodically check the Supreme Court website for new cases and update the database on its own.
2) In October 2007, I coded a standalone MS Windows application of the database that could be installed on any system running Windows XP. This made sense in a country where PC penetration is higher than Internet penetration. The Windows application became quite popular and I received numerous requests for CDs from different corners of the country.
3) Around the same time, I also coded a similar application for decisions of the Central Information Commission - the apex statutory tribunal for adjudicating disputes under the Right to Information Act.
4) In February 2008, both applications were included in the DVD of Digit Magazine - a popular IT magazine in India.

Unfortunately, in August 2008, the Supreme Court website changed its design so that decisions could no longer be downloaded serially in the manner I had been accustomed to. One can only speculate about what prompted this change - since no improvements were made to the actual presentation of the cases. The only thing that changed was that one could no longer download cases serially as I’d been doing. The new format was far more difficult for me to “hack” and I abandoned the attempt. My work left me with no time to attempt to circumvent the new format.

Fortunately at the same time, an exciting new project called IndianKanoon was started by Sushant Sinha, an Indian computer science graduate at Michigan. In addition to decisions of the Supreme Court, his site covers several high courts and links up to the text of legislation of various kinds. Although I have not abandoned plans to develop OpenJudis, the presence of IndianKanoon has allowed me to step back entirely from this domain - secure in the knowledge that it is being taken forward by abler hands than mine.

Predictions, Observations, Conclusions
I’d like to end this already-too-long post with some reflections, randomly ordered, about legal information online.
1) I think one crucial area commonly neglected by most LIIs is client-side software that enables users to store local copies of entire databases. The urgency of this need is highlighted in the following hypothetical about digital libraries by Siva Vaidhyanathan (from The Anarchist in the Library):

So imagine this: An electronic journal is streamed into a library. A library Anarchist in Librarynever has it on its shelf, never owns a paper copy, can’t archive it for posterity. Its patrons can access the material and maybe print it, maybe not. But if the subscription runs out, if the library loses funding and has to cancel that subscription, or if the company itself goes out of business, all the material is gone. The library has no trace of what it bought: no record, no archive. It’s lost entirely.

It may be true that the Internet will be around for some time, but it might be worthwhile for LIIs to stop emulating the commercial database models of restricting control while enabling access. Only then can we begin to take seriously the task of empowering users into archons.

2) My second observation pertains to interface and usability. I have for long been planning to incorporate a set of features including tagging, highlighting, annotating, and bookmarking that I myself would most like to use. Additionally, I have been musing about using Web 2.0 to enable user-participation in maintenance and value-add operations - allowing users to proofread the text of judgments and to compose headnotes. At its most ambitious, in these “visions” of mine, OpenJudis looks like a combination of LII + social networking + Wikipedia.

A common objection to this model is that it would upset the authority of legal texts. In his brilliant essay A Brief History of the Internet from the 15th to the 18th century, the philosopher Lawrence Liang reminds us that the authority of knowledge that we today ascribe to printed text was contested for the longest period in modern history.

Far from ensuring fixity or authority, this early history of Printing was marked by uncertainty, and the constant refrain for a long time was that you could not rely on the book; a French scholar Adrien Baillet warned in 1685 that “the multitude of books which grows every day” would cast Europe into “a state as barbarous as that of the centuries that followed the fall of the Roman Empire.”

Europe’s non-descent into barbarism offers us a degree of comfort in dealing with Adrien Baillet-type arguments made in the context of legal information. The stability that we ascribe to law reports today is a relatively recent historical innovation that began in the mid-19th century. “Modern” law has longer roots than that.

3) While OpenJudis may look like quite a mammoth endeavor for one person, I was at all times intensely aware that this was by no means a solitary undertaking, and that I was “standing on the shoulders of giants.” They included the nameless thousands at the NIC who continue to design websites, scan and upload cases on the court websites - a Sisyphian task - and  the thousands whose labor collectively produced the free software I used : Fedora Core 4, PHP, MySQL, Swish-E. And lastly, the nameless millions who toil to make the physical infrastructure of the Internet itself possible. Like the ground beneath our feet, we take it for granted, even as the tragic recent events in Haiti in recent weeks remind us to be more attentive. (For a truly Herculean endeavor, however, see Sushant Sinha’s IndianKanoon website, about which many ballads may be composed in the decades to come.)

It might be worthwhile for the custodians of LIIs to enable users to become derivative producers themselves, to engage in “practices of self-directed agency” as Benkler suggests. Without sounding immodest, I think the real story of OpenJudis is how the Internet makes it plausible and thinkable for average Joes like me (and better-than-average people like Sushant Sinha) to think of waging unilateral wars against publishing empires.

4) So, what is the impact that all this ubiquitous, instant, free electronic access to legal information is likely to have on the world of law? In a series of lectures titled “Archive Fever,” the philosopher Derrida posed a similar question in a somewhat different context: What would the discipline of psychoanalysis have looked like, he asked, if Sigmund Freud and his contemporaries had had access to computers, televisions, and email? In brief, his answer was that the discipline of psychoanalysis itself would not have been the same - it would have been transformed “from the bottom up” and its very events would have been altered. This is because, in Derrida’s view:

The archive . . . in general is not only the place for stocking and for conserving an archivable content of the past. . . .  No, the technical structure of the archiving archive also determines the structure of the archivable content even in its coming into existence and in its relationship to the future. The archivization produces as much as it records the event.

The implication, following Derrida, is that in the past, law would not have been what itDerrida currently is if electronic archives had been possible. And the obverse is true as well:  in the future, because of the Internet, “rule of law” will no longer observe the logic of the stable trajectories suggested by its classical “analog” commentators. New trajectories will have to be charted.

5) In the same book, Derrida describes a condition he calls “Archive fever”:

It is to burn with a passion. It is never to rest, interminably, from searching for the archive right where it slips away. It is to run after the archive even if there’s too much of it. It is to have a compulsive, repetitive and nostalgic desire for the archive, an irrepressible desire to return to the origin, a homesickness, a nostalgia for the return to the most archaic place of absolute commencement.

I don’t know about other readers of VoxPopulII (if indeed you’ve managed to continue reading this far!), but for the longest time during and after OpenJudis, I suffered distinctively from this malady. I downloaded indiscriminately whole sets of data that still sit unused on my computer, not having made it into OpenJudis. For those in a similar predicament, I offer Borges’s quote with which I began this text, as a reminder of the foolishness of the notion of “Total Libraries.”

Prashant IyengarPrashant Iyengar is a lawyer affiliated with the Alternative Law Forum, Bangalore, India. He is currently pursuing his graduate studies at Columbia University in New York. He runs OpenJudis, a free database of Indian Supreme Court cases.

VoxPopuLII is edited by Judith Pratt. Editor in Chief is Rob Richards.

Bookmark and Share

Preserving Born-Digital Legal Materials…Where to Start?

Printing pressIt’s tempting to begin any discussion of digital preservation and law libraries with a mind-blowing statistic. Something to drive home the fact that the clearly-defined world of information we’ve known since the invention of movable type has evolved into an ephemeral world of bits and bytes, that it’s expanding at a rate that makes it nearly impossible to contain, and that now is the time to invest in digital preservation efforts.

But, at this point, that’s an argument that you and I have already heard. As we begin the second decade of the 21st century, we know with certainty that the digital world is ubiquitous because we ourselves are part of it. Ours is a world where items posted on blogs are cited in landmark court decisions, a former governor and vice-presidential candidate posts her resignation speech and policy positions to Facebook, and a busy 21st-century president is attached at the thumb to his Blackberry.

Medieval imageWe have experienced an exhilarating renaissance in information, which, as many have asserted for more than a decade, is threatening to become a digital dark age due to technology obsolescence and other factors. There is no denying the urgent need for libraries to take on the task of preserving our digital heritage. Law libraries specifically have a critically important role to play in this undertaking. Access to legal and law-related information is a core underpinning of our democratic society. Every law librarian knows this to be true. (I believe it’s what drew us to the profession in the first place.)

Frankly speaking, our current digital preservation strategies and systems are imperfect – and they most likely will never be perfected. That’s because digital preservation is a field that will be in a constant state of change and flux for as long as technology continues to progress. Yet, tremendous strides have been made over the past decade to stave off the dreaded digital dark age, and libraries today have a number of viable tools, services, and best practices at our disposal for the preservation of digital content.

Law libraries and the preservation of born-digital content

In 2008, Dana Neacsu, a law librarian at Columbia University Law School, and I decided to explore the extent to which law libraries were actively involved in the preservation of born-digital legal materials. So, we conducted a survey of digital preservation activity and attitudes among state and academic law libraries.

We found an interesting incongruity among our respondent population of library directors who represented 21 law libraries: less than 7 percent of the digital preservation projects being planned or underway at our respondents’ libraries involved the preservation of born-digital materials. The remaining 93 percent involved the preservation of digital files created through the digitization of print or tangible originals. Yet, by a margin of 2 to 1, our respondents expressed that they believed born-digital materials to be in more urgent need of preservation than print materials.

This finding raises an interesting question: If law librarians (at least those represented among our respondents) believe born-digital materials to be in more urgent need of preservation, why were the majority of digital preservation resources being invested in the preservation of files resulting from digitization projects?

Start/finish lineI speculate that part of the problem is that we often don’t know where to start when it comes to preserving born-digital content. What needs to be preserved? What systems and formats should we use? How will we pay for it?

What needs to be preserved? A few thoughts…

PreservesDetermining what needs to be preserved is not as complicated as it may seem. The mechanisms for content selection and collection development that are already in place at most law libraries lend themselves nicely to prioritizing materials for digital preservation, as I have learned through the Georgetown Law Library’s involvement in The Chesapeake Project Legal Information Archive. A collaborative effort between Georgetown and partners at the State Law Libraries of Maryland and Virginia, The Chesapeake Project was established to preserve born-digital legal information published online and available via open-access URLs (as opposed to within subscription databases).

So, how did we approach selection for the digital archive? Within a broad, shared project collection scope (limited to materials that were law- or policy-related, digitally born, and published to the “free Web” per our Collection Plan) each library simply established its own digital archive selection priorities, based on its unique institutional mandates and the research needs of its users. Libraries have historically developed their various print collections in a similar manner.

The Maryland State Library focused on collecting documents relating to public-policy and legal issues affecting Maryland citizens. The Virginia State Library collected the online publications of the Supreme Court of Virginia and other entities within Virginia’s judicial branch of government. As an academic library, the Georgetown Law Library developed topical and thematic collection priorities based on research and educational areas of interest at the Georgetown University Law Center. (Previously, online materials selected for the Georgetown Law Library’s collection had been printed from the Web on acid-free paper, bound, cataloged, and shelved. Digital preservation offered an attractive alternative to this system.)

To build our topical digital archive collections, the Georgetown Law Library assembled a team of staff subject specialists to select content (akin to our collection development selection committee), and, to make things as simple as possible, submissions were made and managed using a Delicious bookmark account, which allowed our busy subject specialists to submit online content for preservation with only a few clicks.

Fair use has a posseAs a research library, we preserved information published to the free Web under a claim of fair use. Permission from copyright holders was sought only for items published either outside of the U.S. or by for-profit entities. Taking our cues from the Internet Archive, we determined to respect the robots.txt protocol in our Web harvesting activities and provide rights holders with instructions for requesting the removal of their content from the archive.

Fear of duplicating efforts

We have, on occasion, knowingly added digital materials to our archive collection that were already within the purview of other digital preservation programs. There is a fear of duplicating efforts when it comes to digital preservation, but there is also a strong argument to be made for multiple, geographically dispersed entities maintaining duplicate preserved copies of important digital resources.

Repetitive dataThis philosophy, especially as relates to duplicating the digital-preservation efforts of the Government Printing Office, is currently being echoed among several Federal Depository Libraries (and prominently by librarians who contribute to the Free Government Information blog) who are supporting the concept of digital deposit to maintain a truly distributed Federal Depository Library Program. Should there ever be a catastrophic failure at GPO, or even a temporary loss of access (such as that caused by the PURL server crash last August), user access to government documents would remain uninterrupted, thanks to this distributed preservation network. Currently there are 156 academic law libraries listed as selective depositories on the Federal Depository Library Directory; each of these would be candidates for digital deposit should the program come to fruition.

Libraries with perpetual access or post-cancellation access agreements with publishers may also find it worthwhile to invest in digital preservation activities that may be redundant. Some publishers offer easy post-cancellation access to purchased digital content via nonprofit initiatives such as Portico and LOCKSS, both of which function as digital preservation systems. Other publishers, however, may simply provide subscribers with a set of CDs or DVDs containing their purchased subscription content. In these cases, it is worthwhile to actively preserve these files within a locally managed digital archive to ensure long-term accessibility for library patrons, rather than relegating these valuable digital files, stored on an unstable optical medium, to languishing on a shelf.

Law reviews and legal scholarship

Legal scholar paintingIt has been suggested that academic law libraries take responsibility for the preservation of digital content cited within their institutions’ law reviews to ensure that future researchers will able to reference source materials even if they are no longer available at the cited URLs. While there aren’t specific figures relating to the problem of citation link rot in law reviews, research on Web citations appearing in scientific journals has shown that roughly 10 percent of these citations become inactive within 15 months of the citing article’s publication. When it comes to Web-published law and policy information, our own Chesapeake Project evaluation efforts have found that about 14 percent, or 1 out of every 7, Web-based items had disappeared from their original URLs within two years of being archived.

In the near future, we may find ourselves in the position of taking responsibility for the digital preservation of our law reviews themselves, given the call to action in the Durham Statement on Open Access to Legal Scholarship. After all, if law schools end print publication of journals and commit “to keep the electronic versions available in stable, open, digital formats” within open-access online repositories, there is an implicit mandate to ensure that those repositories offer digital preservation functionality, or that a separate dark digital preservation system be used in conjunction with the repository, to ensure long-term access to the digital journal content. (It is important to note that digital repository software and services do not necessarily feature standard digital preservation functionality.)

Law student/law review editorSpeaking of digital repositories, the responsibility for establishing and maintaining institutional repositories most certainly falls to the law library, as does the responsibility for preserving the digital intellectual output of their law schools’ faculty, institutes, centers, and students (many of whom go on to impressive heights).

At the Georgetown Law Library, we’ve also taken on the task of preserving the intellectual output published to the Law Center’s Web sites.

The Preserv project has compiled an impressive bibliography on digital preservation aimed specifically at preservation services for institutional repositories (but also covering many of the larger issues in digital preservation), which is worth reviewing.

What systems and formats should we use?

FrustrationDid I mention that our current digital preservation strategies and systems are imperfect? Well, it’s true. That’s the bad news. No matter which system or service you chose, you will surely encounter occasional glitches, endure system updates and migrations, and be forced to revise your processes and workflows from time to time. This is a fledgling, evolving field, and it’s up to us to grow and evolve along with it.

But, take heart! The good news is that there are standards and best practices established to guide us in developing strategies and selecting digital preservation systems, and we have multiple options to choose from. The key to embarking on a digital preservation project is to be versed in the language and standards of digital preservation, and to know what your options are.

The language and standards of digital preservation

I have heard a very convincing argument against standards in digital preservation: Because digital preservation is a new, evolving field, complying with rigid standards can be detrimental to systems that require a certain amount of adaptability in the face of emerging technological challenges. While I agree with this argument, I also believe that it is tremendously useful for those of us who are librarians, as opposed to programmers or IT specialists, to have standards as a starting point from which to identify and evaluate our options in digital preservation software and services.

There are a number of standards to be aware of in digital preservation. Chief among these is the Open Archival Information System (OAIS) Reference Model, which provides the central framework for most work in digital preservation. A basic question to ask when evaluating a digital preservation system or service is, “Does this system conform to the OAIS model?” If not, consider that a red flag.

AwardsThe Trustworthy Repositories Audit & Certification Criteria and Checklist, or TRAC, is a digital repository evaluation tool currently being incorporated into an international standard for auditing and certifying digital archives. A small number of large repositories have undergone (or are undergoing) TRAC audits, including E-Depot at the Koninklijke Bibliotheek (National Library of the Netherlands), LOCKSS, Portico, and HathiTrust. This number can be expected to increase in the coming years.

The TRAC checklist is also a helpful resource to consult in conducting your own independent evaluations. Last year, for example, the libraries participating in The Chesapeake Project commissioned the Center for Research Libraries to conduct an assessment (as opposed to a formal audit) of our OCLC digital archive system based on TRAC criteria, which provided useful information to strengthen the project.

The PREMIS Data Dictionary provides a core set of preservation metadata elements to support the long-term preservation and future renderability of digital objects stored within a preservation system. The PREMIS working group has created resources and tools to support PREMIS implementation, available via the Library of Congress’s Web site. It is useful to consult the data dictionary when establishing local policy, and to ask about PREMIS compatibility when evaluating digital preservation options.

SilosWhile we’re on the exciting topic of metadata, the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH, not to be confused with OAIS), is another protocol to watch for, especially if discovery and access are key components of your preservation initiative. OAI-PMH is a framework for sharing metadata between various “silos” of content. Essentially, the metadata of an OAI-PMH compliant system could be shared with and made discoverable via a single, federated search interface, allowing users to search the contents of multiple, distributed digital archives at the same time.

For an easy-to-read overview of digital preservation practices and standards, I recommend Priscilla Caplan’s The Preservation of Digital Materials, which appeared in the Feb./March 2008 issue of Library Technology Reports. There are also a few good online glossaries available to help decipher digital preservation jargon: the California Digital Library Glossary, the Internet Archives’ Glossary of Web Archiving Terms, and the Digital Preservation Coalition’s Definitions and Concepts.

Open source formats and software

Open sourceOpen source and open standard formats and software play a vital role in the lifecycle management of digital content. In the context of digital preservation, open-source formats, which make their source code and specifications freely available, facilitate the future development of tools that can assist in the migration of files to new formats as technology progresses and older formats become obsolete. PDF, for example, although developed originally as a proprietary format by Adobe Systems, became a published open standard in 2008, meaning that developers will have a foundation for making these files accessible in the future.

Other open source formats commonly used in digital preservation include the TIFF format for digital images, the ARC or WARC file for Web archiving, and the Extensible Markup Language (XML) text format for encoding data or document structure information. Microsoft formats, such as Word Documents, do not comply with open standards; the proprietary nature of these formats will inhibit future access to these documents when these formats become obsolete. The Library of Congress has a useful Web site devoted to digital formats and sustainability (including moving image and sound formats), which is worth reviewing.

Open source is good for meOpen source software is also looked upon favorably in digital preservation because, similar to open source formats, the software development and design process is made transparent, allowing current and future developers to develop new interfaces to or updates to the software over time.

Open source does not necessarily mean free-of-charge, and in fact, many service providers utilize open source software and open standards in developing fee-based or subscription digital preservation solutions.

Digital preservation solutions

There are many factors to consider in selecting a digital preservation solution. What is the nature of the content being preserved, and can the system accommodate it? Is preservation the sole purpose of the system — so that the system need include only a dark archive — or is a user access interface also necessary? How much does the system cost, and what are the expected ongoing maintenance costs, both in terms of budget and staff time? Is the system scalable, and can it accommodate a growing amount of content over time? This list could go on…

Keep in mind that no system will perfectly accommodate your needs. (Have I mentioned that digital preservation systems will always be imperfect?) And there is no use in waiting for the “perfect system” to be developed. We must use what’s available today. In selecting a system, consider its adherence to digital preservation standards, the stability of the institution or organization providing the solution, and the extent to which the digital preservation system has been accepted and adopted by institutions and user communities.

Tech workersIn a perfect world, perhaps every law library would implement a free, build-it-yourself, OAIS-compliant, open-source digital preservation solution with a large and supportive user community, such as DSpace or Fedora. These systems put full control in the hands of the libraries, which are the true custodians of the preserved digital content. But, in practice, our law libraries often do not have the staff and technological expertise to build and maintain an in-house digital preservation system.

As a result, several reputable library vendors and nonprofit organizations have developed fee-based digital preservation solutions, often built using open-source software. The Internet Archive offers the Archive-It service for the preservation of Web sites. The Stanford University-based LOCKSS program provides a decentralized preservation infrastructure for Web-based and other types of digital content, and the MetaArchive Cooperative provides a preservation repository service using the open-source LOCKSS software. The Ex Libris Digital Preservation System and the collaborative HathiTrust repository both support the preservation of digital objects.

For The Chesapeake Project, the Georgetown, Maryland State, and Virginia State Law Libraries use OCLC systems: the Digital Archive for preservation, coupled with a hosted instance of CONTENTdm as an access interface.

SalesmanIn our experience, working with a vendor that hosted our content at a secure offsite location and managed system updates and migrations allowed us to focus our energies on the administrative and organizational aspects of the project, rather than the ongoing management of the system itself. We were able to develop shared project documentation, including preferred file format and metadata policies, and conduct regular project evaluations. Moreover, because our project was collaborative, it worked to our advantage to enlist a third party to store all three libraries’ content, rather than place the burden of hosting the project’s content upon one single institution. In short, working with a vendor can actually benefit your project.

The ultimate question: How will we pay for it?

We still seem to be in the midst of a global economic recession that has impacted university and library budgets. Yet, despite budget stagnation, there has been a steady increase in the production of digital content.

SkydiversDigital preservation can be expensive, and law library staff members with digital preservation expertise are few. The logical solution to these issues of budget and staff limitations is to seek out opportunities for collaboration, which would allow for the sharing of costs, resources, and expertise among participating institutions.

LIPA logoCollaborative opportunities exist with the Library of Congress, which has created a network of more than 130 preservation partners throughout the U.S., and the law library community is also in the process of establishing its own collaborative digital archive, the Legal Information Archive, to be offered through the Legal Information Preservation Alliance, or LIPA.

During the 2009 AALL annual meeting, LIPA’s executive director announced that The Chesapeake Project had become a LIPA-sanctioned project under the umbrella of the new Legal Information Archive. As a collaborative project with expenses shared by three law libraries, The Chesapeake Project’s costs are currently quite low compared to other annual library expenditures, such as those for subscription databases. These annual costs will decrease as more law libraries join this initiative.

Retro librariansI firmly believe that law libraries must invest in digital preservation if we are to remain relevant and true to our purpose in the 21st century. The core reason libraries exist is to build collections, to make those collections accessible, to assist patrons in using our collections, and to preserve our collections forever. No other institution has been created to take on this responsibility. Digital preservation represents an opportunity in the digital age for law libraries to reclaim their traditional roles as stewards of information, and to ensure that our digital legal heritage will be available to legal scholars and the public well into the future.

Sarah RhodesSarah Rhodes is the digital collections librarian at the Georgetown Law Library in Washington, D.C., and a project coordinator for The Chesapeake Project Legal Information Archive, a digital preservation initiative of the Georgetown Law Library in collaboration with the State Law Libraries of Maryland and Virginia.

VoxPopuLII is edited by Judith Pratt.  Editor in Chief is Rob Richards.

Bookmark and Share

Managing practical memories of legal organizations: Beyond document and case management

Improve MemoryOrganizations, memories, and routines

Nowadays, information and knowledge management and retrieval systems are widely used in organizations to support many tasks.  These include dealing with information overload, supporting decision-making processes, or providing the members of these organizations with the kind of knowledge and information they require to solve problems.

In particular, knowledge management systems may also be understood as a way to manage organizational memories.  These are defined, by J.G. March and H.A. Simon, as “repertories of possible solutions to classes of problems that have been encountered in the past and repertories of components of problem solutions.” Thus, these memories reveal proven mechanisms for problem-solving, and bring organizational needs into focus. In turn, these mechanisms enable organizations to reach their goals and help them make decisions in highly uncertain environments.

Therefore, the ability to store and retrieve these mechanisms and to manage organizational memories is central.  It allows organizations to learn from their previous decision-making processes, successes and failures.  In effect, in both public and private organizations, successful mechanisms typically become routines, that is, adaptive rules of behavior.  With such routines available to them, organizations only need to search for further solutions or new alternatives when they fail to detect stored problem-solving or decision-making routines in their memory. However, routines are not only problem-solving devices.  More importantly, they are learning devices to turn inexperienced professionals into experts.

Organizational memories contain expert knowledge that can guide both individual and organizational behavior, thus fostering the essential role of organizations as reliable and stable patterns of events. Moreover, organizations will be able to learn from experience as long as they are capable of managing their own memories.

The effectiveness of such knowledge management systems, once implemented, may be measured by an organization’s ability to:

  1. store ready-made solutions in their memory;
  2. search and retrieve these solutions (or routines) when problems arise;
  3. distribute knowledge about these routines among organization members so that they can make adequate decisions; and
  4.  attach new decisions to existing templates, or store them as new ready-made solutions.

Organizational Memories

Beyond document and case management

In the legal domain, the design and implementation of knowledge management tools has been stimulated by the need to cope with the volume and complexity of knowledge and information produced by current legal organizations.

Applied to judicial administration, these systems typically help with storing, managing, and keeping track of documents used in or related to legal proceedings. Obviously these documents include judicial decisions (and templates and protocols to produce them), but they are specially designed to enable an efficient management of secondary but highly relevant documents issued by the court personnel — such as court orders, decrees, notifications, and transcripts of declarations. These systems help judicial units maintain the information and knowledge that their members need in order to work effectively on every stage of the procedure.

In fact, legal bureaucratic organizations are usually able to maintain and distribute this type of knowledge among their members.  The implementation of document and case management systems has been critical to this achievement.

Nevertheless, legal organizations (and organizations in general) produce other types of organizational knowledge than those related to document or case management. Systematic loss of such knowledge, or lack of reuse, may increase uncertainty in inefficient organizational and individual behavior.

In particular, research conducted at the Institute of Law and Technology (IDT) on the integration of junior Spanish judges within their institutional environment shows that, while knowledge management systems implemented in these legal organizations typically deal with document and case management, most of the problems that arise during judicial daily work are of a different nature.

The case of the on-call service in courtsJudge

Spain’s legal system is part of the mainstream of European civil law systems, which means that, in contrast to the common law tradition, Spanish judges are specialized civil servants working in a branch of a bigger bureaucratic organization – that is, the State. Ideally, Spanish judges decide cases by applying a corpus of written legal norms that, in principle, should be a complete and coherent repertoire of solutions.

Due to the existence of specialized jurisdictions, the Spanish judicial system is both highly centralized and complex. Courts at the local level, known as Courts of First Instance and Magistrate, are the entry point of the  Spanish judicial system.  These are the courts where junior judges start their professional career, after an average of seven years of mainly theoretical training.

These judges handle most civil cases, decide upon minor criminal offenses, and start most preliminary criminal proceedings that will be later tried by higher courts. Due to the nature of their responsibilities, the judges appointed to these courts are on-call for 24 hours during an 8-day period, on a rotation basis, to attend any incoming criminal or civil case.

Spain’s Court System

In 2005, I took part in a research project at the IDT that carried out an empirical study of the main professional problems affecting Spanish junior judges in their daily work.

Most of the interviewed judges felt that the on-call service was the main source of both professional problems and stressful situations. During this  service, judges faced problems that could be regarded as being of a behavioral, practical nature. Some of those were triggered by both legal and non-legal actors not following established protocols. However, most problems arose from the lack of specific guidelines to deal with situations under time pressure and urgency.

Regardless of the kind of problem that arise during the on-call service, judges have to make decisions, and they usually solve them by informally contacting senior colleagues (which is unlikely when a situation takes place at 4 a.m.), or relying on their intuition and common sense.

With time, judges acquire a great amount of this practical experience. Nevertheless, the Spanish judicial system requires them to promote to upper-level positions after a short period. In these new positions, they will hardly reuse this knowledge, and at the same time new junior judges will fill those low court positions and will have to start the learning process from scratch. Since the legal organization does not provide their members with the means of sharing this practical knowledge, a part of the memory of these organizational units is systematically lost or only informally distributed.

According to the previously presented process for organizational memory management, once certain solutions have been successfully worked out, the knowledge attached to them would enrich the organizational memory with practical routines to deal with similar on-call situations. However, the fact that legal knowledge management systems have typically focused on procedural – that is, written — legal documents hampers their ability to manage this practical knowledge.

In conclusion, the fact that a legal organization neither offers nor maintains a set of ready-made solutions for usual problems faced by its professionals not only challenges their own members’ professional role, but it also handicaps the learning capacity of the organization itself. In turn, it makes particular organizational units more vulnerable to potential problems, such as conflict of interests or organizational slack.

The challenge of practical knowledge

Important advances in knowledge engineering, text mining, and information retrieval have been made that enable formalizing, interpreting and storing different types of knowledge.  Therefore, knowledge management systems can more easily be implemented to improve organizational design.

However, professional, practical knowledge cannot be easily incorporated into the organizational memory, because it may not be found in documents from which it may be directly acquired, extracted and modeled. Furthermore, most organizations may be unaware of the existence of certain key practical knowledge among the staff, and thus also unaware of the need to include it into their organizational memories.

From my experience in the research involving the on-call service, it seems imperative that serious empirical assessments of the processes by which decisions are made — and thus knowledge is produced — within legal organizations should be incorporated in every project, along with the knowledge modeling techniques that can  build more efficient knowledge- and information- management tools.

Joan-Josep VallbéJoan-Josep Vallbé is a researcher at the Institute of Law and Technology (IDT) of the Autonomous University of Barcelona. He is also a lecturer in political science at the Political Science Department of the University of Barcelona. He recently finished and presented his PhD thesis on decision-making within organizational frameworks. His research interests include the use of text mining and text statistics for organizational analysis, and the development of performance indicators on the management of organizational memories.

VoxPopuLII is edited by Judith Pratt.

Bookmark and Share

VoxPop poses a Prisoner’s Dilemma (sort of)

monogamy-as-prisoners-dilemma-1.gifWe pride ourselves on the murkiness of our authorial invitation process at VoxPop.  How are authors selected, exactly?  Nobody knows, not even the guy who does the selecting.  We’d like to lift the veil briefly, by asking for volunteers to help us with a particular area we’re interested in.

We’d like to run a point-counterpoint style dual entry on the subject of authenticity in legal documents.  Yes, we’ve treated the issue before.  But just today we were fishing around in obscure corners of the LII’s WEX legal dictionary, and we found this definition of the Ancient Document Rule:

Under the Federal Rules of Evidence, a permissible method to authenticate a document. Under the rule, if if a document is (1) more than 20 years old; (2) is regular on its face with no signs of obvious alterations; and (3) found in a place of natural custody, or in a place where it would be expected to be found, then the document is found to be prima facie authenticated and therefore admissible.

The specific part of FRE involved — Rule 901 –  is here.

Why would or wouldn’t we apply this thinking to large existing document repositories — such as the backfile of federal documents at GPO?  Is 20 years a reasonable limit?  Should it be 5, or 7?  What does “where found” mean?  We’d like to see two authors — one pro, one con — address these questions in side-by-side posts to VoxPop.

Where does the Prisoner’s Dilemma come in?  Well… if we get no volunteers, we won’t run this.  If we get volunteers on only one side of the issue, we’ll run a one-sided piece.  So, it’s up to you to decide whether both sides will be heard or not.  The window for volunteering will close on Tuesday; send your requests to the murky selector at { tom -  dot -  bruce - att - cornell - dot - edu }.

We’d also be happy to hear from others who want to write for VoxPop — the new year is fast approaching, and we need fresh voices.  Speak up!

Bookmark and Share

Duopolies, web usability, and legal research instruction

Kangaroo BoxingIt’s been a rocky year for West’s relationship with law librarians.

First, the company declined to participate in this year’s American Association of Law Libraries Price Index for Legal Publications. This led AALL to return West’s sponsorship check for the 2009 AALL Annual Meeting. For attendees, this decision was somewhat academic, as West still occupied a large space in the Exhibitor Hall and once again hosted a well-attended Customer Appreciation Party.

Shortly after the conference, West issued an email promotion to customers that asked:

Are you on a first name basis with the librarian? If so, chances are, you’re spending too much time at the library. What you need is fast, reliable research you can access right in your office.

Many law librarians felt publicly insulted by West, expressing their outrage on listservs, blogs, Twitter, Facebook and anywhere legal information professionals could be found that week.

Most recently, West released a video of University of California, Berkeley professor and law librarian Bob Berring explaining the advantages of “free market” premium legal databases over free legal information websites run by “volunteers:”

It’s not like legal information is going to the Safeway or to buy food. You’re not buying a packaged thing. If you say I need to find statutes about this, or what’s the administrative regulations on that, or have the courts spoken about this, you have to go find it. And just saying it’s all out there — I mean, the ocean is all out there, but you need a map, and you need a compass, and… you need a GPS system now. You need someone to tell you how to get there. That’s why librarians are even more important now, because they’ve got the GPS system. But you have to be working with organized information. The value added by folks like West, where the information is edited as it goes in, and it’s classified, and the hooks are put in — easy hooks for the people who I think are sloppy researchers just playing around on the tops, really sophisticated hooks for the people who take the time to learn how to really use the system and understand it. You just can’t say enough about those kind of things, because to say to the average person, “Well, it’s all out there, the law is all out there,” well, it’s a big bunch of goo.

Adding value to the goo

Unfortunately, the West/Lexis duopoly doesn’t provide consumers with the expected advantages of a free market economy. Neither vendor uses price as a marketing strategy, and both negotiate electronic database contracts with customers rather than charge a flat rate. Considering that West has increased its own annual profit margin to 30% or higher in recent years, while raising the cost of supplements at a rate far exceeding inflation, prices are hardly being driven by free market trends, making a price war seem unlikely. (This doesn’t mean consumers aren’t hopping mad about the price of legal information. They are.)

Instead, at least in the database market, both companies rely on content and features to market their products. Each July at the AALL Annual Meeting, both Lexis and Westlaw use their exhibitor space to educate attendees about whatever new databases and customer conveniences will be rolled out in the coming months.

Thomas Edison and carI often compare these annual feature introductions to the evolution of automobile engines, thanks to a childhood spent watching my father work on the family cars. At first Dad knew every nook and cranny of our vehicles, and there was little he couldn’t repair himself over the course of a few nights. As we traded in cars for newer models, his job became more difficult as engines became more complex. None of the automakers seemed to consider ease of access when adding new parts to an automobile engine. They were simply slapped on top of the existing ones, making it harder to perform simple tasks, like replacing belts or spark plugs.

Lexis and Westlaw also add new components on top of the old ones. To generalize, Lexis tends to add new features in the form of tabs (think “Total Litigator”) while Westlaw adds them in sidebars (think “Results Plus”), to the point where once clean interfaces are now littered with disparate elements sharing adjacent screen real estate.

Finding fault with filters

In a talk at last year’s Web 2.0 Expo in New York, author Clay Shirky stated that the fundamental information problem is not “information overload,” but “filter failure.” Shirky summarized this position in a recent interview with Yale Law School’s Jason Eiseman:

As I’ve often said, there’s no such thing as information overload. It’s filter failure, right? From the minute we had more books to read than the average literate person could read in a lifetime, which depending on the region you’re talking about happened someplace between the 16th and 19th century, from that moment on we’ve always had information overload. That’s the modern condition. What’s happening, I think, to our sense that we’re suffering acutely from information overload now is that the old professional filters have broken. They’re simply not adequate to contain a world in which anyone can put material out in the public.

Whether or not you agree with Shirky’s assessment, it provides an interesting framework with which to view the Lexis/Westlaw information problem. If the primary legal information within these systems are “a big bunch of goo,” then secondary resources, headnotes, subject-specific organization, and other finding aids are the filters necessary to cope with information overload.

For West’s “Are you on a first name basis with the librarian?” promotion to work, Westlaw has to provide the “fast, reliable research you can access right in your office” that it advertises. Assuming for purposes of this essay that the presence of relevant content isn’t an issue (an assumption with which many will quibble), this means the system’s filters need to provide reliable information quickly.

There’s no question that both West and Lexis provide an abundance of subject-specific organization, particularly for case law. Headnotes, topics, digests, tables of authority, citators and cross-references to secondary resources all go above and beyond what researchers find in most freely available resources. But these add-ons, or filters, are only effective if presented in a usable manner.

Bridge CollapseFor an assignment in one of my legal research classes this semester, I provided a fact pattern and asked students to perform a Natural Language search in Westlaw of American Law Reports to find a relevant annotation. In a class of only 19 students, six of them answered with citations to resources other than ALR, including articles from American Jurisprudence, Am.Jur. Proof of Facts, and Shepards’ Causes of Action. The problem, it turned out, wasn’t that they had searched the wrong database. Every one of them searched ALR correctly, but those six students mistook Westlaw’s Results Plus, placed at the top of a sidebar on the results page, for their actual search results. Filter failure, indeed.

On another assignment, students were expected to find a particular statutory code section using a secondary resource, view the code section, then navigate to the code’s table of contents to browse related sections codified nearby. This proved nearly impossible for most of them, as the code section they accessed loaded in a pop-up window with no sidebar, thus providing no visible link to the table of contents. The problems didn’t stop there. Even once I told them to click the “Maximize” button at the bottom of the pop-up window, which reloads the code section into the main window with a sidebar, upon clicking the TOC link, anyone using Firefox for Windows loaded a blank page. (To resolve this error, you have to right-click on the frame where the TOC should’ve loaded and select “This Frame -> Reload This Frame.”)

While completing another portion of the statutory code assignment in Lexis, nearly half the students in the class became confused because numerous clickable links throughout the system display as plain black text which only appear as links when the user hovers over them. Also, within statutory code sections, the navigation links provided within the case annotation index routinely loaded an error page rather than navigating to the proper section further down the page.

This doesn’t even address basic usability issues such as broken back button functionality, heavy usage of frames, lack of permanent document URLs (Lexis and Westlaw each have external workarounds for this), and reliance on pop-up windows (something blocked by default on most browsers). In addition, Lexis still doesn’t support users accessing the system with Firefox for Mac.

The wide availability of secondary resources, annotated codes, and numerous other value-added content provides a clear advantage for Lexis and Westlaw over free and mid-level legal information services, and that’s why everyone continues to pay their steep prices. But so long as the systems themselves don’t provide usable access, each still suffers from filter failure.

Is there an incentive to improve?

VAB Under ConstructionThere is evidence that the companies have the expertise to provide a better user experience. West has two electronic versions (one for desktop computers and one for the iPhone) of Black’s Law Dictionary available that offer more intuitive functionality than what’s provided for the same text in Westlaw. Don’t expect a price break, however. The desktop version of Black’s has a list price of $99, while the iPhone version costs $49.99. By comparison, the print version of Black’s Standard Ninth Edition, which likely has substantially higher production costs than the electronic equivalents, carries a list price of $75, meaning iPhone users receive a slightly lower price while desktop users pay even more. Worse still, both electronic versions as well as the content in Westlaw contain the text of the outdated 8th Edition.

Lexis also has an iPhone app, and it’s a free download that requires an existing Lexis password to function. Substantially simplified from its traditional web interface, the user experience is clean and easy to understand. Yet while one can retrieve both primary and secondary documents, as well as Shepardize documents, none of the documents in this interface contain links, only plain citations that must be copied and pasted into the search form to be retrieved.

Of course, the bigger problem with these progressive moves is that they don’t address any of the existing problems in the web interfaces for either product. No one is redesigning the engine, so to speak. These are simply variations of the now traditional roll-out of new features and functionality on top of existing ones that still have the same significant issues.

This is the problem with a duopoly. There aren’t enough producers in the economy to assert significant pressure on either to improve usability. Consumer power is also limited because multi-year contracts prevent easy product substitution, and there’s only one true product substitute available. The producers dictate the competition, and thus far they have dictated a content competition (”The Tabs and Sidebars War”), rather than a usability one — or even a price one.

There are events on the horizon that could impact this stalemate. Bloomberg continues to develop its own legal research product, allegedly designed to be a Westlaw/Lexis competitor. Perhaps this third producer will see value in using price or usability to gain market share. Lewis & Clark law student (and VoxPopuLII author) Robb Shecter recently introduced OregonLaws.org, a free repository of Oregon law that currently features the entire Oregon Revised Statutes and a legal glossary. The site’s simple, logical navigation reflects current web usability norms more accurately than either Lexis or Westlaw, and for a “micro-fee” users can bookmark code sections for quick access and save unlimited “human readable” research trails. And, of course, Google Scholar just added “Legal opinions and journals.” It’s far too early to know if it will become a true player in legal information, but Google always has the potential to be a game changer with anything it does.

What can legal research instructors DO?

Despite the presence of these interesting new projects, consumers can’t expect a quick usability turnaround from Lexis and Westlaw, nor the sudden presence of a competitor with the same depth and breadth of content. History doesn’t support such an expectation, leaving legal research instructors in a precarious position.

Many schools leave Lexis/Westlaw training solely in the hands of the companies’ representatives. While a company rep will be knowledgeable about the system, he will also paint the product in the best possible light for the company, glossing over usability issues and emphasizing new features. After all, law students are future customers, so this instruction is part of a long-term sales pitch.

In order to provide a balanced picture of these systems, legal research instructors need to provide their own Lexis and Westlaw training. This can either be in place of or in addition to what’s provided by company reps, but students need to hear the voice of an experienced researcher who doesn’t rely on either company for a paycheck. Some may see this as an implied institutional endorsement of the high-priced systems, but the reality is most students will end up working with one or both of these systems on a daily basis after graduation. Ignoring this would be an educational disservice. Any sense of endorsement can be addressed through thorough coverage of the usability limitations and a short education on the price realities. Instructors can also discuss the availability of lower priced databases for lawyers who simply want access to primary legal materials.

If the market is going to change, it won’t be because Lexis and Westlaw spontaneously decide to improve products that generate significant profits already. Until then, legal researchers need to be better educated on the limitations of these systems so that their work product isn’t compromised by over-reliance on a duopoly disguised as a free market.

Tom BooneTom Boone is a reference librarian and adjunct professor at Loyola Law School in Los Angeles. He’s also webmaster and a contributing editor for Henderson Valley Eggs, a “themed information collective” website covering law library issues.

VoxPopuLII is edited by Judith Pratt

Bookmark and Share

Surveying is Hard

Where the culture of assessment meets actual learning about users.

These days, anyone with a pulse can sign up for a free Surveymonkey account, ask users a set of questions and call it a survey. The software will tally the results and create attractive charts to send upstairs, reporting on anything you want to know about your users. The technology of running surveys is easy, certainly, but thinking about what the survey should accomplish and ensuring that it meets your needs is not. Finding accurate measures — of the effectiveness of instructional programs, the library’s overall service quality, or efficiency, or of how well we’re serving the law school’s mission — is still something that is very, very hard. But librarians like to know that programs are effective, and Deans, ranking bodies, and prospective students all want to be able to compare libraries, so the draw of survey tools is strong. The logistics are easy, so where are the problems with assessment?

Between user surveys and various external questionnaires, we gather a lot of data about law library stackslaw libraries. Do they provide us with satisfactory methods of evaluating the quality of our libraries? Do they offer satisfactory methods for comparing and ranking libraries? The data we gather is rooted in an old model of the law library where collections could be measured in volumes, and that number was accepted as the basis for comparing library collections. We’ve now rejected that method of assessment, but struggle nevertheless for a more suitable yardstick. The culture of assessment from the broader library community has also entered law librarianship, bringing standardized service quality assessment tools. But despite these tools, and a lot of work on finding the right measurement of library quality, are we actually moving forward, or is some of this work holding us back from improvement? There are two types of measurement widely used to evaluate law libraries: assessments and surveys, which tend to be inward-looking, and the use of data such as budget figures and square footage, which can be used to compare and rank libraries. These are compared below, followed by an introduction to qualitative techniques for studying libraries and users.

(Self)Assessment

There are many tools available for conducting surveys of users, but the tool most familiar to law librarians is probably LibQUAL+®. Distributed as a package by ARL, LibQUAL+® is a “suite of services that libraries use to solicit, track, understand, and act upon users’ opinions of service quality.” The instrument itself is well-vetted, making it possible for libraries to run it without any pre-testing.

The goal is straightforward: to help librarians assess the quality of library services by asking patrons what they think. So, in “22 items and a box,” users can report on whether the library is doing things they expect, and whether the librarians are helpful. LibQUAL+® aligns with the popular “culture of assessment” in libraries, helping administrators to support regular assessment of the quality of their services. Though LibQUAL+® can help libraries assess user satisfaction with what they’re currently doing, it’s important to note that the survey results don’t tell a library what they’re not doing (and/or should be doing). It doesn’t identify gaps in service, or capture opinions on the library’s relevance to users’ work. And as others have noted, such surveys focus entirely on patron satisfaction, which is contextual and constantly shifting. Users with low expectations will be satisfied under very different conditions that users with higher expectations, and the standard instrument can’t fully account for that.

Ranking Statistics

The more visible or external data gathering for law libraries occurs annually, when libraries answer questionnaires from their accrediting bodies. The focus of these instruments is on numbers: quantitative data that can be used to rate and rank law libraries. The ABA’s annual questionnaire counts both space and money. Site visits every seven years add detail and richness to the picture of the institution and provide additional criteria for assessment against the ABA’s standards, but the annually reported data is primarily quantitative. The ABA also asks which methods libraries use “to survey student and faculty satisfaction of library services”, but they don’t gather the results of those surveys.

The ALL-SIS Statistics Committee has been working on developing better measures for the quality of libraries, leading discussions on the AALLNet list (requires login) and inviting input from the wider law librarian community, but this is difficult work, and so far few Big Ideas have emerged. One proposal suggested reporting, via the ALL-SIS supplemental form, responses from students, faculty, and staff regarding how the library’s services, collections and databases contribute to scholarship and teaching/learning, and how the library’s space contributes to their work. This is promising, but it would require more work to build rich qualitative data.

Another major external data gathering initiative is coordinated by the ARL itself, which collects data on law libraries as part of their general data collection for ARL-member (University) libraries. ARL statistics are similarly heavy on numbers, though: their questionnaire counts volumes (dropped just this year from the ABA questionnaire) and current serials, as well as money spent.

Surveys ≠ Innovation

When assessing the quality of libraries, two options for measurement dominate: user satisfaction, and collection size (using dollars spent, volumes, space allocated, or a combination of those). Both present problems: the former is simply insufficient as the sole measure of library quality, and is not useful for comparing libraries, and the latter ignores fundamental differences between the collection development and access issues of different libraries, making the supposedly comparable figures nearly meaningless. A library that is part of a larger university campus will likely have a long list of resources paid for by the main library, and a stand-alone law school won’t. Trying to use the budget figures for these two libraries to compare the size of the collection or the quality of the library would be like comparing apples and apple-shaped things. There’s also something limiting about rating libraries primarily based on their size; is the size of the collection, or the money spent on the collection, the strongest indicator of quality? The Yankees don’t win the World Series every year, after all, despite monetary advantages.

The field of qualitative research (a.k.a. naturalistic or ethnographic research) could offer microphones.jpg some hope. The techniques of naturalistic inquiry have deep roots in the social sciences, but have not yet gained a stronghold in library and information science. The use of naturalistic techniques could be particularly useful for understanding the diverse community of law library users. While not necessarily applicable as a means for rating or ranking libraries, the techniques could lead to a greater understanding of users of law libraries and their needs, and help libraries to develop measures that directly address the match between library and users’ needs.

How many of us have learned things about a library simply by having lunch with students, or chatting with faculty at a college event, or visiting another library? Participants in ABA Site Visits, for instance, get to know an institution in a way that numbers and reports can’t convey. Naturalistic techniques formalize the process of getting to know users, their culture and work, and the way that they use the library. Qualitative research could help librarians to see past habits and assumptions, teaching us about what our users do and what they need. Could those discoveries also shape our definition of service quality, and lead to better measures of quality?

In 2007, librarians at the University of Rochester River Campus conducted an ethnographic study with the help of their resident Lead Anthropologist (!). (The Danes did something similar a few years ago, coordinated through DEFF , the Danish libraries’ group.) The Rochester researchers asked — what do students really do when they write papers? The librarians had set goals to do more, reach more students, and better support the University’s educational mission. Through a variety of techniques, including short surveys, photo diaries, and charrette-style workshops, the librarians learned a great deal about how students work, how their work is integrated into their other life activities, and how students view the library. Some results led to immediate pilot programs: a late-night librarian program during crunch times, for instance. But equally important to the researchers was understanding the students’ perspective on space design and layout in preparation for a reading room renovation.

Concerns about how libraries will manage the increased responsibilities that may accrue from such studies are premature. Our service planning should take into account the priorities of our users. Perhaps some longstanding library services just aren’t that important to our users, after all. Carl Yirka recently challenged librarians on the assumption that everything we currently do is still necessary — and so far, few have risen to the challenge. Some of the things that librarians place value on are not ours to value; our patrons decide whether Saturday reference, instructional sessions on using the wireless internet, and routing of print journals are valuable services. Many services provided by librarians are valuable because they’re part of our responsibility as professionals: to select high-quality information, to organize and maintain it, and to help users find what they need. But the specific ways we do that may always be shifting. Having the Federal Reporter in your on-site print collection is not, in and of itself, a valuable thing, or an indicator of the strength of your collection.

“Measuring more is easy; measuring better is hard.”
Charles Handy (from Joseph R. Matthews, Strategic Planning and Management for Library Managers (2005))

Thinking is Hard

upside down toddler

Where does this leave us? The possibilities for survey research may be great, and the tools facile, but the discussion is still very difficult. At the ALL-SIS-sponsored “Academic Law Library of 2015″ workshop this past July, one small group addressed the question of what users would miss if the library didn’t do what we currently do. If functions like purchasing and management of space were absorbed by other units on campus or in the college, what would be lost? Despite the experience of the group, it was a very challenging question. There were a few concrete ideas that the group could agree were unique values contributed by law librarians, including the following:

  • Assessment of the integrity of legal information
  • Evaluation of technologies and resources
  • Maintaining an eye on the big picture/long term life of information

The exercise was troubling, particularly in light of statements throughout the day by the many attendees who insisted on the necessity of existing services, while unable to articulate the unique value of librarians and libraries to the institution. The Yirka question (and follow-up) was a suggestion to release some tasks in order to absorb new ones, but we ought to be open to the possibility that we need a shift in the kind of services we provide, in addition to balancing the workload. As a professional community, we’re still short on wild fantasies of the library of the future, and our users may be more than happy to help supply some of their own.

Doing Qualitative work

Could good qualitative research move the ball forward? Though good research is time-consuming, it could help us to answer fundamental questions about how patrons use legal information services, how they use the library, and why they do or don’t use the library for their work. Qualitative research could also explore patron expectations in greater detail than quantitative studies like LibQual+, following up on how the library compares to other physical spaces and other sources of legal information that patrons use.

It’s important that librarians tap into resources on campus to support survey research, though, whether qualitative or quantitative. When possible, librarians should use previously vetted instruments, pretested for validity and reliability. This may be a great opportunity for AALL, working with researchers in library and information science to build a survey instrument that could be used by academic law libraries.

Stephanie DavidsonStephanie Davidson is Head of Public Services at the University of Illinois in Champaign. Her research addresses public services in the academic law library, and understanding patron needs and expectations. She is currently preparing a qualitative study of the information behavior of legal scholars.

VoxPopuLII is edited by Judith Pratt

Bookmark and Share

Venture Capital and Peer Production

This blog entry focuses on the need for more and better software to reap the benefits of the legal information treasures available. As you’ll see, this turns out to be more complex than one may think.
Network

For commercial software developers, it is surprisingly hard to stay radically innovative, especially when they are successful. To start with, software development itself is a risky undertaking.  Despite five decades of research in managing this development process, projects frequently are late, over budget, and much less impressive than originally envisioned.  IBM once famously bet the company on a new computer platform, but the development of the associated operating system was so much behind schedule that it threatened IBMs’ existence. Management was tempted to throw ever more human resources at the development problem, only to discover that this in itself causes further delays  -  leaving us with the useful term “mythical man-month”.

But the difficulty in envisioning hurdles in a complex software engineering project is not the only source of risk for innovative software developers. Successful developers may pride themselves on a large and increasing user base.  Such success, however  creates its own unintended constraints.

Customers will dislike rapid change in the software they use, as they will have to relearn how to operate it, may have to expend efforts on converting data to new formats, and/or may need to adjust the preferences and customization options they utilized. This gets worse if the successful software is the platform for a thriving ecosystem of other developers and service providers. Any severe change in the underlying platform means that those living in it have to adapt their code. Each time a customer has to invest time in relearning a software product, it offers competing software providers a chance to nab a customer. This prompts software developers, especially very successful ones, to be relatively conservative in their plans for updates and upgrades. They don’t want to undermine their market success, and thus will be tempted to opt for gradual rather than radical innovation when designing the next version of their successful wares.

We have seen it over and over again: Microsoft’s Word, Powerpoint and Excel have gone through numerous iterations over the past decades, but the basic elements of the user experience have changed relatively little. Similarly, concerns for legacy code by third party developers have been a key holdback for Microsoft’s Windows product team. Don’t break something  -  even if it is utterly ancient and inefficient, buggy and broken  -  as long as it works for the customers.  That’s the understandable, but frustrating, mantra.

Or think of Google: the search engines’ user interface hasn’t seen any major changes since its inception more than a decade ago. Only Apple, it seems, has been getting away with radical innovation that breaks things and forces users to relearn, to convert data, and to expend time. That is the advantage of a small but fervently loyal user base. But even Apple has recently seen the need to take a breather in radical change with Snow Leopard.

And in the legal information context, think of Westlaw and Lexis/Nexis.  Despite direct competition with one another,  when was the last time we saw a truly radical innovation coming from either of these two companies?

Radical innovation requires the will to risk alienating users. As companies grow and pay attention to shareholder expectations, the will-to-risk often wanes. With radical innovation in the marketplace, the challenge lies in the time axis. If one is very successful with a radically new product at time T, it is hard to throw that product away, and try to risk radically reinventing it, for T+1.

On a macro level, we combat this conservative tendency against radical change by providing incentives for innovative entrepreneurs to develop and market competing offerings. If enough customers are unhappy with Excel, perhaps entrepreneurs with radically new and improved concepts of how to crunch and manage numbers in a structured way will seize the opportunity and develop a new killer app that they’ll pit against Excel. That’s enormously risky, but also offers the potential of very steep rewards. Angel investors and venture capitalists thrive on providing the lubricant (in the form of financial resources) for such high risk, high reward propositions. They flourish on the improbable. What they don’t like are “small ideas.”  (It happened to me, too, when I pitched innovative ideas to VCs; they thought my ideas had a very high likelihood of success, but not enough of a lever to reap massive returns. Obviously I was dismayed, but they were right: it is what we need if we want to incentivize radical innovation.)

This also implies, however, that for venture capital to work, markets need be large enough to offer high rewards for risky ventures. If the market is not large enough, venture capital may not be available for a sufficient number of radical innovators to keep pushing the limit. Therefore, existing providers may survive for a long time with incremental innovations. Perhaps that is why Westlaw and Lexis are still around, even though they could fight the tendency toward piecemeal development if they wanted to.

skunkOther large corporations, realizing the bias towards incremental innovation, have repeatedly resorted to radical steps to remedy the problem. They have established skunk works, departments  that are largely disconnected from the rest of the company, freeing the members to try revolutionary rather than evolutionary solutions. Sometimes companies acquire a group of radically innovative engineers from the outside, to inject some fresh thinking into internal development processes that may have become too stale.

Peer production models, almost always based on an open source foundation, are not dependent on market success. (On the drivers of peer production see Yochai Benkler’s  “The Wealth of Networks”). They are not profit driven, and thus may put less pressure on the developers to abstain from radical change. Because Firefox does not have to win in the marketplace, its developers can, at least in theory, be bolder than their commercial counterparts.

Unfortunately, open-source peer produced software may also lose its appetite for radical innovation over time  -  not because of monetary incentives, but because of the collaborative structures utilized in the design process. If a large number of volunteering bug reporters, testers, and coders with vastly differing values and preferences work on a joint project, it is likely that development will revert towards a common denominator of what needs to be done, and thus be inherently gradual and evolutionary, rather than radical. Of course, a majority of participants may at rare moments get together and agree on a revolution – much like those in what then was a British colony in 1776.  But that is the brilliant exception to a rather boring rule.

Indecisiveness that stems from too small a common ground, however, is not the only danger. On the other end of the spectrum, communities and groups with too many ties among each other cause a mental alignment, or “group think,” that equally stifles radical innovation. Northwestern University professor Brian Uzzi has written eloquently about this problem. Finding the right sweet spot between the two extremes is what’s necessary, but in the absence of an outside mechanism that balance is difficult to achieve for open source peer-producing groups.fish

If we would like to remedy this situation, how could we offer incentives to peer producing communities to more often give radical rather than incremental innovation a try? What could be the mechanism that takes on the role of venture capitalists and skunk works in the peer production context?

It surely isn’t telling dissenters with a radically new idea to “fork out” of a project. That’s like asking a commercial design group to leave the company and try on their own, but without providing them with enough resources or incentives. Not a good idea if we want to make radical innovation – the experimentation with revolutionary rather than incremental ideas – easier, not harder.

But what is the venture capital/skunk works equivalent in the peer-producing world?

A few thoughts come to mind, but I invite you to add your ideas, because I may not be thinking radically enough.

(1) User: Users, from large to small, could volunteer,  perhaps through a website, to dedicate some modicum of their time to advancing an open source project not by contributing to its design, but by committing to being first adopters of more radical design solutions. One may imagine a website that helps link users (including law firms) willing to dedicate some “risk” to such riskier open source peer produced projects, perhaps on a sectoral basis (Could this be yet another mission for the LII?).

(2) Designers: Quite a number of corporations and organizations explicitly support open source peer producing projects, mostly by dedicating some of their human resources to improving the code base. These organizations could, if they wanted to improve the capability of such projects to push for more radical innovation, set up incentives for employees to select riskier projects.

(3) Tools: The very tools used to organize peer production of software code already offer many techniques for managing a diverse array of contributors. These tools could be altered to evaluate the a group’s level of diversity and willingness to take risks, based on the findings of social network theory. Such an approach would at least provide the community with a sense of its potential and propensity for radical innovation, and could help group organizers in influencing group composition and group dynamics.  (Yes, this is “data.gov” and the government IT dashboards applied to this context.)

These are nothing more than a few ideas.  Many more are necessary to identify the best ones to implement. But given the rise and importance of peer production, and the constraints inherent in how it is organizing itself, the conversation about how to best provide incentives for radical innovation in the legal information context - and beyond - is one we must have.

[NB:  What do you all think?  How does this apply to the world of legal information, and to specialized software applications that support it — things like point-in-time legislative systems, specialized processing tools, and so on?  Comments please…. (the ed.)]

ViktorViktor Mayer-Schönberger is Associate Professor of Public Policy and Director of the Information + Innovation Policy Research Centre at the LKY School of Public Policy / National University of Singapore. He is also a faculty affiliate of the Belfer Center of Science and International Affairs at Harvard University. He has published many books, most recently “Delete - The Virtue of Forgetting in the Digital Age.”He is a frequent public speaker, and sought expert for print and broadcast media worldwide. He is also on the boards of numerous foundations, think tanks and organizations focused on studying the foundations of the new economy, and advises governments, businesses and NGOs on new economy and information society issues.  In his spare time, he likes to travel, go to the movies, and learn about architecture.

VoxPopuLII is edited by Judith Pratt.

Bookmark and Share

Pushing the envelope: Innovation in legal search

puzzle

To take the words of Walt Whitman, when it comes to improving legal information retrieval (IR), lawyers, legal librarians and informaticians are all to some extent, “Both in and out of the game, and watching and wondering at it“. The reason is that each group holds only a piece of the solution to the puzzle, and as pointed out in an earlier post, they’re talking past each other.

In addition, there appears to be a conservative contingent in each group who actively hinder the kind of cooperation that could generate real progress: lawyers do not always take up technical advances when they are made available, thus discouraging further research, legal librarians cling to indexing when all modern search technologies use free-text search, and informaticians are frequently impatient with, and misunderstand, the needs of legal professionals.

What’s holding progress back?

At root, what has held legal IR back may be the lack of cross-training of professionals in law and informatics, although I’m impressed with the open-mindedness I observe at law and artificial intelligence conferences, and there are some who are breaking out of their comfort zone and neat disciplinary boundaries to address issues in legal informatics.

I recently came back from a visit to the National Institute of Informatics in Japan where I met Ken Satoh, a logician who, late in his professional career, has just graduated from law school. This is not just hard work. I believe it takes a great deal of character for a seasoned academic to maintain students’ respect when they are his seniors in a secondary degree. But the end result is worth it: a lab with an exemplary balance of lawyers and computer scientists, experience and enthusiasm, pulling side-by-side.

Still, I occasionally get the feeling we’re all hoping for some sort of miracle to deliver us from the current predicament posed by the explosion of legal information. Legal professionals hope to be saved by technical wizardry, and informaticians like myself are waiting for data provision, methodical legal feedback on system requirements and performance, and in some cases research funding. In other words, we all want someone other than ourselves to get the ball rolling.

Miracle Occurs

The need to evaluate

Take for example, the lack of large corpora for study, which is one of the biggest stumbling blocks in informatics. Both IR and natural language processing (NLP) currently thrive on experimentation with vast amounts of data, which is used in statistical processing. More data means better statistical estimates and the fewer `guesses’ at relevant probabilities. Even commercial legal case retrieval systems, which give the appearance of being Boolean, use statistics and have done so for around 15 years. (They are based on inference networks that simulate Boolean retrieval with weighted indexing by reducing the rigidness associated with conditional probability estimates for Boolean operators `and’, `or’ and `not’. In this way, document ranking increasingly depends on the number of query constraints met).

The problem is that to evaluate new techniques in IR (and thus improve the field), you need not only a corpus of documents to search but also a sample of legal queries and a list of all the relevant documents in response to those queries that exist in your corpus, perhaps even with some indication of how relevant they are. This is not easy to come by. In theory a lot of case data is publicly available, but accumulating and cleaning legal text downloaded from the internet, making it amenable to search, is nothing short of tortuous. Then relevance judgments must be given by legal professionals, which is difficult given that we are talking about a community of people who charge their time by the hour.

Of course, the cooperation of commercial search providers, who own both the data and training sets with relevance judgments, would make everyone’s life much easier, but for obvious commercial reasons they keep their data to themselves.

To see the productive effects of a good data set we need only look at the research boom now occurring in e-discovery (discovery of electronic evidence, or DESI). In 2006 the TREC Legal Track, including a large evaluation corpus, was established in response to the number of trials requiring e-discovery: 75% of Fortune 500 company trials, with more than 90% of company information now stored electronically. This has generated so much interest that an annual DESI workshop has been established since 2007.

Qualitative evaluation of IR performance by legal professionals is an alternative to the quantitative evaluation usually applied in informatics. The development of new ways to visualize and browse results seems particularly well suited to this approach, where we want to know whether users perceive new interfaces to be genuine improvements. Considering the history of legal IR, qualitative evaluation may be as important as traditional IR evaluation metrics of precision and recall. (Precision is the number of relevant items retrieved out of the total number of items retrieved, and recall is the number of relevant items retrieved out of the total number of relevant items in a collection). However, it should not be the sole basis for evaluation.

A well-known study by Blair and Maron makes this point plain. The authors showed that expert legal researchers retrieve less than 20% of relevant documents when they believe they have found over 75%. In other words, even experts can be very poor judges of retrieval performance.

Context in legal retrieval

ParadigmShift

Setting this aside, where do we go from here? Dan Dabney has argued at the American Association of Law Libraries (AALL) 2005 Annual Meeting that free text search decontextualizes information, and he is right. One thing to notice about current methods in open domain IR, including vector space models, probabilistic models and language models, is that the only context they are taking into account is proximate terms (phrases). At heart, they treat all terms as independent.

However, it’s risky to conclude what was reported from the same meeting: “Using indexes improves accuracy, eliminates false positive results, and leads to completion in ways that full-text searching simply cannot.” I would be interested to know if this continues to be a general perception amongst legal librarians despite a more recent emphasis on innovating with technologies that don’t encroach upon the sacred ground of indexing. Perhaps there’s a misconception that capitalizing on full-text search methods would necessarily replace the use of index terms. This isn’t the case; inference networks used in commercial legal IR are not applied in the open domain, and one of their advantages is that they can incorporate any number of types of evidence.

Specifically, index numbers, terms, phrases, citations, topics and any other desired information are treated as representation nodes in a directed acyclic graph (the network). This graph is used to estimate the probability of a user’s information need being met given a document.

For the time being lawyers, unaware of technology under the hood, default to using inference networks in a way that is familiar, via a search interface that easily incorporates index terms and looks like a Boolean search. (Inference nets are not Boolean but they can be made to behave in the same way.) While Boolean search does tend to be more precise than other methods, the more data there is to search the less well the system performs. Further, it’s not all about precision. Recall of relevant documents is also important and this can be a weak point for Boolean retrieval. Eliminating false positives is no accolade when true positives are eliminated at the same time.

Since the current predicament is an explosion of data, arguing for indexing by contrasting it with full-text retrieval without considering how they might work together seems counterproductive.

Perhaps instead we should be looking at revamping legal reliance on a Boolean-style interface so that we can make better use of full-text search. This will be difficult. Lawyers who are charged, and charge, per search, must be able to demonstrate the value of each search to clients; they can’t afford the iterative nature of what characterizes open domain browsing. Further, if the intelligence is inside the retrieval system, rather than held by legal researchers in the form of knowledge about how to put complex queries together, how are search costs justified? Although Boolean queries are no longer well-adapted, at least value is easy to demonstrate. A push towards free-text search by either legal professionals or commercial search providers will demand a rethink of billing structures.

Given our current systems, are there incremental ways we can improve results from full-text search? Query expansion is a natural consideration and incidentally overlaps with much of the technology underlying graphical means of data exploration such as word clouds and wonderwheels; the difference is that query expansion goes on behind the scenes, whereas in graphical methods the user is allowed to control the process. Query expansion helps the user find terms they hadn’t thought of, but this doesn’t help with the decontextualization problem identified by Dabney; it simply adds more independent terms or phrases.

In order to contextualize information we can marry search using text terms and index numbers as is already applied. Even better would be to do some linguistic analysis of a query to really narrow down the situations in which we want terms to appear. In this way we might get at questions such as “What happened in a case?” or “Why did it happen?” rather than just, “What is this document about?”.

Language processing and IR

Use of linguistic information in IR isn’t a novel idea. In the 1980s, IR researchers started to think about incorporating NLP as an intrinsic part of retrieval. Many of the early approaches attempted to use syntactic information for improving term indexing or weighting. For example, Fagan improved performance by applying syntactic rules to extract similar phrases from queries and documents and then using them for direct matching, but it was held that this was comparable to a less complex, and therefore preferable, statistical approach to language analysis. In fact, Fagan’s work demonstrated early on what is now generally accepted: statistical methods that do not assume any knowledge of word meaning or linguistic role are surprisingly (some would say depressingly) hard to beat for retrieval performance.

Since then there have been a number of attempts to incorporate NLP in IR, but depending on the task involved, there can be a lack of highly accurate methods for automatic linguistic analysis of text that are also robust enough to handle unexpected and complex language constructions. (There are exceptions, for example, part-of-speech tagging is highly accurate.) The result is that improved retrieval performance is often offset by negative effects, resulting in a minimal positive, or even a negative impact on overall performance. This makes NLP techniques not worth the price of additional computational overheads in time and data storage.

However, just because the task isn’t easy doesn’t mean we should give up. Researchers, including myself, are looking afresh at ways to incorporate NLP into IR. This is being encouraged by organizations such as the NII Test Collection for IR Systems Project (NTCIR), who from 2003 to 2006 amassed excellent test and training data for patent retrieval with corpora in Japanese and English and queries in five languages. Their focus has recently shifted towards NLP tasks associated with retrieval, such as patent data mining and classification. Further, their corpora enable study of cross-language retrieval issues that become important in e-discovery since only a minority fraction of a global corporation’s electronic information will be in English.

We stand on the threshold of what will be a period of rapid innovation in legal search driven by the integration of existing knowledge bases with emerging full-text processing technologies. Let’s explore the options.

Tamsin MaxwellK. Tamsin Maxwell is a PhD candidate in informatics at the
University of Edinburgh, focusing on information retrieval in law. She
has a MSc in cognitive science and natural language engineering, and
has given guest talks at the University of Bologna, UMass Amherst, and
NAIST in Japan. Her areas of interest include text processing, machine
learning, information retrieval, data mining and information
extraction. Her passion outside academia is Arabic dance.

VoxPopuLII is edited by Judith Pratt.

Bookmark and Share

If the mountain will not come to the prophet, the prophet will go to the mountain

Within the field of legal informatics, discussions often focus on the technical and methodological questions of access to legal information. The topics can range from classification of legal documents to conceptual retrieval methods and Automatic Detection of Argumentation in Legal Cases. Researchers and businesses try to increase both precision and recall in order to improve search results for lawyers, while public administrations open up the process of legislating for the benefit of democracy and openness. Where are, however, the benefits for laypersons not familiar with retrieving legal information? Does clustering of legal documents, for example, yield a legal text any more understandable for a citizen?

To answer these questions, I would like to go back to the beginning, the purpose of law. Unfortunately for us lawyers, law is not created for us, but to serve as the oil that keeps society running smoothly. One can imagine two scenarios to apply the oil: If the motor has not been taken care of sufficiently, some extra greasy oil might be necessary to get it running again (i.e. if all amicable solutions are exhausted, some sort of dispute resolution is required), this would be the retroactive approach. The other possible application is to add enough oil during driving, so the engine will continue running smoothly without any additional boost, in other words trying to avoid disputes, this would be the proactive line of thinking.

How can proactive law work for the citizens? The basic assumption would be that in order to avoid disputes, one has to be aware of possible legal risks and how to prevent them. In line with the position of the European Union, we can further assume that the assessment and evaluation of risks requires relevant information about the legal facts at hand. It is only possible for a citizen to reach a decision regarding, for example, social benefits or certain rights as an employee, if she or he is aware of the various legal rights and obligations as well as possible legal outcomes.

Having stipulated that legal information is the core requirement for being able to exercise one’s rights as a citizen, the next questions would include which type of information is actually necessary, who should be responsible to communicating it and how it should be provided. These questions I would like to discuss below.  That is, we will talk about why, what, who and how.

Why?

ignorance

Before we move on to the main theme at hand on access to legal information, I would like to highlight a few more things about the why. As already mentioned, and as many legal philosophers have noted, law is the clockwork that makes society click. The principle Ignorantia juris neminem excusat (Ignorance of the law is no excuse) is commonly accepted as one of the foundations of modern civilization. But how would we define ignorance in today’s world? What if a citizen has troubles finding the necessary information despite endless efforts? What if she or he, after finding the relevant information, is not able to understand it? Does this mean she or he is still ignorant?

Public access to legal information is also a question of democracy, because citizens’ insight into politics, governmental work and the lawmaking process is a necessary prerequisite for public trust in the legislative body.

“In shifting from infrastructure to integration and then to transformation, a more holistic framework of connected governance is required. Such a framework recognizes the networking presence of e-government as both an internal driver of transformation within the public sector and an external driver of societal learning and collective adaptation for the jurisdiction as a whole.” (UN e-Government Survey 2008)

In this spirit, governments should consider the management of knowledge an increasing importance. “The essence of knowledge management (KM) is to provide strategies to get the right knowledge to the right people at the right time and in the right format.” (UN e-Government Survey 2008) What, then, is the right knowledge?

What?

The term legal information is as obvious as the word law. It is both apparent and imprecise, and yet we use it rather often. Several scholars have tried to define legal information and legal knowledge, inter alia, Peter Wahlgren in 1992, Erich Schweighofer in 1999, and Robert Richards in 2009.

books

If we consider the term from a layperson’s perspective, one could define it as the data, the facts and figures, that are necessary to solve an issue–one that cannot be handled amicably–between two persons (either legal or physical). In order for a layperson to be able to utilize legal information she or he has to be able to access, read, understand and apply the information.

The accessing element is one of the tasks that legal information institutes fulfill so elegantly. The term “reading” is here to be understood as information that can be grasped either with one’s eyes or ears. The complexity begins when it comes to understanding and applying the information. A layperson might have difficulties understanding and applying the Act on income tax even though the law is accessible and readable.

Is this information then still legal information if we assume that the word “information” means that somebody can receive certain signs and data and use this data meaningful in order to increase her or his knowledge? “Knowledge and information […] influence in a reciprocal way. Information modifies knowledge and knowledge guides potential use of information.” (Schweighofer)

If a layperson does not understand the information provided by official sources, she or he might refer to other information sources, for example by utilizing a Google search. In this case, the question arises how reliable the retrieved information is, however comprehensible. A high ranking in Google search does not automatically relate to high quality of the information even though this might be a common misconception, especially for laypersons not trained in source criticism. Here the importance of providing citizens with some basic and comprehensible information becomes apparent.

This comprehensible information might include more than plain text-based legislation and court decisions. Of interest for the layperson (both in business-consumer as well as government-citizen situations) can furthermore be, inter alia,

  • additional requirements according to terms and conditions or specific procedural rules in public administrations
  • possible legal outcomes and necessary facts that lead to them
  • estimated time of delivery of the product or the decision
  • creditability of the business, including the amount of pending cases before the courts or complaints before the consumer protection authorities.

For a citizen it might also be very significant to know how she or he could behave differently in order to reach a desired result. Typically, citizens are only provided with the information as to how the legal situation is, but not what they could do to improve it, unless they contact a lawyer.

Commonly all these types of data already exist, if maybe not in one location. The most – technically – accessible information are traditional legal sources, such as legislation and case-law. Again, here the question mainly focuses on how to provide and utilize the existing information in a fashion understandable to the user.

“Like any other content transmitted through a communication system, primary legal sources can be rendered more or less understandable, locatable, and hence effective by structuring and presenting them differently for different audiences. And secondary sources must of course be constructed for a particular market, audience, or level of understanding. “(Tom Bruce)

Who should then be responsible for structuring, presenting and rendering it understandable, especially in the light of source criticism and trust?

Who?

Ignorantia juris neminem excusat presupposes that the legal information provided is correct and of high quality. Who can guarantee such a quality? The state, private entities, research facilities, non-governmental organizations or citizens? My answer would be that all could contribute their part of the game.

One should, however, keep in mind, that user-friendliness is not the same as trustworthiness, which leads to the question of how to ensure that citizens are supplied with the right answers? In a world where even governments do not always take responsibility for the correctness of the provided information, such as in the case of online publications for law gazettes, the question remains who, or what entity, should be held liable for the accuracy of its services. But even if a public authority would sustain accountability, to what extent could that influence an already reached legal decision?

The answer of who should provide a certain legal information service could also depend on who the target group of the information is.

“The legal information market is really no longer conceivable as bipolar – it can no longer be seen as a question of lawyers on the one hand versus a largely legally ignorant everyone else on the other. […] Internet-based legal information systems are used by many cases and conditions of people for many different reasons. […] Probably the most interesting group [are] non-lawyer professionals. These are people whose interest in law is vital, ongoing, and professional rather than either being casual and hobby-like or sporadic and trauma-driven. […] Such new and diverse audiences require new and diverse legal information architectures. They will want specialized collections of law of particular relevance to them. They will want those collections organized and presented in ways that reflect their profession or their situation, in ways that collections organized according to the legal abstractions and legal terms in use by lawyers do not. They are concerned with situations and fact-patterns rather than theories, doctrines, and concepts. They are, in short, a very intelligent and exciting type of lay users, and a potentially enormous audience. ‘(Tom Bruce)

Non-lawyer professionals probably constitute a large market for businesses that can tailor their services to a specific group and therefore render them profitable, as the services are considered of value for these professionals.

Traditional laypersons, however, typically do not represent a large market power simply because they will not always be willing to pay for services of this kind. This leaves them to the hands of other stakeholders such as public administrations, research institutes, non-governmental organizations and private initiatives. As already mentioned, conventionally the raw data is supplied by public administrations.  The question, then, is how to deliver it to the end-user.

How?

The Austrian civil code knows two concepts regarding fulfilling one’s part of the contract, Holschuld and Bringschuld. Holschuld means a debt to be collected from the debtor at his residence. Bringschuld constitutes an obligation to be performed at creditor’s habitual residence. In today’s terminology, one could compare Holschuld with pull technology and Bringschuld with push technology. In other words, should the citizens pick up the relevant legal information or should the government actively deliver it at people’s doorsteps, so to speak?

delivery

In the offline paper world, the only way to reach a citizen was to send a letter to her or his house. Obviously, information technology offers many more possibilities when it comes to communicating with citizens, either via a computer or even a mobile phone, taking privacy concerns into consideration.

Several e-government and initiatives (video feed from European Parliament sessions and EU’s channel at Youtube) increase the public participation and insight into politics. While these programs are an important contribution to democracy, they typically do not facilitate daily encounters with legal issues of employment, family, consumer, taxes or housing, or provide citizens with the necessary information to do so.

In this respect, technologies enabling interactivity and re-use of public information are of greater importance, the latter also being a strategic concern of the European Union.  In particular, semantic technology offers solutions for transforming raw data into comprehensible information for citizens. Here, practical examples that utilize at least part of this technology can also be found within e-government projects as well as in private initiatives.

The next step would be law being built into the code already. Intelligent agents negotiate the most advantageous terms and conditions for their owner, cars prevent being switched on if the driver exceeds the permitted alcohol level (Ignition interlock device) and music songs do not play unless your device is authorized (iTunes).

So, from a technological point of view, anything from presenting legal information on a website to implementing law directly into the end device is possible. In practice, though, most governments are content with providing textual legal information, at best in a structured format so it can be re-used easier. The technical implementation of more advanced functions is often left to other market players and businesses.

There are two initiatives in this respect that are worth mentioning, one being a true private project in Sweden and the other one being provided by the Austrian government.

Lagen.nu (law now) has been around for some time now as a private initiative offering free access to Swedish legislation and case law. Recently the site was extended by adding commentaries to specific statutes, which should enable laypersons to understand certain legislation. The site includes explanations for certain terminology and particular comments are also categorized and include links to other laws and cases.

The other example, HELP, a service provided by the Austrian Government Agency, structures and presents legal information depending on the factual situation, e.g. it contains categories such as employment, housing, education, finances, family and social services. The relevant legal requirements are then explained in plain text and the responsible authority is listed and linked to.  In some cases the necessary procedure can even be initiated through the web site.

Both projects are fine examples of the possible transformation of legal information from pull to push technology. They are not quite there yet, though.

The answer

The question we are faced with now is not so much how or which technique would be the best, but rather in which situation a citizen might need certain legal information. Somebody trying to purchase a book via a web site might need information at that moment, and either as a warning text or a check list or its intelligent agent, the purchaser might go to another web site that has better ratings and more favorable legal terms and conditions and no pending law suits. In some other cases, the citizen might need certain information in a specific situation right at the spot.  For example, while filling out a form she or he might want to know what would be most favorable choice, rather than simply the type of personal data required for the form. Depending on the situation, different approaches might be more valuable than others.

The larger issue at hand is where the information is retrieved and who is the provider of the information. In other words, trust is an important factor, particularly trust of the information provider. As previously stated, legal information is not usually provided by public bodies but instead is rerouted through various other entities, such as businesses, organizations and individual efforts. This increases the importance of source criticism even more.

In many cases citizens will use general portals such as Google or Wikipedia to search for information, rather than going directly to the source, most often because citizens are not aware of the services offered. This underlines the importance for legal information providers to co-operate with other communication channels in order to increase their visibility.

The necessary legal information is out there, it just remains to be seen if and how it reaches the citizens. Or to put it in other words: The prophet still has to come to the mountain, but in time, with the increasing use of technology, maybe the mountain will come a bit closer.

ChristineKirchberger

Christine Kirchberger has been a junior lecturer at the Swedish Law and Informatics Research Institute, Stockholm University) since 2001. Besides teaching law and IT she is currently writing her PhD thesis on Legal information as a tool where she focuses on legal information retrieval, the concept of legal information within the framework of the doctrine of legal sources and also takes a look at the information-seeking behavior of lawyers.

VoxPopuLII is edited by Judith Pratt.

Bookmark and Share

The Recipe for Better Legal Information Services

A new style of legal research

An attorney/author in Baltimore is writing an article about state bans of teachers’ religious clothing. She finds one of the tersely written statutes online. The website then does a query of its own and tells her about a useful statute she wasn’t aware of—one setting out the permitted disciplinary actions. When she views it, the site makes the connection clear by showing her the where the second statute references the original. This new information makes her article’s thesis stronger.Recipe card

Meanwhile, 2800 miles away in Oregon, a law student is researching the relationship between the civil and criminal state codes. Browsing a research site, he notices a pattern of civil laws making use of the criminal code, often to enact civil punishments or enable adverse actions. He then engages the website in an interactive text-based dialog, modifying his queries as he considers the previous results. He finally arrives at an interesting discovery: the offenses with the least additional civil burdens are white collar crimes.

A new kind of research system

A new field of computer-assisted legal research is emerging: one that encompasses research in both the academic and the practical “legal research” senses. The two scenarios above both took place earlier this year, enabled by the OregonLaws.org research system that I created and which typifies these new developments.

Interestingly, this kind of work is very recent; it’s distinct from previous uses of computers for researching the law and assisting with legal work. In the past, techniques drawn from computer science have been most often applied to areas such as document management, court administration, and inter-jurisdiction communication. Working to improve administrative systems’ efficiency, people have approached these problem domains through the development of common document formats and methods of data interchange.

The new trend, in contrast, looks in the opposite direction: divergently tackling new problems as opposed to convergently working towards a few focused goals. This organic type of development is occurring because programming and computer science research is vastly cheaper—and much more fun—than it has ever been in the past. Here are a couple of examples of this new trend:

“Computer Programming and the Law”

Law professor Paul Ohm recently wrote a proposal for a new “interdisciplinary research agenda” which he calls “Computer Programming and the Law.” (The law review article is itself also a functioning computer program, written in the literate programming style.) He envisions “researcher-programmers,” enabled by the steadily declining cost of computer-aided research, using computers in revolutionary ways for empirical legal scholarship. He illustrates four new methods for this kind of research: developing computer programs to “gather, create, visualize, and mine data” that can be found in diverse and far-flung sources.

“Computational Legal Studies”

Grad students Daniel Katz and Michael Bommarito (researcher-programmers, as Paul Ohm would call them) created the Computational Legal Studies Blog in March, 2009. The web site is a growing collection of visualization applied to diverse legal and policy issues. The site is part showcase for the authors’ own work and part catalog of the current work of others.

OregonLaws.org

I started the OregonLaws.org project because I wanted faster and and easier access to the 2007 Oregon Revised Statutes (ORS) and other primary and secondary sources. I had a couple of very statute-heavy courses (Wills & Trusts, and Criminal Law) and I frequently needed to quickly find an ORS section. But as I got further into the development, I realized that it could become a platform for experimenting with computational analysis of legal information, similar to the work being done on the Computational Legal Studies Blog.

I developed the system using pretty much the the steps that Paul Ohm discussed:

  1. Gathering data: I downloaded and cleaned up the ORS source documents, converting them from MS Word/HTML to plain text;
  2. Creating: I parsed the texts, creating a database model reflecting the taxonomy of the ORS: Volumes, Titles, Chapters, etc.;
  3. Creating: I created higher-level database entities based on insights into the documents. For example, by modeling textual references between sections explicitly as reference objects which capture a relationship between a referrer and a referent, and;
  4. Mining and Visualizing: Finally, I’ve begun making web-based views of these newly found objects and relationships.Object Model

The object database is the key to intelligent research

By taking the time to go through the steps listed above, powerful new features can be created. Below are some additions to the features described in the introductory scenarios:

We can search smarter. In a previous VoxPopulii post, Julie Jones advocates dropping our usual search methods, and applying techniques like subject-based indexing (a la Factiva’s) to legal content.

This is straightforward to implement with an object model. The Oregon Legislature created the ORS with a conceptual structure similar to most states:  The actual content is found in Sections.  These are grouped into Chapters, which are in turn grouped into Titles.  I was impressed by the organization and the architecture that I was discovering—insights that are obscured by the ways statutes are traditionally presented.

search-filter.png

And so I sought out ways to make use of the legislature’s efforts whenever it made sense.  In the case of search results, the Title organization and naming were extremely useful.  Each Section returned by the search engine “knows” what Chapter and Title it belongs to. A small piece of code can then calculate what Titles are represented in the results, and how frequently. The resulting bar graph doubles as an easy way for users to specify filtering by “subject area”. The screenshot above shows a search for forest.

The ORS’s framework of Volumes, Titles, and Chapters was essentially a tag cloud waiting to be discovered.

We can get better authentication. In another VoxPopulii post, John Joergensen discussed the need for authentication of digital resources. One aspect of this is showing the user the chain of custody from the original source to the current presentation. His ideas about using digital signatures are excellent: a scenario of being able to verify an electronic document’s legitimacy with complete assurance.

glossary-citations.png

We can get a good start towards this goal by explicitly modeling content sources. A source is given attributes for everything we’d want to know to create a citation; date last accessed, URL available at, etc.  Every content object in the database is linked to one of these source objects.  Now, every time we display a document, we can create properly formatted citations to the original sources.

The gather/create/mine/visualize and object-based approaches open up so many new possibilities, they can’t all be discussed in one short article. It sometimes seems that each new step taken enables previously unforeseen features. A few these others are new documents created by re-sorting and aggregating content, web service APIs, and extra annotations that enhance clarity. I believe that in the end, the biggest accomplishment of projects like this will be to raise our expectations for electronic legal research services, increase their quality, and lower their cost.

Robb ShecterRobb Shecter is a software engineer and third year law student at Lewis & Clark Law School in  Portland, Oregon.   He is Managing Editor for the Animal Law Review, plays jazz bass, and has published articles in Linux Journal, Dr. Dobbs Journal, and Java Report.

VoxPopuLII is edited by Judith Pratt.

Bookmark and Share

Next Page »




Bad Behavior has blocked 152 access attempts in the last 7 days.

FireStats icon Powered by FireStats