skip navigation
search

In May of this year, HeinOnline began taking a new approach to legal research, offering researchers the ability to search or browse varying types of legal research material all related to a specialized area of law in one database. We introduced this concept as a new legal research platform with the release of World Constitutions Illustrated: Contemporary & Historical Documents & Resources, which we’ll discuss in further detail later on in this post. First, we must take a brief look at how HeinOnline started and where it is going. Then, we will continue on by looking at the scope of the new platform and how it is being implemented across HeinOnline’s newest library modules.

This is how we started…
Traditionally, HeinOnline libraries featured one title or a single type of legal research material. For example, the Law Journal Library, HeinOnline’s largest and most used database, contains law and law-related periodicals. The Federal Register Library contains the Federal Register dating back to inception, with select supporting resources. The U.S. Statutes at Large Library contains the U.S. Statutes at Large volumes dating back to inception, with select supporting resources.

WhereBeen

This is where we are going…
The new subject-specific legal research platform, introduced earlier this year, has shifted from that traditional approach to a more dynamic approach of offering research libraries focused on a subject area, versus a single title or resource. This platform combines primary and secondary resources, books, law review articles, periodicals, government documents, historical documents, bibliographic references and other supporting resources all related to the same area of law, into one database, thus providing researchers one central place to find what they need.

WhereWeAreGoing

How is this platform being implemented?
In May, HeinOnline introduced the platform with the release of a new library called World Constitutions Illustrated: Contemporary & Historical Documents & Resources. The platform has since been implemented in every new library that HeinOnline has released including History of Bankruptcy: Taxation & Economic Reform in America, Part III and Intellectual Property Law Collection.

Pilot project: World Constitutions Illustrated
First, let’s look at the pilot project, World Constitutions Illustrated. Our goal when releasing this new library was to present legal researchers with a different scope than what is currently available for those studying constitutional law and political science. To achieve this, the library was built upon the new legal research platform, which brings together: constitutional documents, both current and historical; secondary sources such as the CIA’s World Fact Book, Modern Legal Systems Cyclopedia, the Library of Congress’s Country Studies and British and Foreign State Papers; books; law review articles; bibliographies; and links to external resources on the Web that directly relate to the political and historical development of each country. By presenting the information in this format, researchers no longer have to visit multiple Web sites or pull multiple sources to obtain the documentary history of the development of a country’s constitution and government.

Inside the interface, every country has a dedicated resource page that includes the Constitutions and Fundamental Laws, Commentaries & Other Relevant Sources, Scholarly Articles Chosen by Our Editors, a Bibliography of Select Constitutional Books, External Links, and a news feed. Let’s take a look at France.

France

Constitutions & Fundamental Laws
France has a significant hierarchy of constitutional documents from the current constitution as amended to 2008 all the way back to the Declaration of the Rights of Man and of the Citizen promulgated in 1789. Within the hierarchy of documents, one can find consolidated texts, amending laws, and the original text in multiple languages when translations are available.

FranceConstitutions

Commentaries & Other Relevant Sources
Researchers will find more than 100 commentaries and other relevant sources of information related to the development of the government of France and the French Constitution. These sources include secondary source books and classic constitutional law books. To further connect these sources to the French Constitution, our Editors have reviewed each source book and classic constitutional book and linked researchers to the specific chapters or sections of the works that directly relate to the study of the French Constitution. For example, the work titled American Nation: A History, by Albert Bushnell Hart, has direct links to chapters from within volumes 11 and 13, each of which discusses and relates to the development of the French government.

Commentaries

Scholarly Articles Chosen by Our Editors
This section features more than 40 links to scholarly articles from HeinOnline’s Law Journal Library that are directly related to the study of the French Constitution and the development of the government of France. The Editors hand-selected and included these articles from the thousands of articles in the Law Journal Library due to their significance and relation to the constitutional and political development of the nation. When browsing the list of articles, one will also find Hein’s ScholarCheck integrated, which allows a researcher to view other law review articles that cite that specific article. In order for researchers to access the law review articles, they must be subscribed to the Law Journal Library.

ScholarlyArticles

Bibliography of Select Constitutional Books
There are thousands of books related to constitutional law. Our Editors have gone through an extensive list of these resources and hand-selected books relevant to the constitutional development of each country. The selections are presented as a bibliography within each country. France has nearly 100 bibliographic references. Many bibliographic references also contain the ISBN which links to WorldCat, allowing researchers to find the work in a nearby library.

Bibliography

External Links
External links are also selected by the Editors as they are developing the constitutional hierarchies for each country. If there are significant online resources available that support the study of the constitution or the country’s political development, the links are included on the country page.

ExternalLinks

News Feeds
The last component on each country’s page is a news feed featuring recent articles about the country’s constitution. The news feed is powered by a Google RSS news feed and researchers can easily use the RSS icon to add it to their own RSS readers.

NewsFeed

In addition to the significant and comprehensive coverage of every country in World Constitutions Illustrated, the collection also features an abundance of material related to the study of constitutional law at a higher level. This makes it useful for those researching more general or regional constitutional topics.

Searching capabilities on the new platform
To further enhance the capabilities of this platform, researchers are presented with a comprehensive search tool that allows one to search the documents and books by a number of metadata points including the document date, promulgated date, document source, title, and author. For researchers studying the current constitution, the search can be narrowed to include just the current documents that make up the constitution for a country. Furthermore, a search can be generated across all the documents, classic books, or reference books for a specific country, or it can be narrower in scope to include a specific type of resource. After a search is generated, researchers will receive faceted search results, allowing them to quickly and easily drill down their results set by using facets including document type, date, country, and title.

ConstitutionSearch

Contributing to the project
An underlying concept behind the new legal research platform is encouraging legal scholars, law libraries, subject area experts, and other professionals to contribute to the project. HeinOnline wants to work with scholars and libraries from all around the world to continue to build upon the collection and to continue developing the constitutional timelines for every country. Several libraries and scholars from around the world have already contributed constitutional works from their libraries to World Constitutions Illustrated.

Extending the platform beyond the pilot project
As mentioned earlier, this platform has been implemented in every new library that HeinOnline has released including History of Bankruptcy: Taxation & Economic Reform in America, Part III and Intellectual Property Law Collection. Therefore, it’s necessary to briefly take a moment to look at these two libraries.

History of Bankruptcy: Taxation & Economic Reform in America, Part III
The History of Bankruptcy library includes more than 172,000 pages of legislative histories, treatises, documents and more, all related to bankruptcy law in America. The primary resources in this library are the legislative histories, which can be browsed by title, public law number, or popular name. Also included are classic books dating back to the late 1800’s and links to scholarly articles that were selected by our editors due to their significance to the study of bankruptcy law in America.

Bankruptcy

As with the searching capabilities presented in the World Constitutions Illustrated library, researchers can narrow a search by the type of resource, or search across everything in the library. After a search is generated, researchers will receive faceted search results, allowing them to quickly and easily drill down their results set by document type, date, or title.

banksearch.png

Intellectual Property Law Collection
The Intellectual Property Law Collection, released just over a month ago, features nearly 2 million pages of legal research material related to patents, trademarks, and copyrights in America. It includes more than 270 books, more than 100 legislative histories, links to more than 50 legal periodicals, federal agency documents, the Manual of Patent Examining Procedure, CFR Title 37, U.S. Code Titles 17 & 35, and links to scholarly articles chosen by our Editors, all related to intellectual property law in America.

IntellectualProperty

Furthermore, this library features a Google Patent Search widget that will allows researchers to search across more than 7 million patents made available to the public through an arrangement with Google and the United States Patent and Trademark Office.

GooglePatents

Searching in the Intellectual Property Law Collection allows researchers to search across all types of documents, or narrow a search to just books, legislative histories, or federal agency decisions, for example. After a search is generated, researchers will receive faceted search results, allowing them to quickly and easily drill down their results set by using facets including document type, date, country, or title.

SearchIP

HeinOnline is the modern link to legal history, and the new legal research platform bolsters this primary objective. The platform brings together the primary and secondary sources, other supporting documents, books, links to articles, periodicals, and links to other online sources, making it a central stop for researchers to begin their search for legal research material. The Editors have selected the books, articles, and sources that they deem significant to that area of the law. This is then presented in one database, making it easier for researchers to find what they need. With the tremendous growth of digital media and online sources, it can prove difficult for a researcher to quickly navigate to the most significant sources of information. HeinOnline’s goal is to make this navigation easier with the implementation of this new legal research platform.

BaranichMarcie Baranich is the Marketing Manager at William S. Hein & Co., Inc. and is responsible for the strategic marketing processes for both Hein’s traditional products and its growing online legal research database, HeinOnline. In addition to her Marketing role, she is also active in the product development, training and support areas for HeinOnline. She is an author of the HeinOnline Blog, Wiki, YouTube channel, Facebook, and Twitter pages, and manages the strategic use of these resources to communicate and assist users with their legal research needs.

VoxPopuLII is edited by Judith Pratt.

Editor-in-Chief is Robert Richards, to whom queries should be directed.

borgestotallibrary.jpgIn an extraordinary story, Jorge Luis Borges writes of a “Total Library”, organized into ‘hexagons’ that supposedly contained all books:

When it was proclaimed that the Library contained all books, the first impression was one of extravagant happiness. All men felt themselves to be the masters of an intact and secret treasure. . . . At that time a great deal was said about the Vindications: books of apology and prophecy which . . . [contained] prodigious arcana for [the] future. Thousands of the greedy abandoned their sweet native hexagons and rushed up the stairways, urged on by the vain intention of finding their Vindication. These pilgrims disputed in the narrow corridors . . . strangled each other on the divine stairways . . . . Others went mad. . . . The Vindications exist . . . but the searchers did not remember that the possibility of a man’s finding his Vindication, or some treacherous variation thereof, can be computed as zero.  As was natural, this inordinate hope was followed by an excessive depression. The certitude that some shelf in some hexagon held precious books and that these precious books were inaccessible, seemed almost intolerable.

About three years ago I spent almost an entire sleepless month coding OpenJudis - my rather cool, “first-of-its-kind” free online database of Indian Supreme Court cases. The database hosts the full texts of about 25,000 cases decided since 1950. In this post I embark on a somewhat personal reflection on the process of creating OpenJudis – what I learnt about access to law (in India), and about “legal informatics,” along with some meditations on future pathways.

Having, by now, attended my share of FLOSS events, I know it is the invariable tendency of anyone who’s written two lines of free code to consider themselves qualified to pronounce on lofty themes – the nature of freedom and liberty, the commodity, scarcity, etc. With OpenJudis, likewise, I feel like I’ve acquired the necessary license to inflict my theory of the world on hapless readers – such as those at VoxPopuLII!

I begin this post by describing the circumstances under which I began coding OpenJudis. This is followed by some of my reflections on how “legal informatics” relates to and could relate to law.

Online Access to Law in India
India is privileged to have quite a robust ICT architecture. Internet access is relatively India Cyber Cafeinexpensive, and the ubiquity of “cyber cafes” has resulted in extensive Internet penetration, even in the absence of individual subscriptions.

Government bodies at all levels are statutorily obliged to publish, on the Internet, vital information regarding their structure and functioning. The National Informatics Centre (NIC), a public sector corporation, is responsible for hosting, maintaining and updating the websites of government bodies across the country. These include, inter alia, the websites of the Union (federal) Government, the various state governments, union and state ministries, constitutional bodies such as the Election Commission and the Planning Commission, and regulatory bodies such as the Securities Exchange Board of India (SEBI). These websites typically host a wealth of useful information including, illustratively, the full texts of applicable legislations, subordinate legislations, administrative rulings, reports, census data, application forms etc.

The NIC has also been commissioned by the judiciary to develop websites for courts at various levels and publish decisions online. As a result, beginning in around the year 2000, the Supreme Court and various high courts have been publishing their decisions on their websites. The full texts of all Supreme Court decisions rendered since 1950 have been made available, which is an invaluable free resource for the public. Most High Court websites however, have not yet made archival material available online, so at present, access remains limited to decisions from the year 2000 onwards. More recently the NIC has begun setting up websites for subordinate courts, although this process is still at a very embryonic stage.

Apart from free government websites, a handful of commercial enterprises have been providing online access to legal materials. Among them, two deserve special mention. SCCOnline – a product of one of the leading law report publishers in India – provides access to the full texts of decisions of the Indian Supreme Court. The CD version of SCCOnline sells for about INR 70,000 (about US$1,500), which is around the same price the company charges for a full set of print volumes of its reporter. For an additional charge, the company offers updates to the database. The other major commercial venture in the field is Manupatra, which offers access to the full text of decisions of various courts and tribunals as well as the texts of legislation. Access is provided for a basic charge of about US$100, plus a charge of about US$1 per document downloaded. While seemingly modest by international standards, these charges are unaffordable by large sections of the legal profession and the lay public.

OpenJudis
In December 2006, I began coding OpenJudis. My reasons were purely selfish. While the full texts of the decisions of the Supreme Court were already available online for free, the search engine on the government website was unreliable and inadequate for (my) advanced research needs. The formatting of the text of cases themselves was untidy, and it was cumbersome to extract passages from them. Frequently, the website appeared overloaded with users, and alternate free sources were unavailable. I couldn’t afford any of the commercial databases. My own private dissatisfaction with the quality of service, coupled with (in retrospect) my completely naive optimism, led me to attempt OpenJudis. A third crucial factor on the input side was time, and a “room of my own,” which I could afford only because of a generous fellowship I had from the Open Society Institute.

I began rashly, by serially downloading the full texts of the 25,000 decisions on the India’s Supreme CourtSupreme Court website. Once that was done (it took about a week), I really had no notion of how to proceed. I remember being quite exhilarated by the sheer fact of being in possession of twenty five thousand Supreme Court decisions. I don’t think I can articulate the feeling very well. (I have some hope, however, that readers of this blog and my fellow LII-ers will intuitively understand this feeling.) Here I was, an average Joe poking around on the Internet, and just-like-that I now had an archive of 25,000 key documents of our republic,  cumulatively representing the articulations of some of the finest (and  some not-so-fine) legal minds of the previous half-century,  sitting on my laptop. And I could do anything with them.

The word “archive,” incidentally, as Derrida informs us, derives from the Greek arkheion, the residence of the superior magistrates, the archons – those who commanded. The archons both “held and signified political power,” and were considered to possess the right to both “make and represent the law.” “Entrusted to such archons, these documents in effect speak the law: they recall the law and call on or impose the law”. Surely, or I am much mistaken, a very significant transformation has occurred when ordinary citizens become capable of housing Return of the Archonsarchives – when citizens can assume the role of archons at will.

Giddy with power, I had an immediate impulse to find a way to transmit this feeling, to make it portable, to dissipate it – an impulse that will forever mystify economists wedded to “rational” incentive-based models of human behavior. I wasn’t a computer engineer, I didn’t have the foggiest idea how I’d go about it, but I was somehow going to host my own online free database of Indian Supreme Court cases. The audacity of this optimism bears out one of Yochai Benkler‘s insights about the changes wrought by the new “networked information economy” we inhabit. According to Benkler,

The belief that it is possible to make something valuable happen in the world, and the practice of actually acting on that belief, represent a qualitative improvement in the condition of individual freedom [because of NIE]. They mark the emergence of new practices of self-directed agency as a lived experience, going beyond mere formal permissibility and theoretical possibility.

Without my intending it, the archive itself suggested my next task. I had to clean up the text and extract metadata. This process occupied me for the longest time during the development of OpenJudis. I was very new to programming and had only just discovered the joys of Regular Expressions. More than my inexperience with programming techniques, however, it was the utter heterogeneity of reporting styles that took me a while to accustom myself to. Both opinion-writing and reporting styles had changed dramatically in the course of the fifty years my database covered, and this made it difficult to find patterns when extracting, say, the names of judges involved. Eventually, I had cleaned up the texts of the decisions and extracted an impressive (I thought) set of metadata, including the names of parties, the names of the judges, and the date the case was decided. To compensate for the absence of headnotes, I extracted names of statutes cited in the cases as a rough indicator of what their case might relate to. I did all this programming in PHP with the data housed in a MySQL database.

And then I encountered my first major roadblock that threatened to jeopardize the wholePunching Computer operation: I ran my first full-text Boolean search on the MySQL database and the results took a staggering 20 minutes to display. I was devastated! More elaborate searches took longer. Clearly, this was not a model I could host online. Or do anything useful with. Nobody in their right mind would want to wait 20 minutes for the results of their search. I had to look for a quicker database, or, as I eventually discovered, a super fast, lightweight indexing search engine. After a number of failed attempts with numerous free search engine software programs, none of which offered either the desired speed or the search capability I wanted, I was getting quite desperate. Fortunately, I discovered Swish-e, a lightweight, Perl-based Boolean search engine which was extremely fast and, most importantly, free – exactly what I needed. The final stage of creating the interface, uploading the database, and activating the search engine happened very quickly, and sometime in the early hours of December 22nd, 2006, OpenJudis went live. I sent announcement emails out to several e-groups and waited for the millions to show up at my doorstep.

They never did. After a week, I had maybe a hundred users. In a month, a few hundred. I received some very complimentary emails, which was nice, but it didn’t compensate for the failure of “millions” to show up. Over the next year, I added some improvements:
1) First, I built an automatic update feature that would periodically check the Supreme Court website for new cases and update the database on its own.
2) In October 2007, I coded a standalone MS Windows application of the database that could be installed on any system running Windows XP. This made sense in a country where PC penetration is higher than Internet penetration. The Windows application became quite popular and I received numerous requests for CDs from different corners of the country.
3) Around the same time, I also coded a similar application for decisions of the Central Information Commission – the apex statutory tribunal for adjudicating disputes under the Right to Information Act.
4) In February 2008, both applications were included in the DVD of Digit Magazine – a popular IT magazine in India.

Unfortunately, in August 2008, the Supreme Court website changed its design so that decisions could no longer be downloaded serially in the manner I had been accustomed to. One can only speculate about what prompted this change – since no improvements were made to the actual presentation of the cases. The only thing that changed was that one could no longer download cases serially as I’d been doing. The new format was far more difficult for me to “hack” and I abandoned the attempt. My work left me with no time to attempt to circumvent the new format.

Fortunately at the same time, an exciting new project called IndianKanoon was started by Sushant Sinha, an Indian computer science graduate at Michigan. In addition to decisions of the Supreme Court, his site covers several high courts and links up to the text of legislation of various kinds. Although I have not abandoned plans to develop OpenJudis, the presence of IndianKanoon has allowed me to step back entirely from this domain – secure in the knowledge that it is being taken forward by abler hands than mine.

Predictions, Observations, Conclusions
I’d like to end this already-too-long post with some reflections, randomly ordered, about legal information online.
1) I think one crucial area commonly neglected by most LIIs is client-side software that enables users to store local copies of entire databases. The urgency of this need is highlighted in the following hypothetical about digital libraries by Siva Vaidhyanathan (from The Anarchist in the Library):

So imagine this: An electronic journal is streamed into a library. A library Anarchist in Librarynever has it on its shelf, never owns a paper copy, can’t archive it for posterity. Its patrons can access the material and maybe print it, maybe not. But if the subscription runs out, if the library loses funding and has to cancel that subscription, or if the company itself goes out of business, all the material is gone. The library has no trace of what it bought: no record, no archive. It’s lost entirely.

It may be true that the Internet will be around for some time, but it might be worthwhile for LIIs to stop emulating the commercial database models of restricting control while enabling access. Only then can we begin to take seriously the task of empowering users into archons.

2) My second observation pertains to interface and usability. I have for long been planning to incorporate a set of features including tagging, highlighting, annotating, and bookmarking that I myself would most like to use. Additionally, I have been musing about using Web 2.0 to enable user-participation in maintenance and value-add operations – allowing users to proofread the text of judgments and to compose headnotes. At its most ambitious, in these “visions” of mine, OpenJudis looks like a combination of LII + social networking + Wikipedia.

A common objection to this model is that it would upset the authority of legal texts. In his brilliant essay A Brief History of the Internet from the 15th to the 18th century, the philosopher Lawrence Liang reminds us that the authority of knowledge that we today ascribe to printed text was contested for the longest period in modern history.

Far from ensuring fixity or authority, this early history of Printing was marked by uncertainty, and the constant refrain for a long time was that you could not rely on the book; a French scholar Adrien Baillet warned in 1685 that “the multitude of books which grows every day” would cast Europe into “a state as barbarous as that of the centuries that followed the fall of the Roman Empire.”

Europe’s non-descent into barbarism offers us a degree of comfort in dealing with Adrien Baillet-type arguments made in the context of legal information. The stability that we ascribe to law reports today is a relatively recent historical innovation that began in the mid-19th century. “Modern” law has longer roots than that.

3) While OpenJudis may look like quite a mammoth endeavor for one person, I was at all times intensely aware that this was by no means a solitary undertaking, and that I was “standing on the shoulders of giants.” They included the nameless thousands at the NIC who continue to design websites, scan and upload cases on the court websites – a Sisyphian task – and  the thousands whose labor collectively produced the free software I used : Fedora Core 4, PHP, MySQL, Swish-E. And lastly, the nameless millions who toil to make the physical infrastructure of the Internet itself possible. Like the ground beneath our feet, we take it for granted, even as the tragic recent events in Haiti in recent weeks remind us to be more attentive. (For a truly Herculean endeavor, however, see Sushant Sinha’s IndianKanoon website, about which many ballads may be composed in the decades to come.)

It might be worthwhile for the custodians of LIIs to enable users to become derivative producers themselves, to engage in “practices of self-directed agency” as Benkler suggests. Without sounding immodest, I think the real story of OpenJudis is how the Internet makes it plausible and thinkable for average Joes like me (and better-than-average people like Sushant Sinha) to think of waging unilateral wars against publishing empires.

4) So, what is the impact that all this ubiquitous, instant, free electronic access to legal information is likely to have on the world of law? In a series of lectures titled “Archive Fever,” the philosopher Derrida posed a similar question in a somewhat different context: What would the discipline of psychoanalysis have looked like, he asked, if Sigmund Freud and his contemporaries had had access to computers, televisions, and email? In brief, his answer was that the discipline of psychoanalysis itself would not have been the same – it would have been transformed “from the bottom up” and its very events would have been altered. This is because, in Derrida’s view:

The archive . . . in general is not only the place for stocking and for conserving an archivable content of the past. . . .  No, the technical structure of the archiving archive also determines the structure of the archivable content even in its coming into existence and in its relationship to the future. The archivization produces as much as it records the event.

The implication, following Derrida, is that in the past, law would not have been what itDerrida currently is if electronic archives had been possible. And the obverse is true as well:  in the future, because of the Internet, “rule of law” will no longer observe the logic of the stable trajectories suggested by its classical “analog” commentators. New trajectories will have to be charted.

5) In the same book, Derrida describes a condition he calls “Archive fever”:

It is to burn with a passion. It is never to rest, interminably, from searching for the archive right where it slips away. It is to run after the archive even if there’s too much of it. It is to have a compulsive, repetitive and nostalgic desire for the archive, an irrepressible desire to return to the origin, a homesickness, a nostalgia for the return to the most archaic place of absolute commencement.

I don’t know about other readers of VoxPopulII (if indeed you’ve managed to continue reading this far!), but for the longest time during and after OpenJudis, I suffered distinctively from this malady. I downloaded indiscriminately whole sets of data that still sit unused on my computer, not having made it into OpenJudis. For those in a similar predicament, I offer Borges’s quote with which I began this text, as a reminder of the foolishness of the notion of “Total Libraries.”

Prashant IyengarPrashant Iyengar is a lawyer affiliated with the Alternative Law Forum, Bangalore, India. He is currently pursuing his graduate studies at Columbia University in New York. He runs OpenJudis, a free database of Indian Supreme Court cases.

VoxPopuLII is edited by Judith Pratt. Editor in Chief is Rob Richards.

Printing pressIt’s tempting to begin any discussion of digital preservation and law libraries with a mind-blowing statistic. Something to drive home the fact that the clearly-defined world of information we’ve known since the invention of movable type has evolved into an ephemeral world of bits and bytes, that it’s expanding at a rate that makes it nearly impossible to contain, and that now is the time to invest in digital preservation efforts.

But, at this point, that’s an argument that you and I have already heard. As we begin the second decade of the 21st century, we know with certainty that the digital world is ubiquitous because we ourselves are part of it. Ours is a world where items posted on blogs are cited in landmark court decisions, a former governor and vice-presidential candidate posts her resignation speech and policy positions to Facebook, and a busy 21st-century president is attached at the thumb to his Blackberry.

Medieval imageWe have experienced an exhilarating renaissance in information, which, as many have asserted for more than a decade, is threatening to become a digital dark age due to technology obsolescence and other factors. There is no denying the urgent need for libraries to take on the task of preserving our digital heritage. Law libraries specifically have a critically important role to play in this undertaking. Access to legal and law-related information is a core underpinning of our democratic society. Every law librarian knows this to be true. (I believe it’s what drew us to the profession in the first place.)

Frankly speaking, our current digital preservation strategies and systems are imperfect – and they most likely will never be perfected. That’s because digital preservation is a field that will be in a constant state of change and flux for as long as technology continues to progress. Yet, tremendous strides have been made over the past decade to stave off the dreaded digital dark age, and libraries today have a number of viable tools, services, and best practices at our disposal for the preservation of digital content.

Law libraries and the preservation of born-digital content

In 2008, Dana Neacsu, a law librarian at Columbia University Law School, and I decided to explore the extent to which law libraries were actively involved in the preservation of born-digital legal materials. So, we conducted a survey of digital preservation activity and attitudes among state and academic law libraries.

We found an interesting incongruity among our respondent population of library directors who represented 21 law libraries: less than 7 percent of the digital preservation projects being planned or underway at our respondents’ libraries involved the preservation of born-digital materials. The remaining 93 percent involved the preservation of digital files created through the digitization of print or tangible originals. Yet, by a margin of 2 to 1, our respondents expressed that they believed born-digital materials to be in more urgent need of preservation than print materials.

This finding raises an interesting question: If law librarians (at least those represented among our respondents) believe born-digital materials to be in more urgent need of preservation, why were the majority of digital preservation resources being invested in the preservation of files resulting from digitization projects?

Start/finish lineI speculate that part of the problem is that we often don’t know where to start when it comes to preserving born-digital content. What needs to be preserved? What systems and formats should we use? How will we pay for it?

What needs to be preserved? A few thoughts…

PreservesDetermining what needs to be preserved is not as complicated as it may seem. The mechanisms for content selection and collection development that are already in place at most law libraries lend themselves nicely to prioritizing materials for digital preservation, as I have learned through the Georgetown Law Library’s involvement in The Chesapeake Project Legal Information Archive. A collaborative effort between Georgetown and partners at the State Law Libraries of Maryland and Virginia, The Chesapeake Project was established to preserve born-digital legal information published online and available via open-access URLs (as opposed to within subscription databases).

So, how did we approach selection for the digital archive? Within a broad, shared project collection scope (limited to materials that were law- or policy-related, digitally born, and published to the “free Web” per our Collection Plan) each library simply established its own digital archive selection priorities, based on its unique institutional mandates and the research needs of its users. Libraries have historically developed their various print collections in a similar manner.

The Maryland State Library focused on collecting documents relating to public-policy and legal issues affecting Maryland citizens. The Virginia State Library collected the online publications of the Supreme Court of Virginia and other entities within Virginia’s judicial branch of government. As an academic library, the Georgetown Law Library developed topical and thematic collection priorities based on research and educational areas of interest at the Georgetown University Law Center. (Previously, online materials selected for the Georgetown Law Library’s collection had been printed from the Web on acid-free paper, bound, cataloged, and shelved. Digital preservation offered an attractive alternative to this system.)

To build our topical digital archive collections, the Georgetown Law Library assembled a team of staff subject specialists to select content (akin to our collection development selection committee), and, to make things as simple as possible, submissions were made and managed using a Delicious bookmark account, which allowed our busy subject specialists to submit online content for preservation with only a few clicks.

Fair use has a posseAs a research library, we preserved information published to the free Web under a claim of fair use. Permission from copyright holders was sought only for items published either outside of the U.S. or by for-profit entities. Taking our cues from the Internet Archive, we determined to respect the robots.txt protocol in our Web harvesting activities and provide rights holders with instructions for requesting the removal of their content from the archive.

Fear of duplicating efforts

We have, on occasion, knowingly added digital materials to our archive collection that were already within the purview of other digital preservation programs. There is a fear of duplicating efforts when it comes to digital preservation, but there is also a strong argument to be made for multiple, geographically dispersed entities maintaining duplicate preserved copies of important digital resources.

Repetitive dataThis philosophy, especially as relates to duplicating the digital-preservation efforts of the Government Printing Office, is currently being echoed among several Federal Depository Libraries (and prominently by librarians who contribute to the Free Government Information blog) who are supporting the concept of digital deposit to maintain a truly distributed Federal Depository Library Program. Should there ever be a catastrophic failure at GPO, or even a temporary loss of access (such as that caused by the PURL server crash last August), user access to government documents would remain uninterrupted, thanks to this distributed preservation network. Currently there are 156 academic law libraries listed as selective depositories on the Federal Depository Library Directory; each of these would be candidates for digital deposit should the program come to fruition.

Libraries with perpetual access or post-cancellation access agreements with publishers may also find it worthwhile to invest in digital preservation activities that may be redundant. Some publishers offer easy post-cancellation access to purchased digital content via nonprofit initiatives such as Portico and LOCKSS, both of which function as digital preservation systems. Other publishers, however, may simply provide subscribers with a set of CDs or DVDs containing their purchased subscription content. In these cases, it is worthwhile to actively preserve these files within a locally managed digital archive to ensure long-term accessibility for library patrons, rather than relegating these valuable digital files, stored on an unstable optical medium, to languishing on a shelf.

Law reviews and legal scholarship

Legal scholar paintingIt has been suggested that academic law libraries take responsibility for the preservation of digital content cited within their institutions’ law reviews to ensure that future researchers will able to reference source materials even if they are no longer available at the cited URLs. While there aren’t specific figures relating to the problem of citation link rot in law reviews, research on Web citations appearing in scientific journals has shown that roughly 10 percent of these citations become inactive within 15 months of the citing article’s publication. When it comes to Web-published law and policy information, our own Chesapeake Project evaluation efforts have found that about 14 percent, or 1 out of every 7, Web-based items had disappeared from their original URLs within two years of being archived.

In the near future, we may find ourselves in the position of taking responsibility for the digital preservation of our law reviews themselves, given the call to action in the Durham Statement on Open Access to Legal Scholarship. After all, if law schools end print publication of journals and commit “to keep the electronic versions available in stable, open, digital formats” within open-access online repositories, there is an implicit mandate to ensure that those repositories offer digital preservation functionality, or that a separate dark digital preservation system be used in conjunction with the repository, to ensure long-term access to the digital journal content. (It is important to note that digital repository software and services do not necessarily feature standard digital preservation functionality.)

Law student/law review editorSpeaking of digital repositories, the responsibility for establishing and maintaining institutional repositories most certainly falls to the law library, as does the responsibility for preserving the digital intellectual output of their law schools’ faculty, institutes, centers, and students (many of whom go on to impressive heights).

At the Georgetown Law Library, we’ve also taken on the task of preserving the intellectual output published to the Law Center’s Web sites.

The Preserv project has compiled an impressive bibliography on digital preservation aimed specifically at preservation services for institutional repositories (but also covering many of the larger issues in digital preservation), which is worth reviewing.

What systems and formats should we use?

FrustrationDid I mention that our current digital preservation strategies and systems are imperfect? Well, it’s true. That’s the bad news. No matter which system or service you chose, you will surely encounter occasional glitches, endure system updates and migrations, and be forced to revise your processes and workflows from time to time. This is a fledgling, evolving field, and it’s up to us to grow and evolve along with it.

But, take heart! The good news is that there are standards and best practices established to guide us in developing strategies and selecting digital preservation systems, and we have multiple options to choose from. The key to embarking on a digital preservation project is to be versed in the language and standards of digital preservation, and to know what your options are.

The language and standards of digital preservation

I have heard a very convincing argument against standards in digital preservation: Because digital preservation is a new, evolving field, complying with rigid standards can be detrimental to systems that require a certain amount of adaptability in the face of emerging technological challenges. While I agree with this argument, I also believe that it is tremendously useful for those of us who are librarians, as opposed to programmers or IT specialists, to have standards as a starting point from which to identify and evaluate our options in digital preservation software and services.

There are a number of standards to be aware of in digital preservation. Chief among these is the Open Archival Information System (OAIS) Reference Model, which provides the central framework for most work in digital preservation. A basic question to ask when evaluating a digital preservation system or service is, “Does this system conform to the OAIS model?” If not, consider that a red flag.

AwardsThe Trustworthy Repositories Audit & Certification Criteria and Checklist, or TRAC, is a digital repository evaluation tool currently being incorporated into an international standard for auditing and certifying digital archives. A small number of large repositories have undergone (or are undergoing) TRAC audits, including E-Depot at the Koninklijke Bibliotheek (National Library of the Netherlands), LOCKSS, Portico, and HathiTrust. This number can be expected to increase in the coming years.

The TRAC checklist is also a helpful resource to consult in conducting your own independent evaluations. Last year, for example, the libraries participating in The Chesapeake Project commissioned the Center for Research Libraries to conduct an assessment (as opposed to a formal audit) of our OCLC digital archive system based on TRAC criteria, which provided useful information to strengthen the project.

The PREMIS Data Dictionary provides a core set of preservation metadata elements to support the long-term preservation and future renderability of digital objects stored within a preservation system. The PREMIS working group has created resources and tools to support PREMIS implementation, available via the Library of Congress’s Web site. It is useful to consult the data dictionary when establishing local policy, and to ask about PREMIS compatibility when evaluating digital preservation options.

SilosWhile we’re on the exciting topic of metadata, the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH, not to be confused with OAIS), is another protocol to watch for, especially if discovery and access are key components of your preservation initiative. OAI-PMH is a framework for sharing metadata between various “silos” of content. Essentially, the metadata of an OAI-PMH compliant system could be shared with and made discoverable via a single, federated search interface, allowing users to search the contents of multiple, distributed digital archives at the same time.

For an easy-to-read overview of digital preservation practices and standards, I recommend Priscilla Caplan’s The Preservation of Digital Materials, which appeared in the Feb./March 2008 issue of Library Technology Reports. There are also a few good online glossaries available to help decipher digital preservation jargon: the California Digital Library Glossary, the Internet Archives’ Glossary of Web Archiving Terms, and the Digital Preservation Coalition’s Definitions and Concepts.

Open source formats and software

Open sourceOpen source and open standard formats and software play a vital role in the lifecycle management of digital content. In the context of digital preservation, open-source formats, which make their source code and specifications freely available, facilitate the future development of tools that can assist in the migration of files to new formats as technology progresses and older formats become obsolete. PDF, for example, although developed originally as a proprietary format by Adobe Systems, became a published open standard in 2008, meaning that developers will have a foundation for making these files accessible in the future.

Other open source formats commonly used in digital preservation include the TIFF format for digital images, the ARC or WARC file for Web archiving, and the Extensible Markup Language (XML) text format for encoding data or document structure information. Microsoft formats, such as Word Documents, do not comply with open standards; the proprietary nature of these formats will inhibit future access to these documents when these formats become obsolete. The Library of Congress has a useful Web site devoted to digital formats and sustainability (including moving image and sound formats), which is worth reviewing.

Open source is good for meOpen source software is also looked upon favorably in digital preservation because, similar to open source formats, the software development and design process is made transparent, allowing current and future developers to develop new interfaces to or updates to the software over time.

Open source does not necessarily mean free-of-charge, and in fact, many service providers utilize open source software and open standards in developing fee-based or subscription digital preservation solutions.

Digital preservation solutions

There are many factors to consider in selecting a digital preservation solution. What is the nature of the content being preserved, and can the system accommodate it? Is preservation the sole purpose of the system — so that the system need include only a dark archive — or is a user access interface also necessary? How much does the system cost, and what are the expected ongoing maintenance costs, both in terms of budget and staff time? Is the system scalable, and can it accommodate a growing amount of content over time? This list could go on…

Keep in mind that no system will perfectly accommodate your needs. (Have I mentioned that digital preservation systems will always be imperfect?) And there is no use in waiting for the “perfect system” to be developed. We must use what’s available today. In selecting a system, consider its adherence to digital preservation standards, the stability of the institution or organization providing the solution, and the extent to which the digital preservation system has been accepted and adopted by institutions and user communities.

Tech workersIn a perfect world, perhaps every law library would implement a free, build-it-yourself, OAIS-compliant, open-source digital preservation solution with a large and supportive user community, such as DSpace or Fedora. These systems put full control in the hands of the libraries, which are the true custodians of the preserved digital content. But, in practice, our law libraries often do not have the staff and technological expertise to build and maintain an in-house digital preservation system.

As a result, several reputable library vendors and nonprofit organizations have developed fee-based digital preservation solutions, often built using open-source software. The Internet Archive offers the Archive-It service for the preservation of Web sites. The Stanford University-based LOCKSS program provides a decentralized preservation infrastructure for Web-based and other types of digital content, and the MetaArchive Cooperative provides a preservation repository service using the open-source LOCKSS software. The Ex Libris Digital Preservation System and the collaborative HathiTrust repository both support the preservation of digital objects.

For The Chesapeake Project, the Georgetown, Maryland State, and Virginia State Law Libraries use OCLC systems: the Digital Archive for preservation, coupled with a hosted instance of CONTENTdm as an access interface.

SalesmanIn our experience, working with a vendor that hosted our content at a secure offsite location and managed system updates and migrations allowed us to focus our energies on the administrative and organizational aspects of the project, rather than the ongoing management of the system itself. We were able to develop shared project documentation, including preferred file format and metadata policies, and conduct regular project evaluations. Moreover, because our project was collaborative, it worked to our advantage to enlist a third party to store all three libraries’ content, rather than place the burden of hosting the project’s content upon one single institution. In short, working with a vendor can actually benefit your project.

The ultimate question: How will we pay for it?

We still seem to be in the midst of a global economic recession that has impacted university and library budgets. Yet, despite budget stagnation, there has been a steady increase in the production of digital content.

SkydiversDigital preservation can be expensive, and law library staff members with digital preservation expertise are few. The logical solution to these issues of budget and staff limitations is to seek out opportunities for collaboration, which would allow for the sharing of costs, resources, and expertise among participating institutions.

LIPA logoCollaborative opportunities exist with the Library of Congress, which has created a network of more than 130 preservation partners throughout the U.S., and the law library community is also in the process of establishing its own collaborative digital archive, the Legal Information Archive, to be offered through the Legal Information Preservation Alliance, or LIPA.

During the 2009 AALL annual meeting, LIPA’s executive director announced that The Chesapeake Project had become a LIPA-sanctioned project under the umbrella of the new Legal Information Archive. As a collaborative project with expenses shared by three law libraries, The Chesapeake Project’s costs are currently quite low compared to other annual library expenditures, such as those for subscription databases. These annual costs will decrease as more law libraries join this initiative.

Retro librariansI firmly believe that law libraries must invest in digital preservation if we are to remain relevant and true to our purpose in the 21st century. The core reason libraries exist is to build collections, to make those collections accessible, to assist patrons in using our collections, and to preserve our collections forever. No other institution has been created to take on this responsibility. Digital preservation represents an opportunity in the digital age for law libraries to reclaim their traditional roles as stewards of information, and to ensure that our digital legal heritage will be available to legal scholars and the public well into the future.

Sarah RhodesSarah Rhodes is the digital collections librarian at the Georgetown Law Library in Washington, D.C., and a project coordinator for The Chesapeake Project Legal Information Archive, a digital preservation initiative of the Georgetown Law Library in collaboration with the State Law Libraries of Maryland and Virginia.

VoxPopuLII is edited by Judith Pratt.  Editor in Chief is Rob Richards.

Recently I, like many law librarians (including Dean Richard Danner, James Donovan, and the panelists at the University of South Carolina School of Law’s colloquium on “The Law Librarian’s Role in the Scholarly Enterprise” [scroll down & click on "Part 9: Roundtable"]), began to devote more thought to disintermediation in legal information services.  One way that law librarians can adapt to disintermediation is by learning more about the study of legal information systems, that is, legal informatics.  When I began looking closely at legal informatics scholarship last fall, I was dismayed at not being able to locate any single resource that aggregated all of the major scholarly information resources in the field.   As a result, I decided to build one; it’s called Legal Information Systems & Legal Informatics Resources. To provide current information, the site has an accompanying blog , the Legal Informatics Blog, and a Twitter feed.   Building these sites has allowed me to cast a novice’s eye on the field of legal informatics.

Eye

Here is what I’ve glimpsed in the past few months:

I. Surveying the Sources

My exploration of legal informatics has focused initially on information resources. A relatively circumscribed set of scholarly journals, other article sources, preprint services, indexing & abstracting services, blogs, and listservs regularly report research results in legal informatics. A small set of subject headings will retrieve most monographs and dissertations in the field. Accordingly, aggregating access to these resources has been relatively easy, and automating discovery and delivery of many of these sources seems feasible sooner rather than later.

Conferences are trickier.   The number of conferences at which legal informatics issues are addressed is substantial, for several reasons: a large number of researchers from industry as well as academia (see, e.g., the lists of individuals compiled by Dr. Adam Wyner and the organizers of the DEON deontic logic conferences, and this list of departments & institutes), energetically engaged in applied as well as theoretical research, are producing a sizeable output; many of those researchers work in multiple fields; and the pace of technological change is accelerating the research and communication processes.  Several Websites, such as those of the International Association for Artificial Intelligence and Law (IAAIL) and the DEON deontic logic conferences, monitor these meetings, however. Access to proceedings is available from several sources, including ACM’s Portal service, the other major information science indexing services, OCLC WorldCat, and the Legal Information Systems site. As a result, access to most legal informatics conference information and proceedings can be streamlined and hopefully largely automated before too long.

Projects have proven even trickier. Much legal informatics research takes the form of grant-funded projects, of which a great number, particularly in Europe, have been undertaken during the past decade. Political integration in Europe and democratization in many regions encouraged certain governments during the past two decades to fund applied research on legal information systems. Identifying and linking to all of these legal informatics projects seems important for enabling access to legal informatics scholarship. Such a process is quite labor intensive, however, because of the great number of such projects, the lack of a comprehensive list of them, and the many languages in which project documentation is written. A long-term goal of the Legal Information Systems site is to build a database of as many of these projects as can be identified, with links to project Websites, deliverables, and publications.

Since standards and protocols, such as those respecting descriptive metadata and knowledge representation, and data sets constitute additional key resources for legal informatics research, links to many of them have been collected on the Legal Information Systems site. Because many researchers in the field focus on a particular research topic or category of legal information, aggregations of resources on major topics in the field, such as e-rulemaking, evidence, and information behavior, to which the Legal Information Systems site has dedicated pages, and argumentation, to which Dr. Adam Wyner’s blog devotes several pages, may yield efficiencies for researchers. In addition, collections of resources on applied topics such as citation standards, computer-assisted legal research (CALR) services, court technology, the Free Access to Law movement (discussed here by Ginevra Peruginelli & Enrico Francesconi of ITTIG-CNR, with links to resources here), institutional repositories, instructional technology, law practice technology, and open access may be of use to researchers and practitioners alike.

II. Detecting a Communications Gap

From a preliminary scan of the field of legal informatics I’ve learned that legal informaticists and law librarians do not appear to be communicating to any significant extent. For example, law librarians seem to play little or no role at legal informatics conferences and are rarely published in legal informatics journals. (Sarah Rhodes & Dana Neacsu’s recent paper seems an exception.) This seems particularly odd, given that law libraries are developing some of today’s most innovative digital legal information systems, such as the Chesapeake Project Legal Information Archive (a project of the Georgetown University Law Library, the Maryland State Law Library, the Virginia State Law Library, and the Legal Information Preservation Alliance), the Law Library of Congress’s Global Legal Information Network (GLIN), the Harvard Law School Library’s Digital Collections, the digital law libraries created by the Rutgers Camden and Rutgers Newark law libraries, and the USC Law Library’s English Medieval Legal Documents Wiki. Law library scholarship — although it often addresses legal informatics topics such as legal citation (as in studies that reveal information resources utilized by courts), legal information behavior (as in the work of Dean Joan Howland & Nancy Lewis, Dr. Yolanda Jones, and Judith Lihosit ), and the functioning or design of legal information systems such as computer assisted legal research (CALR) services (as in recent studies by Julie Jones, John Doyle, and Dean Mason) — rather infrequently refers to legal informatics scholarship. That is, two communities of experts respecting the same subject — legal information systems — seem for the most part to be talking past each other.

Communication failure

Yet information sharing between law librarians and legal informaticists would substantially benefit both groups.   Law librarians would gain valuable insights into the functioning of the legal information systems they use every day and the likely direction of the legal information industry, as may be gleaned from recent monographs collecting conference papers in the field as well as from the program of the 2009 International Conference on Artificial Intelligence and Law (ICAIL 2009).   Those works show that the primary topics of recent legal informatics scholarship include argumentation and deontic logic (as discussed, for example, in recent dissertations by Dr. Adam Wyner & Dr. Régis Riveret); agent/multi-agent systems; decision support systems; document modeling; several natural language processing issues including multi-language systems, text mining including automated classification and indexing, summarization, segmentation, and information retrieval, as, for example, discussed in proceedings of the TREC Legal Track, and notably in the context of electronic discovery; other applied research topics, particularly concerning e-rulemaking, online dispute resolution, negotiation systems, digital rights management, electronic commerce and contracts, and evidence; and the use of XML, ontologies, and the development of the Semantic Web respecting legal information.

By cooperating with law librarians, legal informaticists for their part would gain access to expert users of legal information systems, quality input respecting the contexts of legal information use (ranging from the information lifecycle to the information behavior of lawyers), and ideas for further research.

Here are some specific suggestions respecting how law librarians could make meaningful contributions to legal informatics research.   First, law librarians could continue to perform legal information behavior research, building on the important recent activity in this area. Second, law librarians who are developing innovative legal information systems could present papers on those systems at legal informatics conferences and write articles about those systems for legal informatics journals.

Third, as expert users of legal information systems and close observers of lawyers, judges, law students, and lay users of legal information, law librarians could generate legal informatics research questions based on their experience and observations. For example, law librarians could recommend research on such little-studied but important legal information systems as conflict of interest control systems and bankruptcy claims agents’ Websites, or on the application of information science and computer science concepts to legal information systems errors, such as those arising from faulty legal drafting practices and overly complex statutory and regulatory schemes.

Fourth, law librarians could provide legal informaticists with expert practitioner and policy perspectives on issues that law librarians have prioritized as a profession, such as authentication, digital preservation, metadata content and management, and user interface design.   Fifth, law librarians could furnish legal informatics researchers with input respecting system capabilities from the vantage of an “expert user,” as Dr. Stephann Makri recently did by including law librarians in his study of lawyers’ information behavior.

Sixth, law librarians engaged in developing innovative digital legal information systems could partner with legal informaticists to study those systems. Seventh, law librarians who are also lawyers could contribute their knowledge of substantive and procedural law to legal informatics research projects, particularly where not all of the legal informaticists involved have legal training.

Finally, law librarians could draw on their in-depth knowledge of legal information systems and users to partner with legal informaticists on the design of research studies.   In particular, those law librarians with training in social science research methods could encourage legal informaticists to employ those methods in their studies of legal information systems, which might benefit from increased use of multiple methodologies.

Handshake

III. Bright Prospects

Greater cooperation between legal informaticists and law librarians would benefit both communities.  The Legal Information Systems site will be developed with an eye toward demonstrating and fostering that cooperation.

[NOTE: This post was updated on 22 August 2011 to reflect new URLs.]

Robert Richards  edits Legal Information Systems & Legal Informatics Resources and its accompanying blog , the Legal Informatics Blog, and  Twitter feed.

VoxPopuLII is edited by Judith Pratt.