skip navigation
search

As researchers use materials in libraries, their actions tend to generate records—research trails in digital databases, lists of borrowed books, and correspondence with librarians. Most of the time, these records are innocuous, but to facilitate freedom of inquiry, librarians generally hold these records as confidential. This confidentiality is especially important in law libraries because legal matters can be very sensitive and stressful. Researchers implicitly trust librarians with at least hints of concerns the researchers would prefer not be generally known. If researchers knew any records of their questions could become known to others, some researchers would avoid using library collections or asking librarians for advice, guidance that very well may help them find valuable information.

In her interesting post, Meg Leta points out that, despite some exhortations that information on Web lasts forever, most information now online will disappear at some point. Websites go down when their owners fail to pay hosting fees. Data is deleted, either by purpose or mistake. A file sitting on a drive or disc will, without maintenance, eventually becomes inaccessible because the storage media has decayed or because the hardware and software needed to read the file has become obsolete. Since information will tend to vanish without action on our part, Leta suggests we should instead focus on actively saving information that is worth keeping.

Leta makes an excellent point, but I’d suggest that in addition to thinking carefully about what information needs to be kept, legal professionals also should consider whether certain types of information warrant purposeful destruction. I’d also suggest that for law libraries, patrons should be given the ability to retain, either through the library or themselves, records of their use of library resources.

Leaving Breadcrumbs Along the Research Trail

Just as most web browsers keep a history of websites visited and search engines retain logs of search terms, law libraries and their vendors maintain records of some researcher interactions with library resources and staff. A very thorough researcher could generate records by using web browsers on library computers, writing to library staff, borrowing books, and accessing databases that require individual user accounts. Many of the major legal databases, such as Westlaw, LexisNexis, and Bloomberg Law require users to log in and maintain individual research trails.

Just as Leta said, most of these records will be destroyed over time through the library’s and vendors’ normal procedures. The library computers probably are set to erase their browser histories every so often, and most integrated library systems delete circulation records once books are returned. Legal databases keep research trails, but generally those trails eventually expire. However, the vendors also keep server logs and track users with cookies; those records probably are deleted at some point, but probably later than when users lose access to their research trails. Any written records the librarians keep of patron interactions might be covered by an organizational records retention schedule; if not, they are kept at the whim of the librarian.

So this appears to be the present situation: law libraries and their vendors collect a variety of records about their patrons’ research. Through normal business processes, much of those records is eventually discarded. Depending on the researcher’s circumstances, the records may be sensitive, and librarians generally strive to keep all such records confidential as a matter of professional ethics. Is there anything in the status quo worth changing?

Retaining Information has Risks and Benefits

Almost all the records libraries keep about their patrons have a purpose. Circulation records are kept so libraries do not lose materials and to make usage statistics. Vendors keep research trails so researchers can retrace their steps and know how their products are being used. After a certain period of time, these records are generally not needed for those reasons.

While records are needed for important reasons, keeping them also involves risk of harming researchers. The most serious risks are that a researcher’s sensitive legal research records will be revealed to others who should not have that information and that the records will be used for a purpose different from the one for which the information was originally collected. I imagine law libraries are not high-priority targets for criminals and government agents, but then again, library databases and email systems are probably not equipped with state-of-the-art security. Certainly the longer records are retained, the more opportunity there is for security to be compromised.

It is easier to imagine a scenario in which library records are used for a new purpose. Database vendors could decide to use research histories  to market products to researchers. This seems possible for law students and attorneys. Publishers could seek to use library or database records to help track researchers committing copyright infringement. I have not heard of any recent attempts by law enforcement to obtain law library records and it is hard to fathom what relevance the records would have to any investigation. On the other hand, the government has sought library records before.

These risks that library records might be wrongly disclosed or misused exist while the records are useful, but the beneficial intended uses of the records outweigh the risks. Once that need has ended, though, there is no justification for keeping the records. The minimize risks to patrons, libraries should determine how long they need certain types of records and then destroy the data as soon as it is not required.

On the other hand, records of research activity can be used to benefit patrons. Surely many researchers could use a list of every book they have borrowed, or a research trail that covers multiple databases. Perhaps software could be developed that would analyze research histories to help make data-driven collection development decisions or recommend new books and articles to faculty and students. Services like this might require keeping patron records for quite some time.

Librarians thinking of future historians might suggest that patron records should be kept in some form so on ancestors can have a better understanding of how we conducted research and to look into the thought processes of significant legal scholars.

Giving Patrons Greater Control of Their Records

How these risks and benefits weigh against each other depends to a great deal on the researcher’s circumstances. For many faculty and students, the privacy of their library records is not a matter of great concern. For attorneys and private citizens (and faculty and students when conducting research on their personal legal matters), privacy is very important, and if they knew of a risk that their records might be used in unexpected ways, they may reduce their use of library resources, or be deterred from using the library altogether.

I suggest law librarians seek to give researchers greater control over their library records. Records should be retained for the absolute least amount of time needed for providing the services for which the data was collected. After that time, the records should be rendered totally irretrievable or reduced to anonymous statistics that cannot be traced to any individual. However, before the records are destroyed, they should be easily accessed and saved by the researcher for her own use. Researchers that choose this option can then keep their records as they see fit, just as they can download bank statements and export their financial transactions to personal money management software.

Below are suggestions for how this might be done.

Make a privacy policy and records retention schedule — Each library should publish a privacy policy that describes how the library collects and retains records of patron interactions. Each library should also make a records retention schedule that details how long each type of record is kept and how researchers can obtain a copy of their records before they are destroyed. Many researchers may choose not to download their records, but in that case the data will be destroyed as soon as it is not needed. The default option is most protective of patron privacy.

Make records easy to obtain and use — Researchers who wish to save their records should be able to more easily obtain them in a format that is compatible with software that organizes, searches, and retrieves the records. For instance, borrowing histories and database research trails could provide citations of accessed materials that are compatible with citation management software like Zotero, citeulike, and Mendeley. Since most integrated library systems and journal databases are provided by vendors, the best librarians can do is urge vendors to add these functions and subscribe to products that allow privacy-protecting defaults while also giving patrons access to their records.

Convince vendors to do the same — Libraries license most of the systems used to catalog and provide access to their collections. Protecting researcher privacy and providing patron access to their records will require the cooperation of vendors. Librarians should ask vendors to publish privacy policies that tell researchers what records are collected and how long they are retained, and encourage development of software that will give patrons copies of their records that are compatible with leading research management software.

For further reading on records destruction and privacy, I suggest Daniel Solove’s Understanding Privacy (Harvard University Press, 2008) and Viktor Mayer-Schonberger’s Delete: The Virtue of Forgetting in the Digital Age (Princeton University Press, 2009).

Benjamin Keele is a reference librarian at the Wolf Law Library of the William & Mary Law School. He earned a J.D. and M.L.S. from Indiana University. His research interests include copyright, privacy, and scholarly publishing. His website is benkeele.com.

 

 

VoxPopuLII is edited by Judith Pratt. Editors-in-Chief are Stephanie Davidson and Christine Kirchberger, to whom queries should be directed. The information above should not be considered legal advice. If you require legal representation, please consult a lawyer.

[Editor’s Note] For topic-related VoxPopuLII posts please see: Meg Leta Ambrose, Accounting for Informatics in the “Right to be Forgotten” Debate.

Editor’s note: This is the first in a 2-part series on issues of content permanence. Benjamin Keele of the William and Mary Law Library will be writing on data deletion principles for VoxPopuLII in April.

A Future Full of the Past?
The current consensus seems to be that information, once online, is permanent. The Disney Channel runs a PSA warning kids to be careful what they put online because “You’re leaving a permanent (and searchable) record any time you post something.” Concerns about content permanence have led many European countries to establish a legal “Right to be Forgotten” to protect citizens from the shackles of the past presented by the Internet. The prospect of content adjustment in the name of privacy has exposed cultural variations on perspectives of the global village[1]. In Europe, the “Right to be Forgotten” has gained traction as a legal mechanism for handling such information issues and has been named a top priority by the European Union Data Privacy Commission. This right essentially transforms public information into private information after a period of time, by limiting the access to third parties, “[T]he right to silence on past events in life that are no longer occurring.”[2] What in Italy and France is called oblivion, however, is controversial and has been called “rewriting history”, “personal history revisionism”, and “censorship” in the U.S.

Benjamin Keele of the William and Mary Law Library has previously addressed the “data” aspect of the “Right to be Forgotten” debate, outlining data deletion principles for organizations privately holding user information — the footprints we leave behind as we interact with sites and devices. This post, and my research generally, focuses on the “information” element in the debate: the content a user posts to the Web. The question often posed in this debate is whether an individual should have the right to manipulate or access content about his or her past that is generated through search engine results. But, this is actually the wrong question. Information Science research tells us that permanence is not a reality and it may never be. Information falls off the Web for many reasons. The right question to ask is, “what information should we actively save, and what information should we allow to fade, particularly when it harms an individual?” Fortunately, Information Science research offers wisdom for answering as well as framing this question.

The Information Preservation Paradox

In this age, “[l]ife, it seems, begins not at birth but with online conception. And a child’s name is the link to that permanent record.” You are what Google says you are, and expectant parents search prospective baby names to help their kids yield top search results in the future. Only a few rare parents want their children to be lost in a virtual crowd, but is infamy preferable? In 2003, A Canadian high school student unwittingly became the Star Wars Kid, and according to Google, he still is as of 2011. A New England Patriots cheerleader was fired for blog content, a Millersville University student teacher was not allowed to graduate because of images on Facebook, and UCLA sophomore Alexandra Wallace quit school and made a public apology for a racist video she posted on YouTube that spurred debate online about a university’s authority to monitor or regulate student speech. Though discoverable through public Google searches, the posted content offered little in the way of context or truth about the owner’s character. In 1992, John Venables and Robert Thompson viciously murdered a 2 year-old and became infamous online and off as the youngest people ever to be incarcerated for murder in English history.

These stories deserve varying levels of sympathy but are all embarrassing, negative, and lead the subjects to want to disconnect their names from their past transgressions to make such information more difficult to discover when interviewing for a job, college, or first date. Paradoxically, the only individuals who have been offered oblivion are the two who committed the most heinous social offense: Venables and Thompson were given new identities upon their release from juvenile incarceration. It may actually be easier for two convicted murderers to get a job than it is for Alexandra Wallace.

This paradox is one of many that result from an incomprehensive and distorted conception of information persistence. The real problem with new forms of access to old information is that without rhyme or reason, much of it disappears while pieces of harmful content may remain. Time disrupts the information system and information values upon which U.S. information privacy law has been based, so we must reassess our views and practices in light of this disruption. Objections to the preservation of personal information may be valid; when content has aged, it becomes increasingly uncontextualized, poorly duplicated, irrelevant, and/or inaccurate. Basic but difficult questions about the role of the Internet in society today and for the future must be answered, and these will be the foundation for resolving disputes that arise from personal information lingering online.

The Crisis of Disappearing Content

Privacy scholars and journalists have embraced the notion of permanence – that we cannot be separated from an identifying piece of online information short of a name change. But information persistence research suggests otherwise – perhaps showing even a decreasing lifespan for content. When articulating the reasons behind the Internet Archive, Brewster Kahle explained that the average lifespan of a webpage was around 100 days. In 2000, Cho and Garcia found that 77% of content was still alive after a day[3]; Brewington estimated that 50% of content was gone after 100 days[4]. In 2003, Fetterly found 65% of content alive after a week[5], and in 2004, Ntoulas found only 10% of content alive after a year[6]. Recent work suggests, albeit tentatively, that data is becoming less persistent over time; for example, Daniel Gomes and Mario Silva studied the persistence of content between 2006 and 2007 and discovered a rate of only 55% alive after 1 day, 41% after a week, 23% after 100 days, and 15% after a year[7]. While all of these studies contained various goals, designs, and methods preventing true synthesis, they all contribute to the well-established principle that the Web is ephemeral[8]. At best, the average lifespan of content is a matter of months or, in rare cases, years — certainly not forever.

The Internet has not defeated time, and information like everything, gets old, decays, and dies, even online. Quite the opposite of permanent, the Web cannot be self-preserving[9]. Permanence is not yet upon us – now is the time to develop practices of information stewardship that will preserve our cultural history as well as protect the privacy rights of those that will live with the information.

Information Stewardship

Old information may be valuable to decision-making or history. The first has been considered by laws like the Fair Credit Reporting Act and database designers with an understanding of the fact that more information does not necessarily or usually result in better quality decisions and that old information may have transformed into misinformation. The second is more difficult: how do we decide what information may be important when we reflect on the past as researchers and historians? Archival ethics, a developed field in library and information science, offers rich insight. The Society of American Archivists have drafted a Code of Ethics that states, “[Archivists] establish procedures and policies to protect the interests of the donors, individuals, groups, and institutions whose public and private lives and activities are recorded in their holdings. As appropriate, archivists place access restrictions on collections to ensure that privacy and confidentiality are maintained, particularly for individuals and groups who have no voice or role in collections’ creation, retention, or public use.”[10]

The Web, of course, does not have a hierarchy to hand down such decisions. It is a bottom-up structure. Therefore, users must find their own inner archivists. They must protect what is important, assess what may be harmful, and take responsibility for the content they contribute to the Web. For a fascinating example of such Web ethics, go to the Star Wars Kid Wikipedia page, and click the “talk” link. You will find that Wikipedia’s biographies of living persons policy has been implemented. This implementation, however, does not prevent the page from being the first listed in Google’s search results for the Star Wars Kid’s real name. There are many other sites that follow some form of archival ethics; many of them limit access to content by altering how private information may be retrieved by a search, either by not offering full-text search functionality on the site (see the Internet Archive) or by using robots.txt to communicate with crawlers that information is off-limits to them (see Public Resource). These access decisions essentially create a card catalog-like system of access to the private information. Library and information scientists have worked with these issues for a very long time. Their expertise is desperately needed as these difficult policy decisions are made at a user, site, network, national, and international level.


[1] Marshall McLuhan, The Gutenberg Galaxy: The Making of Typographic Man (1962).

[2] Georgio Pino, “The Right to Personal Identity in Italian Private Law: Constitutional Interpretation and Judge-Made Rights,” In The Harmonization of Private Law in Europe, M. Van Hoecke and F. Osts (eds.), 237 (2000).

[3] Junghoo Cho and Hector Garcia-Molina, The Evolution of the Web and Implications for an Incremental Crawler, Proceedings of the 26th International Conference on Very Large Data Bases 200-209 (2000).

[4] Brian E. Brewington and George Cybenko, How Dynamic is the Web? Estimating the Information Highway Speed Limit 33 (1-6) Comput. Netw. 257-276 (2000).

[5] Dennis Fetterly, Mark Manasse, Mark Najork, and Janet Wiener, A Large-Scale Study of the Evolution of Web Pages 34(2) Software Practice and Experience 213-237 (2004).

[6] Alexandros Ntoulas, Junghoo Cho, and Christopher Olston, What’s New on the Web? The Evolution of the Web from a Search Engine Perspective, Proceedings of the 13th International Conference on World Wide Web 1-12 (2004).

[7] Gomes and Silva, supra note 4.

[8] Wallace Koehler, A Longitudinal Study of Web Pages Continued: A Consideration of Document Persistence 9(2) Information Research 1 (2004).

[9] Julian Masanes, Web Archiving, at 7 (2006).

[10] Society of American Archivists, “Code of Ethics for Archivists,” at http://www2.archivists.org/statements/saa-core-values-statement-and-code-of-ethics (2011).

Editor’s Note: For topic-related VoxPopuLII posts please see: Robert Richards, Context and Legal Informatics Research.

Meg Leta Ambrose is a doctoral student at the University of Colorado’s interdisciplinary Technology, Media, & Society program. She is a fellow with the computer science department, a research assistant with the law school’s Silicon Flatirons Center, and Provost’s University Library Fellow. She has been awarded the CableLabs fellowship for remainder of her doctoral work. Meg received a J.D. from the University of Illinois in 2008 and can be found at megleta.com.

VoxPopuLII is edited by Judith Pratt.

Editors-in-Chief are Stephanie Davidson and Christine Kirchberger, to whom queries should be directed. The information above should not be considered legal advice. If you require legal representation, please consult a lawyer.