skip navigation
search

Editor’s note: This is the first in a 2-part series on issues of content permanence. Benjamin Keele of the William and Mary Law Library will be writing on data deletion principles for VoxPopuLII in April.

A Future Full of the Past?
The current consensus seems to be that information, once online, is permanent. The Disney Channel runs a PSA warning kids to be careful what they put online because “You’re leaving a permanent (and searchable) record any time you post something.” Concerns about content permanence have led many European countries to establish a legal “Right to be Forgotten” to protect citizens from the shackles of the past presented by the Internet. The prospect of content adjustment in the name of privacy has exposed cultural variations on perspectives of the global village[1]. In Europe, the “Right to be Forgotten” has gained traction as a legal mechanism for handling such information issues and has been named a top priority by the European Union Data Privacy Commission. This right essentially transforms public information into private information after a period of time, by limiting the access to third parties, “[T]he right to silence on past events in life that are no longer occurring.”[2] What in Italy and France is called oblivion, however, is controversial and has been called “rewriting history”, “personal history revisionism”, and “censorship” in the U.S.

Benjamin Keele of the William and Mary Law Library has previously addressed the “data” aspect of the “Right to be Forgotten” debate, outlining data deletion principles for organizations privately holding user information — the footprints we leave behind as we interact with sites and devices. This post, and my research generally, focuses on the “information” element in the debate: the content a user posts to the Web. The question often posed in this debate is whether an individual should have the right to manipulate or access content about his or her past that is generated through search engine results. But, this is actually the wrong question. Information Science research tells us that permanence is not a reality and it may never be. Information falls off the Web for many reasons. The right question to ask is, “what information should we actively save, and what information should we allow to fade, particularly when it harms an individual?” Fortunately, Information Science research offers wisdom for answering as well as framing this question.

The Information Preservation Paradox

In this age, “[l]ife, it seems, begins not at birth but with online conception. And a child’s name is the link to that permanent record.” You are what Google says you are, and expectant parents search prospective baby names to help their kids yield top search results in the future. Only a few rare parents want their children to be lost in a virtual crowd, but is infamy preferable? In 2003, A Canadian high school student unwittingly became the Star Wars Kid, and according to Google, he still is as of 2011. A New England Patriots cheerleader was fired for blog content, a Millersville University student teacher was not allowed to graduate because of images on Facebook, and UCLA sophomore Alexandra Wallace quit school and made a public apology for a racist video she posted on YouTube that spurred debate online about a university’s authority to monitor or regulate student speech. Though discoverable through public Google searches, the posted content offered little in the way of context or truth about the owner’s character. In 1992, John Venables and Robert Thompson viciously murdered a 2 year-old and became infamous online and off as the youngest people ever to be incarcerated for murder in English history.

These stories deserve varying levels of sympathy but are all embarrassing, negative, and lead the subjects to want to disconnect their names from their past transgressions to make such information more difficult to discover when interviewing for a job, college, or first date. Paradoxically, the only individuals who have been offered oblivion are the two who committed the most heinous social offense: Venables and Thompson were given new identities upon their release from juvenile incarceration. It may actually be easier for two convicted murderers to get a job than it is for Alexandra Wallace.

This paradox is one of many that result from an incomprehensive and distorted conception of information persistence. The real problem with new forms of access to old information is that without rhyme or reason, much of it disappears while pieces of harmful content may remain. Time disrupts the information system and information values upon which U.S. information privacy law has been based, so we must reassess our views and practices in light of this disruption. Objections to the preservation of personal information may be valid; when content has aged, it becomes increasingly uncontextualized, poorly duplicated, irrelevant, and/or inaccurate. Basic but difficult questions about the role of the Internet in society today and for the future must be answered, and these will be the foundation for resolving disputes that arise from personal information lingering online.

The Crisis of Disappearing Content

Privacy scholars and journalists have embraced the notion of permanence – that we cannot be separated from an identifying piece of online information short of a name change. But information persistence research suggests otherwise – perhaps showing even a decreasing lifespan for content. When articulating the reasons behind the Internet Archive, Brewster Kahle explained that the average lifespan of a webpage was around 100 days. In 2000, Cho and Garcia found that 77% of content was still alive after a day[3]; Brewington estimated that 50% of content was gone after 100 days[4]. In 2003, Fetterly found 65% of content alive after a week[5], and in 2004, Ntoulas found only 10% of content alive after a year[6]. Recent work suggests, albeit tentatively, that data is becoming less persistent over time; for example, Daniel Gomes and Mario Silva studied the persistence of content between 2006 and 2007 and discovered a rate of only 55% alive after 1 day, 41% after a week, 23% after 100 days, and 15% after a year[7]. While all of these studies contained various goals, designs, and methods preventing true synthesis, they all contribute to the well-established principle that the Web is ephemeral[8]. At best, the average lifespan of content is a matter of months or, in rare cases, years — certainly not forever.

The Internet has not defeated time, and information like everything, gets old, decays, and dies, even online. Quite the opposite of permanent, the Web cannot be self-preserving[9]. Permanence is not yet upon us – now is the time to develop practices of information stewardship that will preserve our cultural history as well as protect the privacy rights of those that will live with the information.

Information Stewardship

Old information may be valuable to decision-making or history. The first has been considered by laws like the Fair Credit Reporting Act and database designers with an understanding of the fact that more information does not necessarily or usually result in better quality decisions and that old information may have transformed into misinformation. The second is more difficult: how do we decide what information may be important when we reflect on the past as researchers and historians? Archival ethics, a developed field in library and information science, offers rich insight. The Society of American Archivists have drafted a Code of Ethics that states, “[Archivists] establish procedures and policies to protect the interests of the donors, individuals, groups, and institutions whose public and private lives and activities are recorded in their holdings. As appropriate, archivists place access restrictions on collections to ensure that privacy and confidentiality are maintained, particularly for individuals and groups who have no voice or role in collections’ creation, retention, or public use.”[10]

The Web, of course, does not have a hierarchy to hand down such decisions. It is a bottom-up structure. Therefore, users must find their own inner archivists. They must protect what is important, assess what may be harmful, and take responsibility for the content they contribute to the Web. For a fascinating example of such Web ethics, go to the Star Wars Kid Wikipedia page, and click the “talk” link. You will find that Wikipedia’s biographies of living persons policy has been implemented. This implementation, however, does not prevent the page from being the first listed in Google’s search results for the Star Wars Kid’s real name. There are many other sites that follow some form of archival ethics; many of them limit access to content by altering how private information may be retrieved by a search, either by not offering full-text search functionality on the site (see the Internet Archive) or by using robots.txt to communicate with crawlers that information is off-limits to them (see Public Resource). These access decisions essentially create a card catalog-like system of access to the private information. Library and information scientists have worked with these issues for a very long time. Their expertise is desperately needed as these difficult policy decisions are made at a user, site, network, national, and international level.


[1] Marshall McLuhan, The Gutenberg Galaxy: The Making of Typographic Man (1962).

[2] Georgio Pino, “The Right to Personal Identity in Italian Private Law: Constitutional Interpretation and Judge-Made Rights,” In The Harmonization of Private Law in Europe, M. Van Hoecke and F. Osts (eds.), 237 (2000).

[3] Junghoo Cho and Hector Garcia-Molina, The Evolution of the Web and Implications for an Incremental Crawler, Proceedings of the 26th International Conference on Very Large Data Bases 200-209 (2000).

[4] Brian E. Brewington and George Cybenko, How Dynamic is the Web? Estimating the Information Highway Speed Limit 33 (1-6) Comput. Netw. 257-276 (2000).

[5] Dennis Fetterly, Mark Manasse, Mark Najork, and Janet Wiener, A Large-Scale Study of the Evolution of Web Pages 34(2) Software Practice and Experience 213-237 (2004).

[6] Alexandros Ntoulas, Junghoo Cho, and Christopher Olston, What’s New on the Web? The Evolution of the Web from a Search Engine Perspective, Proceedings of the 13th International Conference on World Wide Web 1-12 (2004).

[7] Gomes and Silva, supra note 4.

[8] Wallace Koehler, A Longitudinal Study of Web Pages Continued: A Consideration of Document Persistence 9(2) Information Research 1 (2004).

[9] Julian Masanes, Web Archiving, at 7 (2006).

[10] Society of American Archivists, “Code of Ethics for Archivists,” at http://www2.archivists.org/statements/saa-core-values-statement-and-code-of-ethics (2011).

Editor’s Note: For topic-related VoxPopuLII posts please see: Robert Richards, Context and Legal Informatics Research.

Meg Leta Ambrose is a doctoral student at the University of Colorado’s interdisciplinary Technology, Media, & Society program. She is a fellow with the computer science department, a research assistant with the law school’s Silicon Flatirons Center, and Provost’s University Library Fellow. She has been awarded the CableLabs fellowship for remainder of her doctoral work. Meg received a J.D. from the University of Illinois in 2008 and can be found at megleta.com.

VoxPopuLII is edited by Judith Pratt.

Editors-in-Chief are Stephanie Davidson and Christine Kirchberger, to whom queries should be directed. The information above should not be considered legal advice. If you require legal representation, please consult a lawyer.