skip navigation
search

Editor’s note: This is the first in a 2-part series on issues of content permanence. Benjamin Keele of the William and Mary Law Library will be writing on data deletion principles for VoxPopuLII in April.

A Future Full of the Past?
The current consensus seems to be that information, once online, is permanent. The Disney Channel runs a PSA warning kids to be careful what they put online because “You’re leaving a permanent (and searchable) record any time you post something.” Concerns about content permanence have led many European countries to establish a legal “Right to be Forgotten” to protect citizens from the shackles of the past presented by the Internet. The prospect of content adjustment in the name of privacy has exposed cultural variations on perspectives of the global village[1]. In Europe, the “Right to be Forgotten” has gained traction as a legal mechanism for handling such information issues and has been named a top priority by the European Union Data Privacy Commission. This right essentially transforms public information into private information after a period of time, by limiting the access to third parties, “[T]he right to silence on past events in life that are no longer occurring.”[2] What in Italy and France is called oblivion, however, is controversial and has been called “rewriting history”, “personal history revisionism”, and “censorship” in the U.S.

Benjamin Keele of the William and Mary Law Library has previously addressed the “data” aspect of the “Right to be Forgotten” debate, outlining data deletion principles for organizations privately holding user information — the footprints we leave behind as we interact with sites and devices. This post, and my research generally, focuses on the “information” element in the debate: the content a user posts to the Web. The question often posed in this debate is whether an individual should have the right to manipulate or access content about his or her past that is generated through search engine results. But, this is actually the wrong question. Information Science research tells us that permanence is not a reality and it may never be. Information falls off the Web for many reasons. The right question to ask is, “what information should we actively save, and what information should we allow to fade, particularly when it harms an individual?” Fortunately, Information Science research offers wisdom for answering as well as framing this question.

The Information Preservation Paradox

In this age, “[l]ife, it seems, begins not at birth but with online conception. And a child’s name is the link to that permanent record.” You are what Google says you are, and expectant parents search prospective baby names to help their kids yield top search results in the future. Only a few rare parents want their children to be lost in a virtual crowd, but is infamy preferable? In 2003, A Canadian high school student unwittingly became the Star Wars Kid, and according to Google, he still is as of 2011. A New England Patriots cheerleader was fired for blog content, a Millersville University student teacher was not allowed to graduate because of images on Facebook, and UCLA sophomore Alexandra Wallace quit school and made a public apology for a racist video she posted on YouTube that spurred debate online about a university’s authority to monitor or regulate student speech. Though discoverable through public Google searches, the posted content offered little in the way of context or truth about the owner’s character. In 1992, John Venables and Robert Thompson viciously murdered a 2 year-old and became infamous online and off as the youngest people ever to be incarcerated for murder in English history.

These stories deserve varying levels of sympathy but are all embarrassing, negative, and lead the subjects to want to disconnect their names from their past transgressions to make such information more difficult to discover when interviewing for a job, college, or first date. Paradoxically, the only individuals who have been offered oblivion are the two who committed the most heinous social offense: Venables and Thompson were given new identities upon their release from juvenile incarceration. It may actually be easier for two convicted murderers to get a job than it is for Alexandra Wallace.

This paradox is one of many that result from an incomprehensive and distorted conception of information persistence. The real problem with new forms of access to old information is that without rhyme or reason, much of it disappears while pieces of harmful content may remain. Time disrupts the information system and information values upon which U.S. information privacy law has been based, so we must reassess our views and practices in light of this disruption. Objections to the preservation of personal information may be valid; when content has aged, it becomes increasingly uncontextualized, poorly duplicated, irrelevant, and/or inaccurate. Basic but difficult questions about the role of the Internet in society today and for the future must be answered, and these will be the foundation for resolving disputes that arise from personal information lingering online.

The Crisis of Disappearing Content

Privacy scholars and journalists have embraced the notion of permanence – that we cannot be separated from an identifying piece of online information short of a name change. But information persistence research suggests otherwise – perhaps showing even a decreasing lifespan for content. When articulating the reasons behind the Internet Archive, Brewster Kahle explained that the average lifespan of a webpage was around 100 days. In 2000, Cho and Garcia found that 77% of content was still alive after a day[3]; Brewington estimated that 50% of content was gone after 100 days[4]. In 2003, Fetterly found 65% of content alive after a week[5], and in 2004, Ntoulas found only 10% of content alive after a year[6]. Recent work suggests, albeit tentatively, that data is becoming less persistent over time; for example, Daniel Gomes and Mario Silva studied the persistence of content between 2006 and 2007 and discovered a rate of only 55% alive after 1 day, 41% after a week, 23% after 100 days, and 15% after a year[7]. While all of these studies contained various goals, designs, and methods preventing true synthesis, they all contribute to the well-established principle that the Web is ephemeral[8]. At best, the average lifespan of content is a matter of months or, in rare cases, years — certainly not forever.

The Internet has not defeated time, and information like everything, gets old, decays, and dies, even online. Quite the opposite of permanent, the Web cannot be self-preserving[9]. Permanence is not yet upon us – now is the time to develop practices of information stewardship that will preserve our cultural history as well as protect the privacy rights of those that will live with the information.

Information Stewardship

Old information may be valuable to decision-making or history. The first has been considered by laws like the Fair Credit Reporting Act and database designers with an understanding of the fact that more information does not necessarily or usually result in better quality decisions and that old information may have transformed into misinformation. The second is more difficult: how do we decide what information may be important when we reflect on the past as researchers and historians? Archival ethics, a developed field in library and information science, offers rich insight. The Society of American Archivists have drafted a Code of Ethics that states, “[Archivists] establish procedures and policies to protect the interests of the donors, individuals, groups, and institutions whose public and private lives and activities are recorded in their holdings. As appropriate, archivists place access restrictions on collections to ensure that privacy and confidentiality are maintained, particularly for individuals and groups who have no voice or role in collections’ creation, retention, or public use.”[10]

The Web, of course, does not have a hierarchy to hand down such decisions. It is a bottom-up structure. Therefore, users must find their own inner archivists. They must protect what is important, assess what may be harmful, and take responsibility for the content they contribute to the Web. For a fascinating example of such Web ethics, go to the Star Wars Kid Wikipedia page, and click the “talk” link. You will find that Wikipedia’s biographies of living persons policy has been implemented. This implementation, however, does not prevent the page from being the first listed in Google’s search results for the Star Wars Kid’s real name. There are many other sites that follow some form of archival ethics; many of them limit access to content by altering how private information may be retrieved by a search, either by not offering full-text search functionality on the site (see the Internet Archive) or by using robots.txt to communicate with crawlers that information is off-limits to them (see Public Resource). These access decisions essentially create a card catalog-like system of access to the private information. Library and information scientists have worked with these issues for a very long time. Their expertise is desperately needed as these difficult policy decisions are made at a user, site, network, national, and international level.


[1] Marshall McLuhan, The Gutenberg Galaxy: The Making of Typographic Man (1962).

[2] Georgio Pino, “The Right to Personal Identity in Italian Private Law: Constitutional Interpretation and Judge-Made Rights,” In The Harmonization of Private Law in Europe, M. Van Hoecke and F. Osts (eds.), 237 (2000).

[3] Junghoo Cho and Hector Garcia-Molina, The Evolution of the Web and Implications for an Incremental Crawler, Proceedings of the 26th International Conference on Very Large Data Bases 200-209 (2000).

[4] Brian E. Brewington and George Cybenko, How Dynamic is the Web? Estimating the Information Highway Speed Limit 33 (1-6) Comput. Netw. 257-276 (2000).

[5] Dennis Fetterly, Mark Manasse, Mark Najork, and Janet Wiener, A Large-Scale Study of the Evolution of Web Pages 34(2) Software Practice and Experience 213-237 (2004).

[6] Alexandros Ntoulas, Junghoo Cho, and Christopher Olston, What’s New on the Web? The Evolution of the Web from a Search Engine Perspective, Proceedings of the 13th International Conference on World Wide Web 1-12 (2004).

[7] Gomes and Silva, supra note 4.

[8] Wallace Koehler, A Longitudinal Study of Web Pages Continued: A Consideration of Document Persistence 9(2) Information Research 1 (2004).

[9] Julian Masanes, Web Archiving, at 7 (2006).

[10] Society of American Archivists, “Code of Ethics for Archivists,” at http://www2.archivists.org/statements/saa-core-values-statement-and-code-of-ethics (2011).

Editor’s Note: For topic-related VoxPopuLII posts please see: Robert Richards, Context and Legal Informatics Research.

Meg Leta Ambrose is a doctoral student at the University of Colorado’s interdisciplinary Technology, Media, & Society program. She is a fellow with the computer science department, a research assistant with the law school’s Silicon Flatirons Center, and Provost’s University Library Fellow. She has been awarded the CableLabs fellowship for remainder of her doctoral work. Meg received a J.D. from the University of Illinois in 2008 and can be found at megleta.com.

VoxPopuLII is edited by Judith Pratt.

Editors-in-Chief are Stephanie Davidson and Christine Kirchberger, to whom queries should be directed. The information above should not be considered legal advice. If you require legal representation, please consult a lawyer.

I first met Jeremy Bentham as a newly arrived philosophy student walking through the South Cloisters of University College London.  Behind the plate glass of a huge mahogany case, I looked in upon a seated life-size wax figure of a man in an 18th Century coat and knee britches, happily wearing a straw hat.  Only it was not Bentham’s wax figure; it was his embalmed corpse – his “auto-icon.”  Apparently, Bentham’s will left his executor no choice but to have his body stuffed and placed on public display.  There he has been ever since.

Bentham famously believed that publicity was the key to truth.  His ideal was a Panoptic  universe, where all in the world would believe themselves to be constantly observed, listened to, and monitored.  Thus all would become good — or at least would behave (which, for Bentham, amounted to the same thing).  Bentham felt that claims to privacy were no more real or substantial than claims to natural rights, which he despised as pernicious fictions.  Both were harmful beliefs that entrenched privilege and maintained humankind in its misery.  Publicity was the key to truth and human happiness. 

It is easy to make fun of Bentham’s ideas.  But much of what Bentham meant to address in the context of his Panoptic structures we now take for granted.  In Bentham’s lifetime, Parliamentary deliberations were confidential.  Bentham’s arguments forced them into the sunlight.  Legal decisions and statute books were accessible only to lawyers and judges.  Bentham’s arguments led to codification of the law, and increasingly accessible legal rules.  Bentham was far ahead of his time — the first modern information theorist.  The idea that all actions of government would be presumptively available for public review did not become part of U.S. law until the passage of the Freedom of Information Act (FOIA)  in 1967.  As we speak, it appears the English parliament is only now learning Bentham’s message about publicity.

Bentham’s contemporary William Blackstone celebrated the fact that “private vices” were beyond the jurisdiction of the state.  Privacy for him was an organizing principle of civilized society.  But Blackstone believed in an all-seeing God to whom we would be accountable even for our private sins and thoughts.  Bentham, a thoroughgoing atheist, hated Blackstone and all he stood for.  For him the logical truth remained that people who believed themselves to be monitored behaved more responsibly than those who believed themselves to be alone.  So Bentham asked himself: in the absence of God, how can a secular society operate without perpetuating Panoptic structures of surveillance? Foucault's Panopticon When Michael Foucault argued that Bentham’s Panoptic structures had become essential to the functioning of a modern secular state, he did not claim originality for the insight.

But why do our intuitions revolt?  What can our brains say to explain this revulsion?  What is so important about privacy?  Judge Posner has pointed out that when people are given a right to privacy, they use it to conceal discrediting information about themselves from others – and consequently mislead and defraud them.  In a world increasingly characterized by exchanges of information, should we not all just abandon the attempt to maintain privacy, and embrace the Panopticon?  We are, of course, all familiar with the dark side of the Panopticon – the fictional surveillance state of George Orwell’s 1984, or the actual surveillance states in Eastern Europe in the second half of the last century.  But as Bentham knew, and his modern disciple David Brin has explained at greater length, the Orwellian nightmare state is impossible when the Panopticon works both ways – when the government itself is watched – when the surveillor knows himself to be surveilled.  Still, our intuitions rebel, but we are unable to respond to Bentham’s utilitarian logic.

So let us return again to the South Cloisters of University College and the question we began with: what could have possessed Bentham to do what he did with his last remains?  The answer seems to be compelled by the same bloodless logic the man applied in all other aspects of his life.  Bentham, the great apostle of publicity, rejected even the privacy of the grave – he remains the eternal observer, continuing his surveillance of the living from his perch among the dead.

Further reading:

OF PUBLICITY AND PRIVACY, AS APPLIED TO JUDICATURE IN GENERAL, AND TO THE COLLECTION OF THE EVIDENCE IN PARTICULAR. – Jeremy Bentham, The Works of Jeremy Bentham, vol. 6 [1843]

 Bentham’s Panopticon Letters

Peter A. Winn has served as an Assistant U.S. Attorney in the United States Department of Justice since 1994. He is also is an part-time lecturer at the University of Washington Law School where he teaches privacy law and health care fraud and abuse, and is a Senior Fellow at the University of Melbourne where he teaches cybercrime.  The views represented in this article are Mr. Winn’s personal views and not those of the United States Department of Justice.

VoxPopuLII is edited by Judith Pratt.