VoxPopuLII

Open Sesame

May 232012

For some time, Open Access has been a sort of gnat in my office, bugging me periodically, but always just on the edge of getting my full attention. Perhaps due in large part to the fact that our journals simply cost much less than those in other disciplines, law librarians have been able to stay mostly on the outside of this discussion. The marketing benefits of building institutional repositories are just as strong for law schools as other disciplines, however, and many law schools are now boarding the train — with librarians conducting. If you’re new to the discussion of Open Access in general, I suggest Peter Suber’s Open Access Overview for an excellent introduction. This piece is meant to briefly summarize the goals, progress, and future of OA as it applies (mostly) to legal scholarship.

Background and History
Open Access is not merely the buzzword of the moment: Open Access, or OA, describes work that is free to read, by anyone. Though usually tied to discussions of Institutional or Scholarly Repositories, the two do not necessarily have to be connected. Publications can be made “open” via download from an author’s institutional or personal home page, a disciplinary archive such as SSRN or BePress, or through nearly any other type of digital collection – so long as is it provided for free. For readers, free should mean free of cost and free of restrictions. These are sometimes described as gratis OA and libre OA, respectively. As Peter Suber notes, “Gratis OA is free as in beer. Libre OA is free as in beer and free as in speech.”

In addition to the immediate benefits of OA for researchers and for libraries (who would save a great deal of money spent on collections), strong ethical arguments can be made for OA as a necessary public service, given the enormous public support of research (tax dollars). The argument sharpens when research is explicitly supported by Federal or other grant funds. Paying to access grant-funded work amounts to a second charge to the taxpayer, while private publishers profit.

Of course, OA wasn’t an option with print resources; while anyone is “free” to go to a library that subscribes to a journal and read it, physical location itself is a barrier to access. In the networked digital environment, physical location need not be a barrier anymore. For members of the scholarly community who wish to share and discuss work with each other, that might be the end of the story. But while the technology is mature, policies and politics are still developing, and fraught with challenges posed largely by rights holders with significant financial interests in the current publishing system. One vocal segment of that market raises economic objections based on their financial support of the peer review process and other overhead costs related to production and dissemination of scholarly research. Since publishers control the permissions necessary to make OA work most fully, their opposition frustrates the efforts of many OA advocates. Not all publishers are invested in erecting barriers to OA, though; see, e.g., the ROMEO directory of publisher copyright policies and self-archiving. Though some impose embargo periods before posting, many publishers across disciplines allow deposit of the final published version of work.

In the midst of this conflict, many OA proponents acknowledge that production of scholarship is not without costs; Old Faithful didn’t start spouting Arrogant Bastard Ale one bright morning. Separate from the mechanism for sharing the Open Access version of an article, there are charges associated with its production that must be supported. The OA movement seeks a new model for recuperating these costs, rather than eliminating the costs altogether.

Interoperability
So, the “open” part of Open Access is roughly equivalent to “free” (for the reader), which presents economic challenges that remain to be solved. What about the “access” part?

Access to physical literature was largely a matter of indexing and physical copies; inclusion in the leading index(es) of a field was an honor (and potential economic advantage) to journals. Collection development decisions used to be made based in part on whether a journal was indexed. Access to online literature requires more than simply the digital equivalent in order to sufficiently serve the community, though: both the ability to download the article, and the ability to search across the literature are required for researchers to effectively manage the volume of literature.

As a foundational matter, openness in scholarly communication requires a certain amount of interoperability between the archives that serve up scholarship. The Open Archives Initiative (OAI) develop standards to promote interoperability between archives. Such standards support harvesting and assembling the metadata from multiple OAI-compliant archives to facilitate searching and browsing across collections in an institution, field, or discipline.

Paths to OA
One repeated practical question around Open Access is logistical: Who will build the archive, and how will it be populated on a regular basis? There are several models for implementing Open Access. Disciplinary Archives, Institutional or Unit/Departmental Repositories, and Self Archiving are all paths that can be taken, somewhat separate from publishers’ progress towards OA.

Disciplinary repositories are somewhat common around the academic community: PLoS & PubMedCentral, for example, provide access to a large collection of works in Science and Medicine. Like SSRN/LSN, they provide a persistent, accessible host for scholarship, and searchable collection for new papers in the field. One difference in the legal community is in the primary publishing outlets: for most law faculty, the most prestigious placement is in a top-20 law school-published law journal. These journals vary on their OA friendliness, but many faculty read their agreements in such a way to allow this sort of archiving. SSRN has thus provided a low bar for legal scholars to make their work available openly. SSRN also provides a relatively simple, if not entirely useful, metric for scholarly impact in appointments and in promotion and tenure discussions. As of last checking, SSRN’s abstract database was at 395k+, and their full text collection at 324k.

Institutional or Unit/Departmental Repositories (IRs) are also becoming a popular choice for institutions seeking to promote their brand, and to increase the profile of their faculty. A variety of options are available for creating an IR, from open-source hosting to turnkey or hosted systems like BePress’ Digital Commons. Both avenues tend to offer flexibility in creating communities within the IR for subjects or other series, for handling embargoes and other specialized needs. BePress’ Digital Commons, for example, can serve as an IR and/or a publishing system for the peer-review and editing process. As a path to Open Access, the only barriers to IRs are institutional support for the annual licensing/hosting fee and some commitment of staff for populating the IR with publications (or facilitating, if authors will self-archive).

Self-archiving represents an appeal directly to authors, who are not the tough sell that publishers tend to be. As Suber notes, the scholarly publishing arena lacks the economic disincentives to OA normally present for authors. Scholarly law journal articles, the bread and butter of the legal academy, do not produce royalties, so authors have nothing to lose from making their work available in OA platforms. One route to OA, therefore, is self-archiving by researchers. But while they might support OA in principle, researchers’ own best interests may push them to publish in “barrier-based” journals to protect their tenure and grant prospects, despite the interests of both the public and their own scientific community in no-cost, barrier-free access.

What about mandates as part of the path to OA? Recently, some academic institutions and grant agencies have begun instituting some form of mandate of open access publication. The NIH mandate, for example, implemented in 2008, requires deposit in PubMed Central within twelve months of publication for the results of any of their funded research. Others have followed, including Harvard Law School. As a path to OA, both are useful, though funder mandates alone wouldn’t hit enough of the literature to make a difference in terms of access for researchers. Institutional mandates, however, just might:

“When complemented by funding agency and foundation public-access mandates that capture the work originating with industry and government researchers who may not have faculty status, university mandates will, in time, produce nearly universal access to all the scientific literature.”

— David Shulenberger

ROARMAP tracks these mandates and the directed repositories for each. Though other universities and departments have instituted mandates, the 2008 Harvard Law mandate is notable for having originated with the faculty:

“The Harvard Law School is committed to disseminating the fruits of its research and scholarship as widely as possible. In keeping with that commitment, the Faculty adopts the following policy: Each Faculty member grants to the President and Fellows of Harvard College permission to make his or her scholarly articles and to exercise the copyright in those articles. More specifically, each Faculty member grants to the President and Fellows a nonexclusive, irrevocable, worldwide license to exercise any and all rights under copyright relating to each of his or her scholarly articles, in any medium, and to authorize others to do the same, provided that the articles are not sold for a profit. The policy will apply to all scholarly articles authored or co-authored while the person is a member of the Faculty except for any articles completed before the adoption of this policy and any articles for which the Faculty member entered into an incompatible licensing or assignment agreement before the adoption of this policy. The Dean or the Dean’s designate will waive application of the policy to a particular article upon written request by a Faculty member explaining the need.”

Federal Input
Two recent bills dealt with open access: FRPAA, which would mandate OA for federally-funded research; and the Research Works Act (RWA), which would have prohibited such mandates. RWA (HR 3699) was withdrawn in late February of 2012, following Elsevier’s withdrawal of support. Its sponsors issued this statement:

“As the costs of publishing continue to be driven down by new technology, we will continue to see a growth in open-access publishers. This new and innovative model appears to be the wave of the future. … The American people deserve to have access to research for which they have paid. This conversation needs to continue, and we have come to the conclusion that the Research Works Act has exhausted the useful role it can play in the debate.”

FRPAA (HR 4004 and S 2096), on the other hand, is intended “to provide for Federal agencies to develop public access policies relating to research conducted by employees of that agency or from funds administered by that agency.” FRPAA would require any agencies with expenditures over $100 million annually to make manuscripts of the articles published from their funding public within six months of publication – FRPAA puts the burden/freedom on each agency to maintain an archive or draw on an existing archive (e.g., PMC). Each agency is free to develop their own policy as fits their needs (and perhaps their researchers’ needs). The bill also gives the agency a nonexclusive license to disseminate the work, with no other impact on copyright or patent rights. The bill also requires that the agency have a long-term preservation plan for such publications.

Copyright Tangles
How does copyright limit the effectiveness of mandates and other archiving? Less than the average law librarian might imagine. Except where an author’s publishing agreement specifies otherwise, the scholarly community generally agrees that an author holds copyright in his or her submitted manuscript. That copy, referred to as the pre-refereeing preprint, may generally be deposited in an Institutional repository such as the University of Illinois’ IDEALS, posted to an author’s/institution’s SSRN or BePress account, or to their own personal web page.

Ongoing Work
ARL/SPARC encourages universities to voice their approval and support of FRPAA. Researchers around the academy are beginning to show support as well: research has indicated that researchers would self-archive if they were 1) informed about the option, and 2) permitted by their copyright/licensing agreements with publishers to do so. With greater education about the benefits of Open Access for the institution as well as the scholarly community, authors could be encouraged to make better use of institutional and other archives.

In the legal academy, scholarly publishing is somewhat unusual. The preprint distribution culture is strong, and the main publishing outlets are run by the law schools – not by large, publicly-traded U.S. and foreign media corporations. Reprint permission requests are often handled by a member of the law school’s staff – or by a law student – and it’s unclear how much the journals know or care about republication or OA issues in general. But authors and their home institutions aren’t necessarily waiting around for answers; they’re archiving now, and taking down works later if asked. Carol Watson and James Donovan have written extensively about their experience with building and implementing an institutional repository at the University of Georgia, using the Berkeley Electronic Press Digital Commons software. See, e.g., Institutional Repositories: Essential Infrastructure for Scholarship in the Digital Age, Carol A. Watson, James M. Donovan, and Pamela Bluh; White Paper: Behind a Law School’s Decision to Implement an Institutional Repository, James M. Donovan and Carol A. Watson; and Implementing BePress’ Digital Commons Institutional Repository Solution: Two Views from the Trenches, James M. Donovan and Carol A. Watson.

Conclusion The bottom-line is, whether you’re an author or a librarian (or some other type of information/knowledge professional), you should be thinking about current and future access to the results of research — and the logistical/economical/political challenges — whether that research is happening in law or elsewhere in the academy.

Stephanie Davidson is Head of Public Services at the University of Illinois in Champaign. Her research addresses public services in the academic law library, and understanding scholarly research methods and behavior.

VoxPopuLII is edited by Judith Pratt. Editors-in-Chief are Stephanie Davidson and Christine Kirchberger, to whom queries should be directed. The information above should not be considered legal advice. If you require legal representation, please consult a lawyer.

Keeping and Deleting Patron Records in Law Libraries

Data Deletion, Data Privacy, Law librarians, privacy 3 Responses »

Apr 152012

As researchers use materials in libraries, their actions tend to generate records—research trails in digital databases, lists of borrowed books, and correspondence with librarians. Most of the time, these records are innocuous, but to facilitate freedom of inquiry, librarians generally hold these records as confidential. This confidentiality is especially important in law libraries because legal matters can be very sensitive and stressful. Researchers implicitly trust librarians with at least hints of concerns the researchers would prefer not be generally known. If researchers knew any records of their questions could become known to others, some researchers would avoid using library collections or asking librarians for advice, guidance that very well may help them find valuable information.

In her interesting post, Meg Leta points out that, despite some exhortations that information on Web lasts forever, most information now online will disappear at some point. Websites go down when their owners fail to pay hosting fees. Data is deleted, either by purpose or mistake. A file sitting on a drive or disc will, without maintenance, eventually becomes inaccessible because the storage media has decayed or because the hardware and software needed to read the file has become obsolete. Since information will tend to vanish without action on our part, Leta suggests we should instead focus on actively saving information that is worth keeping.

Leta makes an excellent point, but I’d suggest that in addition to thinking carefully about what information needs to be kept, legal professionals also should consider whether certain types of information warrant purposeful destruction. I’d also suggest that for law libraries, patrons should be given the ability to retain, either through the library or themselves, records of their use of library resources.

Leaving Breadcrumbs Along the Research Trail

Just as most web browsers keep a history of websites visited and search engines retain logs of search terms, law libraries and their vendors maintain records of some researcher interactions with library resources and staff. A very thorough researcher could generate records by using web browsers on library computers, writing to library staff, borrowing books, and accessing databases that require individual user accounts. Many of the major legal databases, such as Westlaw, LexisNexis, and Bloomberg Law require users to log in and maintain individual research trails.

Just as Leta said, most of these records will be destroyed over time through the library’s and vendors’ normal procedures. The library computers probably are set to erase their browser histories every so often, and most integrated library systems delete circulation records once books are returned. Legal databases keep research trails, but generally those trails eventually expire. However, the vendors also keep server logs and track users with cookies; those records probably are deleted at some point, but probably later than when users lose access to their research trails. Any written records the librarians keep of patron interactions might be covered by an organizational records retention schedule; if not, they are kept at the whim of the librarian.

So this appears to be the present situation: law libraries and their vendors collect a variety of records about their patrons’ research. Through normal business processes, much of those records is eventually discarded. Depending on the researcher’s circumstances, the records may be sensitive, and librarians generally strive to keep all such records confidential as a matter of professional ethics. Is there anything in the status quo worth changing?

Retaining Information has Risks and Benefits

Almost all the records libraries keep about their patrons have a purpose. Circulation records are kept so libraries do not lose materials and to make usage statistics. Vendors keep research trails so researchers can retrace their steps and know how their products are being used. After a certain period of time, these records are generally not needed for those reasons.

While records are needed for important reasons, keeping them also involves risk of harming researchers. The most serious risks are that a researcher’s sensitive legal research records will be revealed to others who should not have that information and that the records will be used for a purpose different from the one for which the information was originally collected. I imagine law libraries are not high-priority targets for criminals and government agents, but then again, library databases and email systems are probably not equipped with state-of-the-art security. Certainly the longer records are retained, the more opportunity there is for security to be compromised.

It is easier to imagine a scenario in which library records are used for a new purpose. Database vendors could decide to use research histories to market products to researchers. This seems possible for law students and attorneys. Publishers could seek to use library or database records to help track researchers committing copyright infringement. I have not heard of any recent attempts by law enforcement to obtain law library records and it is hard to fathom what relevance the records would have to any investigation. On the other hand, the government has sought library records before.

These risks that library records might be wrongly disclosed or misused exist while the records are useful, but the beneficial intended uses of the records outweigh the risks. Once that need has ended, though, there is no justification for keeping the records. The minimize risks to patrons, libraries should determine how long they need certain types of records and then destroy the data as soon as it is not required.

On the other hand, records of research activity can be used to benefit patrons. Surely many researchers could use a list of every book they have borrowed, or a research trail that covers multiple databases. Perhaps software could be developed that would analyze research histories to help make data-driven collection development decisions or recommend new books and articles to faculty and students. Services like this might require keeping patron records for quite some time.

Librarians thinking of future historians might suggest that patron records should be kept in some form so on ancestors can have a better understanding of how we conducted research and to look into the thought processes of significant legal scholars.

Giving Patrons Greater Control of Their Records

How these risks and benefits weigh against each other depends to a great deal on the researcher’s circumstances. For many faculty and students, the privacy of their library records is not a matter of great concern. For attorneys and private citizens (and faculty and students when conducting research on their personal legal matters), privacy is very important, and if they knew of a risk that their records might be used in unexpected ways, they may reduce their use of library resources, or be deterred from using the library altogether.

I suggest law librarians seek to give researchers greater control over their library records. Records should be retained for the absolute least amount of time needed for providing the services for which the data was collected. After that time, the records should be rendered totally irretrievable or reduced to anonymous statistics that cannot be traced to any individual. However, before the records are destroyed, they should be easily accessed and saved by the researcher for her own use. Researchers that choose this option can then keep their records as they see fit, just as they can download bank statements and export their financial transactions to personal money management software.

Below are suggestions for how this might be done.

Make a privacy policy and records retention schedule — Each library should publish a privacy policy that describes how the library collects and retains records of patron interactions. Each library should also make a records retention schedule that details how long each type of record is kept and how researchers can obtain a copy of their records before they are destroyed. Many researchers may choose not to download their records, but in that case the data will be destroyed as soon as it is not needed. The default option is most protective of patron privacy.

Make records easy to obtain and use — Researchers who wish to save their records should be able to more easily obtain them in a format that is compatible with software that organizes, searches, and retrieves the records. For instance, borrowing histories and database research trails could provide citations of accessed materials that are compatible with citation management software like Zotero, citeulike, and Mendeley. Since most integrated library systems and journal databases are provided by vendors, the best librarians can do is urge vendors to add these functions and subscribe to products that allow privacy-protecting defaults while also giving patrons access to their records.

Convince vendors to do the same — Libraries license most of the systems used to catalog and provide access to their collections. Protecting researcher privacy and providing patron access to their records will require the cooperation of vendors. Librarians should ask vendors to publish privacy policies that tell researchers what records are collected and how long they are retained, and encourage development of software that will give patrons copies of their records that are compatible with leading research management software.

For further reading on records destruction and privacy, I suggest Daniel Solove’s Understanding Privacy (Harvard University Press, 2008) and Viktor Mayer-Schonberger’s Delete: The Virtue of Forgetting in the Digital Age (Princeton University Press, 2009).

Benjamin Keele is a reference librarian at the Wolf Law Library of the William & Mary Law School. He earned a J.D. and M.L.S. from Indiana University. His research interests include copyright, privacy, and scholarly publishing. His website is benkeele.com.

[Editor’s Note] For topic-related VoxPopuLII posts please see: Meg Leta Ambrose, Accounting for Informatics in the “Right to be Forgotten” Debate.

Envitool: Helps companies comply with EHS and CSR

Electronic legal publishing, Legal publishing, legal translation, Multilingual legal information retrieval No Responses »

Mar 302012

The Swedish legal publisher Notisum AB has been on the Swedish market for online legal publishing since 1996. Our Internet-based law book at www.notisum.se is read by more than 50,000 persons per week and our customers range from municipalities and government institutions to Swedish multinationals.

Now we are heading for China, and I would like to share with you some practical experiences from this highly dynamic market and our challenges in trying to conquer it.

The case for a legal monitoring tool, codenamed “EnviTool”

In close co-operation with our customers, we had developed a set of specialized Internet based tools in Sweden for supporting the process of legal compliance and legal information sharing within big organizations. The key driver of these needs was the growing number of certificates according to the international environmental management standard ISO 14001:2004.

ISO 14001 is a worldwide industry standard to help companies to improve their environmental performance through the implementation of an environmental management system. There is much to say about management systems. Continuous improvement is the heart of the matter–it is all about doing the right things right. Establish a plan, do what you planned, check your results and then start all over by correcting your plans. Plan, Do, Check, Act.

According to the standard, you have to identify the relevant environmental legislation for your organization. You need access to those laws and regulations, and you have to keep an updated list. You should also make the information available to the people of your organization.

By providing an online legal register, monitored for changes, with a whole set of information sharing and workflow features, Notisum helps the certified companies to comply with the environmental legislation.

We developed this system step by step. When it came to going outside the borders of the Kingdom of Sweden, we changed the name from Rättsnätet+Miljö to EnviTool.

The case for China

Sweden is a country of very high penetration of the ISO 14001 standard, and the use of the standard is in a mature phase in most organizations. China, on the other hand, is number one in the world, with more than 70,000 certificates issued. The growth is double-digit. So China is the place to be if you have products for this specific customer group. The users of the standard are yet immature in China, so we knew there were some challenges out there.

The market for legal information tools is overall immature in China and legal compliance is not always on top of the manager’s priority lists. However, Notisum took the first steps, starting in 2009, to take on the challenge to make China our second home market. Many challenges, expected and unexpected, were waiting for EnviTool.

Step one – the product

Like many commercial ventures, the EnviTool project was the result of a randomly started chain of events. Our Swedish CEO was playing golf with a professor at KTH, the Royal Institute of Technology in Stockholm. The professor was in charge of a student exchange program between National University of Singapore (NUS) and KTH. We were asked to host an internship for an ambitious computer science student in our company for one academic year.

The internship was successful, our student was doing a great job and we learned a lot about Asia and the Chinese culture. We have now hosted three excellent NUS students from Singapore, all good representatives of their university and their country. And all of them bilingual English and Chinese. That’s when we decided that China would be an interesting market to try. And yes – China is far away from Sweden, it is terribly big and it was really too large a challenge for our company. We wanted to try anyway, with the hope that Singapore could be the bridge for us.

We decided to start a subsidiary in Singapore and so we did. It is easy, by the way. According to the World Bank, Singapore ranks number one in the world in ease of doing business. Coming from Sweden, ranked number 50 in the world in terms of how easily you pay your taxes, I had an almost religious moment when we got a letter of gratitude from the Singaporean tax authorities after paying our taxes. Not so in Sweden, I may add…

With the first NUS intern now as our first employee, we started translating and adapting our internet tool together with our development manager in Sweden. The technological challenges were there, of course. We base our technology on the Microsoft.NET platform, but the support for the simplified Chinese character set was not totally implemented everywhere. Multi-language support was developed, and plenty were the occasions in the beginning when Swedish words popped up unexpectedly. The search function in Chinese is different in EnviTool and the relations between the legislative documents were so different from the Swedish and European law that we had to re-design our database structure.

Step two – the market research

With good help from the Swedish Trade Council in China, we did market research to see if there could be a similar market in China and if our business model could work.

After three journeys and two projects together with the trade council, we decided to give it a try. The EnviTool China project was about to take off. Learning to eat properly with chopsticks was part of the experience. Learning to appreciate the Chinese food was easier although there are some zoological challenges there too, outside the scope of this blog entry.

At this point in time we also employed a Chinese/Swedish project manager with extensive knowledge and experience in the field.

Step three – the content

Translating the tool to Chinese and English was the easy part. When it came to the content, we had to throw out everything from Sweden and put in Chinese legislation and comments. We soon found interesting challenges.

Our first experience of the Chinese legal tradition,which is in many ways different from where we come from, was the search for a standard for citations. In the Swedish databases we had successfully used computer software to automatically find citations, law titles, cross references and other document data. It became clear to us that there were no shortcuts in the Chinese material. We had to input all data manually.

We decided to restrict the information to cover relevant legislation in the EHS (Environmental, Health & Safety) and CSR (Corporate Social Responsibility) field and to concentrate on the national level with some provincial/municipal areas like Beijing and Shanghai. The EHS/CSR users are professionals in their field of work and their industries. They are not lawyers and not very used to legal information systems. EnviTool were developed with EHS/CSR managers in our minds. We wrote the editorial content to suit the needs of our target audience.

We realized that we needed a partner in China to provide fast and timely information. In ChinaLawInfo, established by Peking University in association with the university’s Legal Information Center, we found a great partner. They are the most important legal information provider in China and we saw that Notisum of Sweden and ChinaLawInfo had many similarities in experience and way of working. Yes, we are small and they are big, but that goes for Sweden and China all over. So EnviTool now provides the EHS/CSR laws and regulations from both ChinaLawInfo and government sources. We also have an on-going editorial co-operation in Beijing.

By now we also had good content. The EnviTool Internet service and database, provided from our Singapore company servers, were released in its first version in the fall of 2010.

Step four – market introduction

If company start-up was a short track in Singapore, it was a longer journey in the world’s second biggest economy. After having tried 50 other names, Envitool finally was translated to 安纬同 in Chinese and we got the business permit in August 2011.

We employed the people we needed and found a partner to help us with HR and finance issues. Since then we have started our sales and marketing activities, moving slowly forward. The use of legal information tools served from Singapore is combined with management consulting from our team in Shanghai. We provide training in using the tool and can assist the clients in finding the laws and regulations relevant to their operation.

The second generation of the site is up and running at www.envitool.com and we are proud to have customers from China, the US, Japan and four different European countries.

What we have learned and what we think of the future

To get to know China and the Chinese people is of course one part of the fun. Being a European, you make many mistakes, sometimes because of language, sometimes cultural.

One example of this confusion was when I intervened in the editorial process. In EnviTool we provide bi-lingual Chinese/English short and long comments to laws and regulations. In the Swedish service, which I am more familiar with, the short comment is rendered in italics with the longer comment below in plain text. In the English version of the comments in EnviTool, the short one was not in italics. I complained and our programmer quickly changed this. Shortly thereafter, at a customer meeting, I showed the comments, now in Chinese language version. (I don’t understand a word of Chinese.) Can you imagine Chinese characters in italics? I can tell you, it makes no sense and it looks bad. That was the language mistake. The cultural mistake was managerial. A Swedish employee would have told me how stupid I were, if I came up with such a bad idea. The Asian employee (highly intelligent and highly educated) probably saw the problem and maybe thought “the boss is more stupid than usual, but he is my boss so I have better do what he tells me!”. A lot to learn, many aspects to consider.

To conclude, the start-up was a bit slow because of the red tape but so far, our government contacts have been smooth. We have felt very welcome at the Chinese authorities like the Ministry of Environmental Protection and local governments. In the end, our goals are similar: better environmental and occupational health & safety legal compliance – better environment and better life for the citizens.

We know it will take a long time for us to get the knowledge and experience needed to be a significant player in the Chinese market, and we are prepared to stay there and step by step build our presence. It took many years to build a loyal and substantial customer base in Sweden. It will take even longer in China.

Magnus Svernlöv is the founder and chairman of the Swedish online legal publisher Notisum (www.notisum.se) and its Chinese subsidiary Envitool (www.envitool.cn). He holds an MBA from INSEAD, France, a MScEE degree from Chalmers University of Technology, Sweden and a BA from the School of Business, Ecnomics and Law, University of Gothenburg, Sweden. He welcomes any comment or feedback to ms@notisum.se

The Need to Demystify Legal Relevance

information retrieval, Legal citations, Legal knowledge representation 1 Response »

Mar 172012

“To be blunt, there is just too much stuff.” (Robert C. Berring, 1994 [1])

Law is an information profession where legal professionals take on the role of intermediaries towards their clients. Today, those legal professionals routinely use online legal research services like Westlaw and LexisNexis to gain electronic access to legislative, judicial and scholarly legal documents.

Put simply, legal service providers make legal documents available online and enable users to search these text collections in order to find documents relevant to their information needs. For quite some time the main focus of providers has been the addition of more and more documents to their online collections. Quite contrary to other areas, like Web search, where an increase in the number of available documents has been accompanied by major changes in the search technology employed, the search systems used in online legal research services have changed little since the early days of computer-assisted legal research (CALR).

It is my belief, however, that the search technology employed in CALR systems will have to dramatically change in the next years. The future of online legal research services will more and more depend on the systems’ ability to create useful result lists to users’ queries. The continuing need to make additional texts available will only speed up the change. Electronic availability of a sufficient number of potentially relevant texts is no longer the main issue; quick findability of a few highly relevant documents among hundreds or even thousands of other potentially relevant ones is.

To reach that goal, from a search system’s perspective, relevance ranking is key. In a constantly growing number of situations – just like Professor Berring already stated almost 20 years ago (see above ) – even carefully chosen keywords bring back “too much stuff”. Successful ranking, that is the ordering of search results according to their estimated relevance, becomes the main issue. A system’s ability to correctly assess the relevance of texts for every single individual user, and for every single of their queries will quickly become – or has arguably already become in most cases – the next holy grail of computer-assisted legal research.

Until a few years back providers could still successfully argue that search systems should not be blamed for the lack of “theoretically, maybe, sometimes feasible” relevance-ranking capabilities, but rather that users had to be blamed for their missing search skills. I do not often hear that line of argumentation any longer, which certainly does not have to do with any improvement of (Boolean) search skills of end users. Representatives of service providers do not dare to follow that line of argumentation any longer, I think, because every single day every one of them uses Google by punching in vague, short queries and still mostly gets back sufficiently relevant top results. Why should this not work in CALR systems?

Indeed. Why, one might ask, is there not more Web search technology in contemporary computer-assisted legal research? Sure, according to another often-stressed argument of system providers, computer-assisted legal research is certainly different from Web search. In Web search we typically do not care about low recall as long as this guarantees high precision, while in CALR trading off recall for precision is problematic. But even with those clear differences, I have, for example, not heard a single plausible argument why the cornerstone of modern Web search, link analysis, should not be successfully used in every single CALR system out there.

These statements certainly are blunt and provocative generalizations. Erich Schweighofer, for example, has already even shown in 1999 (pre-mainstream-Web), that there had in fact been technological changes in legal information retrieval in his well-named piece “The Revolution in Legal Information Retrieval or: The Empire Strikes Back”. And there have also been free CALR systems like PreCYdent that have fully employed citation-analysis techniques in computer-assisted legal research and have thereby – even if they did not manage to stay profitable – shown “one of the most innovative SE [search engine] algorithms“, according to experts.

An exhaustive and objective discussion of the various factors that contribute to the slow technological change in computer-assisted legal research can certainly neither be offered by myself alone nor in this short post. For a whole mix of reasons, there is not (yet) more “Google” in CALR, including the fear of system providers to be held liable for query modifications which might (theoretically) lead to wrong expert advice, and the lack of pressure from potential and existing customers to use more modern search technology.

What I want to highlight, however, is one more general explanation which is seldom put forward explicitly. What slows down technological innovation in online legal research, in my opinion, is also the interest of the whole legal profession to hold on to a conception of “legal relevance” that is immune to any kind of computer algorithm. A successfully employed, Web search-like ranking algorithm in CALR would after all not only produce comfortable, highly relevant search results, but would also reveal certain truths about legal research: The search for documents of high “legal relevance” to a specific factual or legal situation is, in most cases, a process which follows clear rules. Many legal research routines follow clear and pre-defined patterns which could be translated into algorithms. The legal profession will have to accept that truth at some point, and will therefore have to define and communicate “legal relevance” much less mystically and more pragmatically.

Again, also at this point, one might ask “Why?” I am certain that if the legal profession, that is legal professionals and their CALR service providers, do not include up-to-date search technology in their CALR systems, someone else will at some point do so without the need for a lot of involvement of legal professionals. To be blunt, at this point, Google can still serve as an example for our systems, at some point soon it might simply set an example instead of our systems.

Anton Geist is Law Librarian at WU (Vienna University of Economics and Business) University Library. He law degrees from University of Vienna (2006) and University of Edinburgh (2010). He is grateful for feedback and discussions and can be contacted at home@antongeist.com.

[1] Berring, Robert C. (1994), Collapse of the Structure of the Legal Research Universe: The Imperative of Digital Information, 69 Wash. L. Rev. 9.

[Editor’s Note] For topic-related VoxPopuLII posts please see: Núria Casellas, Semantic Enhancement of legal information … Are we up for the challenge?; Marcie Baranich, HeinOnline Takes a New Approach to Legal Research With Subject Specific Research Platforms; Elisabetta Fersini, The JUMAS Experience: Extracting Knowledge From Judicial Multimedia Digital Libraries; João Lima, et.al, LexML Brazil Project; Joe Carmel, LegisLink.Org: Simplified Human-Readable URLs for Legislative Citations; Robert Richards, Context and Legal Informatics Research; John Sheridan, Legislation.gov.uk

Accounting for Informatics in the "Right to be Forgotten" Debate

Data Privacy, privacy 3 Responses »

Feb 232012

Editor’s note: This is the first in a 2-part series on issues of content permanence. Benjamin Keele of the William and Mary Law Library will be writing on data deletion principles for VoxPopuLII in April.

A Future Full of the Past?
The current consensus seems to be that information, once online, is permanent. The Disney Channel runs a PSA warning kids to be careful what they put online because “You’re leaving a permanent (and searchable) record any time you post something.” Concerns about content permanence have led many European countries to establish a legal “Right to be Forgotten” to protect citizens from the shackles of the past presented by the Internet. The prospect of content adjustment in the name of privacy has exposed cultural variations on perspectives of the global village[1]. In Europe, the “Right to be Forgotten” has gained traction as a legal mechanism for handling such information issues and has been named a top priority by the European Union Data Privacy Commission. This right essentially transforms public information into private information after a period of time, by limiting the access to third parties, “[T]he right to silence on past events in life that are no longer occurring.”[2] What in Italy and France is called oblivion, however, is controversial and has been called “rewriting history”, “personal history revisionism”, and “censorship” in the U.S.

Benjamin Keele of the William and Mary Law Library has previously addressed the “data” aspect of the “Right to be Forgotten” debate, outlining data deletion principles for organizations privately holding user information — the footprints we leave behind as we interact with sites and devices. This post, and my research generally, focuses on the “information” element in the debate: the content a user posts to the Web. The question often posed in this debate is whether an individual should have the right to manipulate or access content about his or her past that is generated through search engine results. But, this is actually the wrong question. Information Science research tells us that permanence is not a reality and it may never be. Information falls off the Web for many reasons. The right question to ask is, “what information should we actively save, and what information should we allow to fade, particularly when it harms an individual?” Fortunately, Information Science research offers wisdom for answering as well as framing this question.

The Information Preservation Paradox

In this age, “[l]ife, it seems, begins not at birth but with online conception. And a child’s name is the link to that permanent record.” You are what Google says you are, and expectant parents search prospective baby names to help their kids yield top search results in the future. Only a few rare parents want their children to be lost in a virtual crowd, but is infamy preferable? In 2003, A Canadian high school student unwittingly became the Star Wars Kid, and according to Google, he still is as of 2011. A New England Patriots cheerleader was fired for blog content, a Millersville University student teacher was not allowed to graduate because of images on Facebook, and UCLA sophomore Alexandra Wallace quit school and made a public apology for a racist video she posted on YouTube that spurred debate online about a university’s authority to monitor or regulate student speech. Though discoverable through public Google searches, the posted content offered little in the way of context or truth about the owner’s character. In 1992, John Venables and Robert Thompson viciously murdered a 2 year-old and became infamous online and off as the youngest people ever to be incarcerated for murder in English history.

These stories deserve varying levels of sympathy but are all embarrassing, negative, and lead the subjects to want to disconnect their names from their past transgressions to make such information more difficult to discover when interviewing for a job, college, or first date. Paradoxically, the only individuals who have been offered oblivion are the two who committed the most heinous social offense: Venables and Thompson were given new identities upon their release from juvenile incarceration. It may actually be easier for two convicted murderers to get a job than it is for Alexandra Wallace.

This paradox is one of many that result from an incomprehensive and distorted conception of information persistence. The real problem with new forms of access to old information is that without rhyme or reason, much of it disappears while pieces of harmful content may remain. Time disrupts the information system and information values upon which U.S. information privacy law has been based, so we must reassess our views and practices in light of this disruption. Objections to the preservation of personal information may be valid; when content has aged, it becomes increasingly uncontextualized, poorly duplicated, irrelevant, and/or inaccurate. Basic but difficult questions about the role of the Internet in society today and for the future must be answered, and these will be the foundation for resolving disputes that arise from personal information lingering online.

The Crisis of Disappearing Content

Privacy scholars and journalists have embraced the notion of permanence – that we cannot be separated from an identifying piece of online information short of a name change. But information persistence research suggests otherwise – perhaps showing even a decreasing lifespan for content. When articulating the reasons behind the Internet Archive, Brewster Kahle explained that the average lifespan of a webpage was around 100 days. In 2000, Cho and Garcia found that 77% of content was still alive after a day[3]; Brewington estimated that 50% of content was gone after 100 days[4]. In 2003, Fetterly found 65% of content alive after a week[5], and in 2004, Ntoulas found only 10% of content alive after a year[6]. Recent work suggests, albeit tentatively, that data is becoming less persistent over time; for example, Daniel Gomes and Mario Silva studied the persistence of content between 2006 and 2007 and discovered a rate of only 55% alive after 1 day, 41% after a week, 23% after 100 days, and 15% after a year[7]. While all of these studies contained various goals, designs, and methods preventing true synthesis, they all contribute to the well-established principle that the Web is ephemeral[8]. At best, the average lifespan of content is a matter of months or, in rare cases, years — certainly not forever.

The Internet has not defeated time, and information like everything, gets old, decays, and dies, even online. Quite the opposite of permanent, the Web cannot be self-preserving[9]. Permanence is not yet upon us – now is the time to develop practices of information stewardship that will preserve our cultural history as well as protect the privacy rights of those that will live with the information.

Information Stewardship

Old information may be valuable to decision-making or history. The first has been considered by laws like the Fair Credit Reporting Act and database designers with an understanding of the fact that more information does not necessarily or usually result in better quality decisions and that old information may have transformed into misinformation. The second is more difficult: how do we decide what information may be important when we reflect on the past as researchers and historians? Archival ethics, a developed field in library and information science, offers rich insight. The Society of American Archivists have drafted a Code of Ethics that states, “[Archivists] establish procedures and policies to protect the interests of the donors, individuals, groups, and institutions whose public and private lives and activities are recorded in their holdings. As appropriate, archivists place access restrictions on collections to ensure that privacy and confidentiality are maintained, particularly for individuals and groups who have no voice or role in collections’ creation, retention, or public use.”[10]

The Web, of course, does not have a hierarchy to hand down such decisions. It is a bottom-up structure. Therefore, users must find their own inner archivists. They must protect what is important, assess what may be harmful, and take responsibility for the content they contribute to the Web. For a fascinating example of such Web ethics, go to the Star Wars Kid Wikipedia page, and click the “talk” link. You will find that Wikipedia’s biographies of living persons policy has been implemented. This implementation, however, does not prevent the page from being the first listed in Google’s search results for the Star Wars Kid’s real name. There are many other sites that follow some form of archival ethics; many of them limit access to content by altering how private information may be retrieved by a search, either by not offering full-text search functionality on the site (see the Internet Archive) or by using robots.txt to communicate with crawlers that information is off-limits to them (see Public Resource). These access decisions essentially create a card catalog-like system of access to the private information. Library and information scientists have worked with these issues for a very long time. Their expertise is desperately needed as these difficult policy decisions are made at a user, site, network, national, and international level.

[1] Marshall McLuhan, The Gutenberg Galaxy: The Making of Typographic Man (1962).

[2] Georgio Pino, “The Right to Personal Identity in Italian Private Law: Constitutional Interpretation and Judge-Made Rights,” In The Harmonization of Private Law in Europe, M. Van Hoecke and F. Osts (eds.), 237 (2000).

[3] Junghoo Cho and Hector Garcia-Molina, The Evolution of the Web and Implications for an Incremental Crawler, Proceedings of the 26th International Conference on Very Large Data Bases 200-209 (2000).

[4] Brian E. Brewington and George Cybenko, How Dynamic is the Web? Estimating the Information Highway Speed Limit 33 (1-6) Comput. Netw. 257-276 (2000).

[5] Dennis Fetterly, Mark Manasse, Mark Najork, and Janet Wiener, A Large-Scale Study of the Evolution of Web Pages 34(2) Software Practice and Experience 213-237 (2004).

[6] Alexandros Ntoulas, Junghoo Cho, and Christopher Olston, What’s New on the Web? The Evolution of the Web from a Search Engine Perspective, Proceedings of the 13th International Conference on World Wide Web 1-12 (2004).

[7] Gomes and Silva, supra note 4.

[8] Wallace Koehler, A Longitudinal Study of Web Pages Continued: A Consideration of Document Persistence 9(2) Information Research 1 (2004).

[9] Julian Masanes, Web Archiving, at 7 (2006).

[10] Society of American Archivists, “Code of Ethics for Archivists,” at http://www2.archivists.org/statements/saa-core-values-statement-and-code-of-ethics (2011).

Editor’s Note: For topic-related VoxPopuLII posts please see: Robert Richards, Context and Legal Informatics Research.

Meg Leta Ambrose is a doctoral student at the University of Colorado’s interdisciplinary Technology, Media, & Society program. She is a fellow with the computer science department, a research assistant with the law school’s Silicon Flatirons Center, and Provost’s University Library Fellow. She has been awarded the CableLabs fellowship for remainder of her doctoral work. Meg received a J.D. from the University of Illinois in 2008 and can be found at megleta.com.

VoxPopuLII is edited by Judith Pratt.

Editors-in-Chief are Stephanie Davidson and Christine Kirchberger, to whom queries should be directed. The information above should not be considered legal advice. If you require legal representation, please consult a lawyer.

Jurispedia: Perspectives

Crowdsourcing the writing of secondary legal resources, Public access to legal information, User-generated content and legal information, Wikis and law No Responses »

Feb 152012

JurisPedia, the shared law, is an academic project accessible on the Web and devoted to systems of law as well as legal and political sciences throughout the world. The project aims to offer information about all of the laws of every country in the world. Based on a Wiki, JurisPedia combines the facility of contributions on that platform with an academic control of those insertions a posteriori. This international project is the result of a free collaboration of different research teams and law schools[1]. The different websites are accessible in eight languages (Arabic[2], Chinese, Dutch, English, French, German, Spanish and Portuguese). In its seven years of existence, the project has grown to more than 15000 entries and outlines of articles dealing with legal systems of thirty countries.

In 2007, Hughes-Jehan approached my colleagues and I, then running the Southern African Legal Information Institute, to host the English language version of JurisPedia. We were excited at the opportunity to work with JurisPedia to introduce the concept of crowdsourcing legal knowledge to Anglophone universities, where we hoped the concept would fall on fertile ground amongst students and academics.

Any follower of the Wikipedia story will know that the reality is not as simple.

Wikipedia operates on 5 pillars:

Wikipedia is an online encyclopedia;
Wikipedia is written from a neutral point of view;
Wikipedia is free content that anyone can edit, use, modify, and distribute;
Editors should interact with each other in a respectful and civil manner;
Wikipedia does not have firm rules.

In adopting the Wikimedia software, JurisPedia would appear to follow the same principles. There is a significant difference: JurisPedia is not written from a neutral point of view but from a located point of view. Each jurisdiction has a local perspective on their legal concepts. Jurispedia aims to represent the truth in several languages: the law is as it is in a country, not as it could or should be. As a result, we have the bases of a legal encyclopedia representing over 200 legal systems where each concept is clearly identifiable as a part of a national law.

Southern African Perspectives

As for Wikipedia, it is the third pillar which seems to strike terror into the hearts of the legal professionals and academics with whom I have spoken.

When describing the idea to one of the trustees of SAFLII, an acting judge on the bench of the Constitutional Court of South Africa, I was alerted to some difficulties that may lie ahead. She is an exceptional, open-minded and forward-thinking legal mind, but she was cautiously horrified at the prospect of crowdsourced legal knowledge. Her concerns, listed below, were to be echoed by the deans of law schools in South Africa who we approached:

Because there is no formal control over submissions – and therefore their accuracy – JurisPedia cannot be used by students as an official reference tool. Citations linking to JurisPedia will not be accepted in student papers.
Crowdsourced legal information, particularly in common law jurisdictions, runs a high risk of providing an incorrect interpretation of the law.

The overarching concern appears to be that if legal content is made freely available for editing, use, modification and distribution, that the resulting content will be unreliable at best and just plain wrong at worst.

After 7 years online though, there is a substantial amount of feedback about contributors to the project. The open nature of this law-wiki, to which every internet user can contribute, did not lead to a massive surge of uncontrolled and uncontrollable content. On the contrary, although the number of articles continues to grow, it remains reasonable. The subject of the project (only the law), and its academic character has certainly led to a auto-selection of contributors of a higher caliber in legal studies. Many of the contributors are students doing a master or a Ph.D. degree, but they also include doctors, professors and professionals in law, such as lawyers, notaries and judges from more than thirty jurisdictions (and one member of parliament from the Kingdom of Morocco). All these specialists give the project a solid foundation and make it a reality by contributing from time to time as they can. More than 19000 users have subscribed to JurisPedia, and in the past year, more than 1000 people, from Arabic language countries for most part, joined its facebook group.

The JurisPedia content is licensed under a Creative Commons licence that is quite customisable so that the content can be reused for purposes other than commercial purposes. This last point is linked to the authorization of the particular contributor. This is a fair choice in the information society where the digital divide is an important element concerning every international project on the internet: for now, only the most developed jurisdictions have the possibility of using such collective creations in a commercial way. And we take pride in counting contributors from Haiti or Sudan (if you want to use commercially the informations they provide, please, contact them…)

In this context concerns regarding the integrity of the content of JurisPedia become less alarming.

However, I believe that these concerns also represent a misconception of what JurisPedia is and what it can be in the Anglophone, common law, legal context.

Occasionally, it is easier to understand what something is by describing what it is not. JurisPedia is not:

A law report
A law journal
A prescribed legal text book
A law professor
A judge
A lawyer

Let us imagine for a moment that JurisPedia is also not an online portal but a student tutorial group, led by a masters student, an associate or full professor. In the course of the tutorial, a few ideas are put forward, discussed, dissected and amended. Each student (in an ideal world!) leaves the group with a better understanding of a particular point of law which has been discussed. Perhaps the person leading the group has also had occasion to review his or her own position. The group dispurses to research the work further for a more formal submission or interaction.

Now let us imagine that a lay person struggling with a specific legal problem related to what the group has been discussing, is allowed a final, précised description of the law relating to this legal problem prepared by this tutorial group. He or she cannot head into a courtroom armed only with this information, but it may allow them to engage with a legal clinic or lawyer feeling a little less lost.

The thought experiment I describe above describes the read-write meme applied to the legal context. In this meme we encourage an involvement in sharing knowledge amongst legal professionals, academics and students in order to create a body of knowledge about the law accessible by the same as well as by the general public.

The risk of inaccuracies is present in all contexts, printed and online, crowdsourced or expert. A topic for a further blog may be the review of perceived versus actual risk but I would like to use this blog post to propose that the actual risk of inaccuracies can be mitigated by one of two approaches I have considered:

a more active engagement by the legal community and academics in the form of editorial committees; or
through the incorporation of JurisPedia into academic curricula.

My immediate concern with the idea of an editorial committee is that we then begin to morph JurisPedia into what it is not. However, if we can teach students of the law to understand how JurisPedia can be used, and how the concept of self-governance can be applied, then we have created a community of lawyers equipped to deal with a world in which there is some wisdom to the crowds.

The English version of JurisPedia is now hosted by AfricanLII, a project started by some of SAFLII’s founding members and now run as s project of the Southern Africa Litigation Centre. As AfricanLII, we want to help build communities around legal content. We believe that encouraging commentary on the law increases the participation of the people for whom the law is intended and therefore helps to shape what the law should be. JurisPedia represents an angle on this: informed submissions by members (or future members) of the legal community. I have described what JurisPedia is not and alluded to what it could be by way of a thought experiment. I propose that we see JurisPedia as an access point. It may be an access point for a student to assist them to understand a point of law that is opaque to them (including references for further reading); or it may be a way for a lay person to understand a point of law which is currently impacting their lives.

JurisPedia represents a mechanism for bringing relevance in today’s social context to the law. How it is used should be considered creatively by those who could potentially benefit from the legal information diaspora of which it is a part.

Global Perspectives
From the global perspective, JurisPedia gives information about Japanese and Canadian constitutional law in Arabic, information about Indonesian, Ukranian and Serbian law in French. It also gives information about experiments like the “legislative theatre”, born in Brazil and experimented with by actors in France and several other countries. JurisPedia is an international project that should follow some simple and unifying guidelines. This is why we tried from the beginning to eliminate any geographical centralization (in order to inform about law as it is and not as it should be in a certain state). The observation of law in the world is not necessarily connected to the idea of a universal legal system, and – since we like to highlight evidence – law is linked to its culture and can be either more or less[3] similar to our own legal system.

Further, one of the latest enhancements to JurisPedia provides access to the law of 80 countries, by using Google Custom Search on a preselection of relevant websites (see family law in Scotland).

This is why shared law becomes not only a program preventing anybody from ignoring a legal system. On the contrary, JurisPedia will gradually make it possible to appreciate or react to what is done elsewhere, not only in the West but also in the North, East and South[4].

[1] Actually: the Institut de Recherche et d’Etudes en Droit de l’Information et de la Communication (Paul Cézanne University, France); the Faculty of Law of Can Tho (Vietnam); the Faculty of Law at the University of Groningen (Netherlands); the Institute for the Law and Informatics at the Saarland University (Germany); Juris at the Faculty for political and legal sciences at the University of Quebec in Montreal. This list is not definite, the project being absolutely open, especially to research teams and Faculties of Law of southern states.

[2] This arabic version of JurisPedia (جوريسبيديا ) is most of the time managed by Me Mostafa Attiya, member of the Egyptian Bar Association. He made an amazing job and actively participated to build a large arabian legal community on the project.

[3] An animal is often considered to be a movable property. This can be absurd in some societies where the alliance between human and nature is different. History and literature told us often about this kind of astonishment when cultures observe each other (see, concerning criminal law and 900 years ago, Maalouf, Amin. The Crusades Through Arab Eyes, New York: Schocken Books, 1984. (concerning the trials by ordeal during the Frankish period.)

[4] This part was written in Europe…

Hughes-Jehan Vibert is a doctor of Law from the former IRETIJ (Institute of research for the treatment of the legal information, Montpellier University, France) and a research fellow in the Institute of Law and Informatics (IFRI, http://www.rechtsinformatik.de, Germany). He’s ICT project manager for the Network for Legislative Cooperation between the Ministries of Justice of the European Union and also working on a report about the diffusion and access to the law for the International Organization of the Francophonie.

Kerry Anderson is a co-founder of and coordinator for the African Legal Information Institute, a project of the Southern Africa Litigation Center. She has worked variously in web development, research and strategy for an advertising agency, IT startups and financial services corporates.She has a BSc in Computer Science from UCT and an MBA from GIBS. Her MBA dissertation was on the impact of Open Innovation on software research development clusters in South Africa.

[Editor’s Note: For topic-related VoxPopuLII posts please see: Meritxell Fernández-Barrera, Legal Prosumers: How Can Government Leverage User-Generated Content; Isabelle Moncion and Mariya Badeva-Bright, Reaching Sustainability of Free Access to Law Initiatives; and Isabelle Moncion, Building Sustainable LIIs: Or Free Access to Law as Seen Through the Eyes of a Newbie. VoxPopuLII is edited by Judith Pratt.

Protecting Access One Entry at a Time: An Update on the National Inventory of Legal Materials

Access to justice, authentication, digital law, free access to law, Law.gov, Legal citations, Public access to legal information 5 Responses »

Feb 012012

In the fall of 2009, the American Association of Law Libraries (AALL) put out a call for volunteers to participate in our new state working groups to support one of AALL’s top policy priorities: promoting the need for authentication and preservation of digital legal resources. It is AALL policy that the public have no-fee, permanent public access to authentic online legal information. In addition, AALL believes that government information, including the text of all primary legal materials, must be in the public domain and available without restriction.

The response to our call was overwhelming, with volunteers from all 50 states and the District of Columbia expressing interest in participating. To promote our public policy priorities, the initial goals of AALL’s working groups were to:

Take action to oppose any plan in their state to eliminate an official print legal resource in favor of online-only, unless the electronic version is digitally authenticated and will be preserved for permanent public access;
Oppose plans to charge fees to access legal information electronically; and
Ensure that any legal resources in a state’s raw-data portal include a disclaimer so that users know that the information is not an official or authentic resource (similar to what is included on the Code of Federal Regulations XML on Data.gov).

In late 2009, AALL’s then-Director of Government Relations Mary Alice Baish met twice with Law Librarian of Congress Roberta Shaffer and Carl Malamud of Public.Resource.org to discuss Law.gov and Malamud’s idea for a national inventory of legal materials. The inventory would include legal materials from all three branches of government. Mary Alice volunteered our working groups to lead the ambitious effort to contribute to the groundbreaking national inventory. AALL would use this data to update AALL’s 2003 “State-by-State Report on Permanent Public Access to Electronic Government Information” and the 2007 “State-by-State Report on Authentication of Online Legal Resources” and 2009-2010 updates, which revealed that a significant number of state online legal resources are considered to be “official” but that few are authenticating. It would also help the Law Library of Congress, which owns the Law.gov domain name, with their own ambitious projects.

Erika Wayne and Paul Lomio at Stanford University’s Robert Crown Law Library developed a prototype for the national inventory that included nearly 30 questions related to scope, copyright, cost to access, and other use restrictions. They worked with the California State Working Group and the Northern California Association of Law Libraries to populate the inventory with impressive speed, adding most titles in about two months.

AALL’s Government Relations Office staff then expanded the California prototype to include questions related to digital authentication, preservation, and permanent public access. Our volunteers used the following definition of “authentication” provided by the Government Printing Office:

An authentic text is one whose content has been verified by a government entity to be complete and unaltered when compared to the version approved or published by the content originator.

Typically, an authentic text will bear a certificate or mark that conveys information as to its certification, the process associated with ensuring that the text is complete and unaltered when compared with that of the content originator.

An authentic text is able to be authenticated, which means that the particular text in question can be validated, ensuring that it is what it claims to be.

The “Principles and Core Values Concerning Public Information on Government Websites,” drafted by AALL’s Access to Electronic Legal Information Committee (now the Digital Access to Legal Information Committee) and adopted by the Executive Board in 2007, define AALL’s commitment to equitable, no-fee, permanent public access to authentic online legal information. The principle related to preservation states that:

Information on government Web sites must be preserved by the entity, such as a state library, an archives division, or other agency, within the issuing government that is charged with preservation of government information.

Government entities must ensure continued access to all their legal information.

Archives of government information must be comprehensive, including all supplements.

Snapshots of the complete underlying database content of dynamic Web sites should be taken regularly and archived in order to have a permanent record of all additions, changes, and deletions to the underlying data.

Governments must plan effective methods and procedures to migrate information to newer technologies.

In addition, AALL’s 2003 “State-By-State Report on Permanent Public Access to Electronic Government Information” defines permanent public access as, “the process by which applicable government information is preserved for current, continuous and future public access.”

Our volunteers used Google Docs to add to the inventory print and electronic legal titles at the state, county, and municipal levels and answer a series of questions about each title. AALL’s Government Relations Office set up a Google Group for volunteers to discuss issues and questions. Several of our state coordinators developed materials to help other working groups, such as Six Easy Steps to Populating Your State’s Inventory by Maine State Working Group coordinator Christine Hepler, How to Put on a Successful Work Day for Your Working Group by Florida State Working Group co-coordinators Jenny Wondracek and Jamie Keller, and Tips for AALL State Working Groups with contributions from many coordinators.

In October 2010, AALL held a very successful webinar on how to populate the inventories. More than 200 AALL and chapter members participated in the webinar, which included Kentucky State Working Group coordinator Emily Janoski-Haehlen, Maryland State Working Group coordinator Joan Bellistri, and Indiana State Working Group coordinator Sarah Glassmeyer as speakers. By early 2011, more than 350 volunteers were contributing to the state inventories.

Initial Findings

Our dedicated volunteers added more than 7,000 titles to the inventory in time for AALL’s June 30, 2011 deadline. AALL recognized our hard-working volunteers at our annual Advocacy Training during AALL’s Annual Meeting in Philadelphia, and celebrated their significant accomplishments. Timothy L. Coggins, 2010-11 Chair of the Digital Access to Legal Information Committee, presented these preliminary findings:

Authentication: No state reported new resources that have been authenticated since the 2009-2010 Digital Access to Legal Information Committee survey
Official status: Several states have designated at least one legal resource as official, including Arizona, Florida, and Maine
Copyright assertions in digital version: Twenty-five states assert copyright on at least one legal resource, including Oklahoma, Pennsylvania, and Rhode Island
Costs to access official version: Ten states charge fees to access the official version, including Kansas, Vermont, and Wyoming
Preservation and Permanent Public Access: Eighteen states require preservation and permanent public access of at least one legal resource, including Tennessee, Virginia, and Washington

Analyzing and Using the Data

In July 2011, AALL’s Digital Access to Legal Information Committee formed a subcommittee that is charged with reviewing the national inventory data collected by the state working groups. The subcommittee includes Elaine Apostola (Maine State Law and Legislative Reference Library), A. Hays Butler (Rutgers University Law School Library), Sarah Gotschall (University of Arizona Rogers College of Law Library), and Anita Postyn (Richmond Supreme Court Library). Subcommittee members have been reviewing the raw data as entered by the working group volunteers in their state inventories. They will soon focus their attention on developing a report that will also act as an updated version of AALL’s State-by-State Report on Authentication of Online Legal Resources.

The report, to be issued later this year, will once again support what law librarians have known for years: there are widespread issues with access to legal resources and there is an imminent need to prevent a trend of eliminating print resources in favor of electronic resources without the proper safeguards in place. It will also include information on: the official status of legal resources; whether states are providing for authentication, permanent public access, and/or preservation of online legal resources; any use restrictions or copyright claims by the state; and whether a universal (public domain) citation format has been adopted by any courts in the state.

In addition to providing valuable information to the Law Library of Congress and related Law.gov projects, this information has already been helpful to various groups as they proceed to advocate for no-fee, permanent public access to government information. The data has already been useful to advocates of the Uniform Electronic Legal Material Act and will continue to be valuable to those seeking introduction and enactment in their states. The inventory has been used as a starting point for organizations that are beginning digitization projects of their state legal materials. The universal citation data will be used to track the progress of courts recognizing the value of citing official online legal materials through adopting a public domain citation system. Many state working group coordinators have also shared data with their judiciaries and legislatures to help expose the need for taking steps to protect our state legal materials.

The Next Steps: Federal Inventory

In December 2010, we launched the second phase of this project, the Federal Inventory. The Federal Inventory will include:

Legal research materials
Information authored or created by agencies
Resources that are publicly accessible

Our goals are the same as with the state inventories: to identify and answer questions about print and electronic legal materials from all three branches of government. Volunteers from Federal agencies and the courts are already adding information such as decisions, reports and digests (Executive); court opinions, court rules, and Supreme Court briefs (Judicial); and bills and resolutions, the Constitution, and Statutes at Large (Legislative). Emily Carr, Senior Legal Research Specialist at the Law Library of Congress, and Judy Gaskell, retired Librarian of the Supreme Court, are coordinating this project.

Thanks to the contributions of an army of AALL and chapter volunteers, the national inventory of legal materials is nearly complete. Keep an eye on AALL’s website for more information as our volunteers complete the Federal Inventory, analyze the data, and promote the findings to Federal, state and local officials.

Tina S. Ching is the Electronic Services Librarian at Seattle University School of Law. She is the 2011-12 Chair of the AALL Digital Access to Legal Information Committee.

Emily Feltren is Director of Government Relations for the American Association of Law Libraries.

[Editor’s Note: For topic-related VoxPopuLII posts please see: Barbara Bintliff, The Uniform Electronic Legal Material Act Is Ready for Legislative Action; Jason Eiseman, Time to Turn the Page on Print Legal Information; John Joergensen, Authentication of Digital Repositories.]

Law in the Last-Mile: The Potential of Mobile Integration into Legal Services

Access to justice 1 Response »

Dec 222011

As a first year law student, a handful of things are given to you (at least where I studied): a pre-fabricated schedule, a non-negotiable slate of professors, and a basic history lesson — illustrated through individual cases. During my first year, the professor I fought with the most was my property law teacher. Now, I realize that it wasn’t her that I couldn’t accept; it was the implications of the worldview she presented. She saw “property law” as a construct through which wealthy people protected their interests at the expense of those who didn’t have the means to defend themselves. Every case — from “fast fox, loose fox” on down — was an example of someone’s manipulating or changing the rules to exclude the poor from fighting for their interests. It was a pretty radical position to accept and I, maybe to my own discredit, ignored it.

Then, I graduated. I began looking at legal systems around the world and tried to get a sense of how they actually function in practice. I found something a bit startling: they don’t function. Or, at least not for most of us.

Justice: Inaccessible

At first glance, that may seem alarmist. Honestly, it feels a bit radical to say. But, then consider that in 2008, the United Nations issued a report entitled Making the Law Work for Everyone, which estimated that 4 billion people (of a global population of 6 billion at the time) lacked “meaningful” access to the rule of law.

Stop for a second. Read that again. Two-thirds of the world’s population don’t have access to rule-of-law institutions. This means that they lack, not just substantive representation or equal treatment, but even the most basic access to justice.

Now, before you write me, and the UN, off completely as crackpots, I must make some necessary caveats. “Rule-of-law institutions,” in the UN report, means formal, governmentally sponsored systems. The term leaves out pluralistic systems, which rely on adapted or traditional actors, many of which exist exclusively outside of the purview of government, to settle civil or small-scale criminal disputes. Similarly, the word “meaningful,” in this context, is somewhat ambiguous. Making the Law Work for Everyone isn’t clear about what standards it uses to determine what constitutes “access,” “fairness,” or relevant and substantive law (i.e., the number and content of laws). While the report’s major focus was on adapting an appropriate level of formalism in order to create inclusive systems, the strategy of definitionally avoiding pluralism and cultural relativism while assessing a global standard of an internationally (and often constitutionally) protected service significantly complicates the analysis offered in the report.

What’s causing the gap in access to justice?

So, let’s work from the basics. The global population has been rising steadily for, well, a while now, increasing the volume of need addressed by legal systems. Concurrently, the number of countries has grown, and with them young legal systems, often without precedents or established institutional infrastructures. As the number of legal systems has grown, so too have the public’s expectations of the ability of these systems to provide formalized justice procedures. Within each of these nations, trends like urbanization, the emergence of new technologies, and the expansion of regulatory frameworks add complexity to the number of laws each domestic justice system is charged with enforcing. On top of this, the internationalization and proliferation of treaties, trade agreements, and institutions imposes another layer of complexity on what are often already over-burdened enforcement mechanisms. It’s understandable, then, why just about every government in the world struggles, not only to create justice systems that address all of these very complicated issues, but also to administer these systems so that they offer universally equal access and treatment.

Predictably, private industry observed these trends a long time ago. As a result, it should be no surprise that the cost of legal services has been steadily rising for at least 20 years. Law is fairly unique in that it is in charge of creating its own complexity, which is also the basis of its business model. The harder the law is to understand, the more work there is for lawyers. This means fewer people will have the specialized skills and relationships necessary to successfully achieve an outcome through the legal system.

What’s even more confusing is that because clients’ needs and circumstances vary so significantly, it’s very difficult to reliably judge the quality of service a lawyer provides. The result is a market where people, lacking any other reliable indicator, judge by price, aesthetics, and reputation. To a limited extent, this enables lawyers to self-create high-end market demand by inflating prices and, well, wearing really nice suits. (Yes, this is an oversimplification. But they do, often, wear REALLY nice suits). The result is the exclusion (or short-shrifting) of middle- and low-income clients who need the same level of care, but are less concerned with the attire. Incidentally, the size and spending power of the market being excluded — even despite growing wealth inequality — are enormous.

Redesigning legal services

I don’t mean to be simplistic or to re-state widely understood criticisms of legal systems. Instead, I want to establish the foundations for my understanding of things. See, I approach this from a design viewpoint. The two perspectives above — namely, that of governments trying to implement systems, and that of law firms trying to capitalize on available service markets — often neglect the one design perspective that determines success: that of the user. When we’re judging the success of legal systems, we don’t spend nearly enough time thinking about what the average person encounters when trying to engage legal systems. For most people, the accessibility (both physical and intellectual) and procedure of law, are as determinative of participation in the justice system as whether the system meets international standards.

The individuals and organizations on the cutting edge of this thinking, in my understanding, are those tasked with delivering legal services to low-resource and rural populations. Commercial and governmental legal service providers simply haven’t figured out a model that enables them to effectively engage these populations, who are also the world’s largest (relatively) untapped markets. Legal aid providers, however, encounter the individuals who have to overcome barriers like cost, time, education, and distance to just preserve the status quo, as well as those who seek protection. From the perspective of legal aid clients, the biggest challenge to accessing the justice system may be the fact that courts are often located dozens of miles away from clients’ homes, over miserable roads. Or the biggest challenge may be the fact that clients have to appear in court repeatedly to accomplish what seem like small tasks, such as extensions or depositions. Or the biggest challenge may be simply not knowing whom to approach to accomplish their law-related goals. Each of these challenges represents a barrier to access to justice. Each barrier to access, when alleviated, represents an opportunity for engagement and, if done correctly, an opportunity for financial sustainability.

Mobile points the way

None of this is intended as criticism — almost every major service industry in the world grapples with the same challenges. Well, with the exception of at least one: the mobile phone industry. The emergence of mobile phones presents two amazing opportunities for the legal services industry: 1) the very real opportunity for effective engagement with low-income and rural communities; and 2) an example of how, when service offerings are appropriately priced, these communities can represent immensely profitable commercial opportunities.

Let’s begin with a couple of quick points of information. Global mobile penetration (the number of people with active cell phone subscriptions) is approximately 5.3 billion, which is 78 percent of the world’s population. There are two things that every single one of those mobile phone accounts can do: 1) make calls; and 2) send text messages. Text messaging, or SMS (Short Message Service), is particularly interesting in the context of legal services because it is a way to actively engage with a potential client, or a client, immediately, cheaply, and digitally. There are 4.3 billion active SMS users in the world and, in 2010, the world sent 6.1 trillion text messages, a figure that has tripled in the last 3 years and is projected to double again by 2013. That’s more than twice the global Internet population of 2 billion. It’s no exaggeration, at this point, to say that mobile technology is transformative to, basically, everything. What has not been fully explored is why and how mobile devices can transform service delivery in particular settings.

Why is SMS so promising?

Something well-understood in the technology space is the value of approaching people using the platforms that they’re familiar with. In fact, in technology, the thing that matters most is use. Everything has to make sense to a user, and make things easier than they would be if the user didn’t use the system. This thinking largely takes place in technology spaces, in the niche called “user-interface design.” (Forgive the nerdy term, lawyers. Forgive the simplicity, fellow tech nerds.) These are the people who design the way that people engage with a new piece of technology.

In this way, considering it has 4.3 billion users, SMS has been one of the best, and most simply, designed technologies ever. SMS is instant, (usually) cheap, private, digital, standardized, asynchronous (unlike a phone call, people can respond whenever they want), and very easy to use. These benefits have made it the most used digital text-based communication tool in human history.

User-Interface-Design Principles + SMS + Legal Services = ?

So. What happens when you take user-interface design thinking, and apply it to legal systems? Recognizing that the assumptions underlying most formal legal systems arose when those systems originated (most of the time hundreds of years ago), how would we update or change what we do to improve the functioning of legal systems?

There are a lot of good answers to those questions, and moves toward transactional representation, form standardization (à la LegalZoom), legal process outsourcing (à la Pangea3), legal information systems (there are a lot), and process automation (such as document assembly) are all tremendously interesting approaches to this work. Unfortunately, I’m not an expert on any of those.

FrontlineSMS:Legal

I work for an organization called FrontlineSMS, where I also founded our FrontlineSMS:Legal project. What we do, at FrontlineSMS, is design simple pieces of technology that make it easier to use SMS to do complex and professional engagement. The FrontlineSMS:Legal project seeks to capitalize on the benefits of SMS to improve access to justice and the efficiency of legal services. That is, I spend a lot of my time thinking about all the ways in which SMS can be used to provide legal services to more people, more cheaply.

And the good news is, I think, that there are a lot of ways to do this. Pardon me while I geek out on a few.

Intake and referral

The process of remote legal client intake and referral takes a number of forms, depending on the organization, procedural context, and infrastructure. Within most legal processes, the initial interview between a service provider and a client is an exceptionally important and complex interaction. There are, however, often a number of simpler communications that precede and coordinate the initial interview, such as very basic information collection and appointment scheduling, which could be conducted remotely via SMS.

Given the complexity of legal institutions, providing remote intake and referral can significantly reduce the inefficiencies that so-called “last-mile” populations — i.e., populations who live in “areas …beyond the reach of basic government infrastructure or services — face in seeking access to services. The issue of complexity is often compounded by the centralization of legal service providers in urban areas, which requires potential clients to travel just to begin these processes. Furthermore, most rural or extension services operate with paper records, which are physically transported to central locations at fixed intervals. These records are not particularly practical from a workflow management perspective and often are left unexamined in unwieldy filing systems. FrontlineSMS:Legal can reduce these barriers by creating mobile interfaces for digital intake and referral systems, which enable clients to undertake simple interactions, such as identifying the appropriate service provider and scheduling an appointment.

Client and case management

After intake, most legal processes require service providers to interact with their clients on multiple occasions, in order to gather follow-up information, prepare the case, and manage successive court hearings. Recognizing that each such meetings require people from last-mile communities to travel significant distances, the iterative nature of these processes often imposes a disproportionate burden on clients, given the desired outcome. In addition, many countries struggle to provide sufficient postal or fixed-line telephone services, meaning that organizing follow-up appointments with clients can be a significant challenge. These challenges become considerably more complicated in cases that have multiple elements requiring coordination between both clients and institutions.

Similarly, in order to follow up with clients, service providers must place person-to-person phone calls, which can take significant chunks of time. Moreover, internal case management systems originate from paper records, causing large amounts of duplicative data entry and lags in data availability.

To alleviate these problems, we propose that legal service providers install a FrontlineSMS:Legal hub in a central location, such as a law firm or public defender’s office. During the intake interview, service agents would record the client’s mobile number and use SMS as an ongoing communications platform.

By creating a sustained communications channel between service providers and clients, lawyers and governments could communicate simple information, such as hearing reminders, probation compliance reminders, and simple case details. Additionally, these communications could be automated and sent to entire groups of clients, thereby reducing the amount of time required to manage clients and important case deadlines. This set of tools would reduce the barriers to communication with last-mile clients and create digital records of these interactions, enabling service providers to view all of these exchanges in one easy-to-use interface, reducing duplicative data entry and improving information usability.

Caseload- and service-extension agent management

Although this article focuses largely on innovations that improve direct access to legal services for last-mile populations, the same tools also have the effect of improving internal system efficiency by digitizing records and enabling a data-driven approach to measuring outcomes. Both urban and rural service extension programs have a difficult time monitoring their caseloads and agents in the field. The same communication barriers that limit a service provider’s ability to connect with last-mile clients also prevent communication with remote agents. Mobile interfaces have the effect of lowering these barriers, enabling both intake and remote reporting processes to feed digital interfaces. These digital record systems, when used effectively, inform a manager’s ability to allocate cases to the most available service provider.

Applied to legal processes, supervising attorneys can use the same SMS hubs that administer intake and case management processes to digitize their internal management structures. One central hub, fed by the intake process that information desks often perform, and remote input where service extension agents exist can allow managers to assign cases to individual service providers, and then track them through disposition. In doing so, legal service coordinators will be able to track each employee’s workload in real time. In addition, system administrators will be able to look at the types and frequency of cases they take on, which will inform their ability to allocate resources effectively. If, for example, one area has a dramatically higher number of cases than another, it may make sense to deploy multiple community legal advisors to adequately address the area of greatest need.

Ultimately, though, SMS use in legal services remains largely untested. FrontlineSMS is currently working with several partners to design specific mobile interfaces that meet their needs. These efforts will definitely turn up new and interesting things that can be done using SMS and, particularly, FrontlineSMS. These projects, however, are still largely in the design phase.

In addition to practical implementation challenges, there are a large number of challenges that lie ahead, as we begin to consider the implications of the professional use of SMS. Issues such as security, privacy, identity, and chain of custody will all need to be addressed as systems adapt to include new technologies. There are a number of brilliant minds well ahead on this, and we’ve even jury-rigged a few solutions ourselves, but there will be plenty to learn along the way.

The potential is great

What is clear, though, is that SMS has the potential to improve cost efficiencies, engage new populations, and, for the first time, build a justice system that works for the people who need it most.

I don’t think any of this will square me with my property-law professor. I’m not sure I’ll ever fix property law. But I do think that by reaching out to new populations using the technologies in their pockets, we can make a difference in the way people interact with the law. And even if that’s just a little bit, even if it just enables one percent more people to protect their homes, start a business, or pursue a better life, isn’t that worth it?

[Editor’s Note: For other VoxPopuLII posts on using technology to improve access to justice, please see Judge Dory Reiling, IT and the Access-to-Justice Crisis; Nick Holmes, Accessible Law; and Christine Kirchberger, If the mountain will not come to the prophet, the prophet will go to the mountain.]

Sean Martin McDonald is the Director of Operations at FrontlineSMS and the founding Director of FrontlineSMS:Legal. He holds JD and MA degrees from American University. He is the author, most recently, of The Case for mLegal.

VoxPopuLII is edited by Judith Pratt. Editor-in-Chief is Robert Richards, to whom queries should be directed. The statements above are not legal advice or legal representation. If you require legal advice, consult a lawyer. Find a lawyer in the Cornell LII Lawyer Directory.

OpenGovernment.org - Researching U.S. State Legislation

Citizen participation in lawmaking, eparticipation, free access to law, Public access to legal information 1 Response »

Dec 052011

The Civic Need

Civic morale in the U.S. is punishingly low and bleeding out. When it comes to recent public approval of the U.S. Congress, we’re talking imminent negative territory, if such were possible. Gallows chuckles were shared over an October 2011 NYT/CBS poll that found approval of the U.S. Congress down to 9% — lower than, yes, communism, the British Petroleum company during the oil spill, and King George III at the time of the American Revolution. The trends are beyond grim: Gallup in November tracked Congress falling to 13% approval, tying an all-time low. For posterity, this is indeed the first branch of the federal government in America’s constitutional republic, the one with “the power of the purse“, our mostly-millionaire law-makers. Also: the branch whose leadership recently attempted to hole up in an anti-democratic, unaccountable “SuperCommittee” to make historic decisions affecting public policy in secret. Members of Congress are the most fallible, despised elected officials in our representative democracy.

OpenCongress: Responding with open technology

Such was the visceral distrust of government (and apathy about the wider political process, in all its messy necessity) that our non-profit organization, the Participatory Politics Foundation (PPF), sought to combat with our flagship Web application, OpenCongress.org. Launched in 2007, its original motto was: “Bringing you the real story about what’s happening in Congress.” Our premise, then as today, is that radical transparency in government will increase public accountability, reduce systemic corruption in government, and result in better legislative outcomes. We believe free and open-source technology can push forward and serve a growing role in a much more deliberative democratic process — with an eye towards comprehensive electoral reform and increased voter participation. The technology buffet includes, in part, the following: software (in the code that powers OpenCongress); Web applications (like the user-friendly OpenCongress pages and engagement tools); mobile (booming, of course, globally); libre data and open standards; copyleft licensing; and more. One articulation of our goal is to encourage government, as the primary source, to comply exhaustively with the community-generated Principles of Open Government Data (which, it should be noted, are continually being revised and amended by #opengov advocates, as one would expect in a healthy, dynamic, and responsive community of watchdogs with itchy social sharing fingers). Another articulation of our goal, put reductively: we’ll know we’re doing better when voter participation rates rise in the U.S. from our current ballpark of 48% to levels comparable to those of other advanced democracies. Indeed, there has been a very strong and positive public demand for user-friendly Web interfaces and open data access to official government information. Since its launch, OpenCongress has grown to become the most-visited not-for-profit government transparency site in the U.S. (and possibly the world), with over one million visits per month, hundreds of thousands of users, and millions of automated data requests filled every week.

OpenGovernment.org: Opening up state legislatures

The U.S. Congress, unfortunately, remains insistently closed-off from the taxpaying public — living, breathing people and interested constituent communities — in its data inputs and outputs, while public approval keeps falling (for a variety of reasons, more than can be gestured towards here). This discouraging sentiment might be familiar to you — even cliché — if you’re an avid consumer of political news media, political blogs, and social media. But what’s happening in your state legislature? What bills in your state House or Senate chambers are affecting issues you care about? What are special interests saying about them, and how are campaign contributions influencing them? Even political junkies might not have conversational knowledge of key votes in state legislatures, which — if I may be reductive — take all the legislative arcane-ness of the federal Congress and boil it down to an even more restrictive group of state capitol “insiders” who really know the landscape. A June 2011 study by the University of Buffalo PoliSci Department found that, as summarized on Ballotpedia :

First, the American mass public seems to know little about their state governments. In a survey of Ohio, Patterson, Ripley, and Quinlan (1992) found that 72 percent of respondents could not name their state legislator. More recently, an NCSL-sponsored survey found that only 33 percent of respondents over 26 years old could correctly identify even the party that controlled their state legislature.

Further, state legislative elections are rarely competitive, and frequently feature only one major party candidate on the ballot. In the 2010 elections, 32.7 percent of districts had only one major party candidate running. (Ballotpedia 2010) In 18 of the 46 states holding legislative elections in 2010, over 40 percent of seats faced no major-party challenge, and in only ten states was the proportion of uncontested seats lower than 20 percent. In such an environment, the ability to shirk with limited consequences seems clear.”^[1]

To open up state government, PPF created OpenGovernment.org as a joint project with the non-profit Sunlight Foundation and the community-driven Open States Project (of Sunlight Labs). Based on the proven OpenCongress model of transparency, OpenGovernment combines official government information with news and blog coverage, social media mentions, campaign contribution data, public discussion forums, and a suite of free engagement tools. The result, in short, is the most user-friendly page anywhere on the Web for accessing bill information at the state level. The site, launched in a public beta on January 18th, 2011, currently contains information for six U.S. state legislatures: California, Louisiana, Maryland, Minnesota, Texas, and Wisconsin. In March 2011, OpenGovernment was named a semi-finalist in the Accelerator Contest at South by Southwest Interactive conference.

Skimming a state homepage — for example, California — gives a good overview of the site’s offerings: every bill, legislator, vote, and committee, with as much full bill text as is technically available; plus issues, campaign contributions, key vote analysis, special interest group positions, and a raft of social wisdom. A bill page — for example, Wisconsin’s major freedom of association bill, SB 11 — shows how it all comes together in a highly user-friendly interface and, we hope, the best-online user experience. Users can track, share, and comment on legislation, and then contact their elected officials over email directly from OpenGovernment pages. OpenGovernment remains in active open-source development. Our developer hub has more information. See also our wish-list and how anyone can help us grow, as we seek to roll out to all 50 U.S. state legislatures before the November 2012 elections.

Opening up state legislative data: The benefits

To make the value proposition for researchers explicit, I believe fundamentally there is clear benefit in having a go-to Web resource to access official, cited information about any and all legislative objects in a given state legislature (as there is with OpenCongress and the U.S. Congress). It’s desirable for researchers to know they have a permalink of easy-to-skim info for bills, votes, and more on OpenGovernment — as opposed to clunky, outmoded official state legislative websites (screenshots of which can be found in our launch blog post, if you’re brave enough to bear them). Full bill text is, of course vital for citing, as is someday having fully-transparent version-control by legislative assistants and lobbyists and members themselves. For now, the site’s simple abilities to search legislation, sort by “most-viewed,” sort by date, sort by “most-in-the-news,” etc., all offer a highly contemporary user-experience, like those found by citizens elsewhere on the Web (e.g., as online consumers or on social media services). Our open API and code and data repositories ensure that researchers and outside developers (e.g., data specialists) have bulk access to the data we aggregate, in order to remix and sift through for discoveries and insights. Bloggers and journalists can use OpenGovernment (OG) in their political coverage, just as OpenCongress (OC) continues to be frequently cited by major media sites and blog communities. Issue advocates and citizen watchdogs can use OG to find, track, and contact their state legislators, soon with free online organizing features like Contact-Congress on OC. OpenGovernment‘s launch was covered by Alex Howard of O’Reilly Radar, the National Council of State Legislatures (The Thicket blog), and Governing, with notes as well from many of PPF and Sunlight’s #opengov #nonprofit allies, and later on by Knight Foundation, Unmatched Style, and dozens of smaller state-based political blogs.

The technology that powers OpenGovernment.org

The technology behind OpenGovernment was assembled by PPF’s former Director of Technology (and still good friend-of-PPF, following his amicable transition to personal projects) Carl Tashian. In designing it, Carl and I were driven first by a desire to ensure the code was not only relatively-remixable but also as modular as possible. Remixable, because we hoped and expect that other open-source versions of OpenGovernment will spring up, creating (apologies for the cliché, but it’s one I am loathe to relinquish, as it’s really the richest, most apt description of a desirable state of affairs) a diverse ecosystem of government watchdog sites for state legislatures. Open data and user-focused Web design can bring meaningful public accountability not only to state legislatures, but also to the executive and judicial branches of state government as well. PPF seeks non-profit funding support to bring OpenGovernment down to the municipal level — county, city, and local town councils, as hyper-local and close to the neighborhood block as possible — and up to foreign countries and international institutions like the United Nations. In theory, any government entity with official documents and elected official roles is a candidate for a custom version of OpenGovernment facing the public on the open Web — even those without fully-open data sets, which of course, most countries don’t have. But by making OpenGovernment as modular as possible, we aimed to ensure that the site could work with a variety of data inputs and formats. The software is designed to handle a best-case data stream — an API of legislative info — or less-than-best, such as XML feeds, HTML scraping, or even a static set of uploaded documents and spreadsheets.

Speaking of software, OpenGovernment is powered by GovKit, an open-source Ruby gem for aggregating and displaying open government APIs from around the Web. Diagrammed here, they are summarized here with a few notes:

Open States – a RESTful API of official government data, e.g. bills, votes, legislators, committees, and more. This data stream forms the backbone of OpenGovernment. A significantly volunteer effort coordinated by the talented and dedicated team at Sunlight Labs, Open States fulfills a gigantic public need for standardized data about state legislation — largely by the time-intensive process of scraping HTML from unstandardized existing official government websites. Really remarkable and precedent-setting public-interest work, the updates are by James Turk on the Labs Blog. Data received daily in .json format, and wherever possible, bill text is displayed in the smooth open-source DocumentCloud text viewer (e.g., WI SB11).
OpenCongress – API for federal bills, votes, people, and news and blog coverage. OpenGovernment is primarily focused on finding and tracking state bills and legislators, but one of our premises in designing the public resource was that the vast majority of users would first need to look up their elected officials by street address. (Can you name your state legislators with confidence offhand? I couldn’t before developing OpenCongress in 2007.) So since users were likely to take that action, we used our sibling site OpenCongress to find and display federal legislators underneath state ones (e.g., CA zip 94110).
Google News, Google Blog Search, Bing API – we use these methods to aggregate news and blog coverage of bills and members, as on OpenCongress: searching for specific search terms and thereby assembling pages that allow a user to skim down recent mentions of a bill (along with headlines and sources) without straying far from OpenGovernment. One key insight of OpenCongress was that lists of bills “most in the news” and “most-on-blogs” can point users towards what’s likely most-pressing or most-discussed or most-interesting to them, as search engine or even intra-site keyword searches on, say, “climate change bill” don’t always return most-relevant results, even when lightly editorially curated for SEO. On pages of news results for individual bills (e.g., CA SB 9) or members (e.g., WI Sen. Tim Carpenter), it’s certainly useful to get a sense of the latest news by skimming down aggregated headlines, even given known issues with bringing in similarly titled bills (e.g., SB 9 in Texas, not California) or sports statistics or spam. Future enhancements to OpenGovernment will do more to highlight trusted news sources from open data standards — a variety of services like NewsTrust exist on this front, and there’s no shortage of commercial partnerships possible (or via Facebook Connect and other closed social media), but PPF’s focus is on mitigating the “filter bubble” and staying in play on the open Web.
Transparency Data API (by Sunlight Labs) to bring in campaign contribution data from FollowTheMoney. If Open States data is the backbone of OpenGovernment, this money-in-politics data is its heart. PPF’s work is first and foremost motivated by a desire to work in the public interest to mitigate the harmful effects of systemic corruption at every level of government, from the U.S. Congress on down. (See, e.g., Lessig, Rootstrikers, innumerable academic studies and news investigations into the biased outcomes of a system where, for example, federal members of Congress spend between 30 and 70 percent of their time fundraising instead of connecting with constituents.) Part of this is vocally endorsing comprehensive electoral reforms such as non-partisan redistricting, right-to-vote legal frameworks, score voting, parliamentary representation, and the Fair Elections Now Act for full public financing of elections. But the necessary first step is radical transparency of campaign contributions by special interests to elected officials, accompanied by real-time financial disclosure, stronger ethics laws, aggressive oversight, and regulation to stop the revolving door with lobbyists and corporations that results in oligarchical elites and a captured government. Hence “The Money Trail” on OpenGovernment, e.g., for Texas, is a vital resource for connecting bills, votes, and donations. The primary source for money figures is our much-appreciated and detail-oriented non-profit partners at the National Institute on Money in State Politics, who receive data in either electronic or paper files from the state disclosure agencies with which candidates must file their campaign finance reports. Future enhancements to OG will integrate with MAPLight‘s unique analysis of industries supporting and opposing individual bills with their donations. MAPLight has data for CA and WI we’re looking to bring in, with more to come.
Project VoteSmart’s API brings in special-interest group ratings for state government and allows OpenGovernment to highlight the most-impactful legislation in each state, marking their non-partisan “key vote” bills (e.g., for TX). VoteSmart does remarkable legislative analysis that neatly ties in bills to issue areas, but VoteSmart doesn’t have a built-in money-in-politics tie-in on their pages, or tools to track and share legislation. (This is just another way in which OpenGovernment, by aggregating the best available data in a more user-focused design, adds value, we hope, in an open-source Web app, about which more below.) Project VoteSmart’s work is hugely valuable, but the data is again ornery — special interest group ratings are frequently sparse and vary in scale, and are therefore difficult to accurately summarize or average — so for members, where applicable, we show a total of the number of ratings in each category (e.g., for TX Sen. Dan Patrick) and link to a fuller description.
Wikipedia – OG first attempts to match on a legislator’s full name to a bio page on Wikipedia, with largely good but occasionally false-positive results. Of course many politicians go by nicknames, so this is a straightforward enhancement we’ll make once we can prioritize it with our available resources. See, e.g., TX Sen. Joan Huffman on OG, and her bio on Wikipedia.
Twitter – OG has first-pass implementation of bringing in mentions of a state hashtag and bill number, e.g., #txbill #sb7, and for members, state name and legislator name, e.g., Texas Joan Huffman. This is another relatively straightforward engineering enhancement that we can make more responsive and more accurate with additional resources — for example, bringing in more accurate mentions and highlighting ones made by influential publishers on social media. Spending our time working within walled gardens to capture mentions of key votes isn’t inherently pleasant, but bringing out vital chatter onto the open Web and making it available via our open API will be worth the time and investment.
Miro Community, free and open-source software from PPF’s sibling non-profit the Participatory Culture Foundation (PCF), makes it possible to crowdsource streaming online video about state legislatures (e.g., CA).

The OpenGovernment.org Web app is free, libre, and written in open-source Ruby on Rails code (developer hub). Like OpenCongress, the site is not-for-profit, non-commercial, promotes #opengovdata, open standards, and offers an open API, with volunteer contributions and remixes welcome and encouraged. Two features: most pages on the site are available for query via JSON and JSONP; and we offer free lookup of federal and state elected officials by latitude / longitude by URL. PostgreSQL and PostGIS power the back-end — we’ve seen with OpenCongress that the database of aggregated info can become huge, so laying a solid foundation for this was relevant in our early steps. The app uses the terrific open-source GeoServer to display vote maps — many enhancements possible there — and Jammit for asset packaging. For more technical details, see this enjoyable Changelog podcast w/ Carl from February 2011.

Web design on this beta OG Web app is by PPF and PCF’s former designer (and still good friend after an amicable parting) Morgan Knutson, now a designer with Google. As product manager, my goal was creating a user interface that — like the code base — would be as modular as possible. Lots of placeholder notes remain throughout the beta version pointing to areas of future enhancement that we can pursue with more resources and open-source volunteer help. Many of the engagement features of the site — from tracking to commenting to social sharing — were summarized brilliantly by Rob Richards in this Slaw.ca interview with me from July 29th, 2011 — viz., walking users up the “chain of engagement.” It’s a terrific, much-appreciated introduction to the civic-engagement goals of our organization and our beliefs regarding how well-designed web pages can do more than one might think to improve a real-life community in the near-term.

More on open government data and online civic engagement

To briefly run through more academic or data-driven research on the public benefits of #opengovdata and open-source Web tools for civic engagement (not intended to be comprehensive, of course, and with more caveats than I could fit here) :

Tiago Peixoto of the World Bank — @participatory on micropublishing — lots of academic studies
David Eaves, open data activist based in Canada — see also Open Gov Data camp (summary, official)
“Open Data is Civic Capital: Best Practices for ‘Open Government Data'”, essay (rev. Jan. 2011) by Joshua Tauberer, our longtime data partner behind GovTrack and part of the new POPVOX team in the #gov20 startup landscape
Wiki Government — book by Prof. Beth Noveck — and Open Government — book by O’Reilly Media, edited by Daniel Lathrop & Laurel Ruma.
My #opengov curated list on Twitter (@ppolitics).
Civic Software Index, crowdsourced spreadsheet coordinated by our terrific allies abroad, OKFN and MySociety (primarily UK — and PPF’s primary inspiration to create OpenCongress in the U.S.)
Various allied non-profit organizations and data partners listed in the footer of PPF’s homepage, especially Sunlight in D.C. and CivicCommons (also based in NYC — with a deeply rich wiki of #opengov resources)

OpenGovernment.org: Some metrics

To wrap up this summary of OpenGovernment in 2011, then, I’ll summarize some of the metrics we’ve seen on Google Analytics — with limited outreach and no paid advertising or commercial partnerships, OpenGovernment beta with its six states will have received over half a million pageviews in its first year of existence. As with OpenCongress, by far the most-viewed content as of now is bills, found via search engines by their official number, which send approximately two-thirds of all traffic (and of that, Google alone sends over half). Hot bills in Texas and the WI organizing bill constitute three of OG’s top ten most-viewed pages sitewide. After hearing about a firearms bill in the news or from a neighbor, for example, users type “texas bill 321” or “sb 321” into Google and end up on OG, where they’re able to skim the bill’s news coverage, view the campaign contributions (for example) and interest group ratings (for example) of its authors and sponsors, and notify their legislators of their opinions by finding and writing their elected officials.

OpenGovernment.org: Next steps, and How you can help

In addition to rolling out to all 50 U.S. states and launching pilot projects in municipal areas, one of our main goals for OpenGovernment is integration with the free organizing features we launched this past summer on OpenCongress version 3. Enabling OG users to email-their-state-reps directly from bill pages will significantly increase the amount of publicly transparent, linkable, query-able constituent communication on the open Web. Allowing issue-advocacy organizations and political blog communities to create campaigns as part of future MyOG Groups will coordinate whipping of state legislators for a more continually-connected civic experience. And as always, tweaks to the beta site’s user interface will allow us to highlight the best-available information about how money affects politics and votes in state legislatures, to fight systemic corruption, and to bring about a cleaner and more trustworthy democratic process. Help us grow and contact us anytime with questions or feedback. As a public charity, PPF aspires to be grow to become more akin to the Wikimedia Foundation (behind Wikipedia), Mozilla (behind Firefox), and MySociety (behind TheyWorkForYou, for the UK Parliament, and other projects). We’re working towards a future where staying in touch with what’s happening in state capitols is just as easy and as immediately rewarding as, for example, seeing photos from friends on Facebook, sharing a joke on Twitter, or loading a movie on Netflix.com.

David Moore is the Executive Director of the Participatory Politics Foundation, a non-profit organization using technology for civic engagement. He lives in Brooklyn, NY.

Legal Prosumers: How Can Government Leverage User-Generated Content?

Crowdsourcing and legal information systems, Legal knowledge representation, Legal ontologies, Legal text mining, Legal text processing, natural language processing, Semantic Web and law, User-generated content and legal information 1 Response »

Nov 172011

Prosumption: shifting the barriers between information producers and consumers

One of the major revolutions of the Internet era has been the shifting of the frontiers between producers and consumers [1]. Prosumption refers to the emergence of a new category of actors who not only consume but also contribute to content creation and sharing. Under the umbrella of Web 2.0, many sites indeed enable users to share multimedia content, data, experiences [2], views and opinions on different issues, and even to act cooperatively to solve global problems [3]. Web 2.0 has become a fertile terrain for the proliferation of valuable user data enabling user profiling, opinion mining, trend and crisis detection, and collective problem solving [4].

The private sector has long understood the potentialities of user data and has used them for analysing customer preferences and satisfaction, for finding sales opportunities, for developing marketing strategies, and as a driver for innovation. Recently, corporations have relied on Web platforms for gathering new ideas from clients on the improvement or the development of new products and services (see for instance Dell’s Ideastorm; salesforce’s IdeaExchange; and My Starbucks Idea). Similarly, Lego’s Mindstorms encourages users to share online their projects on the creation of robots, by which the design becomes public knowledge and can be freely reused by Lego (and anyone else), as indicated by the Terms of Service. Furthermore, companies have been recently mining social network data to foresee future action of the Occupy Wall Street movement.

Even scientists have caught up and adopted collaborative methods that enable the participation of laymen in scientific projects [5].

Now, how far has government gone in taking up this opportunity?

Some recent initiatives indicate that the public sector is aware of the potential of the “wisdom of crowds.” In the domain of public health, MedWatcher is a mobile application that allows the general public to submit information about any experienced drug side effects directly to the US Food and Drug Administration. In other cases, governments have asked for general input and ideas from citizens, such as the brainstorming session organized by Obama government, the wiki launched by the New Zealand Police to get suggestions from citizens for the drafting of a new policing act to be presented to the parliament, or the Website of the Department of Transport and Main Roads of the State of Queensland, which encourages citizens to share their stories related to road tragedies.

Even in so crucial a task as the drafting of a constitution, government has relied on citizens’ input through crowdsourcing [6]. And more recently several other initiatives have fostered crowdsourcing for constitutional reform in Morocco and in Egypt .

It is thus undeniable that we are witnessing an accelerated redefinition of the frontiers between experts and non-experts, scientists and non-scientists, doctors and patients, public officers and citizens, professional journalists and street reporters. The ‘Net has provided the infrastructure and the platforms for enabling collaborative work. Network connection is hardly a problem anymore. The problem is data analysis.

In other words: how to make sense of the flood of data produced and distributed by heterogeneous users? And more importantly, how to make sense of user-generated data in the light of more institutional sets of data (e.g., scientific, medical, legal)? The efficient use of crowdsourced data in public decision making requires building an informational flow between user experiences and institutional datasets.

Similarly, enhancing user access to public data has to do with matching user case descriptions with institutional data repositories (“What are my rights and obligations in this case?”; “Which public office can help me”?; “What is the delay in the resolution of my case?”; “How many cases like mine have there been in this area in the last month?”).

From the point of view of data processing, we are clearly facing a problem of semantic mapping and data structuring. The challenge is thus to overcome the flood of isolated information while avoiding excessive management costs. There is still a long way to go before tools for content aggregation and semantic mapping are generally available. This is why private firms and governments still mostly rely on the manual processing of user input.

The new producers of legally relevant content: a taxonomy

Before digging deeper into the challenges of efficiently managing crowdsourced data, let us take a closer look at the types of user-generated data flowing through the Internet that have some kind of legal or institutional flavour.

One type of user data emerges spontaneously from citizens’ online activity, and can take the form of:

citizens’ forums

platforms gathering open public data and comments over them (see for instance data-publica )

legal expert blogs (blawgs)

or the journalistic coverage of the legal system.

User data can as well be prompted by institutions as a result of participatory governance initiatives, such as:

crowdsourcing (targeting a specific issue or proposal by government as an open brainstorming session)

comments and questions addressed by citizens to institutions through institutional Websites or through e-mail contact.

This variety of media supports and knowledge producers gives rise to a plurality of textual genres, semantically rich but difficult to manage given their heterogeneity and quick evolution.

Managing crowdsourcing

The goal of crowdsourcing in an institutional context is to extract and aggregate content relevant for the management of public issues and for public decision making. Knowledge management strategies vary considerably depending on the ways in which user data have been generated. We can think of three possible strategies for managing the flood of user data:

Pre-structuring: prompting the citizen narrative in a strategic way

A possible solution is to elicit user input in a structured way; that is to say, to impose some constraints on user input. This is the solution adopted by IdeaScale, a software application that was used by the Open Government Dialogue initiative of the Obama Administration. In IdeaScale, users are asked to check whether their idea has already been covered by other users, and alternatively to add a new idea. They are also invited to vote for the best ideas, so that it is the community itself that rates and thus indirectly filters the users’ input.

The MIT Deliberatorium, a technology aimed at supporting large-scale online deliberation, follows a similar strategy. Users are expected to follow a series of rules to enable the correct creation of a knowledge map of the discussion. Each post should be limited to a single idea, it should not be redundant, and it should be linked to a suitable part of the knowledge map. Furthermore, posts are validated by moderators, who should ensure that new posts follow the rules of the system. Other systems that implement the same idea are featurelist and Debategraph [7].

While these systems enhance the creation and visualization of structured argument maps and promote community engagement through rating systems, they present a series of limitations. The most important of these is the fact that human intervention is needed to manually check the correct structure of the posts. Semantic technologies can play an important role in bridging this gap.

Semantic analysis through ontologies and terminologies

Ontology-driven analysis of user-generated text implies finding a way to bridge Semantic Web data structures, such as formal ontologies expressed in RDF or OWL, with unstructured implicit ontologies emerging from user-generated content. Sometimes these emergent lightweight ontologies take the form of unstructured lists of terms used for tagging online content by users. Accordingly, some works have dealt with this issue, especially in the field of social tagging of Web resources in online communities. More concretely, different works have proposed models for making compatible the so-called top-down metadata structures (ontologies) with bottom-up tagging mechanisms (folksonomies).

The possibilities range from transforming folksonomies into lightly formalized semantic resources (Lux and Dsinger, 2007; Mika, 2005) to mapping folksonomy tags to the concepts and the instances of available formal ontologies (Specia and Motta, 2007; Passant, 2007). As the basis of these works we find the notion of emergent semantics (Mika, 2005), which questions the autonomy of engineered ontologies and emphasizes the value of meaning emerging from distributed communities working collaboratively through the Web.

We have recently worked on several case studies in which we have proposed a mapping between legal and lay terminologies. We followed the approach proposed by Passant (2007) and enriched the available ontologies with the terminology appearing in lay corpora. For this purpose, OWL classes were complemented with a has_lexicalization property linking them to lay terms.

The first case study that we conducted belongs to the domain of consumer justice, and was framed in the ONTOMEDIA project. We proposed to reuse the available Mediation-Core Ontology (MCO) and Consumer Mediation Ontology (COM) as anchors to legal, institutional, and expert knowledge, and therefore as entry points for the queries posed by consumers in common-sense language.

The user corpus contained around 10,000 consumer questions and 20,000 complaints addressed from 2007 to 2010 to the Catalan Consumer Agency. We applied a traditional terminology extraction methodology to identify candidate terms, which were subsequently validated by legal experts. We then manually mapped the lay terms to the ontological classes. The relations used for mapping lay terms with ontological classes are mostly has_lexicalisation and has_instance.

A second case study in the domain of consumer law was carried out with Italian corpora. In this case domain terminology was extracted from a normative corpus (the Code of Italian Consumer law) and from a lay corpus (around 4000 consumers’ questions).

In order to further explore the particularities of each corpus respecting the semantic coverage of the domain, terms were gathered together into a common taxonomic structure [8]. This task was performed with the aid of domain experts. When confronted with the two lists of terms, both laypersons and technical experts would link most of the validated lay terms to the technical list of terms through one of the following relations:

Subclass: the lay term denotes a particular type of legal concept. This is the most frequent case. For instance, in the class objects, telefono cellulare (cell phone) and linea telefonica (phone line) are subclasses of the legal terms prodotto (product) and servizio (service), respectively. Similarly, in the class actors agente immobiliare (estate agent) can be seen as subclass of venditore (seller). In other cases, the linguistic structures extracted from the consumers’ corpus denote conflictual situations in which the obligations have not been fulfilled by the seller and therefore the consumer is entitled to certain rights, such as diritto alla sostituzione (entitlement to a replacement). These types of phrases are subclasses of more general legal concepts such as consumer right.

Instance: the lay term denotes a concrete instance of a legal concept. In some cases, terms extracted from the consumer corpus are named entities that denote particular individuals, such as Vodafone, an instance of a domain actor, a seller.

Equivalent: a legal term is used in lay discourse. For instance, contratto (contract) or diritto di recessione (withdrawal right).

Lexicalisation: the lay term is a lexical variant of the legal concept. This is the case for instance of negoziante, used instead of the legal term venditore (seller) or professionista (professional).

The distribution of normative and lay terms per taxonomic level shows that, whereas normative terms populate mostly the upper levels of the taxonomy [9], deeper levels in the hierarchy are almost exclusively represented by lay terms.

Term distribution per taxonomic level

The result of this type of approach is a set of terminological-ontological resources that provide some insights on the nature of laypersons’ cognition of the law, such as the fact that citizens’ domain knowledge is mainly factual and therefore populates deeper levels of the taxonomy. Moreover, such resources can be used for the further processing of user input. However, this strategy presents some limitations as well. First, it is mainly driven by domain conceptual systems and, in a way, they might limit the potentialities of user-generated corpora. Second, they are not necessarily scalable. In other words, these terminological-ontological resources have to be rebuilt for each legal subdomain (such as consumer law, private law, or criminal law), and it is thus difficult to foresee mechanisms for performing an automated mapping between lay terms and legal terms.

Beyond domain ontologies: information extraction approaches

One of the most important limitations of ontology-driven approaches is the lack of scalability. In order to overcome this problem, a possible strategy is to rely on informational structures that occur generally in user-generated content. These informational structures go beyond domain conceptual models and identify mostly discursive, emotional, or event structures.

Discursive structures formalise the way users typically describe a legal case. It is possible to identify stereotypical situations appearing in the description of legal cases by citizens (i.e., the nature of the problem; the conflict resolution strategies, etc.). The core of those situations is usually predicates, so it is possible to formalize them as frame structures containing different frame elements. We followed such an approach for the mapping of the Spanish corpus of consumers’ questions to the classes of the domain ontology (Fernández-Barrera and Casanovas, 2011). And the same technique was applied for mapping a set of citizens’ complaints in the domain of acoustic nuisances to a legal domain ontology (Bourcier and Fernández-Barrera, 2011). By describing general structures of citizen description of legal cases we ensure scalability.

Emotional structures are extracted by current algorithms for opinion- and sentiment mining. User data in the legal domain often contain an important number of subjective elements (especially in the case of complaints and feedback on public services) that could be effectively mined and used in public decision making.

Finally, event structures, which have been deeply explored so far, could be useful for information extraction from user complaints and feedback, or for automatic classification into specific types of queries according to the described situation.

Crowdsourcing in e-government: next steps (and precautions?)

Legal prosumers’ input currently outstrips the capacity of government for extracting meaningful content in a cost-efficient way. Some developments are under way, among which are argument-mapping technologies and semantic matching between legal and lay corpora. The scalability of these methodologies is the main obstacle to overcome, in order to enable the matching of user data with open public data in several domains.

However, as technologies for the extraction of meaningful content from user-generated data develop and are used in public-decision making, a series of issues will have to be dealt with. For instance, should the system developer bear responsibility for the erroneous or biased analysis of data? Ethical questions arise as well: May governments legitimately analyse any type of user-generated content? Content-analysis systems might be used for trend- and crisis detection; but what if they are also used for restricting freedoms?

The “wisdom of crowds” can certainly be valuable in public decision making, but the fact that citizens’ online behaviour can be observed and analysed by governments without citizens’ acknowledgement poses serious ethical issues.

Thus, technical development in this domain will have to be coupled with the definition of ethical guidelines and standards, maybe in the form of a system of quality labels for content-analysis systems.

[Editor’s Note: For earlier VoxPopuLII commentary on the creation of legal ontologies, see Núria Casellas, Semantic Enhancement of Legal Information… Are We Up for the Challenge? For earlier VoxPopuLII commentary on Natural Language Processing and legal Semantic Web technology, see Adam Wyner, Weaving the Legal Semantic Web with Natural Language Processing. For earlier VoxPopuLII posts on user-generated content, crowdsourcing, and legal information, see Matt Baca and Olin Parker, Collaborative, Open Democracy with LexPop; Olivier Charbonneau, Collaboration and Open Access to Law; Nick Holmes, Accessible Law; and Staffan Malmgren, Crowdsourcing Legal Commentary.]

[1] The idea of prosumption existed actually long before the Internet, as highlighted by Ritzer and Jurgenson (2010): the consumer of a fast food restaurant is to some extent as well the producer of the meal since he is expected to be his own waiter, and so is the driver who pumps his own gasoline at the filling station.

[2] The experience project enables registered users to share life experiences, and it contained around 7 million stories as of January 2011: http://www.experienceproject.com/index.php.

[3] For instance, the United Nations Volunteers Online platform (http://www.onlinevolunteering.org/en/vol/index.html) helps volunteers to cooperate virtually with non-governmental organizations and other volunteers around the world.

[4] See for instance the experiment run by mathematician Gowers on his blog: he posted a problem and asked a large number of mathematicians to work collaboratively to solve it. They eventually succeeded faster than if they had worked in isolation: http://gowers.wordpress.com/2009/01/27/is-massively-collaborative-mathematics-possible/.

[5] The Galaxy Zoo project asks volunteers to classify images of galaxies according to their shapes: http://www.galaxyzoo.org/how_to_take_part. See as well Cornell’s projects Nestwatch (http://watch.birds.cornell.edu/nest/home/index) and FeederWatch (http://www.birds.cornell.edu/pfw/Overview/whatispfw.htm), which invite people to introduce their observation data into a Website platform.

[6] http://www.participedia.net/wiki/Icelandic_Constitutional_Council_2011.

[7] See the description of Debategraph in Marta Poblet’s post, Argument mapping: visualizing large-scale deliberations (http://serendipolis.wordpress.com/2011/10/01/argument-mapping-visualizing-large-scale-deliberations-3/).

[8] Terms have been organised in the form of a tree having as root nodes nine semantic classes previously identified. Terms have been added as branches and sub-branches, depending on their degree of abstraction.

[9] It should be noted that legal terms are mostly situated at the second level of the hierarchy rather than the first one. This is natural if we take into account the nature of the normative corpus (the Italian consumer code), which contains mostly domain specific concepts (for instance, withdrawal right) instead of general legal abstract categories (such as right and obligation).

REFERENCES

Bourcier, D., and Fernández-Barrera, M. (2011). A frame-based representation of citizen’s queries for the Web 2.0. A case study on noise nuisances. E-challenges conference, Florence 2011.

Fernández-Barrera, M., and Casanovas, P. (2011). From user needs to expert knowledge: Mapping laymen queries with ontologies in the domain of consumer mediation. AICOL Workshop, Frankfurt 2011.

Lux, M., and Dsinger, G. (2007). From folksonomies to ontologies: Employing wisdom of the crowds to serve learning purposes. International Journal of Knowledge and Learning (IJKL), 3(4/5): 515-528.

Mika, P. (2005). Ontologies are us: A unified model of social networks and semantics. In Proc. of Int. Semantic Web Conf., volume 3729 of LNCS, pp. 522-536. Springer.

Passant, A. (2007). Using ontologies to strengthen folksonomies and enrich information retrieval in Weblogs. In Int. Conf. on Weblogs and Social Media, 2007.

Poblet, M., Casellas, N., Torralba, S., and Casanovas, P. (2009). Modeling expert knowledge in the mediation domain: A Mediation Core Ontology, in N. Casellas et al. (Eds.), LOAIT- 2009. 3rd Workshop on Legal Ontologies and Artificial Intelligence Techniques joint with 2^nd Workshop on Semantic Processing of Legal Texts. Barcelona, IDT Series n. 2.

Ritzer, G., and Jurgenson, N. (2010). Production, consumption, prosumption: The nature of capitalism in the age of the digital “prosumer.” In Journal of Consumer Culture 10: 13-36.

Specia, L., and Motta, E. (2007). Integrating folksonomies with the Semantic Web. Proc. Euro. Semantic Web Conf., 2007.

Meritxell Fernández-Barrera is a researcher at the Cersa (Centre d’Études et de Recherches de Sciences Administratives et Politiques) -CNRS, Université Paris 2-. She works on the application of natural language processing (NLP) to legal discourse and legal communication, and on the potentialities of Web 2.0 for participatory democracy.

Older Entries Newer Entries

Suffusion theme by Sayontan Sinha