{"id":2061,"date":"2011-11-17T06:41:07","date_gmt":"2011-11-17T11:41:07","guid":{"rendered":"http:\/\/blog.law.cornell.edu\/voxpop\/?p=2061"},"modified":"2025-01-31T14:26:21","modified_gmt":"2025-01-31T19:26:21","slug":"legal-prosumers-how-can-government-leverage-user-generated-content","status":"publish","type":"post","link":"https:\/\/blog.law.cornell.edu\/voxpop\/2011\/11\/17\/legal-prosumers-how-can-government-leverage-user-generated-content\/","title":{"rendered":"Legal Prosumers: How Can Government Leverage User-Generated Content?"},"content":{"rendered":"
Prosumption: shifting the barriers between information producers and consumers<\/strong><\/p>\n One of the major revolutions of the Internet era has been the shifting of the frontiers between producers and consumers [1]. Prosumption<\/em> refers to the emergence of a new category of actors who not only consume but also contribute to content creation and sharing. Under the umbrella of Web 2.0, many sites indeed enable users to share multimedia content, data, experiences [2], views and opinions on different issues, and even to act cooperatively to solve global problems [3]. Web 2.0 has become a fertile terrain for the proliferation of valuable user data enabling user profiling, opinion mining, trend and crisis detection, and collective problem solving [4].<\/p>\n The private sector has long understood the potentialities of user data and has used them for analysing customer preferences and satisfaction, for finding sales opportunities, for developing marketing strategies, and as a driver for innovation. Recently, corporations have relied on Web platforms for gathering new ideas from clients on the improvement or the development of new products and services (see for instance Dell\u2019s Ideastorm<\/a>; salesforce\u2019s IdeaExchange<\/a>; and My Starbucks Idea<\/a>). Similarly, Lego\u2019s Mindstorms <\/a> encourages users to share online their projects on the creation of robots, by which the design becomes public knowledge and can be freely reused by Lego (and anyone else), as indicated by the Terms of Service. Furthermore, companies have been recently mining social network data to foresee future action of the Occupy Wall Street movement<\/a>.<\/p>\n Even scientists have caught up and adopted collaborative methods that enable the participation of laymen in scientific projects [5].<\/p>\n Now, how far has government gone in taking up this opportunity?<\/p>\n Some recent initiatives indicate that the public sector is aware of the potential of the \u201cwisdom of crowds.\u201d In the domain of public health, MedWatcher<\/a> is a mobile application that allows the general public to submit information about any experienced drug side effects directly to the US Food and Drug Administration. In other cases, governments have asked for general input and ideas from citizens, such as the brainstorming session<\/a> organized by Obama government, the wiki launched by the New Zealand Police to get suggestions from citizens for the drafting of a new policing act to be presented to the parliament<\/a>, or the Website of the Department of Transport and Main Roads of the State of Queensland<\/a>, which encourages citizens to share their stories related to road tragedies.<\/p>\n Even in so crucial a task as the drafting of a constitution, government has relied on citizens\u2019 input through crowdsourcing [6]. And more recently several other initiatives have fostered crowdsourcing for constitutional reform in Morocco<\/a> and in Egypt <\/a>.<\/p>\n It is thus undeniable that we are witnessing an accelerated redefinition of the frontiers between experts and non-experts, scientists and non-scientists, doctors and patients, public officers and citizens, professional journalists and street reporters. The ‘Net has provided the infrastructure and the platforms for enabling collaborative work. Network connection is hardly a problem anymore. The problem is data analysis.<\/p>\n In other words: how to make sense of the flood of data produced and distributed by heterogeneous users? And more importantly, how to make sense of user-generated data in the light of more institutional sets of data (e.g., scientific, medical, legal)? The efficient use of crowdsourced data in public decision making requires building an informational flow between user experiences and institutional datasets.<\/p>\n Similarly, enhancing user access to public data has to do with matching user case descriptions with institutional data repositories (\u201cWhat are my rights and obligations in this case?\u201d; \u201cWhich public office can help me\u201d?; \u201cWhat is the delay in the resolution of my case?”; \u201cHow many cases like mine have there been in this area in the last month?\u201d<\/em>).<\/p>\n From the point of view of data processing, we are clearly facing a problem of semantic mapping and data structuring. The challenge is thus to overcome the flood of isolated information while avoiding excessive management costs. There is still a long way to go before tools for content aggregation and semantic mapping are generally available. This is why private firms and governments still mostly rely on the manual processing of user input.<\/p>\n The new producers of legally relevant content: a taxonomy<\/strong><\/p>\n Before digging deeper into the challenges of efficiently managing crowdsourced data, let us take a closer look at the types of user-generated data\u00a0flowing through the Internet that have some kind of legal or institutional flavour.<\/p>\n One type of user data emerges spontaneously from citizens’ online activity, and can take the form of:<\/p>\n User data can as well be prompted by institutions as a result of participatory governance initiatives, such as:<\/p>\n This variety of media supports and knowledge producers gives rise to a plurality of textual genres, semantically rich but difficult to manage given their heterogeneity and quick evolution.<\/p>\n Managing crowdsourcing<\/strong><\/p>\n The goal of crowdsourcing in an institutional context is to extract and aggregate content relevant for the management of public issues and for public decision making. Knowledge management strategies vary considerably depending on the ways in which user data have been generated. We can think of three possible strategies for managing the flood of user data:<\/p>\n Pre-structuring: prompting the citizen narrative in a strategic way<\/strong><\/p>\n A possible solution is to elicit user input in a structured way; that is to say, to impose some constraints on user input. This is the solution adopted by IdeaScale<\/a>, a software application that was used by the Open Government Dialogue<\/em> <\/a> initiative of the Obama Administration. In IdeaScale, users are asked to check whether their idea has already been covered by other users, and alternatively to add a new idea. They are also invited to vote for the best ideas, so that it is the community itself that rates and thus indirectly filters the users\u2019 input.<\/p>\n The MIT Deliberatorium<\/a>, a technology aimed at supporting large-scale online deliberation, follows a similar strategy. Users are expected to follow a series of rules to enable the correct creation of a knowledge map of the discussion. Each post should be limited to a single idea, it should not be redundant, and it should be linked to a suitable part of the knowledge map. Furthermore, posts are validated by moderators, who should ensure that new posts follow the rules of the system. Other systems that implement the same idea are featurelist <\/a> and Debategraph <\/a> [7].<\/p>\n While these systems enhance the creation and visualization of structured argument maps and promote community engagement through rating systems, they present a series of limitations. The most important of these is the fact that human intervention is needed to manually check the correct structure of the posts. Semantic technologies can play an important role in bridging this gap.<\/p>\n Semantic analysis through ontologies and terminologies<\/strong><\/p>\n Ontology-driven analysis of user-generated text implies finding a way to bridge Semantic Web data structures, such as formal ontologies expressed in RDF or OWL, with unstructured implicit ontologies emerging from user-generated content. Sometimes these emergent lightweight ontologies take the form of unstructured lists of terms used for tagging online content by users. Accordingly, some works have dealt with this issue, especially in the field of social tagging of Web resources in online communities. More concretely, different works have proposed models for making compatible the so-called top-down metadata structures (ontologies) with bottom-up tagging mechanisms (folksonomies).<\/p>\n The possibilities range from transforming folksonomies into lightly formalized semantic resources (Lux and Dsinger, 2007<\/a>; Mika, 2005<\/a>) to mapping folksonomy tags to the concepts and the instances of available formal ontologies (Specia and Motta, 2007<\/a>; Passant, 2007<\/a>). As the basis of these works we find the notion of emergent semantics<\/a> (Mika, 2005<\/a>), which questions the autonomy of engineered ontologies and emphasizes the value of meaning emerging from distributed communities working collaboratively through the Web.<\/p>\n The first case study that we conducted belongs to the domain of consumer justice, and was framed in the ONTOMEDIA project<\/a>. We proposed to reuse the available\u00a0Mediation-Core Ontology (MCO)<\/a> and Consumer Mediation Ontology (COM) as anchors to legal, institutional, and expert knowledge, and therefore as entry points for the queries posed by consumers in common-sense language.<\/p>\n The user corpus contained around 10,000 consumer questions and 20,000 complaints addressed from 2007 to 2010 to the Catalan Consumer Agency<\/a>. We applied a traditional terminology extraction methodology to identify candidate terms, which were subsequently validated by legal experts. We then manually mapped the lay terms to the ontological classes. The relations used for mapping lay terms with ontological classes are mostly has_lexicalisation and has_instance.<\/p>\n A second case study in the domain of consumer law was carried out with Italian corpora. In this case domain terminology was extracted from a normative corpus (the Code of Italian Consumer law<\/a>) and from a lay corpus (around 4000 consumers\u2019 questions).<\/p>\n In order to further explore the particularities of each corpus respecting the semantic coverage of the domain, terms were gathered together into a common taxonomic structure [8]. This task was performed with the aid of domain experts. When confronted with the two lists of terms, both laypersons and technical experts would link most of the validated lay terms to the technical list of terms through one of the following relations:<\/p>\n The distribution of normative and lay terms per taxonomic level shows that, whereas normative terms populate mostly the upper levels of the taxonomy [9], deeper levels in the hierarchy are almost exclusively represented by lay terms.<\/p>\n Term distribution per taxonomic level<\/p><\/div>\n The result of this type of approach is a set of terminological-ontological resources that provide some insights on the nature of laypersons’ cognition of the law, such as the fact that citizens\u2019 domain knowledge is mainly factual and therefore populates deeper levels of the taxonomy. Moreover, such resources can be used for the further processing of user input. However, this strategy presents some limitations as well. First, it is mainly driven by domain conceptual systems and, in a way, they might limit the potentialities of user-generated corpora. Second, they are not necessarily scalable. In other words, these terminological-ontological resources have to be rebuilt for each legal subdomain (such as consumer law, private law, or criminal law), and it is thus difficult to foresee mechanisms for performing an automated mapping between lay terms and legal terms.<\/p>\n Beyond domain ontologies: information extraction approaches <\/strong> One of the most important limitations of ontology-driven approaches is the lack of scalability. In order to overcome Discursive structures<\/em> formalise the way users typically describe a legal case. It is possible to identify stereotypical situations appearing in the description of legal cases by citizens (i.e., the nature of the problem; the conflict resolution strategies, etc.). The core of those situations is usually predicates, so it is possible to formalize them as frame structures containing different frame elements. We followed such an approach for the mapping of the Spanish corpus of consumers\u2019 questions to the classes of the domain ontology (Fern\u00e1ndez-Barrera and Casanovas, 2011). And the same technique was applied for mapping a set of citizens\u2019 complaints in the domain of acoustic nuisances to a legal domain ontology (Bourcier and Fern\u00e1ndez-Barrera, 2011). By describing general structures of citizen description of legal cases we ensure scalability.<\/p>\n Emotional structures<\/em> are extracted by current algorithms for opinion- and sentiment mining. User data in the legal domain often contain an important number of subjective elements (especially in the case of complaints and feedback on public services) that could be effectively mined and used in public decision making.<\/p>\n Finally, event structures<\/em>, which have been deeply explored so far, could be useful for information extraction from user complaints and feedback, or for automatic classification into specific types of queries according to the described situation.<\/p>\n Crowdsourcing in e-government: next steps (and precautions?)<\/strong><\/p>\n Legal prosumers’ input currently outstrips the capacity of government for extracting meaningful content in a cost-efficient way. Some developments are under way, among which are argument-mapping technologies and semantic matching between legal and lay corpora. The scalability of these methodologies is the main obstacle to overcome, in order to enable the matching of user data with open public data in several domains.<\/p>\n However, as technologies for the extraction of meaningful content from user-generated data develop and are used in public-decision making, a series of issues will have to be dealt with. For instance, should the system developer bear responsibility for the erroneous or biased analysis of data? Ethical questions arise as well: May governments legitimately analyse any type of user-generated content? Content-analysis systems might be used for trend- and crisis detection; but what if they are also used for restricting freedoms?<\/p>\n The \u201cwisdom of crowds\u201d can certainly be valuable in public decision making, but the fact that citizens\u2019 online behaviour can be observed and analysed by governments without citizens’ acknowledgement poses serious ethical issues.<\/p>\n Thus, technical development in this domain will have to be coupled with the definition of ethical guidelines and standards, maybe in the form of a system of quality labels for content-analysis systems.<\/p>\n [Editor’s Note<\/em>: For earlier VoxPopuLII<\/em> commentary on the creation of legal ontologies, see N\u00faria Casellas, Semantic Enhancement of Legal Information\u2026 Are We Up for the Challenge?<\/a> For earlier VoxPopuLII<\/em> commentary on Natural Language Processing and legal Semantic Web technology, see Adam Wyner, Weaving the Legal Semantic Web with Natural Language Processing<\/a>. For earlier VoxPopuLII<\/em> posts on user-generated content, crowdsourcing, and legal information, see Matt Baca and Olin Parker, Collaborative, Open Democracy with LexPop<\/a>; Olivier Charbonneau, Collaboration and Open Access to Law<\/a>; Nick Holmes, Accessible Law<\/a>; and Staffan Malmgren, Crowdsourcing Legal Commentary<\/a>.]<\/p>\n [1] The idea of prosumption existed actually long before the Internet, as highlighted by Ritzer and Jurgenson (2010)<\/a>: the consumer of a fast food restaurant is to some extent as well the producer of the meal since he is expected to be his own waiter, and so is the driver who pumps his own gasoline at the filling station.<\/p>\n [2] The experience project<\/a> enables registered users to share life experiences, and it contained around 7 million stories as of January 2011: http:\/\/www.experienceproject.com\/index.php<\/a>.<\/p>\n [3] For instance, the United Nations Volunteers Online platform<\/a> (http:\/\/www.onlinevolunteering.org\/en\/vol\/index.html<\/a>) helps volunteers to cooperate virtually with non-governmental organizations and other volunteers around the world.<\/p>\n [4] See for instance the experiment run by mathematician Gowers on his blog: he posted a problem and asked a large number of mathematicians to work collaboratively to solve it. They eventually succeeded faster than if they had worked in isolation: http:\/\/gowers.wordpress.com\/2009\/01\/27\/is-massively-collaborative-mathematics-possible\/<\/a>.<\/p>\n [5] The Galaxy Zoo project<\/a> asks volunteers to classify images of galaxies according to their shapes: http:\/\/www.galaxyzoo.org\/how_to_take_part<\/a>. See as well Cornell’s projects Nestwatch<\/a> (http:\/\/watch.birds.cornell.edu\/nest\/home\/index<\/a>) and FeederWatch<\/a> (http:\/\/www.birds.cornell.edu\/pfw\/Overview\/whatispfw.htm<\/a>), which invite people to introduce their observation data into a Website platform.<\/p>\n [6] http:\/\/www.participedia.net\/wiki\/Icelandic_Constitutional_Council_2011<\/a>.<\/p>\n [7] See the description of Debategraph in Marta Poblet’s post, Argument mapping: visualizing large-scale deliberations (http:\/\/serendipolis.wordpress.com\/2011\/10\/01\/argument-mapping-visualizing-large-scale-deliberations-3\/<\/a>).<\/p>\n [8] Terms have been organised in the form of a tree having as root nodes nine semantic classes previously identified. Terms have been added as branches and sub-branches, depending on their degree of abstraction.<\/p>\n [9] It should be noted that legal terms are mostly situated at the second level of the hierarchy rather than the first one. This is natural if we take into account the nature of the normative corpus (the Italian consumer code), which contains mostly domain specific concepts (for instance, withdrawal right<\/em>) instead of general legal abstract categories (such as right<\/em> and obligation<\/em>).<\/p>\n REFERENCES<\/p>\n Bourcier, D., and Fern\u00e1ndez-Barrera, M. (2011). A frame-based representation of citizen’s queries for the Web 2.0. A case study on noise nuisances. E-challenges conference, Florence 2011.<\/p>\n Fern\u00e1ndez-Barrera, M., and Casanovas, P. (2011). From user needs to expert knowledge: Mapping laymen queries with ontologies in the domain of consumer mediation. AICOL Workshop, Frankfurt 2011.<\/p>\n Lux, M., and Dsinger, G. (2007). From folksonomies to ontologies: Employing wisdom of the crowds to serve learning purposes<\/a>. International Journal of Knowledge and Learning (IJKL)<\/em>, 3(4\/5): 515-528.<\/p>\n Mika, P. (2005). Ontologies are us: A unified model of social networks and semantics<\/a>. In Proc. of Int. Semantic Web Conf.<\/em>, volume 3729 of LNCS<\/em>, pp. 522-536. Springer.<\/p>\n Passant, A. (2007). Using ontologies to strengthen folksonomies and enrich information retrieval in Weblogs<\/a>. In Int. Conf. on Weblogs and Social Media, 2007.<\/p>\n Poblet, M., Casellas, N., Torralba, S., and Casanovas, P. (2009). Modeling expert knowledge in the mediation domain: A Mediation Core Ontology<\/a>, in N. Casellas et al. (Eds.), LOAIT- 2009. 3rd Workshop on Legal Ontologies and Artificial Intelligence Techniques joint with 2nd<\/sup> Workshop on Semantic Processing of Legal Texts<\/em>. Barcelona, IDT Series n. 2.<\/p>\n Ritzer, G., and Jurgenson, N. (2010). Production, consumption, prosumption: The nature of capitalism in the age of the digital “prosumer.”<\/a> In Journal of Consumer Culture<\/em> 10: 13-36.<\/p>\n Specia, L., and Motta, E. (2007). Integrating folksonomies with the Semantic Web<\/a>. Proc. Euro. Semantic Web Conf.<\/em>, 2007.<\/p>\n VoxPopuLII is edited by Judith Pratt.<\/a> Editor-in-Chief is Robert Richards<\/a>, to whom queries should be directed. The statements above are not legal advice or legal representation. If you require legal advice, consult a lawyer. Find a lawyer<\/a> in the Cornell LII Lawyer Directory<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":" Prosumption: shifting the barriers between information producers and consumers One of the major revolutions of the Internet era has been the shifting of the frontiers between producers and consumers [1]. Prosumption refers to the emergence of a new category of actors who not only consume but also contribute to content creation and sharing. Under the […]<\/a><\/p>\n","protected":false},"author":75,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[356,294,295,467,471,246,293,4858],"tags":[4863,4859,4865,4862,4864,4861,4860],"_links":{"self":[{"href":"https:\/\/blog.law.cornell.edu\/voxpop\/wp-json\/wp\/v2\/posts\/2061"}],"collection":[{"href":"https:\/\/blog.law.cornell.edu\/voxpop\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.law.cornell.edu\/voxpop\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.law.cornell.edu\/voxpop\/wp-json\/wp\/v2\/users\/75"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.law.cornell.edu\/voxpop\/wp-json\/wp\/v2\/comments?post=2061"}],"version-history":[{"count":99,"href":"https:\/\/blog.law.cornell.edu\/voxpop\/wp-json\/wp\/v2\/posts\/2061\/revisions"}],"predecessor-version":[{"id":4084,"href":"https:\/\/blog.law.cornell.edu\/voxpop\/wp-json\/wp\/v2\/posts\/2061\/revisions\/4084"}],"wp:attachment":[{"href":"https:\/\/blog.law.cornell.edu\/voxpop\/wp-json\/wp\/v2\/media?parent=2061"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.law.cornell.edu\/voxpop\/wp-json\/wp\/v2\/categories?post=2061"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.law.cornell.edu\/voxpop\/wp-json\/wp\/v2\/tags?post=2061"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}\n
\n
\n
\n
\n
\n
<\/a>We have recently worked on several case studies in which we have proposed a mapping between legal and lay terminologies. We followed the approach proposed by Passant (2007)<\/a> and enriched the available ontologies with the terminology appearing in lay corpora. For this purpose, OWL classes were complemented with a has_lexicalization property linking them to lay terms.<\/p>\n
\n
\n
\n
\n
<\/a>
\n<\/em><\/p>\n<\/a> this problem, a possible strategy is to rely on informational structures that occur generally in user-generated content. These informational structures go beyond domain conceptual models and identify mostly discursive, emotional, or event structures.<\/p>\n
\n<\/a>Meritxell Fern\u00e1ndez-Barrera<\/a><\/strong> is a researcher at the Cersa <\/em>(Centre d’\u00c9tudes et de Recherches de Sciences Administratives et Politiques) -CNRS<\/a>, Universit\u00e9 Paris 2-. She works on the application of natural language processing (NLP) to legal discourse and legal communication, and on the potentialities of Web 2.0 for participatory democracy.<\/p>\n