skip navigation
search

Take a look at your bundle of tags on Delicious. Would you ever believe you’re going to change the law with a handful of them?

You’re going to change the way you research the law. The way you apply it. The way you teach it and, in doing so, shape the minds of future lawyers.

Do you think I’m going too far? Maybe.

But don’t overlook the way taxonomies have changed the law and shaped lawyers’ minds so far. Taxonomies? Yeah, taxonomies.

We, the lawyers, have used extensively taxonomies through the years; Civil lawyers in particular have shown to be particularly prone to them. We’ve used taxonomies for three reasons: to help legal research, to help memorization and teaching, and to apply the law.

 

Taxonomies help legal research.

2959826262_9b724b5a72First, taxonomies help us retrieve what we’ve stored (rules and case law).

Are you looking for a rule about a sales contract? Dive deep into the “Obligations” category and the corresponding book (Recht der Schuldverhältnisse, Obbligazioni, Des contrats ou des obligations conventionnelles en général, you name it ).

If you are a Common Lawyer, and ignore the perverse pleasure of browsing through Civil Code taxonomy, you’ll probably know Westlaw’s classification and its key numbering system. It has much more concrete categories and therefore much longer lists than the Civilians’ classification.
Legal taxonomies are there to help users find the content they’re looking for.

However, taxonomies sometimes don’t reflect the way the users reason; when this happens, you just won’t find what you’re looking for.

The problem with legal taxonomies.

If you are a German lawyer, you’ll probably be searching the “Obligations” book for rules concerning marriage; indeed in the German lawyer’s frame of mind, marriage is a peculiar form of contract. But if you are Italian, like I am, then you will most probably start looking in the “Persons” book; marriage rules are simply there, and we have been taught that marriage is not a contract but an agreement with no economic content (we have been trained to overlook the patrimonial shade in deference to the sentimental one).

So if I, the Italian, look for rules about marriage in the German civil code, I won’t find anything in the “Persons” book.
In other words, taxonomies work when they’re used by someone who reasons like the creator or–-and this happens with lawyers inside a certain legal system–-when users are trained to use the same taxonomy, and lawyers are trained at length.

But let’s take my friend Tim; he doesn’t have a legal education. He’s navigating Westlaw’s key number system looking for some relevant case law on car crashes. By chance he knows he should look below “torts,” but where? Is this injury and damage from act (k439)? Is this injury to a person in general (k425)? Is this injury to property or right of property in general (k429)? Wait, should he look below “crimes” (he is unclear on the distinction between torts and crimes)? And so on. Do these questions sound silly to you, the lawyers? Consider this: the titles we mentioned give no hint of the content, unless you already know what’s in there.

Because Law, complex as it is, needs a map. Lawyers have been trained to use the map. But what about non-lawyers?

In other words, the problems with legal taxonomies occur when the creators and the users don’t share the same frame of mind. And this is most likely to happen when the creators of the taxonomy are lawyers and the users are not lawyers.
Daniel Dabney wrote something similar some time ago. Let’s imagine that I buy a dog, take the little pooch home and find out that it’s mangy. Let’s imagine I’m that kind of aggressively unsatisfied customer and want to sue the seller, but know nothing about law. I go to the library and what will I look for? Rules on dogs sale? A book on Dog’s law? I’m lucky, there’s one, actually: “Dog law”, a book that gathers all laws regarding dogs and dogs owners.
But of course, that’s just luck, and  if I had to browse through legal category in the Westlaw’s index, I would never have found anything regarding “dogs”. I will never find the word “dog”, which is nonetheless the first word a non-legal trained person would think of. A savvy lawyer would look for rules regarding sales and warranties: general categories I may not know of (or think of) if I’m not a lawyer. If I’m not a lawyer I may not know that “the sale of arguably defective dogs are to be governed by the same rules that apply to other arguably defective items, like leaky fountain pens”. Dogs are like pens for a lawyer, but they are just dogs for a dogs-owner: so a dogs owner will look for rules about dogs, not rules about sales and warranties (or at least he would look for sale of dogs). And dog law, a  user aimed, object oriented category would probably fits his needs.

Observation #1: To make legal content available to everyone we must change the information architecture through which legal information are presented.

Will folksonomies make a better job?
Let’s come to folksonomies now. Here, the mismatch between creators (lawyers) and users’ way of reasoning is less likely to occur. The very same users decide which category to create and what to put into it. Moreover, more tags can overlap; that is, the same object can be tagged more than once. This allows the user to consider the same object from different perspectives. Take Delicious. If you search for “Intellectual property” on the Delicious search engine, you find a page about Copyright definition on Wikipedia. It was tagged mainly with “copyright.” But many users also tagged it with “wikipedia,” “law” and “intellectual-property” and even “art”. Maybe it was the non-lawyers out there who found it more useful to tag it with the “law” tag (a lawyer’s tag would have been more specific); maybe it was the lawyers who massively tagged it with “art” (there are a few “art” tags in their libraries). Or was it the other way around? The thing is, it’s up to users to decide where to classify it.

People also tag laws on Delicious using different labels that may or may not be related to law, because Delicious is a general-use website. But instead, let’s take a crowdsourced legal content website like Docracy. Here, people upload and tag their contracts, so it’s only legal content, and they tag them using only legal categories.

On Docracy, I found out that a whole category of documents that was dedicated to Terms of Service. Terms of Service is not a traditional legal category—-like torts, property, and contracts—-but it was a particularly useful category for Docracy users.

Docracy: WordPress Terms of Service are tagged with "TOS" but also with "Website".

Docracy: WordPress Terms of Service are tagged with “TOS” but also with “Website”.

If I browse some more, I see that the WordPress TOS are also tagged with “website.” Right, it makes sense; that is, if I’m a web designer looking for the legal stuff I need to know before deploying my website. If I start looking just from “website,” I’ll find TOS, but also “contract of works for web design or “standard agreements for design services” from AIGA.

You got it? What legal folksonomies bring us is:

  1. User-centered categories
  2. Flexible categorization systems. Many items can be tagged more than once and so be put into different categories. Legal stuff can be retrieved through different routes but also considered under different lights.

Will this enhance findability? I think it will, especially if the users are non-lawyers. And services that target the low-end of the legal market usually target non-lawyers.

Alright, I know what you’re thinking. You’re thinking, oh no, again another naive folksonomy supporter! And then you say: “Folksonomie structures are too flat to constitute something useful for legal research!” and “Law is too a specific sector with highly technical vocabulary and structure. Non-legal trained users would just tag wrongly”.

Let me quickly address these issues.

Objection 1: Folksonomies are too flat to constitute something  useful for legal research

Let’s start from a premise: we have no studies on legal folksonomies yet. Docracy is not a full folksonomy yet ( users can tag but tags are pre-determined by administrators). But we do have examples of folksonomies tout court, so my argument moves analogically from them. Folksonomies do work. Take  the Library of Congress Flickr project. Like an old grandmother, the Library gathered thousands of pictures that no-one ever had the time to review and categorize.  So pictures were uploaded on Flickr and left for the users to tag and comment. They did it en masse, mostly by using descriptive or topical tags (non-subjective) that were useful for retrieval. If folksonomies work for pictures (Flickr), books (Goodreads), questions and answers (Quora), basically everything else (Delicious), why shouldn’t they work for law? Given that premise, let’s move to first objection: folksonomies are flat. Wrong. As folksonomies evolve, we find out that they can have two, three and even more levels of categories. Take a look at the Quora hierarchy.

That’s not flat. Look, there are at least four levels in the screenshot: Classical Musicians & Composers > Pianists > Jazz Pianists > Ray Charles > What’d I Say. Right, Jazz pianists are not classical musicians: but mistakes do occur and the good point in folksonomies is that users can freely correct them.

Second point: findability doesn’t depend only on hierarchies. You can browse the folksonomy’s categories but you can also use free text search to dig into it.  In this case, users’ tags are metadata and so findability is enhanced because the search engine retrieves what users have tagged–not what admins have tagged.

 

Objection 2: Non-legal people will use the wrong tags

Uhm, yes, you’re right. They will tag a criminal law document with “tort” and a tort case involving a car accident with “car crash”. And so? Who cares? What if the majority of users find it useful? We forget too often that law is a social phenomenon, not a tool for technicians. And language is a social phenomenon too. If users consistently tag a legal document with the “wrong” tag X instead of the “right” tag Y, it means that they usually name that legal document with X. So most of them, when looking for that document, will look for X. And they’ll retrieve it, and be happy with that.

Of course, legal-savvy people would like to search by typical legal words (like, maybe, “chattel”?) or by using the legal categories they know so well.  Do we want to compromise? The fact is, in a system where there is only user-generated content, it goes without saying that a traditional top-down taxonomy would not work. But if we have to imagine a system where content is not user-generated, like a legal or case law database, that could happen. There could be, for instance, a mixed taxonomy-folksonomy system where taxonomy is built with traditional legal terms and scheme, whereas folksonomy is built by the users who are free to tag. Search in the end, can be done by browsing the taxonomy, by browsing the folksonomy or by means of a search engine which fishes on content relying both on metadata chosen by system administrators and on metadata chosen by the users who tagged the content.

This may seem like an imaginary system–but it’s happening already. Amazon uses traditional categories and leave the users free to tag. The BBC website followed a similar pattern, moving from full taxonomy system to a hybrid taxonomy-folksonomy one. Resilience, resilience, as Andrea Resmini and Luca Rosati put it in their seminal book on information architecture. Folksonomies and taxonomies can coexist. But this is not what this article is about, so sorry for the digression and let’s move to the first prediction.

Prediction #1: Folksonomies will provide the right information architecture for non-legal users.

Taxonomies and folksonomies help legal teaching.

7797310218_8d42f4743bSecondly, taxonomies help us memorize rules and case law. Put all the things in a box and group them on the basis of a common feature, and you’ll easily remember where they are. For this reason, taxonomies have played a major role in legal teaching. I’ll tell you a little story. Civil lawyers know very well the story of Gaius, the ancient Roman jurist who created a successful taxonomy for his law handbook, the Institutiones. His taxonomy was threefold: all law can be divided into persons, things, and actions. Five centuries later (five centuries!) Emperor Justinian transferred the very same taxonomy into his own Institutiones, a handbook aimed at youth “craving for legal knowledge” (cupida legum iuventes). Why? Because it worked! How powerful, both the slogan and the taxonomy! Indeed more than 1000 years later, we found it again, with a few changes, in German, French, Italian, and Spanish Civil Codes and that, in a whole bunch of nutshells, explains private law following the taxonomy of the Codes.

And now, consider what the taxonomies have done to lawyers’ minds.

Taxonomies have shaped their way of considering facts. Think. Put something into a category and you will lose all the other points of view on the same thing. The category shapes and limits our way to look at that particular thing.

Have you ever noticed how civil lawyers and common lawyers have a totally different way of looking at facts? Common lawyers see and take into account the details. Civil lawyers overlook them because the taxonomy they use has told them to do so.

In Rylands vs Fletcher (a UK tort case) some water escapes from a reservoir and floods a mine nearby. The owner of the reservoir could not possibly foresee the event and prevent it. However, the House of Lords states that the owner of the mine has the right to recover damages, even if there is no negligence. (“The person who for his own purpose brings on his lands and collects and keeps there anything likely to do mischief, if it escapes, must keep it in at his peril, and if he does not do so, is prima facie answerable for all the damage which is the natural consequence of its escape.”)

In Read vs Lyons, however, an employee gets injured during an explosion occurring in the ammunition factory where she is employed. The rule set in Rylands couldn’t be applied, as, according to the House of Lords, the case was very different; there is no escape.

On the contrary, for a Civil lawyer the decision would have been the same in both cases. For instance, under Italian Civil Code (but French and German Codes are not substantially different on this point), one would apply the general rule that grants reward for damages caused by “dangerous activities” and requires no proof of negligence on the plaintiff (art.2050 of the Civil Code), no matter what causes the danger (a big reservoir of water, an ammunition factory, whatever else).

Observation#2: taxonomies are useful for legal teaching and they shape lawyers minds.

Folksonomies for legal teaching?

Okay, and what about folksonomies? What if the way people tag legal concepts makes its way into legal teaching?

Take the Docracy‘s TOS category—have you ever thought about a course on TOS?

Another website, another example: Rocket Lawyer. Its categorization is not based on folksonomy, however; it’s purposely built around a user’s needs, which have been tested over the years, so in a way the taxonomy of the website comes from its users. One category is “identity theft”, which should be quite popular if it is prompted on the first page. What about teaching a course on identity theft? That would merge some material traditionally taught in privacy law, criminal law, and torts courses. Some course areas would overlap, which is good for memorization. Think again to the example of “Dog Law” by Dabney. What about a course about Dog Law, collecting material that refers to dogs across traditional legal categories?

Also, the same topic would be considered from different points of view.

What if students were trained to the specifications of the above-mentioned flexibility of categories? They wouldn’t get trapped into a single way of seeing things. If folksonomies account for different levels of abstractions, they would be trained to consider details. Not only that,  they would develop a very flexible frame of mind.

Prediction #2: legal folksonomies in legal teaching would keep lawyers’ minds flexible.

 

Taxonomies and folksonomies SHAPE the law.

Third, taxonomies make the law apply differently. Think about it. They are the very highways that allow the law to travel down to us. And here it comes, the real revolutionary potential of legal folksonomies, if we were to make them work.

Let’s start from taxonomies, with a couple of examples.

Civil lawyers are taught that Public and Private Law are two distinctive areas of law, to which different rules apply. In common law, the distinction is not that clear-cut. In Rigby vs Chief Constable of Northamptonshire  (a tort case from UK case law) the police—in an attempt to catch a criminal—damage a private shop by accidentally firing a canister of gas and setting the shop ablaze. The Queen’s Bench Division establishes that the police are liable under the tort of negligence only because the plaintiff manages to prove the police’s fault; they apply a private law category to a public body.
How would the same case have been decided under, say, French law? As the division between public and private law is stricter, the category of liability without fault, which is traditionally used when damages are caused by public bodies, would apply. The State would have to indemnify the damage, no matter if there was negligence.

Remember Rylands vs Fletcher and Lyons vs Read? The presence of escape/no escape was determinant, because the English taxonomy is very concrete. Civil lawyers work with taxonomies that have fewer, larger, and more abstract categories. If you cause damages by performing a risky activity, even if conducted without fault, you have to repay them. Period. Abstract taxonomy sweeps out any concrete detail. I think that Robert Berring had something like this in mind–although he referred to legal research–when he said  that “classification  defines the world of thinkable thoughts”. Or, as Dabney puts it, “thoughts that aren’t represented in the system had become unthinkable”.
So taxonomies make the law apply differently. In the former case, by setting a boundary between the public-private spheres; in the latter by creating a different framework for the application of more abstract or more detailed rules.

 

You don’t get it? All right, it’s tough, but do you have two minutes more? Let’s take this example by Dabney. Key number system’s taxonomy distinguishes between Navigable and Non-navigable waters (in the screenshot: waters and water courses). There’s a reason for that: lands under navigable waters presumptively belongs to the state, because “private ownership of the land under navigable waters would (…) compromise the use of those waters for navigation ad commerce”. So there are two categories because different laws apply to each. But now look at this screenshot.avulsion

Find anything strange? Yes:  avulsion rules are “doubled”: they are contained in both categories. But they are the very same: rules concerning avulsion don’t change if the water is navigable or not (check avulsion definition if you, like me, don’t remember what it is ). Dabney: “In this context,(…) there is no difference in the legal rules that are applied that depend on whether or not the water is navigable. Navigability has an effect on a wide range of issues concerning waters, but not on the accretion/avulsion issue. Here, the organization of the system needlessly separates cases from each other on the basis of an irrelevant criterion”. And you think, ok, but as long as we are aware of this error and know the rules concerning avulsion are the same, it’s not biggie. Right, but in the future?

“If searchers, over time, find cases involving navigable waters in one place and non-navigable waters in another, there might develop two distinct bodies of law.” Got it? Dabney foresees it. The way we categorize the law would shape the way we apply it.

Observation #3 Different taxonomies entail different ways to apply the law.

So, what if we substitute taxonomies with folksonomies?

And what if they had the power to shape the way judges, legal scholars, lawmakers and legal operators think?

Legal folksonomies are just starting out, and what I envisage is still yet to come. Which makes this article kind of a visionary one, I admit.

However, what Docracy is teaching us is that users—I didn’t say lawyers, but users—are generating decent legal content. Would you have bet your two cents on this, say, five years ago?
What if users started generating new legal categories (legal folksonomies?)

Berring wrote something really visionary more than ten years ago in his beautiful “Legal Research and the World of Thinkable Thoughts”. He couldn’t have folksonomies in mind, and still, wouldn’t you think he referred to them when writing: “There is simply too much stuff to sort through. No one can write a comprehensive treatise any more, and no one can read all of the new cases. Machines are sorting for us. We need a new set of thinkable thoughts.  We need a new Blackstone. We need someone, or more likely a group of someones, who can reconceptualize the structure of legal information.“?

Prediction #3 Legal folksonomies will make the law apply differently.

Let’s wait and see. Let the users tag. Where this tagging is going to take us is unpredictable, yes, but if you look at where taxonomies have taken us for all these years, you may find a clue.

I have a gut feeling that folksonomies are going to change the way we search, teach, and apply the law.

IMG_1221

 

 

Serena Manzoli is a legal architect and the founder at Wildcat, legal search for curious humans. She has been a Euro bureaucrat, a cadet, an in-house counsel, a bored lawyer. She holds an LLM from University of Bologna. She blogs at Lawyers are boring.  Twitter: SquareLaw

Prosumption: shifting the barriers between information producers and consumers

One of the major revolutions of the Internet era has been the shifting of the frontiers between producers and consumers [1]. Prosumption refers to the emergence of a new category of actors who not only consume but also contribute to content creation and sharing. Under the umbrella of Web 2.0, many sites indeed enable users to share multimedia content, data, experiences [2], views and opinions on different issues, and even to act cooperatively to solve global problems [3]. Web 2.0 has become a fertile terrain for the proliferation of valuable user data enabling user profiling, opinion mining, trend and crisis detection, and collective problem solving [4].

The private sector has long understood the potentialities of user data and has used them for analysing customer preferences and satisfaction, for finding sales opportunities, for developing marketing strategies, and as a driver for innovation. Recently, corporations have relied on Web platforms for gathering new ideas from clients on the improvement or the development of new products and services (see for instance Dell’s Ideastorm; salesforce’s IdeaExchange; and My Starbucks Idea). Similarly, Lego’s Mindstorms encourages users to share online their projects on the creation of robots, by which the design becomes public knowledge and can be freely reused by Lego (and anyone else), as indicated by the Terms of Service. Furthermore, companies have been recently mining social network data to foresee future action of the Occupy Wall Street movement.

Even scientists have caught up and adopted collaborative methods that enable the participation of laymen in scientific projects [5].

Now, how far has government gone in taking up this opportunity?

Some recent initiatives indicate that the public sector is aware of the potential of the “wisdom of crowds.” In the domain of public health, MedWatcher is a mobile application that allows the general public to submit information about any experienced drug side effects directly to the US Food and Drug Administration. In other cases, governments have asked for general input and ideas from citizens, such as the brainstorming session organized by Obama government, the wiki launched by the New Zealand Police to get suggestions from citizens for the drafting of a new policing act to be presented to the parliament, or the Website of the Department of Transport and Main Roads of the State of Queensland, which encourages citizens to share their stories related to road tragedies.

Even in so crucial a task as the drafting of a constitution, government has relied on citizens’ input through crowdsourcing [6]. And more recently several other initiatives have fostered crowdsourcing for constitutional reform in Morocco and in Egypt .

It is thus undeniable that we are witnessing an accelerated redefinition of the frontiers between experts and non-experts, scientists and non-scientists, doctors and patients, public officers and citizens, professional journalists and street reporters. The ‘Net has provided the infrastructure and the platforms for enabling collaborative work. Network connection is hardly a problem anymore. The problem is data analysis.

In other words: how to make sense of the flood of data produced and distributed by heterogeneous users? And more importantly, how to make sense of user-generated data in the light of more institutional sets of data (e.g., scientific, medical, legal)? The efficient use of crowdsourced data in public decision making requires building an informational flow between user experiences and institutional datasets.

Similarly, enhancing user access to public data has to do with matching user case descriptions with institutional data repositories (“What are my rights and obligations in this case?”; “Which public office can help me”?; “What is the delay in the resolution of my case?”; “How many cases like mine have there been in this area in the last month?”).

From the point of view of data processing, we are clearly facing a problem of semantic mapping and data structuring. The challenge is thus to overcome the flood of isolated information while avoiding excessive management costs. There is still a long way to go before tools for content aggregation and semantic mapping are generally available. This is why private firms and governments still mostly rely on the manual processing of user input.

The new producers of legally relevant content: a taxonomy

Before digging deeper into the challenges of efficiently managing crowdsourced data, let us take a closer look at the types of user-generated data flowing through the Internet that have some kind of legal or institutional flavour.

One type of user data emerges spontaneously from citizens’ online activity, and can take the form of:

  • citizens’ forums
  • platforms gathering open public data and comments over them (see for instance data-publica )
  • legal expert blogs (blawgs)
  • or the journalistic coverage of the legal system.

User data can as well be prompted by institutions as a result of participatory governance initiatives, such as:

  • crowdsourcing (targeting a specific issue or proposal by government as an open brainstorming session)
  • comments and questions addressed by citizens to institutions through institutional Websites or through e-mail contact.

This variety of media supports and knowledge producers gives rise to a plurality of textual genres, semantically rich but difficult to manage given their heterogeneity and quick evolution.

Managing crowdsourcing

The goal of crowdsourcing in an institutional context is to extract and aggregate content relevant for the management of public issues and for public decision making. Knowledge management strategies vary considerably depending on the ways in which user data have been generated. We can think of three possible strategies for managing the flood of user data:

Pre-structuring: prompting the citizen narrative in a strategic way

A possible solution is to elicit user input in a structured way; that is to say, to impose some constraints on user input. This is the solution adopted by IdeaScale, a software application that was used by the Open Government Dialogue initiative of the Obama Administration. In IdeaScale, users are asked to check whether their idea has already been covered by other users, and alternatively to add a new idea. They are also invited to vote for the best ideas, so that it is the community itself that rates and thus indirectly filters the users’ input.

The MIT Deliberatorium, a technology aimed at supporting large-scale online deliberation, follows a similar strategy. Users are expected to follow a series of rules to enable the correct creation of a knowledge map of the discussion. Each post should be limited to a single idea, it should not be redundant, and it should be linked to a suitable part of the knowledge map. Furthermore, posts are validated by moderators, who should ensure that new posts follow the rules of the system. Other systems that implement the same idea are featurelist and Debategraph [7].

While these systems enhance the creation and visualization of structured argument maps and promote community engagement through rating systems, they present a series of limitations. The most important of these is the fact that human intervention is needed to manually check the correct structure of the posts. Semantic technologies can play an important role in bridging this gap.

Semantic analysis through ontologies and terminologies

Ontology-driven analysis of user-generated text implies finding a way to bridge Semantic Web data structures, such as formal ontologies expressed in RDF or OWL, with unstructured implicit ontologies emerging from user-generated content. Sometimes these emergent lightweight ontologies take the form of unstructured lists of terms used for tagging online content by users. Accordingly, some works have dealt with this issue, especially in the field of social tagging of Web resources in online communities. More concretely, different works have proposed models for making compatible the so-called top-down metadata structures (ontologies) with bottom-up tagging mechanisms (folksonomies).

The possibilities range from transforming folksonomies into lightly formalized semantic resources (Lux and Dsinger, 2007; Mika, 2005) to mapping folksonomy tags to the concepts and the instances of available formal ontologies (Specia and Motta, 2007; Passant, 2007). As the basis of these works we find the notion of emergent semantics (Mika, 2005), which questions the autonomy of engineered ontologies and emphasizes the value of meaning emerging from distributed communities working collaboratively through the Web.

We have recently worked on several case studies in which we have proposed a mapping between legal and lay terminologies. We followed the approach proposed by Passant (2007) and enriched the available ontologies with the terminology appearing in lay corpora. For this purpose, OWL classes were complemented with a has_lexicalization property linking them to lay terms.

The first case study that we conducted belongs to the domain of consumer justice, and was framed in the ONTOMEDIA project. We proposed to reuse the available Mediation-Core Ontology (MCO) and Consumer Mediation Ontology (COM) as anchors to legal, institutional, and expert knowledge, and therefore as entry points for the queries posed by consumers in common-sense language.

The user corpus contained around 10,000 consumer questions and 20,000 complaints addressed from 2007 to 2010 to the Catalan Consumer Agency. We applied a traditional terminology extraction methodology to identify candidate terms, which were subsequently validated by legal experts. We then manually mapped the lay terms to the ontological classes. The relations used for mapping lay terms with ontological classes are mostly has_lexicalisation and has_instance.

A second case study in the domain of consumer law was carried out with Italian corpora. In this case domain terminology was extracted from a normative corpus (the Code of Italian Consumer law) and from a lay corpus (around 4000 consumers’ questions).

In order to further explore the particularities of each corpus respecting the semantic coverage of the domain, terms were gathered together into a common taxonomic structure [8]. This task was performed with the aid of domain experts. When confronted with the two lists of terms, both laypersons and technical experts would link most of the validated lay terms to the technical list of terms through one of the following relations:

  • Subclass: the lay term denotes a particular type of legal concept. This is the most frequent case. For instance, in the class objects, telefono cellulare (cell phone) and linea telefonica (phone line) are subclasses of the legal terms prodotto (product) and servizio (service), respectively. Similarly, in the class actors agente immobiliare (estate agent) can be seen as subclass of venditore (seller). In other cases, the linguistic structures extracted from the consumers’ corpus denote conflictual situations in which the obligations have not been fulfilled by the seller and therefore the consumer is entitled to certain rights, such as diritto alla sostituzione (entitlement to a replacement). These types of phrases are subclasses of more general legal concepts such as consumer right.
  • Instance: the lay term denotes a concrete instance of a legal concept. In some cases, terms extracted from the consumer corpus are named entities that denote particular individuals, such as Vodafone, an instance of a domain actor, a seller.
  • Equivalent: a legal term is used in lay discourse. For instance, contratto (contract) or diritto di recessione (withdrawal right).
  • Lexicalisation: the lay term is a lexical variant of the legal concept. This is the case for instance of negoziante, used instead of the legal term venditore (seller) or professionista (professional).

The distribution of normative and lay terms per taxonomic level shows that, whereas normative terms populate mostly the upper levels of the taxonomy [9], deeper levels in the hierarchy are almost exclusively represented by lay terms.

Term distribution per taxonomic level

The result of this type of approach is a set of terminological-ontological resources that provide some insights on the nature of laypersons’ cognition of the law, such as the fact that citizens’ domain knowledge is mainly factual and therefore populates deeper levels of the taxonomy. Moreover, such resources can be used for the further processing of user input. However, this strategy presents some limitations as well. First, it is mainly driven by domain conceptual systems and, in a way, they might limit the potentialities of user-generated corpora. Second, they are not necessarily scalable. In other words, these terminological-ontological resources have to be rebuilt for each legal subdomain (such as consumer law, private law, or criminal law), and it is thus difficult to foresee mechanisms for performing an automated mapping between lay terms and legal terms.

Beyond domain ontologies: information extraction approaches

One of the most important limitations of ontology-driven approaches is the lack of scalability. In order to overcome this problem, a possible strategy is to rely on informational structures that occur generally in user-generated content. These informational structures go beyond domain conceptual models and identify mostly discursive, emotional, or event structures.

Discursive structures formalise the way users typically describe a legal case. It is possible to identify stereotypical situations appearing in the description of legal cases by citizens (i.e., the nature of the problem; the conflict resolution strategies, etc.). The core of those situations is usually predicates, so it is possible to formalize them as frame structures containing different frame elements. We followed such an approach for the mapping of the Spanish corpus of consumers’ questions to the classes of the domain ontology (Fernández-Barrera and Casanovas, 2011). And the same technique was applied for mapping a set of citizens’ complaints in the domain of acoustic nuisances to a legal domain ontology (Bourcier and Fernández-Barrera, 2011). By describing general structures of citizen description of legal cases we ensure scalability.

Emotional structures are extracted by current algorithms for opinion- and sentiment mining. User data in the legal domain often contain an important number of subjective elements (especially in the case of complaints and feedback on public services) that could be effectively mined and used in public decision making.

Finally, event structures, which have been deeply explored so far, could be useful for information extraction from user complaints and feedback, or for automatic classification into specific types of queries according to the described situation.

Crowdsourcing in e-government: next steps (and precautions?)

Legal prosumers’ input currently outstrips the capacity of government for extracting meaningful content in a cost-efficient way. Some developments are under way, among which are argument-mapping technologies and semantic matching between legal and lay corpora. The scalability of these methodologies is the main obstacle to overcome, in order to enable the matching of user data with open public data in several domains.

However, as technologies for the extraction of meaningful content from user-generated data develop and are used in public-decision making, a series of issues will have to be dealt with. For instance, should the system developer bear responsibility for the erroneous or biased analysis of data? Ethical questions arise as well: May governments legitimately analyse any type of user-generated content? Content-analysis systems might be used for trend- and crisis detection; but what if they are also used for restricting freedoms?

The “wisdom of crowds” can certainly be valuable in public decision making, but the fact that citizens’ online behaviour can be observed and analysed by governments without citizens’ acknowledgement poses serious ethical issues.

Thus, technical development in this domain will have to be coupled with the definition of ethical guidelines and standards, maybe in the form of a system of quality labels for content-analysis systems.

[Editor’s Note: For earlier VoxPopuLII commentary on the creation of legal ontologies, see Núria Casellas, Semantic Enhancement of Legal Information… Are We Up for the Challenge? For earlier VoxPopuLII commentary on Natural Language Processing and legal Semantic Web technology, see Adam Wyner, Weaving the Legal Semantic Web with Natural Language Processing. For earlier VoxPopuLII posts on user-generated content, crowdsourcing, and legal information, see Matt Baca and Olin Parker, Collaborative, Open Democracy with LexPop; Olivier Charbonneau, Collaboration and Open Access to Law; Nick Holmes, Accessible Law; and Staffan Malmgren, Crowdsourcing Legal Commentary.]


[1] The idea of prosumption existed actually long before the Internet, as highlighted by Ritzer and Jurgenson (2010): the consumer of a fast food restaurant is to some extent as well the producer of the meal since he is expected to be his own waiter, and so is the driver who pumps his own gasoline at the filling station.

[2] The experience project enables registered users to share life experiences, and it contained around 7 million stories as of January 2011: http://www.experienceproject.com/index.php.

[3] For instance, the United Nations Volunteers Online platform (http://www.onlinevolunteering.org/en/vol/index.html) helps volunteers to cooperate virtually with non-governmental organizations and other volunteers around the world.

[4] See for instance the experiment run by mathematician Gowers on his blog: he posted a problem and asked a large number of mathematicians to work collaboratively to solve it. They eventually succeeded faster than if they had worked in isolation: http://gowers.wordpress.com/2009/01/27/is-massively-collaborative-mathematics-possible/.

[5] The Galaxy Zoo project asks volunteers to classify images of galaxies according to their shapes: http://www.galaxyzoo.org/how_to_take_part. See as well Cornell’s projects Nestwatch (http://watch.birds.cornell.edu/nest/home/index) and FeederWatch (http://www.birds.cornell.edu/pfw/Overview/whatispfw.htm), which invite people to introduce their observation data into a Website platform.

[6] http://www.participedia.net/wiki/Icelandic_Constitutional_Council_2011.

[7] See the description of Debategraph in Marta Poblet’s post, Argument mapping: visualizing large-scale deliberations (http://serendipolis.wordpress.com/2011/10/01/argument-mapping-visualizing-large-scale-deliberations-3/).

[8] Terms have been organised in the form of a tree having as root nodes nine semantic classes previously identified. Terms have been added as branches and sub-branches, depending on their degree of abstraction.

[9] It should be noted that legal terms are mostly situated at the second level of the hierarchy rather than the first one. This is natural if we take into account the nature of the normative corpus (the Italian consumer code), which contains mostly domain specific concepts (for instance, withdrawal right) instead of general legal abstract categories (such as right and obligation).

REFERENCES

Bourcier, D., and Fernández-Barrera, M. (2011). A frame-based representation of citizen’s queries for the Web 2.0. A case study on noise nuisances. E-challenges conference, Florence 2011.

Fernández-Barrera, M., and Casanovas, P. (2011). From user needs to expert knowledge: Mapping laymen queries with ontologies in the domain of consumer mediation. AICOL Workshop, Frankfurt 2011.

Lux, M., and Dsinger, G. (2007). From folksonomies to ontologies: Employing wisdom of the crowds to serve learning purposes. International Journal of Knowledge and Learning (IJKL), 3(4/5): 515-528.

Mika, P. (2005). Ontologies are us: A unified model of social networks and semantics. In Proc. of Int. Semantic Web Conf., volume 3729 of LNCS, pp. 522-536. Springer.

Passant, A. (2007). Using ontologies to strengthen folksonomies and enrich information retrieval in Weblogs. In Int. Conf. on Weblogs and Social Media, 2007.

Poblet, M., Casellas, N., Torralba, S., and Casanovas, P. (2009). Modeling expert knowledge in the mediation domain: A Mediation Core Ontology, in N. Casellas et al. (Eds.), LOAIT- 2009. 3rd Workshop on Legal Ontologies and Artificial Intelligence Techniques joint with 2nd Workshop on Semantic Processing of Legal Texts. Barcelona, IDT Series n. 2.

Ritzer, G., and Jurgenson, N. (2010). Production, consumption, prosumption: The nature of capitalism in the age of the digital “prosumer.” In Journal of Consumer Culture 10: 13-36.

Specia, L., and Motta, E. (2007). Integrating folksonomies with the Semantic Web. Proc. Euro. Semantic Web Conf., 2007.

Meritxell Fernández-Barrera is a researcher at the Cersa (Centre d’Études et de Recherches de Sciences Administratives et Politiques) -CNRS, Université Paris 2-. She works on the application of natural language processing (NLP) to legal discourse and legal communication, and on the potentialities of Web 2.0 for participatory democracy.

VoxPopuLII is edited by Judith Pratt. Editor-in-Chief is Robert Richards, to whom queries should be directed. The statements above are not legal advice or legal representation. If you require legal advice, consult a lawyer. Find a lawyer in the Cornell LII Lawyer Directory.

WorldLII[Editor’s Note: We are republishing here, with some corrections, a post by Dr. Núria Casellas that appeared earlier on VoxPopuLII.]

The organization and formalization of legal information for computer processing in order to support decision-making or enhance information search, retrieval and knowledge management is not recent, and neither is the need to represent legal knowledge in a machine-readable form. Nevertheless, since the first ideas of computerization of the law in the late 1940s, the appearance of the first legal information systems in the 1950s, and the first legal expert systems in the 1970s, claims, such as Hafner’s, that “searching a large database is an important and time-consuming part of legal work,” which drove the development of legal information systems during the 80s, have not yet been left behind.

Similar claims may be found nowadays as, on the one hand, the amount of available unstructured (or poorly structured) legal information and documents made available by governments, free access initiatives, blawgs, and portals on the Web will probably keep growing as the Web expands. And, on the other, the increasing quantity of legal data managed by legal publishing companies, law firms, and government agencies, together with the high quality requirements applicable to legal information/knowledge search, discovery, and management (e.g., access and privacy issues, copyright, etc.) have renewed the need to develop and implement better content management tools and methods.

Information overload, however important, is not the only concern for the future of legal knowledge management; other and growing demands are increasing the complexity of the requirements that legal information management systems and, in consequence, legal knowledge representation must face in the future. Multilingual search and retrieval of legal information to enable, for example, integrated search between the legislation of several European countries; enhanced laypersons’ understanding of and access to e-government and e-administration sites or online dispute resolution capabilities (e.g., BATNA determination); the regulatory basis and capabilities of electronic institutions or normative and multi-agent systems (MAS); and multimedia, privacy or digital rights management systems, are just some examples of these demands.

How may we enable legal information interoperability? How may we foster legal knowledge usability and reuse between information and knowledge systems? How may we go beyond the mere linking of legal documents or the use of keywords or Boolean operators for legal information search? How may we formalize legal concepts and procedures in a machine-understandable form?

In short, how may we handle the complexity of legal knowledge to enhance legal information search and retrieval or knowledge management, taking into account the structure and dynamic character of legal knowledge, its relation with common sense concepts, the distinct theoretical perspectives, the flavor and influence of legal practice in its evolution, and jurisdictional and linguistic differences?

These are challenging tasks, for which different solutions and lines of research have been proposed. Here, I would like to draw your attention to the development of semantic solutions and applications and the construction of formal structures for representing legal concepts in order to make human-machine communication and understanding possible.

Semantic metadata

For example, in the search and retrieval area, we still perform nowadays most legal searches in online or application databases using keywords (that we believe to be contained in the document that we are searching for), maybe together with a combination of Boolean operators, or supported with a set of predefined categories (metadata regarding, for example, date, type of court, etc.), a list of pre-established topics, thesauri (e.g., EuroVoc), or a synonym-enhanced search.

These searches rely mainly on syntactic matching, and — with the exception of searches enhanced with categories, synonyms, or thesauri — they will return only documents that contain the exact term searched for. To perform more complex searches, to go beyond the term, we require the search engine to understand the semantic level of legal documents; a shared understanding of the domain of knowledge becomes necessary.

Although the quest for the representation of legal concepts is not new, these efforts have recently been driven by the success of the World Wide Web (WWW) and, especially, by the later development of the Semantic Web. Sir Tim Berners-Lee described it as an extension of the Web “in which information is given well-defined meaning, better enabling computers and people to work in cooperation.”

From Web 2.0 to Web 3.0

Thus, the Semantic Web is envisaged as an extension of the current Web, which now comprises collaborative tools and social networks (the Social Web or Web 2.0). The Semantic Web is sometimes also referred to as Web 3.0, although there is no widespread agreement on this matter, as different visions exist regarding the enhancement and evolution of the current Web.

These efforts also include the Web of Data (or Linked Data), which relies on the existence of standard formats (URIs, HTTP and RDF) to allow the access and query of interrelated datasets, which may be granted through a SPARQL endpoint (e.g., Govtrack.us, US census data, etc.). Sharing and connecting data on the Web in compliance with the Linked Data principles enables the exploitation of content from different Web data sources with the development of search, browse, and other mashup applications. (See the Linking Open Data cloud diagram by Cyganiak and Jentzsch below.) [Editor’s Note: Legislation.gov.uk also applies Linked Data principles to legal information, as John Sheridan explains in his recent post.]

LinkedData

Thus, to allow semantics to be added to the current Web, new languages and tools (ontologies) were needed, as the development of the Semantic Web is based on the formal representation of meaning in order to share with computers the flexibility, intuition, and capabilities of the conceptual structures of human natural languages. In the subfield of computer science and information science known as Knowledge Representation, the term “ontology” refers to a consensual and reusable vocabulary of identified concepts and their relationships regarding some phenomena of the world, which is made explicit in a machine-readable language. Ontologies may be regarded as advanced taxonomical structures, Semantic Web Stackwhere concepts are formalized as classes and defined with axioms, enriched with the description of attributes or constraints, and properties.

The task of developing interoperable technologies (ontology languages, guidelines, software, and tools) has been taken up by the World Wide Web Consortium (W3C). These technologies were arranged in the Semantic Web Stack according to increasing levels of complexity (like a layer cake). In this stack, higher layers depend on lower layers (and the latter are inherited from the original Web). These languages include XML (eXtensible Markup Language), a superset of HTML usually used to add structure to documents, and the so-called ontology languages: RDF/RDFS (Resource Description Framework/Schema), OWL, and OWL2 (Ontology Web Language). While the RDF language offers simple descriptive information about the resources on the Web, encoded in sets of triples of subject (a resource), predicate (a property or relation), and object (a resource or a value), RDFS allows the description of sets. OWL offers an even more expressive language to define structured ontologies (e.g. class disjointess, union or equivalence, etc.

Moreover, a specification to support the conversion of existing thesauri, taxonomies or subject headings into RDF triples has recently been published: the SKOS, Simple Knowledge Organization System standard. These specifications may be exploited in Linked Data efforts, such as the New York Times vocabularies. Also, EuroVoc, the multilingual thesaurus for activities of the EU is, for example, now available in this format.

Although there are different views in the literature regarding the scope of the definition or main characteristics of ontologies, the use of ontologies is seen as the key to implementing semantics for human-machine communication. Many ontologies have been built for different purposes and knowledge domains, for example:

  • OpenCyc: an open source version of the Cyc general ontology;
  • SUMO: the Suggested Upper Merged Ontology;
  • the upper ontologies PROTON (PROTo Ontology) and DOLCE (Descriptive Ontology for Linguistic and Cognitive Engineering);
  • the FRBRoo model (which represents bibliographic information);
  • the RDF representation of Dublin Core;
  • the Gene Ontology;
  • the FOAF (Friend of a Friend) ontology.

Although most domains are of interest for ontology modeling, the legal domain offers a perfect area for conceptual modeling and knowledge representation to be used in different types of intelligent applications and legal reasoning systems, not only due to its complexity as a knowledge intensive domain, but also because of the large amount of data that it generates. The use of semantically-enabled technologies for legal knowledge management could provide legal professionals and citizens with better access to legal information; enhance the storage, search, and retrieval of legal information; make possible advanced knowledge management systems; enable human-computer interaction; and even satisfy some hopes respecting automated reasoning and argumentation.

Regarding the incorporation of legal knowledge into the Web or into IT applications, or the more complex realization of the Legal Semantic Web, several directions have been taken, such as the development of XML standards for legal documentation and drafting (including Akoma Ntoso, LexML, CEN Metalex, and Norme in Rete), and the construction of legal ontologies.

Ontologizing legal knowledge

During the last decade, research on the use of legal ontologies as a technique to represent legal knowledge has increased and, as a consequence, a very interesting debate about their capacity to represent legal concepts and their relation to the different existing legal theories has arisen. It has even been suggested that ontologies could be the “missing link” between legal theory and Artificial Intelligence.

The literature suggests that legal ontologies may be distinguished by the levels of abstraction of the ideas they represent, the key distinction being between core and domain levels. Legal core ontologies model general concepts which are believed to be central for the understanding of law and may be used in all legal domains. In the past, ontologies of this type were mainly built upon insights provided by legal theory and largely influenced by normativism and legal positivism, especially by the works of Hart and Kelsen. Thus, initial legal ontology development efforts in Europe were influenced by hopes and trends in research on legal expert systems based on syllogistic approaches to legal interpretation.

More recent contributions at that level include the LKIF-Core Ontology, the LRI-Core Ontology, the DOLCE+CLO (Core Legal Ontology), and the Ontology of Fundamental Legal Concepts.Blue Scene Such ontologies usually include references to the concepts of Norm, Legal Act, and Legal Person, and may contain the formalization of deontic operators (e.g., Prohibition, Obligation, and Permission).

Domain ontologies, on the other hand, are directed towards the representation of conceptual knowledge regarding specific areas of the law or domains of practice, and are built with particular applications in mind, especially those that enable communication (shared vocabularies), or enhance indexing, search, and retrieval of legal information. Currently, most legal ontologies being developed are domain-specific ontologies, and some areas of legal knowledge have been heavily targeted, notably the representation of intellectual property rights respecting digital rights management (IPROnto Ontology, the Copyright Ontology, the Ontology of Licences, and the ALIS IP Ontology), and consumer-related legal issues (the Customer Complaint Ontology (or CContology), and the Consumer Protection Ontology). Many other well-documented ontologies have also been developed for purposes of the detection of financial fraud and other crimes; the representation of alternative dispute resolution methods, privacy compliance, patents, cases (e.g., Legal Case OWL Ontology), judicial proceedings, legal systems, and argumentation frameworks; and the multilingual retrieval of European law, among others. (See, for example, the proceedings of the JURIX and ICAIL conferences for further references.)

A socio-legal approach to legal ontology development

Thus, there are many approaches to the development of legal ontologies. Nevertheless, in the current legal ontology literature there are few explicit accounts or insights into the methods researchers use to elicit legal knowledge, and the accounts that are available reflect a lack of consensus as to the most appropriate methodology. For example, some accounts focus solely on the use of text mining techniques towards ontology learning from legal texts; while others concentrate on the analysis of legal theories and related materials to extract and formalize legal concepts. Moreover, legal ontology researchers disagree about the role that legal experts should play in ontology development and validation.

Orange SceneIn this regard, at the Institute of Law and Technology, we are developing a socio-legal approach to the construction of legal conceptual models. This approach stems from our collaboration with firms, government agencies, and nonprofit organizations (and their experts, clients, and other users) for the gathering of either explicit or tacit knowledge according to their needs. This empirically-based methodology may require the modeling of legal knowledge in practice (or professional legal knowledge, PLK), and the acquisition of knowledge through ethnographic and other social science research methods, together with the extraction (and merging) of concepts from a range of different sources (acts, regulations, case law, protocols, technical reports, etc.) and their validation by both legal experts and users.

For example, the Ontology of Professional Judicial Knowledge (OPJK) was developed in collaboration with the Spanish School of the Judicary to enhance search and retrieval capabilities of a Web-based frequentl- asked-question system (IURISERVICE) containing a repository of practical knowledge for Spanish judges in their first appointment. The knowledge was elicited from an ethnographic survey in Spanish First Instance Courts. On the other hand, the Neurona Ontologies, for a data protection compliance application, are based on the knowledge of legal experts and the requirements of enterprise asset management, together with the analysis of privacy and data protection regulations and technical risk management standards.

This approach tries to take into account many of the criticisms that developers of legal knowledge-based systems (LKBS) received during the 1980s and the beginning of the 1990s, including, primarily, the lack of legal knowledge or legal domain understanding of most LKBS development teams at the time. These criticisms were rooted in the widespread use of legal sources (statutes, case law, etc.) directly as the knowledge for the knowledge base, instead of including in the knowledge base the “expert” knowledge of lawyers or law-related professionals.

Further, in order to represent knowledge in practice (PLK), legal ontology engineering could benefit from the use of social science research methods for knowledge elicitation, institutional/organizational analysis (institutional ethnography), as well as close collaboration with legal practitioners, users, experts, and other stakeholders, in order to discover the relevant conceptual models that ought to be represented in the ontologies. Moreover, I understand the participation of these stakeholders in ontology evaluation and validation to be crucial to ensuring consensus about, and the usability of, a given legal ontology.

Challenges and drawbacks

Although the use of ontologies and the implementation of the Semantic Web vision may offer great advantages to information and knowledge management, there are great challenges and problems to be overcome.

First, the problems related to knowledge acquisition techniques and bottlenecks in software engineering are inherent in ontology engineering, and ontology development is quite a time-consuming and complex task. Second, as ontologies are directed mainly towards enabling some communication on the basis of shared conceptualizations, how are we to determine the sharedness of a concept? And how are context-dependencies or (cultural) diversities to be represented? Furthermore, how can we evaluate the content of ontologies?

Collaborative Current research is focused on overcoming these problems through the establishment of gold standards in concept extraction and ontology learning from texts, and the idea of collaborative development of legal ontologies, although these techniques might be unsuitable for the development of certain types of ontologies. Also, evaluation (validation, verification, and assessment) and quality measurement of ontologies are currently an important topic of research, especially ontology assessment and comparison for reuse purposes.

Regarding ontology reuse, the general belief is that the more abstract (or core) an ontology is, the less it owes to any particular domain and, therefore, the more reusable it becomes across domains and applications. This generates a usability-reusability trade-off that is often difficult to resolve.

Finally, once created, how are these ontologies to evolve? How are ontologies to be maintained and new concepts added to them?

Over and above these issues, in the legal domain there are taking place more particularized discussions:  for example, the discussion of the advantages and drawbacks of adopting an empirically based perspective (bottom-up), and the complexity of establishing clear connections with legal dogmatics or general legal theory approaches (top-down). To what extent are these two different perspectives on legal ontology development incompatible? How might they complement each other? What is their relationship with text-based approaches to legal ontology modeling?

I would suggest that empirically based, socio-legal methods of ontology construction constitute a bottom-up approach that enhances the usability of ontologies, while the general legal theory-based approach to ontology engineering fosters the reusability of ontologies across multiple domains.

The scholarly discussion of legal ontology development also embraces more fundamental issues, among them the capabilities of ontology languages for the representation of legal concepts, the possibilities of incorporating a legal flavor into OWL, and the implications of combining ontology languages with the formalization of rules.

Finally, the potential value to legal ontology of other approaches, areas of expertise, and domains of knowledge construction ought to be explored, for example: pragmatics and sociology of law methodologies, experiences in biomedical ontology engineering, formal ontology approaches, salamander.jpgand the relationships between legal ontology and legal epistemology, legal knowledge and common sense or world knowledge, expert and layperson’s knowledge, legal information and Linked Data possibilities, and legal dogmatics and political science (e.g., in e-Government ontologies).

As you may see, the challenges faced by legal ontology engineering are great, and the limitations of legal ontologies are substantial. Nevertheless, the potential of legal ontologies is immense. I believe that law-related professionals and legal experts have a central role to play in the successful development of legal ontologies and legal semantic applications.

[Editor’s Note: For many of us, the technical aspects of ontologies and the Semantic Web are unfamiliar. Yet these technologies are increasingly being incorporated into the legal information systems that we use everyday, so it’s in our interest to learn more about them. For those of us who would like a user-friendly introduction to ontologies and the Semantic Web, here are some suggestions:

Dr. Núria Casellas Dr. Núria Casellas is a visiting researcher at the Legal Information Institute at Cornell University. She is a researcher at the Institute of Law and Technology and an assistant professor at the UAB Law School (on leave). She has participated in several national and European-funded research projects regarding legal ontologies and legal knowledge management: these concern the acquisition of knowledge in judicial settings (IURISERVICE), modeling privacy compliance regulations (NEURONA), drafting legislation (DALOS), and the Legal Case Study of the Semantically Enabled Knowledge Technologies (SEKT VI Framework project), among others. Co-editor of the IDT Series, she holds a Law Degree from the Universitat Autònoma de Barcelona, a Master’s Degree in Health Care Ethics and Law from the University of Manchester, and a PhD (“Modelling Legal Knowledge through Ontologies. OPJK: the Ontology of Professional Judicial Knowledge”).

VoxPopuLII is edited by Judith Pratt. Editor in Chief is Robert Richards.

Ontology?The organization and formalization of legal information for computer processing in order to support decision-making or enhance information search, retrieval and knowledge management is not recent, and neither is the need to represent legal knowledge in a machine-readable form. Nevertheless, since the first ideas of computerization of the law in the late 1940s, the appearance of the first legal information systems in the 1950s, and the first legal expert systems in the 1970s, claims, such as Hafner’s, that “searching a large database is an important and time-consuming part of legal work,” which drove the development of legal information systems during the 80s, have not yet been left behind.

Similar claims may be found nowadays as, on the one hand, the amount of available unstructured (or poorly structured) legal information and documents made available by governments, free access initiatives, blawgs, and portals on the Web will probably keep growing as the Web expands. And, on the other, the increasing quantity of legal data managed by legal publishing companies, law firms, and government agencies, together with the high quality requirements applicable to legal information/knowledge search, discovery, and management (e.g., access and privacy issues, copyright, etc.) have renewed the need to develop and implement better content management tools and methods.

Information overload, however important, is not the only concern for the future of legal knowledge management; other and growing demands are increasing the complexity of the requirements that legal information management systems and, in consequence, legal knowledge representation must face in the future. Multilingual search and retrieval of legal information to enable, for example, integrated search between the legislation of several European countries; enhanced laypersons’ understanding of and access to e-government and e-administration sites or online dispute resolution capabilities (e.g., BATNA determination); the regulatory basis and capabilities of electronic institutions or normative and multi-agent systems (MAS); and multimedia, privacy or digital rights management systems, are just some examples of these demands.

How may we enable legal information interoperability? How may we foster legal knowledge usability and reuse between information and knowledge systems? How may we go beyond the mere linking of legal documents or the use of keywords or Boolean operators for legal information search? How may we formalize legal concepts and procedures in a machine-understandable form?

In short, how may we handle the complexity of legal knowledge to enhance legal information search and retrieval or knowledge management, taking into account the structure and dynamic character of legal knowledge, its relation with common sense concepts, the distinct theoretical perspectives, the flavor and influence of legal practice in its evolution, and jurisdictional and linguistic differences?

These are challenging tasks, for which different solutions and lines of research have been proposed. Here, I would like to draw your attention to the development of semantic solutions and applications and the construction of formal structures for representing legal concepts in order to make human-machine communication and understanding possible.

Semantic metadata

Nowadays, in the search and retrieval area, we still perform most legal searches in online or application databases using keywords (that we believe to be contained in the document that we are searching for), maybe together with a combination of Boolean operators, or supported with a set of predefined categories (metadata regarding, for example, date, type of court, etc.), a list of pre-established topics, thesauri (e.g., EUROVOC), or a synonym-enhanced search.

These searches rely mainly on syntactic matching, and — with the exception of searches enhanced with categories, synonyms, or thesauri — they will return only documents that contain the exact term searched for. To perform more complex searches, to go beyond the term, we require the search engine to understand the semantic level of legal documents; a shared understanding of the domain of knowledge becomes necessary.

Although the quest for the representation of legal concepts is not new, these efforts have recently been driven by the success of the World Wide Web (WWW) and, especially, by the later development of the Semantic Web. Sir Tim Berners-Lee described it as an extension of the Web “in which information is given well-defined meaning, better enabling computers and people to work in cooperation.”

FRBRoo screenshot

Thus, the Semantic Web (including Linked Data efforts or the Web of Data) is envisaged as an extension of the current Web, which now also comprises collaborative tools and social networks (the Social Web or Web 2.0). The Semantic Web is sometimes also referred to as Web 3.0, although there is no widespread agreement on this matter, as different visions exist regarding the enhancement and evolution of the current Web.

From Web 2.0 to Web 3.0

Towards that shift, new languages and tools (ontologies) were needed to allow semantics to be added to the current Web, as the development of the Semantic Web is based on the formal representation of meaning in order to share with computers the flexibility, intuition, and capabilities of the conceptual structures of human natural languages. In the subfield of computer science and information science known as Knowledge Representation, the term “ontology” refers to a consensual and reusable vocabulary of identified concepts and their relationships regarding some phenomena of the world, which is made explicit in a machine-readable language. Ontologies may be regarded as advanced taxonomical structures, where concepts formalized as classes (e.g., “Actor”) are defined with axioms, enriched with the description of attributes or constraints (for example, “cardinality”), and linked to other classes through properties (e.g., “possesses” or “is_possessed_by”).
FRBRoo

The task of developing interoperable technologies (ontology languages, guidelines, software, and tools) has been taken up by the World Wide Web Consortium (W3C). These technologies were arranged in the Semantic Web Stack according to increasing levels of complexity (like a layer cake), in the sense that higher layers depend on lower layers (and the latter are inherited from the original Web). The languages include XML (eXtensible Markup Language), a superset of HTML usually used to add structure to documents, and the so-called ontology languages: RDF (Resource Description Framework), OWL, and Semantic Web StackOWL2 (Ontology Web Language). Recently, a specification to support the conversion of existing thesauri, taxonomies or subject headings into RDF has been released (the the SKOS, Simple Knowledge Organization System standard).

Although there are different views in the literature regarding the scope of the definition or main characteristics of ontologies, the use of ontologies is seen as the key to implementing semantics for human-machine communication. Many ontologies have been built for different purposes and knowledge domains, for example:

Although most domains are of interest for ontology modeling, the legal domain offers a perfect area for conceptual modeling and knowledge representation to be used in different types of intelligent applications and legal reasoning systems, not only due to its complexity as a knowledge intensive domain, but also because of the large amount of data that it generates. The use of semantically-enabled technologies for legal knowledge management could provide legal professionals and citizens with better access to legal information; enhance the storage, search, and retrieval of legal information; make possible advanced knowledge management systems; enable human-computer interaction; and even satisfy some hopes respecting automated reasoning and argumentation.

Regarding the incorporation of legal knowledge into the Web or into IT applications, or the more complex realization of the Legal Semantic Web, several directions have been taken, such as the development of XML standards for legal documentation and drafting (including Akoma Ntoso, LexML, CEN Metalex, and Norme in Rete), and the construction of legal ontologies.

Ontologizing legal knowledge

During the last decade, research on the use of legal ontologies as a technique to represent legal knowledge has increased and, as a consequence, a very interesting debate about their capacity to represent legal concepts and their relation to the different existing legal theories has arisen. It has even been suggested that ontologies could be the “missing link” between legal theory and Artificial Intelligence.

The literature suggests that legal ontologies may be distinguished by the levels of abstraction of the ideas they represent, the key distinction being between core and domain levels. Legal core ontologies model general concepts which are believed to be central for the understanding of law and may be used in all legal domains. In the past, ontologies of this type were mainly built upon insights provided by legal theory and largely influenced by normativism and legal positivism, especially by the works of Hart and Kelsen. Thus, initial legal ontology development efforts in Europe were influenced by hopes and trends in research on legal expert systems based on syllogistic approaches to legal interpretation.

More recent contributions at that level include the LRI-Core Ontology, the DOLCE+CLO (Core Legal Ontology), and the Ontology of Fundamental Legal ConceptsBlue Scene (the basis for the LKIF-Core Ontology). Such ontologies usually include references to the concepts of Norm, Legal Act, and Legal Person, and may contain the formalization of deontic operators (e.g., Prohibition, Obligation, and Permission).

Domain ontologies, on the other hand, are directed towards the representation of conceptual knowledge regarding specific areas of the law or domains of practice, and are built with particular applications in mind, especially those that enable communication (shared vocabularies), or enhance indexing, search, and retrieval of legal information. Currently, most legal ontologies being developed are domain-specific ontologies, and some areas of legal knowledge have been heavily targeted, notably the representation of intellectual property rights respecting digital rights management (IPROnto Ontology, the Copyright Ontology, the Ontology of Licences, and the ALIS IP Ontology), and consumer-related legal issues (the Customer Complaint Ontology (or CContology), and the Consumer Protection Ontology). Many other well-documented ontologies have also been developed for purposes of the detection of financial fraud and other crimes; the representation of alternative dispute resolution methods, cases, judicial proceedings, and argumentation frameworks; and the multilingual retrieval of European law, among others. (See, for example, the proceedings of the JURIX and ICAIL conferences for further references.)

A socio-legal approach to legal ontology development

Thus, there are many approaches to the development of legal ontologies. Nevertheless, in the current legal ontology literature there are few explicit accounts or insights into the methods researchers use to elicit legal knowledge, and the accounts that are available reflect a lack of consensus as to the most appropriate methodology. For example, some accounts focus solely on the use of legal text mining and statistical analysis, in which ontologies are built by means of machine learning from legal texts; while others concentrate on the analysis of legal theories and related materials. Moreover, legal ontology researchers disagree about the role that legal experts should play in ontology validation.

Orange SceneIn this regard, at the Institute of Law and Technology, we are developing a socio-legal approach to the construction of legal conceptual models. This approach stems from our collaboration with firms, government agencies, and nonprofit organizations (and their experts, clients, and other users) for the gathering of either explicit or tacit knowledge according to their needs. This empirically-based methodology may require the modeling of legal knowledge in practice (or professional legal knowledge, PLK), and the acquisition of knowledge through ethnographic and other social science research methods, together with the extraction (and merging) of concepts from a range of different sources (acts, regulations, case law, protocols, technical reports, etc.) and their validation by both legal experts and users.

For example, the Ontology of Professional Judicial Knowledge (OPJK) was developed in collaboration with the Spanish School of the Judicary to enhance search and retrieval capabilities of a Web-based frequentl- asked-question system (IURISERVICE) containing a repository of practical knowledge for Spanish judges in their first appointment. The knowledge was elicited from an ethnographic survey in Spanish First Instance Courts. On the other hand, the Neurona Ontologies, for a data protection compliance application, are based on the knowledge of legal experts and the requirements of enterprise asset management, together with the analysis of privacy and data protection regulations and technical risk management standards.

This approach tries to take into account many of the criticisms that developers of legal knowledge-based systems (LKBS) received during the 1980s and the beginning of the 1990s, including, primarily, the lack of legal knowledge or legal domain understanding of most LKBS development teams at the time. These criticisms were rooted in the widespread use of legal sources (statutes, case law, etc.) directly as the knowledge for the knowledge base, instead of including in the knowledge base the “expert” knowledge of lawyers or law-related professionals.

Further, in order to represent knowledge in practice (PLK), legal ontology engineering could benefit from the use of social science research methods for knowledge elicitation, institutional/organizational analysis (institutional ethnography), as well as close collaboration with legal practitioners, users, experts, and other stakeholders, in order to discover the relevant conceptual models that ought to be represented in the ontologies. Moreover, I understand the participation of these stakeholders in ontology evaluation and validation to be crucial to ensuring consensus about, and the usability of, a given legal ontology.

Challenges and drawbacks

Although the use of ontologies and the implementation of the Semantic Web vision may offer great advantages to information and knowledge management, there are great challenges and problems to be overcome.

First, the problems related to knowledge acquisition techniques and bottlenecks in software engineering are inherent in ontology engineering, and ontology development is quite a time-consuming and complex task. Second, as ontologies are directed mainly towards enabling some communication on the basis of shared conceptualizations, how are we to determine the sharedness of a concept? And how are context-dependencies or (cultural) diversities to be represented? Furthermore, how can we evaluate the content of ontologies?

Collaborative Current research is focused on overcoming these problems through the establishment of gold standards in concept extraction and ontology learning from texts, and the idea of collaborative development of legal ontologies, although these techniques might be unsuitable for the development of certain types of ontologies. Also, evaluation (validation, verification, and assessment) and quality measurement of ontologies are currently an important topic of research, especially ontology assessment and comparison for reuse purposes.

Regarding ontology reuse, the general belief is that the more abstract (or core) an ontology is, the less it owes to any particular domain and, therefore, the more reusable it becomes across domains and applications. This generates a usability-reusability trade-off that is often difficult to resolve.

Finally, once created, how are these ontologies to evolve? How are ontologies to be maintained and new concepts added to them?

Over and above these issues, in the legal domain there are taking place more particularized discussions:  for example, the discussion of the advantages and drawbacks of adopting an empirically based perspective (bottom-up), and the complexity of establishing clear connections with legal dogmatics or general legal theory approaches (top-down). To what extent are these two different perspectives on legal ontology development incompatible? How might they complement each other? What is their relationship with text-based approaches to legal ontology modeling?

I would suggest that empirically based, socio-legal methods of ontology construction constitute a bottom-up approach that enhances the usability of ontologies, while the general legal theory-based approach to ontology engineering fosters the reusability of ontologies across multiple domains.

The scholarly discussion of legal ontology development also embraces more fundamental issues, among them the capabilities of ontology languages for the representation of legal concepts, the possibilities of incorporating a legal flavor into OWL, and the implications of combining ontology languages with the formalization of rules.

Finally, the potential value to legal ontology of other approaches, areas of expertise, and domains of knowledge construction ought to be explored, for example: pragmatics and sociology of law methodologies, experiences in biomedical ontology engineering, formal ontology approaches, salamander.jpgand the relationships between legal ontology and legal epistemology, legal knowledge and common sense or world knowledge, expert and layperson’s knowledge, and legal dogmatics and political science (e.g., in e-Government ontologies).

As you may see, the challenges faced by legal ontology engineering are great, and the limitations of legal ontologies are substantial. Nevertheless, the potential of legal ontologies is immense. I believe that law-related professionals and legal experts have a central role to play in the successful development of legal ontologies and legal semantic applications.

[Editor’s Note: For many of us, the technical aspects of ontologies and the Semantic Web are unfamiliar. Yet these technologies are increasingly being incorporated into the legal information systems that we use everyday, so it’s in our interest to learn more about them. For those of us who would like a user-friendly introduction to ontologies and the Semantic Web, here are some suggestions:

Dr. Núria Casellas Dr. Núria Casellas is a researcher at the Institute of Law and Technology and an assistant professor at the UAB Law School. She has participated in several national and European-funded research projects regarding the acquisition of knowledge in judicial settings (IURISERVICE), improving access to multimedia judicial content (E-Sentencias), on Drafting Legislation with Ontology-Based Support (DALOS), or in the Legal Case Study of the Semantically Enabled Knowledge Technologies (SEKT VI Framework project), among others. Her lines of investigation include: legal knowledge representation, legal ontologies, artificial intelligence and law, legal semantic web, law and technology, and bioethics.
She holds a Law Degree from the Universitat Autònoma de Barcelona, a Master’s Degree in Health Care Ethics and Law from the University of Manchester, and a PhD in Public Law and Legal Philosophy (UAB). Her PhD thesis is entitled “Modelling Legal Knowledge through Ontologies. OPJK: the Ontology of Professional Judicial Knowledge”.

VoxPopuLII is edited by Judith Pratt. Editor in Chief is Rob Richards.