Legal ontologies » VoxPopuLII

About LII / Get the law / Find a lawyer / Legal Encyclopedia / Help Out

Legal Research Ontology, Part II

Legal ontologies, legal research No Responses »

Aug 202015

My blog post last year about developing a legal research ontology was such an optimistic (i.e., naive), linear narrative. This was one of my final notes:

At this point, I am in the beginning stages of taking advantage of all the semantic web has to offer. The ontology’s classes now have subclasses. I am building the relationships between the classes and subclasses and using Protege to bring them all together.

I should have known better.

What I didn’t realize then was that I really didn’t understand anything about the semantic web. While I could use the term in a sentence and reference RDF and OWL and Protege, once you scratched the surface I was lost. Based on Sara Frug’s recommendation during a presentation at CALI Con 2014, I started reading Semantic Web for Dummies.

It has been, and continues to be, slow going. I don’t have a computer science or coding background, and so much of my project feels like trying to teach myself a new language without immersion or much of a guide. But the process of this project has become just as interesting to me as the end product. How are we equipped to teach ourselves anything? At a certain point, you just have to jump in and do something, anything, to get the project moving.

I had already identified the classes:
* Type of research material;
* Type of research problem;
* Source of law;
* Area of law;
* Legal action; and
* Final product.

I knew that each class has subclasses. Yet in my readings, as I learned how ontologies are used for constructing relationships between entities, I missed the part where I had to construct relationships between the entities. They didn’t just magically appear when you enter the terms into Protege.

I’m using Web Protege, an open-source product developed by the Stanford Center for Biomedical Informatics Research, using the OWL ontology language.

Ontology engineering is a hot topic these days, and there is a growing body of papers, tutorials, and presentations on OWL and ontology engineering. That’s also part of the problem: There’s a little too much out there. I knew that anything I would do with my ontology would happen in Protege, so I decided to start there with the extensive user documentation and user support. Their user guide takes you through setting up your first ontology with step-by-step illustrations and a few short videos. I also discovered a tutorial on the web titled Pizzas in 10 minutes.

Following the tutorial, you construct a basic ontology of pizza using different toppings and sauces. While it took me longer than 10 minutes to complete, it did give me enough familiarity with constructing relationships to take a stab at it with my ontology and its classes. Here’s what I came up with:

This representation doesn’t list every subclass; e.g., in Types of research material, I only listed primary source and in Area of law, I only listed contracts, torts and property. But it gives you an idea of how the classes relate to each other. Something I learned in building the sample pizza ontology in Protege is the importance of creating two properties: the relational “_property and the modifier_” property. The recommendation is to use has or is as prefixes1 for the properties. You can see how classes relate to each other in the above diagram as well as how classes are modified by subclasses and individuals.

I’m continuing to read Semantic Web for Dummies, and I’m currently focusing on Chapter 8: Speaking the Web Ontology Language. It has all kinds of nifty Venn diagrams and lines of computer code, and I’m working on understanding it all. This line keeps me going. However, if you’re looking for a system to draw inferences or to interpret the implications of your assertions (for example, to supply a dynamic view of your data), OWL is for you2.

One of my concerns is that a few of my subclasses belong to more than class. But the beauty of the Semantic web and OWL is that class and subclass are dynamic sets, and when you run the ontology individual members can change from one set to another. This means that Case Law can be both a subclass of Source of Law and an instance of Primary Source in the class Type of Research Material.

The way in which I set up my classes, subclasses, and the relationships between them are simple assertions3. Two equivalent classes would look like a enn diagram with the two sets as completely overlapping. This helps in dealing with synonyms. You can assert equivalence between individuals as well as classes, but it is better to set up each individual’s relationships with its classes, and then let the OWL reasoning system decide if the individuals are truly interchangeable. This is very helpful in a situation in which you are combining ontologies. There are more complicated assertions (equivalence, disjointness, and subsumption), and I am working on applying them and building out the ontology.

Next I need to figure out the characteristics of the properties relating the classes, subclasses, and individuals in my ontology: inverse, symmetric, transitive, intersection, union, complement, and restriction. As I continue to read (and reread) Semantic Web for Dummies, I am gaining a new appreciation for set theory and descriptive logic. Math seems to always have a way of finding you! I am also continuing to fill in the ontology with terms (using simple assertions), and I also need to figure out SPARQL so I can query the ontology. It still feels like one of those one step forward, two steps back endeavors, but it is interesting.

I hope to keep you posted, and I am grateful to the Vox PopuLII blog for having me back to write an update.

Amy Taylor is the Access Services Librarian and Adjunct Professor at American University Washington College of Law. Her main research interests are legal ontologies, organization of legal information and the influence of online legal research on the development of precedent. You can reach her on Twitter @taylor_amy or email: amytaylor@wcl.american.edu.

VoxPopuLII is edited by Judith Pratt. Editors-in-Chief are Stephanie Davidson and Christine Kirchberger, to whom queries should be directed.

—
1 Matthew Horridge, A Practical Guide to Building OWL Ontologies, 20, http://phd.jabenitez.com/wp-content/uploads/2014/03/A-Practical-Guide-To-Building-OWL-Ontologies-Using-Protege-4.pdf (last visited May 19, 2015.

2 Jeffrey Pollock, Semantic Web for Dummies 195 (Wiley 2009).

3 Id. at 200.

Folksonomies & Law - Background issues and theoretical perspectives

information retrieval, knowledge management, Law libraries, Legal knowledge management, Legal ontologies, Legal semantic web, Semantic Web and law, Taxonomies 3 Responses »

Nov 272014

§.1.- Foreword

«If folksonomies work for pictures (Flickr), books (Goodreads), questions and answers (Quora), basically everything else (Delicious), why shouldn’t they work for law?» (Serena Manzoli)

In a post on this blog, Serena Manzoli distinguishes three uses of taxonomies in law: (1) for research of legal documents, (2) in teaching to law students, and (3) for its practical application.

In regard to her first point, she notes that (observation #1) to increase the availability of legal resources is compelling change of the whole information architecture, and – correctly, in my opinion – she exposes some objections to the heuristic efficiency of folksonomies: (objection #1) they are too “flat” to constitute something useful for legal research and (objection #2) it is likely that non-expert users could “pollute” the set of tags. Notwithstanding these issues, she states (prediction #1) that folksonomies could be helpful with non-legal users.

On the second point, she notes (observation #2) that folksonomies could be beneficial to study the law, because they could allow one to penetrate easier into its conceptual frameworks; she also formulates the hypothesis (prediction #2) that this teaching method could shape a more flexible mindset in students.

In discussing the third point, she notes (observation #3) that different taxonomies entail different ways of apply the law, and (prediction #3) she formulates the hypothesis that, in a distant perspective in which folksonomies would replace taxonomies, the result would be a whole new way to apply the law.

I appreciated Manzoli’s post and accepted with pleasure the invitation of Christine Kirchberger – to whom I am grateful – to share my views with the readers of this prestigious blog. Hereinafter I intend to focus on the theoretical profiles that aroused my curiosity. My position is partly different from that of Serena Manzoli.

§.2.- Introduction

In order to detect the issues stemming from folksonomies, I think it is relevant to give some preliminary clarifications.

In collective tagging systems, by tagging we can describe the content of an object – an image, a song or a document – label it using any lexical expression preceded by the “hashtag” (the symbol “#”) and share it with our friends and followers or also recommend it to an audience of strangers.

Folksonomies (blend of the words “taxonomy” and “folk”) are sets of categories resulting from the use of tags in the description of on line resources by the users, allowing a “many to many” connection between tags, users and resources.

Basic pattern of a folksonomy

Thomas Vander Wal coined the word a decade ago – ten years is really a long time in ICTs – and these technologies, as reported by Serena Manzoli, have now been adopted in most of the social networks and e-commerce systems.

The main feature of folksonomies is that tags aggregate spontaneously in a semantic core; therefore, they are often associated with taxonomies or ontologies, although in these latter cases hierarchies and categories are established before the collection of data, as “a priori”.

Simplifying, I can say that tags may describe three aspects of the resources, using particulars (i.e. a picture of a flowerpot lit by the sun):

(1) The content of the resources (i.e. #flowers),

(2) The interaction with other specific resources and the environment in general (i.e. #sun or #summer),

(3) The effect that these resources have on users having access to them (i.e. #beautiful).

Since it seems to me that none of these aspects should be disregarded in an overall assessment of folksonomies, I will consider all of them.

Having regard to law, they end to match with these three major issues:

(1) Law as a “content”. Users select legal documents among others available and choose those that seem most relevant. As a real interest is – normally – the driving criterion of the search, and as this typically is given by the need to solve a legal problem, I designate this profile with the expression «Quid juris?».

(2) Law as a “concept”. This problem emerges because the single legal document can not be conceived separately from the context in which it appears, namely the relations it has with the legal system to which it belongs. Consequently becomes inevitable to ask what the law is, as a common feature of all legal documents. Recalling Immanuel Kant in the “Metaphysics of Morals”, here I use the expression «Quid jus?».

(3) Law as a “sentiment”. What emerges in folksonomies is a subjective attitude that regards the meaning to be attributed to the research of resources and that affects the way in which it is performed. To this I intend to refer using the expression «Cur jus?».

§.3.- Folksonomies, Law, and «Quid juris?»: legal information management and collective tagging systems

In this respect, I agree definitely with Serena Manzoli. Folksonomies seem to open very interesting perspectives in the field of legal information management; we admit, however, that these technologies still have some limitations. For instance: just because the resources are tagged freely, it is difficult to use them to build taxonomies or ontologies; inexperienced users classify resources less efficiently than the other, diluting all the efforts of more skilled users and “polluting” well-established catalogs; vice versa, even experienced users can make mistakes in the allocation of tags, worsening the quality of information being shared.

Though in some cases these issues can be solved in several ways – i.e., the use of tags can be guided with the tag’s recommendation, hence the distinction between broad and narrow folksonomies – and even if it can reasonably be expected that these tools will work even better in the future, for now we can say that folksonomies are useful just to integrate pre-existing classifications.

I may add, as an example, that an Italian law requires the creation of “user-created taxonomies (folksonomies),” “Guidelines for websites of public administrations” of 29 July 2011, page 20. These guidelines have been issued pursuant to art. 4 of Directive 26th November 2009 n. 8, of the “Minister for Public Administration and Innovation”, according to the Legislative Decree of 7 March 2005, n. 82, “Digital Administration Code” (O.J. n. 112 of 16th May 2005, S.O. n. 93). It may be interesting to point out that in Italian law the innovation in administrative bodies is promoted by a specific institution, the Agency for Digital Italy (“Agenzia per l’Italia Digitale”), which coordinates the actions in this field and sets standards for usability and accessibility. Folksonomies indeed fall into this latter category.

Following this path, a municipality (Turin) has recently set up a system of “social bookmarking” for the benefit of citizens called TaggaTO.

§.4.- Folksonomies, Law, and «Quid jus?»: the difference between the “map” and the “territory”

In this regard, my theoretical approach is different from that of Serena Manzoli. Here is the reason our findings are opposite.

Human beings are “tagging animals”, since labelling things is a natural habit. We can note it in common life: each of us, indeed, organizes his environment at home (we have jars with “salt” or “pepper” written on the caps) and at work (we use folders with “invoices” or “bank account” printed on the cover). The significance of tags is obvious if we consider using it with other people: it allows us to establish and share a common information framework. For the same reasons of convenience, tags have been included in most of the software applications we use (documents, e-mail, calendars) and, as said above, in many online services. To sum up, labels help us to build a representation of reality: they are tools for our knowledge.

In regard to reality and knowledge, it may be recalled that in the twentieth century there were two philosophical perspectives: the “continental tradition”, focused on the first (reality) and pretty much common in Europe, and the “analytic philosophy”, centered on the second (knowledge and widespread among USA, UK and Scandinavia. More recently, this distinction has lost much of its heuristic value and we have seen rising a different approach, the “philosophy of information”, which proposes, developing some theoretical aspects of cybernetics, a synthesis of reality and knowledge in an unifying vision that originates from a naturalistic notion of “information”.

I will try to simplify, saying that if reality is a kind of “territory”, and if taxonomies (and in general ontologies) can be considered as a sort of representation of knowledge, then they can be considered as “maps”.

In light of these premises, I should explain what to me “sharing resources” and “shared knowledge” mean in folksonomies. Folksonomies are a kind of “map”, indeed, but different than ontologies. In a metaphor: ontologies could be seen as “maps” created by a single geographer overlapping the reliefs of many “territories”, and sold indiscriminately to travelers; folksonomies could be seen as “maps” that inhabitants of different territories help each other to draw by telephone or by texting a message. Both solutions have advantages and disadvantages: the former may be detailed but more difficult to consult, while the latter may be always updated but affected by inaccuracies. In this sense, folksonomies could be said “antifragile” – according to the brilliant metaphor of Nassim Nicholas Taleb – because their value improves with increased use, while ontologies could be seen as “fragile”, because of the linearity of the process of production and distribution.

Therefore, as the “map” is not the “territory”, reality does not change depending on the representation. Nevertheless, this does not mean that the “maps” are not helpful to travel to unknown “territories”, or to reach faster the destination even in “territories” that are well known (just like when driving in the car with the aid of GPS).

On the application of folksonomies to the field of law, I shall say that, after all, legal science has always been a kind of “natural folksonomy”. Indeed, it has always been a widespread knowledge, ready to be practiced, open to discussion, and above all perfectly “antifragile”: new legal issues to be solved determine a further use of the systems, thus causing an increase in knowledge and therefore a greater accuracy in the description of the legal domain. In this regard, Serena Manzoli in her post also mentioned the Corpus Juris Civilis, which for centuries has been crucial in the Western legal culture. Scholars went to Italy from all over Europe to study it, at the beginning by noting few elucidations in the margins of the text (glossatores), then commenting on what they had learned (commentatores), and using their legal competences to decide cases that were submitted to them as judges or to argue in trials as lawyers.

Modern tradition has refused all of this, imposing a rationalistic and rigorous view of law. This approach – “fragile”, continuing with the paradigm of Nassim Nicholas Taleb – has spread in different directions, which simplifying I can lower to three:

(1) Legal imperativism: law as embodied in the words of the sovereign.

Leviathan (Thomas Hobbes)

(2) Legal realism: law as embodied in the words of the judge.

Gavel

(3) Legal formalism: law as embodied in administrative procedures.

The Castle (Franz Kafka)

For too long we have been led to pretending to see only the “map” and to ignore the “territory”. In my opinion, the application of folksonomies to law can be very useful to overcome these prejudices emerging from the traditional legal positivism, and to revisit a concept of law that is a step closer to its origin and its nature. I wrote “a step closer”; I’d like to clarify, to emphasize that the “map”, even if obtained through a participatory process, remains a representation of the “territory”, and to suggest that the vision known as the “philosophy of information” seems an attempt to overlay or replace the two terms – hence its “naturalism” – rather than to draw a “map” as similar as possible to the “territory”.

§.5- Folksonomies, Law and «Cur jus?»: the user in folksonomies: from “anybody” to “somebody”

This profile does not fall within the topics covered in Manzoli’s post, but I would like to take this opportunity to discuss it because it is the most intriguing to me.

Each of us arranges his resources according to the meaning that he intends to give his world. Think of how each of us arrays the resources containing information that he needs in his work: the books on the desk of a scholar, the files on the bench of a lawyer or a judge, the documents in the archive of a company. We place things around us depending on the problem we have to address: we use the surrounding space to help us find the solution.

With folksonomies, in general, we simply do the same in a context in which the concept of “space” is just a matter of abstraction.

What does it mean? We organize things, then we create “information”. Gregory Bateson in a very famous book, Steps to an Ecology of Mind – in which he wrote on “maps” and “territories”, too – stated that “information” is “the difference that makes the difference”. This definition, brilliant in its simplicity, raises the tremendous problem of the meaning of our existence and the freedom of will. This issue can be explained through an example given by a very interesting app called “Somebody”, recently released by the contemporary artist Miranda July.

The app works as follows: a message addressed to a given person is written and transmitted to another, who delivers it verbally. In other words, the actual recipient receives the message from an individual who is unknown to him. The point that fascinates me is this: someone suddenly comes out to tell that you “make a difference,” that you are not “anybody” because you are “somebody” for “somebody.” Moreover, at the same time this same person, since he is addressing you, becomes “somebody,” because the sender of the message chose him among others, since he “meant something” to him.

For me, the meaning of this amazing app can be summed up in this simple equation:

“Being somebody” = “Mean something” = “Make a difference”

This formula means that each of us believes he is worth something (“being somebody”), that his life has a meaning (“mean something”), that his choices or actions can change something – even if slightly – in this world (“make a difference”).

Returning to Bateson, if it is important to each of us to “make a difference”, if we all want to be “somebody”, then how could we settle down for recognize ourselves as just an “organizing agent”? Self-consciousness is related to semantics and to the freedom of choice: who is not free at all, does not create any “difference” in the world. Poetically, Miranda July makes people talk to each other, giving a meaning to humanity and a purpose to freedom: this is what “making a difference” means for humans.

In applying folksonomies to law, we should consider all this. It is true that folksonomies record the way in which each user arrays available legal documents, but it should be emphasized the purpose for which this activity is carried out. Therefore, it should be clear that an efficient cataloguing of resources depends on several conditions: certainly that the user shall know the law and remember its ontologies, but also that he shall be focused on what he is doing. This means that the user needs to be well-motivated, in order to recognize the value of what he is doing, so that to give meaning to his activity.

§.6- Conclusion

I believe that folksonomies can teach us a lot. In them we can find not only an extraordinary technical tool, but also – and most importantly – a reason to overcome the traditional legal positivism – which is “ontological” and therefore “fragile” – and thus rediscover the cooperation not only among experts, but also with non-experts, in the name of an “antifragile” shared legacy of knowledge that is called “law”.

All this will work – or at least, it will work better – if we remember that we are human beings.

Federico Costantini.

I hold a Master’s degree in Law and a Ph.D. in Philosophy of Law from the University of Padua (Italy).
Currently I am Researcher in Philosophy of Law (Legal informatics) in the Department of Legal sciences at the University of Udine (Italy).
My study aims to bridge philosophy, computer science and law, focusing on the strife between human nature and new technologies. Recently I am investigating the theoretical implications of ICTs on «social ontology», the concept of law as an instrument of social control as emerging from the «peer to peer economy», the use of folksonomies in legal information management and the theoretical aspects of Digital evidence.
I teach Legal Informatics in the Faculty of Law of Udine. In my lectures on cyberlaw, which I study since 2000, I bring out the critical profiles of the “Information Society” from the discussion of the most recent jurisprudence.
I am also a Lawyer. I am registered in the Bar Association of Udine (Italy) in a special section (full time academic researchers and professors).
My full profile can be visited on www.linkedin.com .
My complete list of publications can be found on https://air.uniud.it.

VoxPopuLII is edited by Judith Pratt. Editors-in-Chief are Stephanie Davidson and Christine Kirchberger, to whom queries should be directed.

Building a Legal Research Ontology

Legal metadata, Legal ontologies, legal research 3 Responses »

Mar 192014

It’s hard for me to pin down exactly when I knew I wanted to build a legal research ontology. There was no light bulb moment; or perhaps I should say, there was no anvil falling on my head, Wile E. Coyote style. At the beginning of the fall 2012 semester, our Westlaw representative presented the newest features of Westlaw Next, including the new look of the headnotes in case law results. My first glance at it was jarring. At first I thought it was just the font and the streamlined interface, but after taking a closer look at it, I realized it was also the content.

The outline of the headnotes had been compressed. It was substantively different. Previously, each section of the key number system in your headnote was presented in outline form with indented lines and roman numerals, and you could click on any of the outline headings. In the new version, only the main key number heading and the section pertaining to the case are visible. While there is a Change View link in each case result that leads to the classic outline view, I am sceptical of it for a couple of reasons. One is that Westlaw has made the new look the default look and could at some point do away with the Change View option. Second, the new look becomes the look for each new class of law students. If style is all that is communicated by the interface, it would not be of much concern. But there is substance. There is function. How do we now communicate this substance? Should we be so dependent on vendors in legal research teaching? Given the paucity of time we have with first-year students, do we have other viable options?

These questions were in the back of my mind when I attended the LVI conference at Cornell later that semester. On the first day of the conference, I settled in to the Data Organization and Legal Informatics Track. By the end of the day, two of the presentations I heard, one on concept mapping and another on semantic web technologies using RDF and OWL, opened up a door to a new set of possibilities. One of the notes I scribbled during the conference was “ontology for westlaw problem?” I came back from the conference and began researching ontologies and ontology engineering. (I may have gone a little overboard. At last count, I have over 500 articles and book chapters.)

So what is an ontology exactly? Here’s the definition I’ve cobbled together from my readings and my subsequent translation of those readings into words I can actually understand. (Any conceptual errors are mine.) An ontology is a way to take a set of concepts and organize it in a formalized way (i.e., with standards and naming conventions and a machine-readable structure), using an ontology language that takes advantage of the semantic web. The rest of this blog post will be a more detailed description of this definition.

Before you can use the set of concepts to build the ontology, you have to define them. And when I first started thinking about this project, it was on a much grander scale. It didn’t take me long at all to realize that I could not single-handedly create a comprehensive ontology of U.S. law.

I decided to focus on what we do as legal research instructors. I’ve always thought that one of our primary duties is to show our students the big picture so they can be confident in their abilities to research in unfamiliar situations. Our teaching is complicated by the fact that very few of us have the kind of classroom time we would like, and even if we did, we are teaching concepts that students may not put to use for months afterwards. So I wanted this ontology to be something we could use to convey the big picture, as well as a tool our students could use at their point of need.

I further narrowed the focus of the ontology to what we teach 1Ls in basic legal research. We teach them how to research with primary and secondary sources (Type of Research Materials) in the broad categories of law they learn in their IL classes (Area of Law). We teach them about the types of law they will encounter (Type of Law). I also wondered if I could find a way to incorporate all the topics we teach them implicitly. Under the surface of black letter research is the knowledge that our students will be spending their summers as summer associates or summer interns. They will need to produce something tangible for a partner or a senior associate or a judge (Final Product). We’re not sending them out to do research as an intellectual exercise. Not only is something tangible expected from them, but they will also need to keep in mind that their work stems from some type of legal action (Legal Action). That legal action might be a breach of contract headed for litigation, or it might be the need to draft a contract between two parties, i.e., it could be litigation or a transaction.

Based on this focus, I had five classes: Type of Research Materials; Area of Law; Type of Law; Final Product; and Legal Action. I was fortunate enough to be able to participate in the Sixth Conference on Legal Information: Scholarship and Teaching (known as “The Boulder Conference”) with a working paper on the ontology. Drawing from his work on legal research instruction, Paul Callister suggested I add another class, Type of Research Problem. I took his advice, and I am grateful to him for his generosity. And now the classes number six.

My next task was coming up with the terms for the ontology — filling it in, so to speak. Some of the terms were almost self-evident. Types of Law include case law and statutory law and regulations. Areas of Law include torts and civil procedure and property and contracts. For others, the First Decennial Digest is out of copyright and so those terms can be used. Most volumes are available digitally from either HathiTrust or LLMC. (The rest are on a shelf in my office.) Some of the terms are outdated, but most legal concepts change gradually over time. I am also grateful to Ed Walters for sharing Fastcase search results with me (completely stripped of any identifying user data and also deduped). Between these two sources, I haven’t yet run out of terms.

Selecting the ontology language was the easiest part of the endeavor. I learned about the Ontology Web Language (OWL) at the LII conference. In my readings, I had also run across the World Wide Web Consortium (W3C), and their standards for OWL (now in two versions, OWL 1 and OWL 2). If you really want to let your inner geek out for a romp, go there and happy fun times will be had.

I also needed a program to build the ontology using the W3C standards and naming conventions. Protege is a free and open-source software program developed and distributed by Stanford University. It comes with extensive user guides. It allows for the creation, sharing and publishing of ontologies, and it uses OWL. And fortunately, a voluminous amount has been written and presented on the topic of ontology engineering, from papers and book chapters to slide decks on sites like SlideShare.

At this point, I am in the beginning stages of taking advantage of all the semantic web has to offer. The ontology’s classes now have subclasses. I am building the relationships between the classes and subclasses, and using Protege to bring them all together. I am also prototyping lesson plans that can take advantage of the ontology. For example, if you write a problem for your students that requires them to research strict tort liability for failure to warn of the danger in the use of a product, you can also use the ontology to bring in the Restatement Third of Torts: Products Liability, as well as secondary sources such as treatises. You can also tie this into whatever final product you want your students to produce: a client letter; a memo to the firm; results of research into punitive damages awards, etc. As long as you have the ontology classes set up, you can add anything to them in order to personalize your research problem.

I also hope to host the ontology on a website with a section for instructors to share lesson plans and ontology files. The files from Protege use an .owl extension, so they can be shared as easily as a pdf. All you need is a program like Protege to open the file. You could use the file as-is or modify it for any type of legal research problem. I also hope that the complete ontology, consisting of the permutations of legal research, can be available for students to query when they are researching as associates and interns.

Amy Taylor is the Access Services Librarian and Adjunct Professor at American University Washington College of Law. Her main research interests are legal ontologies, organization of legal information and the influence of online legal research on the development of precedent. You can reach her on Twitter @taylor_amy or email: amytaylor@wcl.american.edu.

VoxPopuLII is edited by Judith Pratt. Editors-in-Chief are Stephanie Davidson and Christine Kirchberger, to whom queries should be directed.

Taxonomies make the law. Will folksonomies change it?

Crowdsourcing and legal information systems, Legal ontologies, Taxonomies 8 Responses »

Apr 292013

Take a look at your bundle of tags on Delicious. Would you ever believe you’re going to change the law with a handful of them?

You’re going to change the way you research the law. The way you apply it. The way you teach it and, in doing so, shape the minds of future lawyers.

Do you think I’m going too far? Maybe.

But don’t overlook the way taxonomies have changed the law and shaped lawyers’ minds so far. Taxonomies? Yeah, taxonomies.

We, the lawyers, have used extensively taxonomies through the years; Civil lawyers in particular have shown to be particularly prone to them. We’ve used taxonomies for three reasons: to help legal research, to help memorization and teaching, and to apply the law.

Taxonomies help legal research.

First, taxonomies help us retrieve what we’ve stored (rules and case law).

Are you looking for a rule about a sales contract? Dive deep into the “Obligations” category and the corresponding book (Recht der Schuldverhältnisse, Obbligazioni, Des contrats ou des obligations conventionnelles en général, you name it ).

If you are a Common Lawyer, and ignore the perverse pleasure of browsing through Civil Code taxonomy, you’ll probably know Westlaw’s classification and its key numbering system. It has much more concrete categories and therefore much longer lists than the Civilians’ classification.
Legal taxonomies are there to help users find the content they’re looking for.

However, taxonomies sometimes don’t reflect the way the users reason; when this happens, you just won’t find what you’re looking for.

The problem with legal taxonomies.

If you are a German lawyer, you’ll probably be searching the “Obligations” book for rules concerning marriage; indeed in the German lawyer’s frame of mind, marriage is a peculiar form of contract. But if you are Italian, like I am, then you will most probably start looking in the “Persons” book; marriage rules are simply there, and we have been taught that marriage is not a contract but an agreement with no economic content (we have been trained to overlook the patrimonial shade in deference to the sentimental one).

So if I, the Italian, look for rules about marriage in the German civil code, I won’t find anything in the “Persons” book.
In other words, taxonomies work when they’re used by someone who reasons like the creator or–-and this happens with lawyers inside a certain legal system–-when users are trained to use the same taxonomy, and lawyers are trained at length.

But let’s take my friend Tim; he doesn’t have a legal education. He’s navigating Westlaw’s key number system looking for some relevant case law on car crashes. By chance he knows he should look below “torts,” but where? Is this injury and damage from act (k439)? Is this injury to a person in general (k425)? Is this injury to property or right of property in general (k429)? Wait, should he look below “crimes” (he is unclear on the distinction between torts and crimes)? And so on. Do these questions sound silly to you, the lawyers? Consider this: the titles we mentioned give no hint of the content, unless you already know what’s in there.

Because Law, complex as it is, needs a map. Lawyers have been trained to use the map. But what about non-lawyers?

In other words, the problems with legal taxonomies occur when the creators and the users don’t share the same frame of mind. And this is most likely to happen when the creators of the taxonomy are lawyers and the users are not lawyers.
Daniel Dabney wrote something similar some time ago. Let’s imagine that I buy a dog, take the little pooch home and find out that it’s mangy. Let’s imagine I’m that kind of aggressively unsatisfied customer and want to sue the seller, but know nothing about law. I go to the library and what will I look for? Rules on dogs sale? A book on Dog’s law? I’m lucky, there’s one, actually: “Dog law”, a book that gathers all laws regarding dogs and dogs owners.
But of course, that’s just luck, and if I had to browse through legal category in the Westlaw’s index, I would never have found anything regarding “dogs”. I will never find the word “dog”, which is nonetheless the first word a non-legal trained person would think of. A savvy lawyer would look for rules regarding sales and warranties: general categories I may not know of (or think of) if I’m not a lawyer. If I’m not a lawyer I may not know that “the sale of arguably defective dogs are to be governed by the same rules that apply to other arguably defective items, like leaky fountain pens”. Dogs are like pens for a lawyer, but they are just dogs for a dogs-owner: so a dogs owner will look for rules about dogs, not rules about sales and warranties (or at least he would look for sale of dogs). And dog law, a user aimed, object oriented category would probably fits his needs.

Observation #1: To make legal content available to everyone we must change the information architecture through which legal information are presented.

Will folksonomies make a better job?
Let’s come to folksonomies now. Here, the mismatch between creators (lawyers) and users’ way of reasoning is less likely to occur. The very same users decide which category to create and what to put into it. Moreover, more tags can overlap; that is, the same object can be tagged more than once. This allows the user to consider the same object from different perspectives. Take Delicious. If you search for “Intellectual property” on the Delicious search engine, you find a page about Copyright definition on Wikipedia. It was tagged mainly with “copyright.” But many users also tagged it with “wikipedia,” “law” and “intellectual-property” and even “art”. Maybe it was the non-lawyers out there who found it more useful to tag it with the “law” tag (a lawyer’s tag would have been more specific); maybe it was the lawyers who massively tagged it with “art” (there are a few “art” tags in their libraries). Or was it the other way around? The thing is, it’s up to users to decide where to classify it.

People also tag laws on Delicious using different labels that may or may not be related to law, because Delicious is a general-use website. But instead, let’s take a crowdsourced legal content website like Docracy. Here, people upload and tag their contracts, so it’s only legal content, and they tag them using only legal categories.

On Docracy, I found out that a whole category of documents that was dedicated to Terms of Service. Terms of Service is not a traditional legal category—-like torts, property, and contracts—-but it was a particularly useful category for Docracy users.

Docracy: WordPress Terms of Service are tagged with “TOS” but also with “Website”.

If I browse some more, I see that the WordPress TOS are also tagged with “website.” Right, it makes sense; that is, if I’m a web designer looking for the legal stuff I need to know before deploying my website. If I start looking just from “website,” I’ll find TOS, but also “contract of works for web design“ or “standard agreements for design services” from AIGA.

You got it? What legal folksonomies bring us is:

User-centered categories
Flexible categorization systems. Many items can be tagged more than once and so be put into different categories. Legal stuff can be retrieved through different routes but also considered under different lights.

Will this enhance findability? I think it will, especially if the users are non-lawyers. And services that target the low-end of the legal market usually target non-lawyers.

Alright, I know what you’re thinking. You’re thinking, oh no, again another naive folksonomy supporter! And then you say: “Folksonomie structures are too flat to constitute something useful for legal research!” and “Law is too a specific sector with highly technical vocabulary and structure. Non-legal trained users would just tag wrongly”.

Let me quickly address these issues.

Objection 1: Folksonomies are too flat to constitute something useful for legal research

Let’s start from a premise: we have no studies on legal folksonomies yet. Docracy is not a full folksonomy yet ( users can tag but tags are pre-determined by administrators). But we do have examples of folksonomies tout court, so my argument moves analogically from them. Folksonomies do work. Take the Library of Congress Flickr project. Like an old grandmother, the Library gathered thousands of pictures that no-one ever had the time to review and categorize. So pictures were uploaded on Flickr and left for the users to tag and comment. They did it en masse, mostly by using descriptive or topical tags (non-subjective) that were useful for retrieval. If folksonomies work for pictures (Flickr), books (Goodreads), questions and answers (Quora), basically everything else (Delicious), why shouldn’t they work for law? Given that premise, let’s move to first objection: folksonomies are flat. Wrong. As folksonomies evolve, we find out that they can have two, three and even more levels of categories. Take a look at the Quora hierarchy.

That’s not flat. Look, there are at least four levels in the screenshot: Classical Musicians & Composers > Pianists > Jazz Pianists > Ray Charles > What’d I Say. Right, Jazz pianists are not classical musicians: but mistakes do occur and the good point in folksonomies is that users can freely correct them.

Second point: findability doesn’t depend only on hierarchies. You can browse the folksonomy’s categories but you can also use free text search to dig into it. In this case, users’ tags are metadata and so findability is enhanced because the search engine retrieves what users have tagged–not what admins have tagged.

Objection 2: Non-legal people will use the wrong tags

Uhm, yes, you’re right. They will tag a criminal law document with “tort” and a tort case involving a car accident with “car crash”. And so? Who cares? What if the majority of users find it useful? We forget too often that law is a social phenomenon, not a tool for technicians. And language is a social phenomenon too. If users consistently tag a legal document with the “wrong” tag X instead of the “right” tag Y, it means that they usually name that legal document with X. So most of them, when looking for that document, will look for X. And they’ll retrieve it, and be happy with that.

Of course, legal-savvy people would like to search by typical legal words (like, maybe, “chattel”?) or by using the legal categories they know so well. Do we want to compromise? The fact is, in a system where there is only user-generated content, it goes without saying that a traditional top-down taxonomy would not work. But if we have to imagine a system where content is not user-generated, like a legal or case law database, that could happen. There could be, for instance, a mixed taxonomy-folksonomy system where taxonomy is built with traditional legal terms and scheme, whereas folksonomy is built by the users who are free to tag. Search in the end, can be done by browsing the taxonomy, by browsing the folksonomy or by means of a search engine which fishes on content relying both on metadata chosen by system administrators and on metadata chosen by the users who tagged the content.

This may seem like an imaginary system–but it’s happening already. Amazon uses traditional categories and leave the users free to tag. The BBC website followed a similar pattern, moving from full taxonomy system to a hybrid taxonomy-folksonomy one. Resilience, resilience, as Andrea Resmini and Luca Rosati put it in their seminal book on information architecture. Folksonomies and taxonomies can coexist. But this is not what this article is about, so sorry for the digression and let’s move to the first prediction.

Prediction #1: Folksonomies will provide the right information architecture for non-legal users.

Taxonomies and folksonomies help legal teaching.

Secondly, taxonomies help us memorize rules and case law. Put all the things in a box and group them on the basis of a common feature, and you’ll easily remember where they are. For this reason, taxonomies have played a major role in legal teaching. I’ll tell you a little story. Civil lawyers know very well the story of Gaius, the ancient Roman jurist who created a successful taxonomy for his law handbook, the Institutiones. His taxonomy was threefold: all law can be divided into persons, things, and actions. Five centuries later (five centuries!) Emperor Justinian transferred the very same taxonomy into his own Institutiones, a handbook aimed at youth “craving for legal knowledge” (cupida legum iuventes). Why? Because it worked! How powerful, both the slogan and the taxonomy! Indeed more than 1000 years later, we found it again, with a few changes, in German, French, Italian, and Spanish Civil Codes and that, in a whole bunch of nutshells, explains private law following the taxonomy of the Codes.

And now, consider what the taxonomies have done to lawyers’ minds.

Taxonomies have shaped their way of considering facts. Think. Put something into a category and you will lose all the other points of view on the same thing. The category shapes and limits our way to look at that particular thing.

Have you ever noticed how civil lawyers and common lawyers have a totally different way of looking at facts? Common lawyers see and take into account the details. Civil lawyers overlook them because the taxonomy they use has told them to do so.

In Rylands vs Fletcher (a UK tort case) some water escapes from a reservoir and floods a mine nearby. The owner of the reservoir could not possibly foresee the event and prevent it. However, the House of Lords states that the owner of the mine has the right to recover damages, even if there is no negligence. (“The person who for his own purpose brings on his lands and collects and keeps there anything likely to do mischief, if it escapes, must keep it in at his peril, and if he does not do so, is prima facie answerable for all the damage which is the natural consequence of its escape.”)

In Read vs Lyons, however, an employee gets injured during an explosion occurring in the ammunition factory where she is employed. The rule set in Rylands couldn’t be applied, as, according to the House of Lords, the case was very different; there is no escape.

On the contrary, for a Civil lawyer the decision would have been the same in both cases. For instance, under Italian Civil Code (but French and German Codes are not substantially different on this point), one would apply the general rule that grants reward for damages caused by “dangerous activities” and requires no proof of negligence on the plaintiff (art.2050 of the Civil Code), no matter what causes the danger (a big reservoir of water, an ammunition factory, whatever else).

Observation#2: taxonomies are useful for legal teaching and they shape lawyers minds.

Folksonomies for legal teaching?

Okay, and what about folksonomies? What if the way people tag legal concepts makes its way into legal teaching?

Take the Docracy‘s TOS category—have you ever thought about a course on TOS?

Another website, another example: Rocket Lawyer. Its categorization is not based on folksonomy, however; it’s purposely built around a user’s needs, which have been tested over the years, so in a way the taxonomy of the website comes from its users. One category is “identity theft”, which should be quite popular if it is prompted on the first page. What about teaching a course on identity theft? That would merge some material traditionally taught in privacy law, criminal law, and torts courses. Some course areas would overlap, which is good for memorization. Think again to the example of “Dog Law” by Dabney. What about a course about Dog Law, collecting material that refers to dogs across traditional legal categories?

Also, the same topic would be considered from different points of view.

What if students were trained to the specifications of the above-mentioned flexibility of categories? They wouldn’t get trapped into a single way of seeing things. If folksonomies account for different levels of abstractions, they would be trained to consider details. Not only that, they would develop a very flexible frame of mind.

Prediction #2: legal folksonomies in legal teaching would keep lawyers’ minds flexible.

Taxonomies and folksonomies SHAPE the law.

Third, taxonomies make the law apply differently. Think about it. They are the very highways that allow the law to travel down to us. And here it comes, the real revolutionary potential of legal folksonomies, if we were to make them work.

Let’s start from taxonomies, with a couple of examples.

Civil lawyers are taught that Public and Private Law are two distinctive areas of law, to which different rules apply. In common law, the distinction is not that clear-cut. In Rigby vs Chief Constable of Northamptonshire (a tort case from UK case law) the police—in an attempt to catch a criminal—damage a private shop by accidentally firing a canister of gas and setting the shop ablaze. The Queen’s Bench Division establishes that the police are liable under the tort of negligence only because the plaintiff manages to prove the police’s fault; they apply a private law category to a public body.
How would the same case have been decided under, say, French law? As the division between public and private law is stricter, the category of liability without fault, which is traditionally used when damages are caused by public bodies, would apply. The State would have to indemnify the damage, no matter if there was negligence.

Remember Rylands vs Fletcher and Lyons vs Read? The presence of escape/no escape was determinant, because the English taxonomy is very concrete. Civil lawyers work with taxonomies that have fewer, larger, and more abstract categories. If you cause damages by performing a risky activity, even if conducted without fault, you have to repay them. Period. Abstract taxonomy sweeps out any concrete detail. I think that Robert Berring had something like this in mind–although he referred to legal research–when he said that “classification defines the world of thinkable thoughts”. Or, as Dabney puts it, “thoughts that aren’t represented in the system had become unthinkable”.
So taxonomies make the law apply differently. In the former case, by setting a boundary between the public-private spheres; in the latter by creating a different framework for the application of more abstract or more detailed rules.

You don’t get it? All right, it’s tough, but do you have two minutes more? Let’s take this example by Dabney. Key number system’s taxonomy distinguishes between Navigable and Non-navigable waters (in the screenshot: waters and water courses). There’s a reason for that: lands under navigable waters presumptively belongs to the state, because “private ownership of the land under navigable waters would (…) compromise the use of those waters for navigation ad commerce”. So there are two categories because different laws apply to each. But now look at this screenshot.

Find anything strange? Yes: avulsion rules are “doubled”: they are contained in both categories. But they are the very same: rules concerning avulsion don’t change if the water is navigable or not (check avulsion definition if you, like me, don’t remember what it is ). Dabney: “In this context,(…) there is no difference in the legal rules that are applied that depend on whether or not the water is navigable. Navigability has an effect on a wide range of issues concerning waters, but not on the accretion/avulsion issue. Here, the organization of the system needlessly separates cases from each other on the basis of an irrelevant criterion”. And you think, ok, but as long as we are aware of this error and know the rules concerning avulsion are the same, it’s not biggie. Right, but in the future?

“If searchers, over time, find cases involving navigable waters in one place and non-navigable waters in another, there might develop two distinct bodies of law.” Got it? Dabney foresees it. The way we categorize the law would shape the way we apply it.

Observation #3 Different taxonomies entail different ways to apply the law.

So, what if we substitute taxonomies with folksonomies?

And what if they had the power to shape the way judges, legal scholars, lawmakers and legal operators think?

Legal folksonomies are just starting out, and what I envisage is still yet to come. Which makes this article kind of a visionary one, I admit.

However, what Docracy is teaching us is that users—I didn’t say lawyers, but users—are generating decent legal content. Would you have bet your two cents on this, say, five years ago?
What if users started generating new legal categories (legal folksonomies?)

Berring wrote something really visionary more than ten years ago in his beautiful “Legal Research and the World of Thinkable Thoughts”. He couldn’t have folksonomies in mind, and still, wouldn’t you think he referred to them when writing: “There is simply too much stuff to sort through. No one can write a comprehensive treatise any more, and no one can read all of the new cases. Machines are sorting for us. We need a new set of thinkable thoughts. We need a new Blackstone. We need someone, or more likely a group of someones, who can reconceptualize the structure of legal information.“?

Prediction #3 Legal folksonomies will make the law apply differently.

Let’s wait and see. Let the users tag. Where this tagging is going to take us is unpredictable, yes, but if you look at where taxonomies have taken us for all these years, you may find a clue.

I have a gut feeling that folksonomies are going to change the way we search, teach, and apply the law.

Serena Manzoli is a legal architect and the founder at WildLawyer, a design agency for law firms. She has been a Euro bureaucrat, a cadet, an in-house counsel, a bored lawyer. She holds an LLM from University of Bologna. She blogs at Lawyers are boring. Twitter: SquareLaw

Legal Prosumers: How Can Government Leverage User-Generated Content?

Crowdsourcing and legal information systems, Legal knowledge representation, Legal ontologies, Legal text mining, Legal text processing, natural language processing, Semantic Web and law, User-generated content and legal information 1 Response »

Nov 172011

Prosumption: shifting the barriers between information producers and consumers

One of the major revolutions of the Internet era has been the shifting of the frontiers between producers and consumers [1]. Prosumption refers to the emergence of a new category of actors who not only consume but also contribute to content creation and sharing. Under the umbrella of Web 2.0, many sites indeed enable users to share multimedia content, data, experiences [2], views and opinions on different issues, and even to act cooperatively to solve global problems [3]. Web 2.0 has become a fertile terrain for the proliferation of valuable user data enabling user profiling, opinion mining, trend and crisis detection, and collective problem solving [4].

The private sector has long understood the potentialities of user data and has used them for analysing customer preferences and satisfaction, for finding sales opportunities, for developing marketing strategies, and as a driver for innovation. Recently, corporations have relied on Web platforms for gathering new ideas from clients on the improvement or the development of new products and services (see for instance Dell’s Ideastorm; salesforce’s IdeaExchange; and My Starbucks Idea). Similarly, Lego’s Mindstorms encourages users to share online their projects on the creation of robots, by which the design becomes public knowledge and can be freely reused by Lego (and anyone else), as indicated by the Terms of Service. Furthermore, companies have been recently mining social network data to foresee future action of the Occupy Wall Street movement.

Even scientists have caught up and adopted collaborative methods that enable the participation of laymen in scientific projects [5].

Now, how far has government gone in taking up this opportunity?

Some recent initiatives indicate that the public sector is aware of the potential of the “wisdom of crowds.” In the domain of public health, MedWatcher is a mobile application that allows the general public to submit information about any experienced drug side effects directly to the US Food and Drug Administration. In other cases, governments have asked for general input and ideas from citizens, such as the brainstorming session organized by Obama government, the wiki launched by the New Zealand Police to get suggestions from citizens for the drafting of a new policing act to be presented to the parliament, or the Website of the Department of Transport and Main Roads of the State of Queensland, which encourages citizens to share their stories related to road tragedies.

Even in so crucial a task as the drafting of a constitution, government has relied on citizens’ input through crowdsourcing [6]. And more recently several other initiatives have fostered crowdsourcing for constitutional reform in Morocco and in Egypt .

It is thus undeniable that we are witnessing an accelerated redefinition of the frontiers between experts and non-experts, scientists and non-scientists, doctors and patients, public officers and citizens, professional journalists and street reporters. The ‘Net has provided the infrastructure and the platforms for enabling collaborative work. Network connection is hardly a problem anymore. The problem is data analysis.

In other words: how to make sense of the flood of data produced and distributed by heterogeneous users? And more importantly, how to make sense of user-generated data in the light of more institutional sets of data (e.g., scientific, medical, legal)? The efficient use of crowdsourced data in public decision making requires building an informational flow between user experiences and institutional datasets.

Similarly, enhancing user access to public data has to do with matching user case descriptions with institutional data repositories (“What are my rights and obligations in this case?”; “Which public office can help me”?; “What is the delay in the resolution of my case?”; “How many cases like mine have there been in this area in the last month?”).

From the point of view of data processing, we are clearly facing a problem of semantic mapping and data structuring. The challenge is thus to overcome the flood of isolated information while avoiding excessive management costs. There is still a long way to go before tools for content aggregation and semantic mapping are generally available. This is why private firms and governments still mostly rely on the manual processing of user input.

The new producers of legally relevant content: a taxonomy

Before digging deeper into the challenges of efficiently managing crowdsourced data, let us take a closer look at the types of user-generated data flowing through the Internet that have some kind of legal or institutional flavour.

One type of user data emerges spontaneously from citizens’ online activity, and can take the form of:

citizens’ forums

platforms gathering open public data and comments over them (see for instance data-publica )

legal expert blogs (blawgs)

or the journalistic coverage of the legal system.

User data can as well be prompted by institutions as a result of participatory governance initiatives, such as:

crowdsourcing (targeting a specific issue or proposal by government as an open brainstorming session)

comments and questions addressed by citizens to institutions through institutional Websites or through e-mail contact.

This variety of media supports and knowledge producers gives rise to a plurality of textual genres, semantically rich but difficult to manage given their heterogeneity and quick evolution.

Managing crowdsourcing

The goal of crowdsourcing in an institutional context is to extract and aggregate content relevant for the management of public issues and for public decision making. Knowledge management strategies vary considerably depending on the ways in which user data have been generated. We can think of three possible strategies for managing the flood of user data:

Pre-structuring: prompting the citizen narrative in a strategic way

A possible solution is to elicit user input in a structured way; that is to say, to impose some constraints on user input. This is the solution adopted by IdeaScale, a software application that was used by the Open Government Dialogue initiative of the Obama Administration. In IdeaScale, users are asked to check whether their idea has already been covered by other users, and alternatively to add a new idea. They are also invited to vote for the best ideas, so that it is the community itself that rates and thus indirectly filters the users’ input.

The MIT Deliberatorium, a technology aimed at supporting large-scale online deliberation, follows a similar strategy. Users are expected to follow a series of rules to enable the correct creation of a knowledge map of the discussion. Each post should be limited to a single idea, it should not be redundant, and it should be linked to a suitable part of the knowledge map. Furthermore, posts are validated by moderators, who should ensure that new posts follow the rules of the system. Other systems that implement the same idea are featurelist and Debategraph [7].

While these systems enhance the creation and visualization of structured argument maps and promote community engagement through rating systems, they present a series of limitations. The most important of these is the fact that human intervention is needed to manually check the correct structure of the posts. Semantic technologies can play an important role in bridging this gap.

Semantic analysis through ontologies and terminologies

Ontology-driven analysis of user-generated text implies finding a way to bridge Semantic Web data structures, such as formal ontologies expressed in RDF or OWL, with unstructured implicit ontologies emerging from user-generated content. Sometimes these emergent lightweight ontologies take the form of unstructured lists of terms used for tagging online content by users. Accordingly, some works have dealt with this issue, especially in the field of social tagging of Web resources in online communities. More concretely, different works have proposed models for making compatible the so-called top-down metadata structures (ontologies) with bottom-up tagging mechanisms (folksonomies).

The possibilities range from transforming folksonomies into lightly formalized semantic resources (Lux and Dsinger, 2007; Mika, 2005) to mapping folksonomy tags to the concepts and the instances of available formal ontologies (Specia and Motta, 2007; Passant, 2007). As the basis of these works we find the notion of emergent semantics (Mika, 2005), which questions the autonomy of engineered ontologies and emphasizes the value of meaning emerging from distributed communities working collaboratively through the Web.

We have recently worked on several case studies in which we have proposed a mapping between legal and lay terminologies. We followed the approach proposed by Passant (2007) and enriched the available ontologies with the terminology appearing in lay corpora. For this purpose, OWL classes were complemented with a has_lexicalization property linking them to lay terms.

The first case study that we conducted belongs to the domain of consumer justice, and was framed in the ONTOMEDIA project. We proposed to reuse the available Mediation-Core Ontology (MCO) and Consumer Mediation Ontology (COM) as anchors to legal, institutional, and expert knowledge, and therefore as entry points for the queries posed by consumers in common-sense language.

The user corpus contained around 10,000 consumer questions and 20,000 complaints addressed from 2007 to 2010 to the Catalan Consumer Agency. We applied a traditional terminology extraction methodology to identify candidate terms, which were subsequently validated by legal experts. We then manually mapped the lay terms to the ontological classes. The relations used for mapping lay terms with ontological classes are mostly has_lexicalisation and has_instance.

A second case study in the domain of consumer law was carried out with Italian corpora. In this case domain terminology was extracted from a normative corpus (the Code of Italian Consumer law) and from a lay corpus (around 4000 consumers’ questions).

In order to further explore the particularities of each corpus respecting the semantic coverage of the domain, terms were gathered together into a common taxonomic structure [8]. This task was performed with the aid of domain experts. When confronted with the two lists of terms, both laypersons and technical experts would link most of the validated lay terms to the technical list of terms through one of the following relations:

Subclass: the lay term denotes a particular type of legal concept. This is the most frequent case. For instance, in the class objects, telefono cellulare (cell phone) and linea telefonica (phone line) are subclasses of the legal terms prodotto (product) and servizio (service), respectively. Similarly, in the class actors agente immobiliare (estate agent) can be seen as subclass of venditore (seller). In other cases, the linguistic structures extracted from the consumers’ corpus denote conflictual situations in which the obligations have not been fulfilled by the seller and therefore the consumer is entitled to certain rights, such as diritto alla sostituzione (entitlement to a replacement). These types of phrases are subclasses of more general legal concepts such as consumer right.

Instance: the lay term denotes a concrete instance of a legal concept. In some cases, terms extracted from the consumer corpus are named entities that denote particular individuals, such as Vodafone, an instance of a domain actor, a seller.

Equivalent: a legal term is used in lay discourse. For instance, contratto (contract) or diritto di recessione (withdrawal right).

Lexicalisation: the lay term is a lexical variant of the legal concept. This is the case for instance of negoziante, used instead of the legal term venditore (seller) or professionista (professional).

The distribution of normative and lay terms per taxonomic level shows that, whereas normative terms populate mostly the upper levels of the taxonomy [9], deeper levels in the hierarchy are almost exclusively represented by lay terms.

Term distribution per taxonomic level

The result of this type of approach is a set of terminological-ontological resources that provide some insights on the nature of laypersons’ cognition of the law, such as the fact that citizens’ domain knowledge is mainly factual and therefore populates deeper levels of the taxonomy. Moreover, such resources can be used for the further processing of user input. However, this strategy presents some limitations as well. First, it is mainly driven by domain conceptual systems and, in a way, they might limit the potentialities of user-generated corpora. Second, they are not necessarily scalable. In other words, these terminological-ontological resources have to be rebuilt for each legal subdomain (such as consumer law, private law, or criminal law), and it is thus difficult to foresee mechanisms for performing an automated mapping between lay terms and legal terms.

Beyond domain ontologies: information extraction approaches

One of the most important limitations of ontology-driven approaches is the lack of scalability. In order to overcome this problem, a possible strategy is to rely on informational structures that occur generally in user-generated content. These informational structures go beyond domain conceptual models and identify mostly discursive, emotional, or event structures.

Discursive structures formalise the way users typically describe a legal case. It is possible to identify stereotypical situations appearing in the description of legal cases by citizens (i.e., the nature of the problem; the conflict resolution strategies, etc.). The core of those situations is usually predicates, so it is possible to formalize them as frame structures containing different frame elements. We followed such an approach for the mapping of the Spanish corpus of consumers’ questions to the classes of the domain ontology (Fernández-Barrera and Casanovas, 2011). And the same technique was applied for mapping a set of citizens’ complaints in the domain of acoustic nuisances to a legal domain ontology (Bourcier and Fernández-Barrera, 2011). By describing general structures of citizen description of legal cases we ensure scalability.

Emotional structures are extracted by current algorithms for opinion- and sentiment mining. User data in the legal domain often contain an important number of subjective elements (especially in the case of complaints and feedback on public services) that could be effectively mined and used in public decision making.

Finally, event structures, which have been deeply explored so far, could be useful for information extraction from user complaints and feedback, or for automatic classification into specific types of queries according to the described situation.

Crowdsourcing in e-government: next steps (and precautions?)

Legal prosumers’ input currently outstrips the capacity of government for extracting meaningful content in a cost-efficient way. Some developments are under way, among which are argument-mapping technologies and semantic matching between legal and lay corpora. The scalability of these methodologies is the main obstacle to overcome, in order to enable the matching of user data with open public data in several domains.

However, as technologies for the extraction of meaningful content from user-generated data develop and are used in public-decision making, a series of issues will have to be dealt with. For instance, should the system developer bear responsibility for the erroneous or biased analysis of data? Ethical questions arise as well: May governments legitimately analyse any type of user-generated content? Content-analysis systems might be used for trend- and crisis detection; but what if they are also used for restricting freedoms?

The “wisdom of crowds” can certainly be valuable in public decision making, but the fact that citizens’ online behaviour can be observed and analysed by governments without citizens’ acknowledgement poses serious ethical issues.

Thus, technical development in this domain will have to be coupled with the definition of ethical guidelines and standards, maybe in the form of a system of quality labels for content-analysis systems.

[Editor’s Note: For earlier VoxPopuLII commentary on the creation of legal ontologies, see Núria Casellas, Semantic Enhancement of Legal Information… Are We Up for the Challenge? For earlier VoxPopuLII commentary on Natural Language Processing and legal Semantic Web technology, see Adam Wyner, Weaving the Legal Semantic Web with Natural Language Processing. For earlier VoxPopuLII posts on user-generated content, crowdsourcing, and legal information, see Matt Baca and Olin Parker, Collaborative, Open Democracy with LexPop; Olivier Charbonneau, Collaboration and Open Access to Law; Nick Holmes, Accessible Law; and Staffan Malmgren, Crowdsourcing Legal Commentary.]

[1] The idea of prosumption existed actually long before the Internet, as highlighted by Ritzer and Jurgenson (2010): the consumer of a fast food restaurant is to some extent as well the producer of the meal since he is expected to be his own waiter, and so is the driver who pumps his own gasoline at the filling station.

[2] The experience project enables registered users to share life experiences, and it contained around 7 million stories as of January 2011: http://www.experienceproject.com/index.php.

[3] For instance, the United Nations Volunteers Online platform (http://www.onlinevolunteering.org/en/vol/index.html) helps volunteers to cooperate virtually with non-governmental organizations and other volunteers around the world.

[4] See for instance the experiment run by mathematician Gowers on his blog: he posted a problem and asked a large number of mathematicians to work collaboratively to solve it. They eventually succeeded faster than if they had worked in isolation: http://gowers.wordpress.com/2009/01/27/is-massively-collaborative-mathematics-possible/.

[5] The Galaxy Zoo project asks volunteers to classify images of galaxies according to their shapes: http://www.galaxyzoo.org/how_to_take_part. See as well Cornell’s projects Nestwatch (http://watch.birds.cornell.edu/nest/home/index) and FeederWatch (http://www.birds.cornell.edu/pfw/Overview/whatispfw.htm), which invite people to introduce their observation data into a Website platform.

[6] http://www.participedia.net/wiki/Icelandic_Constitutional_Council_2011.

[7] See the description of Debategraph in Marta Poblet’s post, Argument mapping: visualizing large-scale deliberations (http://serendipolis.wordpress.com/2011/10/01/argument-mapping-visualizing-large-scale-deliberations-3/).

[8] Terms have been organised in the form of a tree having as root nodes nine semantic classes previously identified. Terms have been added as branches and sub-branches, depending on their degree of abstraction.

[9] It should be noted that legal terms are mostly situated at the second level of the hierarchy rather than the first one. This is natural if we take into account the nature of the normative corpus (the Italian consumer code), which contains mostly domain specific concepts (for instance, withdrawal right) instead of general legal abstract categories (such as right and obligation).

REFERENCES

Bourcier, D., and Fernández-Barrera, M. (2011). A frame-based representation of citizen’s queries for the Web 2.0. A case study on noise nuisances. E-challenges conference, Florence 2011.

Fernández-Barrera, M., and Casanovas, P. (2011). From user needs to expert knowledge: Mapping laymen queries with ontologies in the domain of consumer mediation. AICOL Workshop, Frankfurt 2011.

Lux, M., and Dsinger, G. (2007). From folksonomies to ontologies: Employing wisdom of the crowds to serve learning purposes. International Journal of Knowledge and Learning (IJKL), 3(4/5): 515-528.

Mika, P. (2005). Ontologies are us: A unified model of social networks and semantics. In Proc. of Int. Semantic Web Conf., volume 3729 of LNCS, pp. 522-536. Springer.

Passant, A. (2007). Using ontologies to strengthen folksonomies and enrich information retrieval in Weblogs. In Int. Conf. on Weblogs and Social Media, 2007.

Poblet, M., Casellas, N., Torralba, S., and Casanovas, P. (2009). Modeling expert knowledge in the mediation domain: A Mediation Core Ontology, in N. Casellas et al. (Eds.), LOAIT- 2009. 3rd Workshop on Legal Ontologies and Artificial Intelligence Techniques joint with 2^nd Workshop on Semantic Processing of Legal Texts. Barcelona, IDT Series n. 2.

Ritzer, G., and Jurgenson, N. (2010). Production, consumption, prosumption: The nature of capitalism in the age of the digital “prosumer.” In Journal of Consumer Culture 10: 13-36.

Specia, L., and Motta, E. (2007). Integrating folksonomies with the Semantic Web. Proc. Euro. Semantic Web Conf., 2007.

Meritxell Fernández-Barrera is a researcher at the Cersa (Centre d’Études et de Recherches de Sciences Administratives et Politiques) -CNRS, Université Paris 2-. She works on the application of natural language processing (NLP) to legal discourse and legal communication, and on the potentialities of Web 2.0 for participatory democracy.

VoxPopuLII is edited by Judith Pratt. Editor-in-Chief is Robert Richards, to whom queries should be directed. The statements above are not legal advice or legal representation. If you require legal advice, consult a lawyer. Find a lawyer in the Cornell LII Lawyer Directory.

Is it Time for Law Libraries to Collaborate on Description for Their Own Institutions’ Legal Scholarship?

Legal knowledge representation, Legal metadata, Legal ontologies, Legal publishing, Legal semantic web, Linked Data and law, Semantic annotation of legal texts, Semantic Web and law 2 Responses »

Sep 302011

Over the past couple of years, there has been a great deal of discussion — particularly in relation to the Durham Statement [1] — about technical standards and preservation issues for law reviews that publish openly and exclusively online. Other colleagues have already blogged or written more formally about the lack of metadata being produced in the production of law reviews, and about problems in indexing open access law journal literature. [2] In a previous VoxPopuLII post, Dr. Núria Casellas discussed the significance of semantic enhancement and how it affects how we should be thinking about providing access to legal information. [3] So I would like to marry the two discussions, by formally asking my academic law librarian colleagues whether the time has come for us to work together to develop an ontology [4], a substantive knowledge system that could be used by our law schools’ legal journals in “marking up” content for consumption on the Internet.

This ontology should be applied not only to what we think of as traditional law journal content, but also to “related” content — such as companion blogs, video, data, etc. This related content will inevitably grow, as what we think of as a “journal” evolves. Indeed, a typical “journal” will most likely look very different years from now than it does today. As members of the institutions that publish one of the major forms of literature in law, and as members of organizations that possess significant legal metadata and subject expertise, law librarians are uniquely positioned to facilitate the discoverability and utility of law reviews published on the Web. Such a project also has the potential to support additional projects such as new metrics and ways in which to look at scholarship.

If this system were indeed widely adopted, it could facilitate a type of access to law journal content that has not been accomplished with existing, centralized means of access, such as Google Scholar, the ABA’s project, and commercial databases. Ideally, I would like to see our community develop a cooperative that could provide a hosted technical infrastructure to be used by institutions that lack the financial or technical resources to invest in a major repository service or open source solutions. While this idea seems “utopian” at this point, I think that our community could realistically pursue standards and language for the more “substantive” aspects of the metadata, even if we are unable to agree on the ultimate solutions for preservation or platforms that “serve up” the content. Such an ontology could also serve as a precursor to even more ambitious, collaborative projects to make legal information more accessible and discoverable.

What do you mean by an ontology? Don’t you just mean a taxonomy or “shared vocabulary”?

I frame the idea as an “ontology project” because publishing on the Web has increasingly become about structured, open, Linked Data and marking up content for the Semantic Web [4]. As the significance of Linked Data grows, it is important for us to think about access to legal scholarship in terms of knowledge systems that contemplate access to information in those terms. The use of structured data/schemas in publishing law reviews would be optimized by human knowledge/expertise for the expression of ideas and language to be applied in that data. We need to think about subject access beyond the standard, familiar hierarchical “subjects” that we have come to use in our existing taxonomies, indexing, and classification systems. [5] We should be thinking about these “subjects” in a way that shows deeper interrelationships between concepts and “types” and that “interacts” and is “interoperable” with other systems. The ontology approach contemplates access to information from a variety of perspectives in relational and situational ways.

Law reviews could be published in a way that incorporates a particular ontology that could also be mapped to other ontologies. Linking ontologies in this way would yield useful connections across systems, bodies of knowledge, and perspectives, including multilingual thesauri, interdisciplinary knowledge, and practice-oriented and “pro se” consumer perspectives. Thinking about the project as an “ontology” also brings to mind three other important features of the system: (1) the “philosophical” definition of the term “ontology”; (2) the significance of “language” and subject expertise; and (3) the flexibility that would allow us to build something dynamic and responsive to the ever-changing nature of law. Such an approach contemplates the approach to legal information advocated in Dr. Casellas’s piece. [6]

Don’t legal indexes already do this? Why reinvent the wheel?

When some of these issues were raised last October at the workshop entitled “Implementing the Durham Statement: Best Practices for Open Access Law Journals”, someone asked why we should “reinvent the wheel” when other longstanding systems (e.g., law journal indexes) are already doing this. Most of these longstanding systems are based on paid subscription models and are not open in a way that facilitates rapid response to evolving developments in the law, or use by those who consume legal information. More importantly, this project would really be about facilitating publishing and improving access to online content by providing a quality, substantive, open knowledge structure for journals to use for marking up content and building access into publication. This project would not be an attempt to displace or “usurp” indexes which focus on access to content from the “outside” perspective of the publication itself (and which are increasingly concerned with marketable enhancements like full-text access, search features, user interface, Web 2.0 functionality, etc.).

The “wheels” we might be accused of reinventing also include “federated searching” and Web-scale discovery systems being purchased by libraries, but I think similar arguments about cost, perspective on the content, and scope would still apply. Development of our project and adoption of Web-scale discovery systems are not mutually exclusive. Web-scale discovery systems could potentially integrate and map to our system. In any event, the point is not to “throw out“ existing systems, but to create an additional knowledge structure that is open and potentially informed by and interoperable with other existing systems.

How would we proceed?

There are many approaches to ontology development [7], including derivation from or text-mining of legal texts [8], top-down development by humans, and building upon or extracting from existing ontologies. [9] Any of these methods (or a combination thereof) could work in the case of developing an ontological structure that could be applied to law review content.

A “top–down” approach based on the knowledge of individuals could start with librarians. But it should also involve working with law school faculty and scholars having expertise in particular subject areas, as well as with authors of law journal articles, and editors of law journals themselves (particularly those focusing on specialized legal topics). [10] Each law school has faculty and librarians who possess specialized legal subject knowledge — as well as collections in particular areas of law — that could enrich the project. In addition to contributing their substantive knowledge, librarians would have an opportunity to develop a language and a system that reflect how they think about and look for information.

Other colleagues have already suggested greater engagement with law school faculty, for purposes of learning about how faculty conduct and think about research. [11] A project like this would give us the opportunity to engage our stakeholders respecting how they think about, contextualize, and relate topics. (Perhaps we could learn more about the way law library stakeholders think about information by presenting them with samplings of articles, and inquiring as to how the stakeholders would “expect” to find those articles.) Instead of forcing those knowledgeable about their field to learn the taxonomy and structure we have been given by traditional systems, we would be harvesting the expertise of those subject specialists in order to create richer metadata that contemplates their habits and knowledge. Faculty, authors, and journal editors with subject expertise coupled with law librarians could potentially provide a very sophisticated, dynamic, and responsive system.

We should also consider looking at existing ontologies and other systems, including Library of Congress and other popular and relevant systems used in law. There are several ontologies related to law that could inform the project and that could also potentially be mapped. [12] Systems to be consulted (and mapped) could include ontologies designed for primary law and local knowledge management in legal settings, as well as ontologies in subject areas outside the law. Also, some law schools might have their own local systems that could inform ours.

Finally, while we would probably want to avoid using text mining as the only method, the project should also contemplate doing some mining and extraction from law journal literature itself. Such an approach might be particularly helpful in grappling with older legal concepts and appreciating the use of certain terms/language over time.

Whatever method(s) we select, we have a host of inspiration from other projects in legal informatics and from projects in other disciplines (particularly in the sciences) that strive to provide naming conventions within disciplines, and map knowledge across systems through coordinated efforts. Although it is a much more ambitious project than ours, John Willbanks’ Neurocommons project provides us with a model of how such a project could garner participation and grow, particularly if we were to coordinate with other projects and ontologies being developed. [13]

If we build it, who will use it?

If we do develop such a system, who would actually apply it in publishing law reviews? Hopefully, libraries will take the lead and realize that this is a role that they themselves should be fulfilling. While many libraries facilitate repositories and other platforms for publishing law journals and provide training and reference/research support for cite checking and preemption, many do not provide markup and metadata work on the articles themselves. In a recent survey by Benjamin Keele and me in relation to a paper we have been writing, only 1 in 57 respondents reported doing any work on article metadata for their journals. [14] Librarians are already cataloging books and spending time grappling with metadata development and changes in the ways in which we describe our cataloged resources (RDA, FRBR, etc.). Further, librarians today spend a lot of time and money purchasing or building “repositories” or other platforms for their law journals. Greater support of metadata development for our own institutional output (beyond provision of simplistic taxonomies) is a natural outgrowth of such activities. As other librarians have commented, providing institutional repositories is not sufficient. [15]

Such activity could contemplate new roles for catalogers. A recent NISO Webinar on the impact of Linked Data on library cataloging suggested that library catalogers will be less focused on creating “records” and more concerned with “graphs”. The presenters commented that catalogers will enhance the increasing amount of minimal metadata coming directly from publishers, and provide access to original and local content. [16] Some libraries have already integrated metadata work into their workflows for cataloged resources, and it is possible for a law library to integrate journal work into its technical services workflow. [17] From a reference perspective, librarians are already often exposed to journal content in the early stages of publication through support of preemption checking, student note/comment research help, and cite checking support. Librarians are thereby in a good position to understand the “aboutness” of the content. Such involvement could provide law librarians with a natural progression toward being more involved in helping journals “mark up” their content. It is an opportunity for us embrace more wholeheartedly the role of law libraries as publishers and knowledge managers. Many of our colleagues in the open law movement, in knowledge management in legal practice, and in other disciplines have made forays into this area. [18]

Some would probably argue that libraries do not have sufficient staff to get involved in law journal publishing activities, particularly in markup. In addition, some institutions have entire offices outside the library that support publication activities. Even if libraries feel that they are not in a position to manage the workflow of the application of this knowledge system, the most important contribution librarians can make lies in expertise or intellectual input. Application of this ontology could also be performed by law students themselves or other law school staff. Further, authors themselves are potential users and providers of metadata. In many other disciplines, especially the sciences, more authors are using author add-in tools and other software programs to help mark up their manuscripts for publication. Specialized tools could be developed to facilitate authors’ adding metadata to their own law review articles. (Many law authors are already used to contributing keywords to SSRN papers.)

“But…”: Obstacles and opportunities

As I write this piece, I anticipate comments such as, “That would be too big of an undertaking,” and: “Is that really our role?” While libraries are feeling the pressure of more limited resources and time, I would argue that this project would synergize with libraries’ existing interactions with our primary users (faculty and students) and could be built into other outreach activities. In the end, it could actually help to create an organic system responsive to users’ needs. Pursuing this project in tandem with other coordinated activities to facilitate open access law journals, law librarians would join many of our university library colleagues in thinking of ourselves in the role of producer/publisher and in providing new opportunities for our library staff (both technical services and reference/subject specialists).

I envision a host of other issues and problems (too many to enumerate in this posting) that might arise in relation to a project such as this, but I consider none of them “insurmountable.” Below is a sample of some issues that come to mind:

Coordination/governance: Who would control the project? Who would be the final arbiter of what is adopted? Past discussions of the Durham Statement have suggested the possibility of an organization providing support for journals that tried to “comply” with the Durham Statement. [19] Such an organization might consider taking on a project such as this. Perhaps leadership for this project could evolve in some way out of institutional and personal relationships, such as those that have evolved for collection development, [20] or possibly through some coordinated efforts of American Association of Law Libraries (AALL) Special Interest Sections (particularly ALL-SIS, TS-SIS, and RIPS-SIS). If our own institutions are not willing to support such a project, individual librarians on their own (myself included!) might be willing to contribute time and energy to the project. We are also fortunate to have a supportive community of technologists in the open law and knowledge management fields, who could serve as potential partners. The important aspects of the project are that it should be owned by an entity with diverse representation and interests, and that it should be established as something that will be free.

Target content and scope: Would we be framing subjects as they tend to exist in U.S. law review literature? While the structure would be designed for use by law reviews, if it were kept open and without restrictions, it could potentially be adopted by peer-reviewed journals and mapped to other indexing systems, either through Web-scale discovery or other systems. How do we frame an ontology that contemplates incorporation of multiple legal systems and relation to multiple languages? How do we deal with the translation issues that may arise? How would the ontology map to other systems and multilingual thesauri? Should we be contemplating ontologies in other disciplines that have addressed these issues?

Could it be for naught? One might ask, “If we build it, will they come?” Even if provided with such a system (as well as other best practices and support), would law reviews actually adopt it? Even if they do not apply such a system to their data structures, the substantive system that evolves could also be applied from the “outside” by third parties if the content itself is open. While one could argue that this would truly be “reinventing the wheel” in duplicating the efforts of existing indexing systems, one could argue alternatively that the scope, nature, and openness of the resulting system would offer a unique contribution to the indexing environment and would at least provide an additional alternative to the existing systems.

Technical questions: Which particular tools should we use to work collaboratively? What machine-readable formats would we contemplate using? How would we deal technically with systemic changes to the ontology and its application? There is a long list of tools and formats suitable for this project, and of methods for dealing with changes to metadata resources such as ontologies.

How would we contemplate application of this ontology in existing publishing platforms? What tools would we contemplate journals using to mark up documents with metadata from the ontology? Many of the repositories and platforms libraries are currently using permit enhancement of metadata with keywords, user-generated tags, or existing basic subject categories. But existing repositories and platforms do not necessarily facilitate markup that is optimized for the Semantic Web.

Who is the audience? Who is looking for such an ontology? If the language and concepts are at least in part based on the needs and knowledge of our faculty and students, do we develop something tailored to their use instead of developing something that serves broader norms? How could we take into consideration how others (pro se’s, court personnel, etc.) might be looking for information, and map or relate our ontology to other systems that incorporate those users’ perspectives? How could we develop an ontology that contemplates relating to primary law?

Rights issues: Are there rights issues involved in adaptations or derivations of others’ ontologies? How would we want to handle rights issues/licenses respecting the ontology that we develop? [21] Hopefully, the answer is freely and openly!

So what do you think?

Hopefully, this post will spur a discussion that could be continued on this blog or in another forum. In any event, law libraries should be rethinking their roles in the production of law review metadata. Law libraries should be considering how the evolution of the Semantic Web and cataloging standards might impact how they provide support for their own institution’s journals.

NOTES

This post is based in part on two draft papers: Benjamin Keele and Michelle Pearse, How Law Libraries Can Help Law Journals Publish Better (poster session presented during the 2011 AALL Annual Meeting in Philadelphia, PA on July 23-26, 2011), and Michelle Pearse, Whither the Future of Law Journal Indexing?.

[1] Richard A. Danner, Kelly Leong, and Wayne Miller, The Durham Statement Two Years Later: Open Access in the Law School Journal Environment, 103 Law Library Journal 39, 52 (2011), http://scholarship.law.duke.edu/faculty_scholarship/2358/; Implementing the Durham Statement: Best Practices for Open Access Law Journals Conference, http://www.law.duke.edu/libtech/openaccess/conference2010 (October 22, 2010).

[2] Tom Boone, Librarians Key to Open Access Electronic Law Reviews, http://tomboone.com/library-laws/2009/09/librarians-key-open-access-electronic-law-reviews (September 3, 2009) ; Sarah Glassmeyer, Getting to Durham Compliance, SarahGlassmeyer(Dot)Com, http://sarahglassmeyer.com/?p=442 (April 26, 2010); Edward T. Hart, Indexing Open Access Law Journals…or Maybe Not, 38 International Journal of Legal Information 19 (2010), http://scholarship.law.cornell.edu/ijli/vol38/iss1/5/.

[3] Dr. Nuria Casellas, Semantic Enhancement of Legal Information: Are We Up for the Challenge?, VoxPopuLII, http://blog.law.cornell.edu/voxpop/2010/02/15/semantic-enhancement-of-legal-information%e2%80%a6-are-we-up-for-the-challenge/ (February 15, 2010).

[4] Some resources related to this topic appear at http://schema.org and http://linkeddata.org. Some argue that the Semantic Web might already be ill-fated: Janna Quitney Anderson and Lee Rainie, The Fate of the Semantic Web, http://www.pewinternet.org/~/media//Files/Reports/2010/PIP-Future-of-the-Internet-Semantic-web.pdf (Pew Research Center 2010). Tom Gruber defines an ontology as “a specification of a conceptualization”: http://www-ksl.stanford.edu/kst/what-is-an-ontology.html; Tom Gruber, in the Encyclopedia of Database Systems, Ling Kiu and M. Tamer Ozsu (Eds.), Spring-Verlag, 2009 http://tomgruber.org/writing/ontology-definition-2007.htm and http://semanticweb.org/wiki/Ontology. Joost Breuker and colleagues elaborate: “The term ‘ontology’ may have different meanings: (i) philosophical discipline; (ii) informal conceptual system; (iii) a formal semantic account; (iv) a specification of a conceptualization; (v) a representation of a conceptual system via logical theory, (vi) the vocabulary used by a logical theory, (vii) a meta-level specification of a logical theory.” J. Breuker et al., “The Flood, the Channels and the Dykes,” in Joost Breuker, Pompeu Casanovas, Michael C.A. Klein and Enrico Francesconi, eds., Law, Ontologies and the Semantic Web: Channeling the Legal Information Flood (IOS Press 2009), at 11. Adam Wyner defines “ontology” in the following way: “An ontology represents a common vocabulary and organization of information that explicitly, formally, and generally specifies a conceptualization of a given domain. Ontologies are related to knowledge management (cf. Rusanow’s ‘Knowledge Management and the Smarter Lawyer’) and taxonomies (cf. Sherwin’s article ‘Legal Taxonomies’). But an ontology is a more specific, explicit and formal representation of knowledge than provided by KM [knowledge management]; and it is richer and more flexible than a taxonomy….In making an ontology, one turns tacit expert knowledge into explicit representations that can be shared, tested and modified by people as well as processed by a computer.” Dr. Adam Z. Wyner, “Legal Concepts Spin a Semantic Web”, Law Technology News, http://www.law.com/jsp/lawtechnologynews10/PubArticleLTN.jsp?id=1202431256007&slreturn=1 (June 8, 2009). Dr. Núria Casellas gives a good explanation of the Semantic Web and ontologies: Dr. Núria Casellas, Semantic Enhancement of Legal Information: Are We Up for the Challenge? http://blog.law.cornell.edu/voxpop/2010/02/15/semantic-enhancement-of-legal-information%e2%80%a6-are-we-up-for-the-challenge/ (February 15, 2010).

[5] Christopher A. Welty and Jessica Jenkins, Formal Ontology for Subject, 31 Journal of Knowledge and Data Engineering 155 (1999) (also available at http://www.cs.vassar.edu/~weltyc/papers/subjects/subject.html); Hope A. Olson, The Power to Name: Locating the Limits of Subject Representation in Libraries (Kluwer Academic Publishers, 2002); Knowledge Representation with Ontologies: Present Challenges – Future Possibilities, 65 International Journal of Human-Computer Studies 563 (2007), doi: 10.1016/j.ijhcs.2007.04.003.

[6] “In the subfield of computer science and information science known as Knowledge Representation, the term ‘ontology’ refers to a consensual and reusable vocabulary of identified concepts and their relationships regarding some phenomena of the world, which is made explicit in a machine-readable language. Ontologies may be regarded as advanced taxonomical structures, where concepts are formalized as classes and defined with axioms, enriched with description of attributes or constraints, and properties.” Dr. Núria Casellas, Semantic Enhancement of legal Information: Are We Up for the Challenge? http://blog.law.cornell.edu/voxpop/2010/02/15/semantic-enhancement-of-legal-information%e2%80%a6-are-we-up-for-the-challenge/. See also Dr. Adam Z. Wyner, “Legal Concepts Spin a Semantic Web”, Law Technology News, http://www.law.com/jsp/lawtechnologynews10/PubArticleLTN.jsp?id=1202431256007&slreturn=1 (June 8, 2009) (suggesting Web-based collaborative ontology development where legal professionals contribute to a free, open ontology for law); Dr. Adam Z. Wyner, Weaving the Legal Semantic Web with Natural Language Processing, VoxPopuLII, http://blog.law.cornell.edu/voxpop/2010/05/17/weaving-the-legal-semantic-web-with-natural-language-processing/ (May 17, 2010).

[7] Bill Cope, Mary Kalantzis and Liam Magee, Towards a Semantic Web: Connecting Knowledge in Academic Research (Chandos Publishing 2011), at 72 (noting several studies on investigating approaches and software); A Holistic Approach to Collaborative Ontology Development Based on Change Management, 9 Web Semantics: Science, Services and Agents on the World Wide Web 299 (2011), doi:10.1016/j.websem.2011.06.007; “Ontologies can be designed by means of methods such as…encompassing top-down expertise elicitation from humans, bottom-up learning from documents, and middle-out application of design patterns, which can be specialized from domain-independent ontologies, extracted from best practices, existing ontologies or other knowledge sources, as well as learnt from conceptual invariances found in experts’ documents.” Aldo Gangemi, “Introducing Pattern-Based Design for Legal Ontologies,” in Joost Breuker, Pompeu Casanovas, Michel C.A. Klein and Enrico Francesconi, eds., Law, Ontologies and the Semantic Web: Channelling the Information Flood (IOS Press, 2009), at 53.

[8] Enrico Francesconi, Semantic Processing of Legal Texts: Where the Language of Law Meets the Law of Language (Springer 2008).

[9] “Creating and developing ontologies requires domain expertise and the ability to capture this knowledge in a clean conceptual model.” Roberta Cruel, Olga Morozova, Markus Rhode, Elena Simperl, Katharina Siorapes, Oksana Tokarchuk, Torben Wiedenhoefer, and Fahri Yetim, Motivation Mechanisms for Participation in Human-Driven Semantic Content Creation, 1 International Journal of Knowledge Engineering and Data Mining 331 (2011), doi: 10.1504/IJKEDM.2011.040653.

[10] This approach of working with faculty and other scholars from the legal academy would be similar to the “socio-legal” referenced by Dr. Casellas in her post regarding her Institute of Law and Technology project. Dr. Adam Z. Wyner has also advocated web-based collaborative ontolology development where legal professionals contribute to a free, open ontology for law. Dr. Adam Z. Wyner, “Legal Concepts Spin a Semantic Web”, Law Technology News, http://www.law.com/jsp/lawtechnologynews10/PubArticleLTN.jsp?id=1202431256007&slreturn=1 (June 8, 2009).

[11] Stephanie Davidson, Way Beyond Legal Research: Understanding the Research Habits of Legal Scholars, 102 Law Library Journal 561 (2010), http://www.aallnet.org/main-menu/Publications/llj/LLJ-Archives/Vol-102/publljv102n04/2010-32.pdf; Richard A. Danner, Supporting Scholarship: Thoughts on the Role of the Academic Librarian, 39 Journal of Law & Education 365-386 (2010), http://scholarship.law.duke.edu/faculty_scholarship/2071/ .

[12] Robert Richards, Legal Information Systems & Legal Informatics Resources: Knowledge Representation: Legal (Selected) http://www.personal.psu.edu/rcr5122/Ontologies.html; Robert Richards, Legal Information Systems & Legal Informatics Resources: General Resources for Application to Law, http://www.personal.psu.edu/rcr5122/OntologiesGeneral.html; Joost Breuker, Pompeu Casanovas, Michael C.A. Klein, and Enrico Francesconi, eds., Law, Ontologies and the Semantic Web: Channeling the Legal Information Flood (IOS Press 2009), at 12 (table of 23 ontologies).

[13] Alan Ruttenberg et al., Life Sciences on the Semantic Web: The Neurocommons and Beyond. Briefings in Bioinformatics, 10(2): 193-204 (2009), doi: 10.1093/bib/bbp004 (“The NeuroCommons project seeks to make all scientific research materials – research articles, knowledge bases, research data, physical materials – as available and as usable as they can be. We do this by fostering practices that render information in a form that promotes uniform access by computational agents – sometimes called ‘interoperability’. We want knowledge sources to combine easily and meaningfully, enabling semantically precise queries that span multiple information sources.”).

[14] Benjamin Keele and Michelle Pearse, How Law Libraries Can Help Law Journals Publish Better (poster session presented during the 2011 AALL Annual Meeting in Philadelphia, PA on July 23-26, 2011, http://scholarship.law.wm.edu/libpubs/25/).

[15] Tom Boone, Librarians Key to Open Access Law Reviews, http://tomboone.com/library-laws/2009/09/librarians-key-open-access-electronic-law-reviews (September 3, 2009).

[16] NISO/DCMI, International Bibliographic Standards, Linked Data and the Impact on Library Cataloging (Webinar), http://www.niso.org/news/events/2011/dcmi/linked (August 24, 2011).

[17] Valeri Craigle, Legal Scholarship in the Digital Domain: A Technical Roadmap for Implementing the Durham Statement, Technical Services Law Librarian, at 1 (December 2010), http://www.library.illinois.edu/archives/e-records/aall/8501591a/news/TSLLdecember2010.pdf.

[18] See Dr. Adam Z. Wyner, “Legal Concepts Spin a Semantic Web,” Law Technology News, http://www.law.com/jsp/lawtechnologynews10/PubArticleLTN.jsp?id=1202431256007&slreturn=1 (June 8, 2009) (suggesting Web-based collaborative ontology development where legal professionals contribute to a free, open ontology for law).

[19] Wayne Miller, A Foundational Proposal for Making the Durham Statement Real, http://scholarship.law.duke.edu/faculty_scholarship/2325/ (suggesting founding an organization “whose mission is to guarantee the ongoing viability and availability of all publications that adhere to the Durham Statement’s call to action, hereinafter called the Durham Statement Foundation.”); Richard A. Danner, Kelly Leong and Wayne Miller, The Durham Statement Two Years Later: Open Access in the Law School Journal Environment, 103 Law Library Journal 39, 52 (2011), http://scholarship.law.duke.edu/faculty_scholarship/2358/ (noting that the Durham Statement “calls for law schools to end print publication in a planned and coordinated effort led by the legal education community”).

[20] Some examples include the Northeast Foreign Law Libraries Cooperative Group and “B2F2” (currently in the process of being the process of being renamed) with Boston area law librarians.

[21] John Wilbanks, “Licensing and Ontologies: Research from Creative Commons,” http://ontolog.cim3.net/file/work/IPR/OOR-IPR-01_IPR-landscape_2010-09-09/licensing-n-ontologies–JohnWilbanks-CC_20100909.pdf (September 9, 2010).

Michelle Pearse is the Research Librarian for Open Access Initiatives and Scholarly Communication at the Harvard Law School Library where she manages implementation of the law school’s open access policy for its faculty, and other projects related to scholarly communication and open access to legal information and scholarship. She is also involved in efforts to archive born-digital content for the collection, and provides research services to faculty and staff.

VoxPopuLII is edited by Judith Pratt. Editor-in-Chief is Robert Richards, to whom queries should be directed. The statements above are not legal advice or legal representation. If you require legal advice, consult a lawyer. Find a lawyer in the Cornell LII Lawyer Directory.

Semantic Enhancement of Legal Information… Are We Up for the Challenge? [Revised Repost]

Cross-language legal information retrieval, information retrieval, knowledge management, Legal knowledge representation, Legal ontologies, Legal semantic web, Linked Data, Linked Data and law, Multilingual legal information retrieval, Semantic Web and law 1 Response »

Jan 182011

[Editor’s Note: We are republishing here, with some corrections, a post by Dr. Núria Casellas that appeared earlier on VoxPopuLII.]

The organization and formalization of legal information for computer processing in order to support decision-making or enhance information search, retrieval and knowledge management is not recent, and neither is the need to represent legal knowledge in a machine-readable form. Nevertheless, since the first ideas of computerization of the law in the late 1940s, the appearance of the first legal information systems in the 1950s, and the first legal expert systems in the 1970s, claims, such as Hafner’s, that “searching a large database is an important and time-consuming part of legal work,” which drove the development of legal information systems during the 80s, have not yet been left behind.

Similar claims may be found nowadays as, on the one hand, the amount of available unstructured (or poorly structured) legal information and documents made available by governments, free access initiatives, blawgs, and portals on the Web will probably keep growing as the Web expands. And, on the other, the increasing quantity of legal data managed by legal publishing companies, law firms, and government agencies, together with the high quality requirements applicable to legal information/knowledge search, discovery, and management (e.g., access and privacy issues, copyright, etc.) have renewed the need to develop and implement better content management tools and methods.

Information overload, however important, is not the only concern for the future of legal knowledge management; other and growing demands are increasing the complexity of the requirements that legal information management systems and, in consequence, legal knowledge representation must face in the future. Multilingual search and retrieval of legal information to enable, for example, integrated search between the legislation of several European countries; enhanced laypersons’ understanding of and access to e-government and e-administration sites or online dispute resolution capabilities (e.g., BATNA determination); the regulatory basis and capabilities of electronic institutions or normative and multi-agent systems (MAS); and multimedia, privacy or digital rights management systems, are just some examples of these demands.

How may we enable legal information interoperability? How may we foster legal knowledge usability and reuse between information and knowledge systems? How may we go beyond the mere linking of legal documents or the use of keywords or Boolean operators for legal information search? How may we formalize legal concepts and procedures in a machine-understandable form?

In short, how may we handle the complexity of legal knowledge to enhance legal information search and retrieval or knowledge management, taking into account the structure and dynamic character of legal knowledge, its relation with common sense concepts, the distinct theoretical perspectives, the flavor and influence of legal practice in its evolution, and jurisdictional and linguistic differences?

These are challenging tasks, for which different solutions and lines of research have been proposed. Here, I would like to draw your attention to the development of semantic solutions and applications and the construction of formal structures for representing legal concepts in order to make human-machine communication and understanding possible.

Semantic metadata

For example, in the search and retrieval area, we still perform nowadays most legal searches in online or application databases using keywords (that we believe to be contained in the document that we are searching for), maybe together with a combination of Boolean operators, or supported with a set of predefined categories (metadata regarding, for example, date, type of court, etc.), a list of pre-established topics, thesauri (e.g., EuroVoc), or a synonym-enhanced search.

These searches rely mainly on syntactic matching, and — with the exception of searches enhanced with categories, synonyms, or thesauri — they will return only documents that contain the exact term searched for. To perform more complex searches, to go beyond the term, we require the search engine to understand the semantic level of legal documents; a shared understanding of the domain of knowledge becomes necessary.

Although the quest for the representation of legal concepts is not new, these efforts have recently been driven by the success of the World Wide Web (WWW) and, especially, by the later development of the Semantic Web. Sir Tim Berners-Lee described it as an extension of the Web “in which information is given well-defined meaning, better enabling computers and people to work in cooperation.”

Thus, the Semantic Web is envisaged as an extension of the current Web, which now comprises collaborative tools and social networks (the Social Web or Web 2.0). The Semantic Web is sometimes also referred to as Web 3.0, although there is no widespread agreement on this matter, as different visions exist regarding the enhancement and evolution of the current Web.

These efforts also include the Web of Data (or Linked Data), which relies on the existence of standard formats (URIs, HTTP and RDF) to allow the access and query of interrelated datasets, which may be granted through a SPARQL endpoint (e.g., Govtrack.us, US census data, etc.). Sharing and connecting data on the Web in compliance with the Linked Data principles enables the exploitation of content from different Web data sources with the development of search, browse, and other mashup applications. (See the Linking Open Data cloud diagram by Cyganiak and Jentzsch below.) [Editor’s Note: Legislation.gov.uk also applies Linked Data principles to legal information, as John Sheridan explains in his recent post.]

Thus, to allow semantics to be added to the current Web, new languages and tools (ontologies) were needed, as the development of the Semantic Web is based on the formal representation of meaning in order to share with computers the flexibility, intuition, and capabilities of the conceptual structures of human natural languages. In the subfield of computer science and information science known as Knowledge Representation, the term “ontology” refers to a consensual and reusable vocabulary of identified concepts and their relationships regarding some phenomena of the world, which is made explicit in a machine-readable language. Ontologies may be regarded as advanced taxonomical structures, where concepts are formalized as classes and defined with axioms, enriched with the description of attributes or constraints, and properties.

The task of developing interoperable technologies (ontology languages, guidelines, software, and tools) has been taken up by the World Wide Web Consortium (W3C). These technologies were arranged in the Semantic Web Stack according to increasing levels of complexity (like a layer cake). In this stack, higher layers depend on lower layers (and the latter are inherited from the original Web). These languages include XML (eXtensible Markup Language), a superset of HTML usually used to add structure to documents, and the so-called ontology languages: RDF/RDFS (Resource Description Framework/Schema), OWL, and OWL2 (Ontology Web Language). While the RDF language offers simple descriptive information about the resources on the Web, encoded in sets of triples of subject (a resource), predicate (a property or relation), and object (a resource or a value), RDFS allows the description of sets. OWL offers an even more expressive language to define structured ontologies (e.g. class disjointess, union or equivalence, etc.

Moreover, a specification to support the conversion of existing thesauri, taxonomies or subject headings into RDF triples has recently been published: the SKOS, Simple Knowledge Organization System standard. These specifications may be exploited in Linked Data efforts, such as the New York Times vocabularies. Also, EuroVoc, the multilingual thesaurus for activities of the EU is, for example, now available in this format.

Although there are different views in the literature regarding the scope of the definition or main characteristics of ontologies, the use of ontologies is seen as the key to implementing semantics for human-machine communication. Many ontologies have been built for different purposes and knowledge domains, for example:

OpenCyc: an open source version of the Cyc general ontology;
SUMO: the Suggested Upper Merged Ontology;
the upper ontologies PROTON (PROTo Ontology) and DOLCE (Descriptive Ontology for Linguistic and Cognitive Engineering);
the FRBRoo model (which represents bibliographic information);
the RDF representation of Dublin Core;
the Gene Ontology;
the FOAF (Friend of a Friend) ontology.

Although most domains are of interest for ontology modeling, the legal domain offers a perfect area for conceptual modeling and knowledge representation to be used in different types of intelligent applications and legal reasoning systems, not only due to its complexity as a knowledge intensive domain, but also because of the large amount of data that it generates. The use of semantically-enabled technologies for legal knowledge management could provide legal professionals and citizens with better access to legal information; enhance the storage, search, and retrieval of legal information; make possible advanced knowledge management systems; enable human-computer interaction; and even satisfy some hopes respecting automated reasoning and argumentation.

Regarding the incorporation of legal knowledge into the Web or into IT applications, or the more complex realization of the Legal Semantic Web, several directions have been taken, such as the development of XML standards for legal documentation and drafting (including Akoma Ntoso, LexML, CEN Metalex, and Norme in Rete), and the construction of legal ontologies.

Ontologizing legal knowledge

During the last decade, research on the use of legal ontologies as a technique to represent legal knowledge has increased and, as a consequence, a very interesting debate about their capacity to represent legal concepts and their relation to the different existing legal theories has arisen. It has even been suggested that ontologies could be the “missing link” between legal theory and Artificial Intelligence.

The literature suggests that legal ontologies may be distinguished by the levels of abstraction of the ideas they represent, the key distinction being between core and domain levels. Legal core ontologies model general concepts which are believed to be central for the understanding of law and may be used in all legal domains. In the past, ontologies of this type were mainly built upon insights provided by legal theory and largely influenced by normativism and legal positivism, especially by the works of Hart and Kelsen. Thus, initial legal ontology development efforts in Europe were influenced by hopes and trends in research on legal expert systems based on syllogistic approaches to legal interpretation.

More recent contributions at that level include the LKIF-Core Ontology, the LRI-Core Ontology, the DOLCE+CLO (Core Legal Ontology), and the Ontology of Fundamental Legal Concepts. Blue Scene Such ontologies usually include references to the concepts of Norm, Legal Act, and Legal Person, and may contain the formalization of deontic operators (e.g., Prohibition, Obligation, and Permission).

Domain ontologies, on the other hand, are directed towards the representation of conceptual knowledge regarding specific areas of the law or domains of practice, and are built with particular applications in mind, especially those that enable communication (shared vocabularies), or enhance indexing, search, and retrieval of legal information. Currently, most legal ontologies being developed are domain-specific ontologies, and some areas of legal knowledge have been heavily targeted, notably the representation of intellectual property rights respecting digital rights management (IPROnto Ontology, the Copyright Ontology, the Ontology of Licences, and the ALIS IP Ontology), and consumer-related legal issues (the Customer Complaint Ontology (or CContology), and the Consumer Protection Ontology). Many other well-documented ontologies have also been developed for purposes of the detection of financial fraud and other crimes; the representation of alternative dispute resolution methods, privacy compliance, patents, cases (e.g., Legal Case OWL Ontology), judicial proceedings, legal systems, and argumentation frameworks; and the multilingual retrieval of European law, among others. (See, for example, the proceedings of the JURIX and ICAIL conferences for further references.)

A socio-legal approach to legal ontology development

Thus, there are many approaches to the development of legal ontologies. Nevertheless, in the current legal ontology literature there are few explicit accounts or insights into the methods researchers use to elicit legal knowledge, and the accounts that are available reflect a lack of consensus as to the most appropriate methodology. For example, some accounts focus solely on the use of text mining techniques towards ontology learning from legal texts; while others concentrate on the analysis of legal theories and related materials to extract and formalize legal concepts. Moreover, legal ontology researchers disagree about the role that legal experts should play in ontology development and validation.

Orange Scene In this regard, at the Institute of Law and Technology, we are developing a socio-legal approach to the construction of legal conceptual models. This approach stems from our collaboration with firms, government agencies, and nonprofit organizations (and their experts, clients, and other users) for the gathering of either explicit or tacit knowledge according to their needs. This empirically-based methodology may require the modeling of legal knowledge in practice (or professional legal knowledge, PLK), and the acquisition of knowledge through ethnographic and other social science research methods, together with the extraction (and merging) of concepts from a range of different sources (acts, regulations, case law, protocols, technical reports, etc.) and their validation by both legal experts and users.

For example, the Ontology of Professional Judicial Knowledge (OPJK) was developed in collaboration with the Spanish School of the Judicary to enhance search and retrieval capabilities of a Web-based frequentl- asked-question system (IURISERVICE) containing a repository of practical knowledge for Spanish judges in their first appointment. The knowledge was elicited from an ethnographic survey in Spanish First Instance Courts. On the other hand, the Neurona Ontologies, for a data protection compliance application, are based on the knowledge of legal experts and the requirements of enterprise asset management, together with the analysis of privacy and data protection regulations and technical risk management standards.

This approach tries to take into account many of the criticisms that developers of legal knowledge-based systems (LKBS) received during the 1980s and the beginning of the 1990s, including, primarily, the lack of legal knowledge or legal domain understanding of most LKBS development teams at the time. These criticisms were rooted in the widespread use of legal sources (statutes, case law, etc.) directly as the knowledge for the knowledge base, instead of including in the knowledge base the “expert” knowledge of lawyers or law-related professionals.

Further, in order to represent knowledge in practice (PLK), legal ontology engineering could benefit from the use of social science research methods for knowledge elicitation, institutional/organizational analysis (institutional ethnography), as well as close collaboration with legal practitioners, users, experts, and other stakeholders, in order to discover the relevant conceptual models that ought to be represented in the ontologies. Moreover, I understand the participation of these stakeholders in ontology evaluation and validation to be crucial to ensuring consensus about, and the usability of, a given legal ontology.

Challenges and drawbacks

Although the use of ontologies and the implementation of the Semantic Web vision may offer great advantages to information and knowledge management, there are great challenges and problems to be overcome.

First, the problems related to knowledge acquisition techniques and bottlenecks in software engineering are inherent in ontology engineering, and ontology development is quite a time-consuming and complex task. Second, as ontologies are directed mainly towards enabling some communication on the basis of shared conceptualizations, how are we to determine the sharedness of a concept? And how are context-dependencies or (cultural) diversities to be represented? Furthermore, how can we evaluate the content of ontologies?

Current research is focused on overcoming these problems through the establishment of gold standards in concept extraction and ontology learning from texts, and the idea of collaborative development of legal ontologies, although these techniques might be unsuitable for the development of certain types of ontologies. Also, evaluation (validation, verification, and assessment) and quality measurement of ontologies are currently an important topic of research, especially ontology assessment and comparison for reuse purposes.

Regarding ontology reuse, the general belief is that the more abstract (or core) an ontology is, the less it owes to any particular domain and, therefore, the more reusable it becomes across domains and applications. This generates a usability-reusability trade-off that is often difficult to resolve.

Finally, once created, how are these ontologies to evolve? How are ontologies to be maintained and new concepts added to them?

Over and above these issues, in the legal domain there are taking place more particularized discussions: for example, the discussion of the advantages and drawbacks of adopting an empirically based perspective (bottom-up), and the complexity of establishing clear connections with legal dogmatics or general legal theory approaches (top-down). To what extent are these two different perspectives on legal ontology development incompatible? How might they complement each other? What is their relationship with text-based approaches to legal ontology modeling?

I would suggest that empirically based, socio-legal methods of ontology construction constitute a bottom-up approach that enhances the usability of ontologies, while the general legal theory-based approach to ontology engineering fosters the reusability of ontologies across multiple domains.

The scholarly discussion of legal ontology development also embraces more fundamental issues, among them the capabilities of ontology languages for the representation of legal concepts, the possibilities of incorporating a legal flavor into OWL, and the implications of combining ontology languages with the formalization of rules.

Finally, the potential value to legal ontology of other approaches, areas of expertise, and domains of knowledge construction ought to be explored, for example: pragmatics and sociology of law methodologies, experiences in biomedical ontology engineering, formal ontology approaches, and the relationships between legal ontology and legal epistemology, legal knowledge and common sense or world knowledge, expert and layperson’s knowledge, legal information and Linked Data possibilities, and legal dogmatics and political science (e.g., in e-Government ontologies).

As you may see, the challenges faced by legal ontology engineering are great, and the limitations of legal ontologies are substantial. Nevertheless, the potential of legal ontologies is immense. I believe that law-related professionals and legal experts have a central role to play in the successful development of legal ontologies and legal semantic applications.

[Editor’s Note: For many of us, the technical aspects of ontologies and the Semantic Web are unfamiliar. Yet these technologies are increasingly being incorporated into the legal information systems that we use everyday, so it’s in our interest to learn more about them. For those of us who would like a user-friendly introduction to ontologies and the Semantic Web, here are some suggestions:

Tom Gruber, Where the Social Web Meets the Semantic Web (video);
Sandro Hawke, How the Semantic Web Works;
Kevin Hemenway, The Semantic Web: 1-2-3;
Jim Hendler et al., Introduction to the Semantic Web (video);
Ivan Herman, Introduction to the Semantic Web;
Brian Lowe, Introduction to Ontologies: Adding Meaning to Metadata;
Marek Obitko, Introduction to Ontologies and Semantic Web;
Sean B. Palmer, The Semantic Web: An Introduction;
Ioana Robu et al., An Introduction to the Semantic Web for Health Sciences Librarians;
Barry Smith, Ontology: An Introduction: Video: How to Build an Ontology;
University of Manchester, CO-ODE, Tutorial: A Practical Introduction to Ontologies and OWL;
Dr. Adam Z. Wyner, Legal Ontologies Spin a Semantic Web.]

Dr. Núria Casellas is a visiting researcher at the Legal Information Institute at Cornell University. She is a researcher at the Institute of Law and Technology and an assistant professor at the UAB Law School (on leave). She has participated in several national and European-funded research projects regarding legal ontologies and legal knowledge management: these concern the acquisition of knowledge in judicial settings (IURISERVICE), modeling privacy compliance regulations (NEURONA), drafting legislation (DALOS), and the Legal Case Study of the Semantically Enabled Knowledge Technologies (SEKT VI Framework project), among others. Co-editor of the IDT Series, she holds a Law Degree from the Universitat Autònoma de Barcelona, a Master’s Degree in Health Care Ethics and Law from the University of Manchester, and a PhD (“Modelling Legal Knowledge through Ontologies. OPJK: the Ontology of Professional Judicial Knowledge”).

VoxPopuLII is edited by Judith Pratt. Editor in Chief is Robert Richards.

LexML Brazil Project

elegislation, elegislation systems, information retrieval, Legal identifiers, Legal metadata, Legal ontologies, Legal text processing, Legal XML, Legislative information systems, open source software, search 2 Responses »

Oct 152010

This post is divided into three topical sections. The first one is an introduction to the LexML Brazil Project and its unified search portal, after which some aspects related to semantic interoperability shall be presented and, at the end, we show the current work and future direction of the project.

Before going on to the aforementioned subjects, a few words about Brazil and its legislative and legal systems are necessary. Brazil is a country of continental proportions, composed of 27 states and more than five thousand municipalities, or cities, as in Brazil no distinction is made between town and city. As a federative system, each state and municipality has its own legislative chamber. While states and cities follow a unicameral system, the Federation itself has a bicameral system, with the National Congress divided into a Chamber of Deputies and the Federal Senate. These legislatures generate a great number of laws, or normative acts. The abundance of normative acts is very significant, considering that, in contrast with Common Law systems, Brazil’s legal system, based on the Civil Law, is characterized by the predominance of normative acts.

According to Edilenice Passos, “the proliferation of normative acts, of higher or lower hierarchy, eventually causes total chaos, for this big mass of juridical documents hampers the work of lawyers, of researchers, and of the very citizens, who are ruled by Brazilian laws.” Edilenice Passos also cites Arnoldo Wald, who, in 1969, was already alerting Brazilians that “the true legislative labyrinth created as a result of an inflation of statutes passed in recent years has turned the ruling Brazilian law into a patchwork, in which the mere legislative updating becomes a daily torture for a lawyer and a judge who are searching for the rules applicable to a specific subject, from among acts, supplementary acts, institutional acts, decree-laws, and other normative acts.”

Almost all Brazilian legal and legislative information is available through the Internet. However, this information is distributed among several thousand sites, each containing documents produced by a specific government institution. Thus, the relationships between acts of different institutions is not available explicitly, making it very hard to understand this “legal patchwork.”

Nowadays, much time is lost looking for this information, filtering the results of search engines. As Roy Tennant says, “Librarians like to search; everyone else likes to find,” and further adds: “People generally want to find everything they can on a topic, ranked by relevance and displayed in ways that make it easy to narrow in on their goal.”

Born to address these issues, LexML Brazil is an information network that aims to organize Brazil’s legislative and legal information. The project is an initiative of the “Comunidade TI Controle” (IT Control Community) and is being implemented by the Brazilian Federal Senate, through PRODASEN (the Senate’s special secretariat for information systems) and Interlegis (a virtual community of Brazilian legislatures).

LexML Brazil’s first product is the Legislative and Legal Information Portal, which opened on June 30, 2009, indexing 1.28 million documents. In September 2010, its index ranged through more than 1.5 million documents. By indexing the metadata collected from several institutions using the OAI-PMH protocol, the portal unifies access to a variety of legislative and legal information sources, which is a step toward the goal of guaranteeing Brazilians’ constitutional right of access to information.

LexML Portal

The LexML Portal home page layout is very simple and is similar to Google‘s main page. At this screen, it is possible to restrict the search to Legislation, Jurisprudence, or Bills.

The search results page allows the user to refine the search by using filters, according to his or her information requirements. Five filters are available: location, issuing authority, document type, date, and acronyms.

The detail page provides links to the official publication version of each document, and to other publications available in information systems of network participants, which, in this particular case, are: National Press, Presidency, Chamber of Deputies, and Federal Senate. General information about the document is available by clicking one of “Mais Detalhes (More details)” links, which directs the Web browser to the corresponding network participant’s metadata page. A service providing automatic identification of textual references can be activated by clicking the “Linker” label.

Semantic Interoperability

While systems interoperability and syntactic issues can be managed with the estabilished standards of representation, codification, and exchange (XML, METS, Unicode, OAI-PMH, etc.), structural and semantic interoperability demands the adoption of a reference model that allows the integration of several models and the use of a unified terminology for indexing different sources of information. According to Patel et al., the general purpose of semantic interoperability is “to support complex and advanced context-sensitive query processing over heterogeneous information resources.” Lack of semantic interoperability generates then the “information silos” problem, characterized by the lack of information integration and consequent inability to process complex queries.

The next section presents the design choices made by the LexML Brazil Project to address issues related to semantic interoperability using Ranganathan‘s “stratification planes” classification system, featuring: an idea plane, a verbal plane, and a notational plane.

Idea Plane

The idea plane is composed of the abstract entities of a domain, independently of how they are nominated or identified.

The metadata standards that propose to address interoperability issues do so either for a specific, restricted domain or for heterogeneous domains. Specialized metadata standards (MARC, EAD, MODS, etc.) allow different sources of information about specific domains (bibliographical or archival information) to be integrated and searched in an advanced form. On the other hand, the Dublin Core standard is one of the few that try to integrate arbitrarily heterogeneous sources using a minimum set of elements and qualifiers. Its characteristic simplicity enables easy adoption by multiple actors, but also hinders query processing, preventing the use of the rich chain of relationships among entities. The lack of generality or expressiveness of these standards precludes their use for achieving semantic interoperability of heterogeneous sources of legislative and legal information in Brazil.

An alternative is to use formal ontologies instead of metadata standards. According to Martin Doerr, “recently, more and more projects and theoreticians support the use of formal ontologies as common conceptual schema for information integration.” One such ontology, the CIDOC CRM model, was designed to help the integration, mediation, and interchange of heterogeneous cultural heritage information. It was developed in 1994 and has since been approved as the ISO 21127:2006 standard. The CIDOC CRM model is then a natural choice for conceptual schemas of legal and legislative information, if one considers that the text corpus consisting of a nation’s sources of law is a part of the nation’s cultural heritage information.

However, the CIDOC CRM “document” concept lacks the necessary detail needed to describe the relationships among the several information abstraction levels: work, expression, manifestation, and item. That requirement is fulfilled by the FRBR_ER entity-relationship model, which was considered as a reference model in earlier phases of the project (“An Adaptation of the FRBR Model to Legal Norms,” João Lima, Proceedings of the V Legislative XML Workshop, Florence, 2005) .

The FRBR_OO standard, an ontology created by a working group formed in 2003 by representatives of IFLA (International Federation of Library Associations and Institutions) and ICOM (International Council of Museums) for purposes of harmonizing both models, was adopted by the LexML project because it combines the advantages of both models while addressing their shortcomings. As such, FRBR_OO manifests a great affinity to the LexML domain (“A Time-aware Ontology for Legal Resources,” João Lima et al., Proceedings of the Tenth International ISKO Conference, 2008).

One of the great innovations of the CIDOC CRM model is the information structuring around temporal events, a central concept in the model. This contrasts with most other metadata models, which have resources as the central objects of interest. This innovative approach defines events as entities that connect actors, things (concrete and abstract), places, and time intervals.

This particular emphasis could be criticized on the ground that the user is generally interested in a specific resource, such as the text of a law. However, the result of a search for information about a law is much more relevant if it includes an organized list of events related to the resource, along with the resource itself.

The importance of choosing a suitable reference model is easily observable in the present discussion about what particular syntax to use to codify persistent identifiers — urn:lex, LegisLink, Akoma Ntoso, etc. Before reaching the syntax level, such discussions should focus first on the idea plane, where a greater potential for integration exists. A consensus reached at this level would allow great flexibility for the specification of diverse persistent identifier syntaxes.

Verbal Plane

The CIDOC CRM ontology separates the class of types and denominations from other classes. Multiple names, identifiers, and types can be attributed to all entities of the CRM, allowing any domain class to be classified by several taxonomies and be known by multiple names and identifiers.

This approach is used in LexML to represent different terms that identify the same concepts. Six classes form LexML’s uniform resource identifiers: place, authority, type of document, event, type of content, and language. To externalize the LexML vocabularies specification, we recommend, and use, the W3C SKOS (Simple Knowledge Organization System).

Notational Plane

The definition of uniform and persistent identifiers is fundamental for the creation and maintenance of an information chain. Identifiers are already part of the legal domain. For identification purposes, numbers are attributed to rulings, decisions, abridgments, and bills, allowing references by means of textual remissions. In the computational environment, the creation of persistent and uniform identifiers allows not only identification and reference, but also access to documents by means of textual hyperlinks.

Based on the experience of the Italian project Norme in Rete with respect to URN (Uniform Resource Name) identifiers, LexML defines a grammar for the construction of identifiers for legislative and legal documents in Brazil. As an example, the name “urn:lex:br:federal:lei:1993-06-21;8666” identifies, in a persistent and unique way, the “Federal Act No. 8666, of June 21, 1993.” If all information systems agree with respect to the identifiers, it is possible to share descriptive metadata, as well as information about semantic relationships, such as regulation, amendment, abrogation, etc.

The Linker service, accessible through the LexML Portal (see, e.g., Act 11.705 without linker and Act 11.705 with linker), creates hyperlinks automatically through a dynamic textual analysis that identifies textual remissions of [i.e., citations to] normative documents. These hyperlinks can be used to navigate through textual remissions.

Future Directions

LexML 1.0 consists of the Search Portal, the Resolution Service, the Persistent Identifier, and the Linker Service. The next version, LexML 2.0, will go further: it will involve the development of open source tools for managing the complete text of documents encoded according to the LexML Brazil XML Schema, which was derived from the schemas of the Akoma Ntoso Project.

The complete management of document texts in a structured form has been a goal of the project since its inception. In as early as 2000, the Federal Constitution Portal was implemented following this idea. This portal allows the user to see all the versions of the constitutional text through a timeline, with the option to see the list of historical changes [see, e.g., art. 12] and with the ability to navigate bi-directional links [for example, in art. 154, click on the blue arrows].

During the development of that portal, taking into account the various forms of XML used to encode normative texts in many countries, and especially the experience of the Italian project Norme in Rete, a decision was made to make a unified portal and a persistent identifier a priority of the LexML project. Presently, our efforts to build open source tools for management of document texts are being renewed. One of these tools, a LexML Document Editor, will enable the authoring of legal texts as if using a word processor, but producing a structured document at the end. Another tool is the Compiler, which will semi-automatically generate modified versions of documents that have been updated by other legal acts. The Consolidator will help to simplify the display of legal information — and users’ experience of the legal system — through the consolidation of several related normative acts into a single act. The Comparator will be used to display the differences between versions of a document. The last tool, the Publisher, will be used to render XML content in different formats, such as html, PDF, PDF-A, EPUB, etc., with the ability to choose different views of the same text, such as the original text, the updated text as of a specific date, etc.

Last but not least, the Information Management Committee, which is a community of practice composed of librarians, archivists, and information analysts of several institutions of the three Brazilian governmental branches, interested in the management of legal and legislative information, is responsible for the definition of the priority and long range planning of the LexML Brazil Project.

[Editor’s Note: For documentation, schemas, and controlled vocabularies respecting LexML Brazil, please see the LexML Brazil Project Website. For more information on these issues, please see the following VoxPopuLII posts: John Sheridan on Legislation.gov.uk, Ivan Mokanov on CANLII‘s innovative legal citation system, Joe Carmel on LegisLink, and Robb Shecter on OregonLaws.org.]

The LexML Brazil core team, from left to right: João Lima (joaolima at senado.gov.br) is the leader of The LexML Project. His Information Science Ph.D. thesis details many of the concepts presented here; João Holanda (jholanda at senado.gov.br) holds a BSc in History from UnB; João Rafael (jrafael at senado.gov.br) holds a MSc in Computer Science from UFMG and a BSc in Computer Science from UnB; Marcos Fragomeni (fragomeni at senado.gov.br) holds a BSc in Computer Science from UnB.

VoxPopuLII is edited by Judith Pratt. Editor in chief is Robert Richards.

Semantic Enhancement of Legal Information… Are We Up for the Challenge?

Cross-language legal information retrieval, information retrieval, knowledge management, Legal knowledge representation, Legal ontologies, Legal semantic web, Multilingual legal information retrieval, Semantic Web and law 8 Responses »

Feb 152010

The organization and formalization of legal information for computer processing in order to support decision-making or enhance information search, retrieval and knowledge management is not recent, and neither is the need to represent legal knowledge in a machine-readable form. Nevertheless, since the first ideas of computerization of the law in the late 1940s, the appearance of the first legal information systems in the 1950s, and the first legal expert systems in the 1970s, claims, such as Hafner’s, that “searching a large database is an important and time-consuming part of legal work,” which drove the development of legal information systems during the 80s, have not yet been left behind.

Similar claims may be found nowadays as, on the one hand, the amount of available unstructured (or poorly structured) legal information and documents made available by governments, free access initiatives, blawgs, and portals on the Web will probably keep growing as the Web expands. And, on the other, the increasing quantity of legal data managed by legal publishing companies, law firms, and government agencies, together with the high quality requirements applicable to legal information/knowledge search, discovery, and management (e.g., access and privacy issues, copyright, etc.) have renewed the need to develop and implement better content management tools and methods.

Information overload, however important, is not the only concern for the future of legal knowledge management; other and growing demands are increasing the complexity of the requirements that legal information management systems and, in consequence, legal knowledge representation must face in the future. Multilingual search and retrieval of legal information to enable, for example, integrated search between the legislation of several European countries; enhanced laypersons’ understanding of and access to e-government and e-administration sites or online dispute resolution capabilities (e.g., BATNA determination); the regulatory basis and capabilities of electronic institutions or normative and multi-agent systems (MAS); and multimedia, privacy or digital rights management systems, are just some examples of these demands.

How may we enable legal information interoperability? How may we foster legal knowledge usability and reuse between information and knowledge systems? How may we go beyond the mere linking of legal documents or the use of keywords or Boolean operators for legal information search? How may we formalize legal concepts and procedures in a machine-understandable form?

In short, how may we handle the complexity of legal knowledge to enhance legal information search and retrieval or knowledge management, taking into account the structure and dynamic character of legal knowledge, its relation with common sense concepts, the distinct theoretical perspectives, the flavor and influence of legal practice in its evolution, and jurisdictional and linguistic differences?

These are challenging tasks, for which different solutions and lines of research have been proposed. Here, I would like to draw your attention to the development of semantic solutions and applications and the construction of formal structures for representing legal concepts in order to make human-machine communication and understanding possible.

Semantic metadata

Nowadays, in the search and retrieval area, we still perform most legal searches in online or application databases using keywords (that we believe to be contained in the document that we are searching for), maybe together with a combination of Boolean operators, or supported with a set of predefined categories (metadata regarding, for example, date, type of court, etc.), a list of pre-established topics, thesauri (e.g., EUROVOC), or a synonym-enhanced search.

These searches rely mainly on syntactic matching, and — with the exception of searches enhanced with categories, synonyms, or thesauri — they will return only documents that contain the exact term searched for. To perform more complex searches, to go beyond the term, we require the search engine to understand the semantic level of legal documents; a shared understanding of the domain of knowledge becomes necessary.

Although the quest for the representation of legal concepts is not new, these efforts have recently been driven by the success of the World Wide Web (WWW) and, especially, by the later development of the Semantic Web. Sir Tim Berners-Lee described it as an extension of the Web “in which information is given well-defined meaning, better enabling computers and people to work in cooperation.”

Thus, the Semantic Web (including Linked Data efforts or the Web of Data) is envisaged as an extension of the current Web, which now also comprises collaborative tools and social networks (the Social Web or Web 2.0). The Semantic Web is sometimes also referred to as Web 3.0, although there is no widespread agreement on this matter, as different visions exist regarding the enhancement and evolution of the current Web.

Towards that shift, new languages and tools (ontologies) were needed to allow semantics to be added to the current Web, as the development of the Semantic Web is based on the formal representation of meaning in order to share with computers the flexibility, intuition, and capabilities of the conceptual structures of human natural languages. In the subfield of computer science and information science known as Knowledge Representation, the term “ontology” refers to a consensual and reusable vocabulary of identified concepts and their relationships regarding some phenomena of the world, which is made explicit in a machine-readable language. Ontologies may be regarded as advanced taxonomical structures, where concepts formalized as classes (e.g., “Actor”) are defined with axioms, enriched with the description of attributes or constraints (for example, “cardinality”), and linked to other classes through properties (e.g., “possesses” or “is_possessed_by”).

The task of developing interoperable technologies (ontology languages, guidelines, software, and tools) has been taken up by the World Wide Web Consortium (W3C). These technologies were arranged in the Semantic Web Stack according to increasing levels of complexity (like a layer cake), in the sense that higher layers depend on lower layers (and the latter are inherited from the original Web). The languages include XML (eXtensible Markup Language), a superset of HTML usually used to add structure to documents, and the so-called ontology languages: RDF (Resource Description Framework), OWL, and OWL2 (Ontology Web Language). Recently, a specification to support the conversion of existing thesauri, taxonomies or subject headings into RDF has been released (the the SKOS, Simple Knowledge Organization System standard).

Although there are different views in the literature regarding the scope of the definition or main characteristics of ontologies, the use of ontologies is seen as the key to implementing semantics for human-machine communication. Many ontologies have been built for different purposes and knowledge domains, for example:

OpenCyc,
SUMO,
PROTON,
DOLCE,
the FRBRoo model (used in the above code and graph examples),
the RDF representation of Dublin Core,
the Gene Ontology,
the Wine Ontology, and
the SemanticBible.

Although most domains are of interest for ontology modeling, the legal domain offers a perfect area for conceptual modeling and knowledge representation to be used in different types of intelligent applications and legal reasoning systems, not only due to its complexity as a knowledge intensive domain, but also because of the large amount of data that it generates. The use of semantically-enabled technologies for legal knowledge management could provide legal professionals and citizens with better access to legal information; enhance the storage, search, and retrieval of legal information; make possible advanced knowledge management systems; enable human-computer interaction; and even satisfy some hopes respecting automated reasoning and argumentation.

Regarding the incorporation of legal knowledge into the Web or into IT applications, or the more complex realization of the Legal Semantic Web, several directions have been taken, such as the development of XML standards for legal documentation and drafting (including Akoma Ntoso, LexML, CEN Metalex, and Norme in Rete), and the construction of legal ontologies.

Ontologizing legal knowledge

During the last decade, research on the use of legal ontologies as a technique to represent legal knowledge has increased and, as a consequence, a very interesting debate about their capacity to represent legal concepts and their relation to the different existing legal theories has arisen. It has even been suggested that ontologies could be the “missing link” between legal theory and Artificial Intelligence.

The literature suggests that legal ontologies may be distinguished by the levels of abstraction of the ideas they represent, the key distinction being between core and domain levels. Legal core ontologies model general concepts which are believed to be central for the understanding of law and may be used in all legal domains. In the past, ontologies of this type were mainly built upon insights provided by legal theory and largely influenced by normativism and legal positivism, especially by the works of Hart and Kelsen. Thus, initial legal ontology development efforts in Europe were influenced by hopes and trends in research on legal expert systems based on syllogistic approaches to legal interpretation.

More recent contributions at that level include the LRI-Core Ontology, the DOLCE+CLO (Core Legal Ontology), and the Ontology of Fundamental Legal Concepts Blue Scene (the basis for the LKIF-Core Ontology). Such ontologies usually include references to the concepts of Norm, Legal Act, and Legal Person, and may contain the formalization of deontic operators (e.g., Prohibition, Obligation, and Permission).

Domain ontologies, on the other hand, are directed towards the representation of conceptual knowledge regarding specific areas of the law or domains of practice, and are built with particular applications in mind, especially those that enable communication (shared vocabularies), or enhance indexing, search, and retrieval of legal information. Currently, most legal ontologies being developed are domain-specific ontologies, and some areas of legal knowledge have been heavily targeted, notably the representation of intellectual property rights respecting digital rights management (IPROnto Ontology, the Copyright Ontology, the Ontology of Licences, and the ALIS IP Ontology), and consumer-related legal issues (the Customer Complaint Ontology (or CContology), and the Consumer Protection Ontology). Many other well-documented ontologies have also been developed for purposes of the detection of financial fraud and other crimes; the representation of alternative dispute resolution methods, cases, judicial proceedings, and argumentation frameworks; and the multilingual retrieval of European law, among others. (See, for example, the proceedings of the JURIX and ICAIL conferences for further references.)

A socio-legal approach to legal ontology development

Thus, there are many approaches to the development of legal ontologies. Nevertheless, in the current legal ontology literature there are few explicit accounts or insights into the methods researchers use to elicit legal knowledge, and the accounts that are available reflect a lack of consensus as to the most appropriate methodology. For example, some accounts focus solely on the use of legal text mining and statistical analysis, in which ontologies are built by means of machine learning from legal texts; while others concentrate on the analysis of legal theories and related materials. Moreover, legal ontology researchers disagree about the role that legal experts should play in ontology validation.