skip navigation
search
US Law Code

c.c. BY-SA 3.0. wikipedia.org

If you think that law isn’t written for lawyers, try reading some.  It can even start looking normal after a while (say about the length of time it takes to get through law degree).  But research on the main street impact of legal language suggests that for most people, the law is likely to be either incomprehensible or very hard to read.

This problem is a focus of a research project which a team of us at ANU and Cornell LII have been addressing over the past months (Eric McCreath (Australian National University, Research School of Computer Science), Wayne Weibel (Cornell University Law School, Legal Information Insitute), Nic Ceynowa (LII), Sara Frug (LII), Tom Bruce (LII) and myself (ANU)).  With the generous help of thousands of LII users, as part of a citizen science project, we’ve been collecting data on the readability of law as well as demographic data about the users of law.

If you are concerned about access to law, and many are, the current situation is not really good enough.  Whether you tend to ‘human rights’, ‘democratic values’, ‘economic efficiency’, ‘rule of law’ or are just wanting to make sure your hapless minions follow your every command, you’ll be able to think of a good reason why the law should be more accessible (readable) than it is.

Of course the problem has been around for a very long time, and plain language is a standing goal of many legislative drafting offices.  Reform efforts have been underway since the middle ages.  Certainly legal language has improved considerably, particularly as a result of 19th and 20th century reforms with that goal in mind.  Still, the law can’t be said to be readily accessible to the general public, in the sense of its readability.

What has changed that makes the problem more urgent today is that the general public can now at least get to the law.  That’s the revolution that’s been achieved by online publishers of the law, including the Free Access to Law Movement and official and commercial law publishers.  As the UK’s First Parliamentary Counsel observed last year:

Legislation affects us all. And increasingly, legislation is being searched for, read and used by a broad range of people. It is no longer confined to professional libraries; websites like legislation.gov.uk have made it accessible to everyone. So the digital age has made it easier for people to find the law of the land; but once they have found it, they may be baffled. The law is regarded by its users as intricate and intimidating.

In 2010, the Plain Writing Act was adopted by the US Congress with the aim of improving government writing. Sad to say, the Act itself is no model of plain language. Section, sub-section and paragraph roll on, line after line, provision after convoluted provision. In substance they say not much more than: write clearly so that the public can understand and use what you write.  Didn’t anyone see the irony?  Then again, reality check, most legislation is never read by the people who vote to make it law. Just to make sure the drawbridge was well and truly up, if you read through to the fine print at the end there is an important rider.  What happens if no one can understand what the law is supposed to mean? Well, nothing a judge can do about it.  Great aspiration, but …

A sea change could be on the way, though. The Good Law initiative is one great example of efforts to address the complexity and readability of legislation. What is significant is that how we are thinking about legal rules is changing.  Official publishers of the law are beginning to talk about the law as if it’s data.  The UK National Archives Office has even published an API — Application Programmers Interface (basically a ‘how to’ for developers who want to use the “data”).So now we’re thinking of law as data.  And we’re going to unleash computer scientists on it, to do whatever their imaginations can come up with. Bommarito and Katz‘ work on the legal code as a mathematical network is a great example of the virtually infinite possibilities.

Our own research uses the potential of computational technologies in another way. Online legal sites are not just ‘documents’.  They are places where people are actively interacting with the law. We used crowd-sourcing to engage with this audience, asking them to rate law on readability characteristics as well as exploring the demographics of who uses the law. Our aim was to develop a labelled dataset that could be used as input to machine learning. “Labelled data” is machine learning gold — hard to get, but if you can you get it, you can use it to make predictions about what human judges would say. In our case we are trying to predict whether a legal sentence will be readable or not.

In the process we learned quite a bit about the audience using the law, and about which law they use. Scouring the Google Analytics data, it became obvious that the law is not equally read. We may all be equal before the law, but the law is not equal before us. Just 37 sections of the US Code account for almost 10% of the page visits to US code pages (there are about 65,000). So a tiny fraction of the Code is being read all the time.  On the other hand there are huge swathes of the Code that hardly ever see the light of a back illuminated screen. This is not trivial news. Computer scientists love lists. Prioritised lists get their own special lectures for first year CS students — and here we have a prioritized list. You want to know what law is at the top of your priority list — the users will tell you. If you’re concerned with cleaning up the law code or making it easier to understand, there’s useful stuff here.

Ranking of sections by frequency of readership (on a logarithmic scale)

Ranking of sections by frequency of readership (on a logarithmic scale)

It will be no surprise that we found that law is harder for just about every part of the community than legal professionals.  What was surprising was that legal professionals (including law students), turn out to be a minority of those interested enough to respond, on the LII site at least.

These were just a few of the demographic insights we were able to draw.

On the machine learning front, we were able to show that machine learning can improve on traditional readability metrics  in predicting language difficulty (they’ve long been regarded as suspect in application to legal texts anyway). That said, it’s early days and we would like to extend the research we have done so far. There is a lot of potential for future research applying computational techniques to the readability of law.  A co-authored publication further describing the research introduced in this article will be presented at this year’s Law Via the Internet Conference being held at the end of September.

But while we’re thinking about it, there are other ways to think about `access’ to law.  What if instead of writing the law, it was visualized?  You know — like in pictures.  Before you storm off in contempt, note this: research is validating that pictures can improve user experience — for example in the contract space, where what your clients think of your contract can impact on your bottom line.

It’s radical enough unleashing computer scientists on legal rules. What might the law look like if we try thinking like designers?   ‘User experience’ of legal rules? That one didn’t come up in law school.  We’re in some surreally different world at this point. Designers create artefacts for people to use which are optimised for functionality, beauty and other characteristics –- not things that are meant to tell people what to do. ‘User experience’ is their kind of thinking.

As readers of Vox Pop will know, the idea of legal design is starting to get traction. Helena Haapio and Stefania Passera’s great article on legal design covers some of the field. An article they jointly published last year points out some of the benefits of visualization. Earlier this year, we worked on a joint paper exploring the feasibility of automating legal visualization. We were able to demonstrate the automation of visualization of clauses, such as a contract term clause, a liquidated damages clause or a payment clause. Visit our proof of concept site, where you can play with visualizing different options.

OK. So perhaps some of the above reads like we’re on the up-slope of the hype curve. But that of course is the fun. For those of us who’ve spent many years in the law, looking at the law from a different professional paradigm can help us see things that didn’t stand out before. It certainly enjoyable and brings a breath of fresh air to the law.

Michael CurtottiMichael Curtotti is undertaking a PhD in the Research School of Computer Science at the Australian National University.  His co-authored publications on legal informatics include: A Right to Access Implies a Right to Know:  An Open Online Platform for Readability ResearchEnhancing the Visualization of Law and A corpus of Australian contract language: description, profiling and analysis.  He holds a Bachelor of Laws and a Bachelor of Commerce from the University of New South Wales, and a Masters of International Law from the Australian National University.  He works part-time as a legal adviser to the ANU Students Association and the ANU Post-graduate & research students Association, providing free legal services to ANU students.

—————————

Other related posts on VoxPopuLII on this topic include Law in the Last-Mile: The Potential of Mobile Integration into Legal Services by Sean Martin McDonald, Incomprehension Compounded by Mistranslation – The Imperatives of Access to Legal Information in South Africa by Eve Gray and Accessible Law by Nick Holmes

VoxPopuLII is edited by Judith Pratt. Editors-in-Chief are Stephanie Davidson and Christine Kirchberger, to whom queries should be directed.

WorldLII[Editor's Note: We are republishing here, with some corrections, a post by Dr. Núria Casellas that appeared earlier on VoxPopuLII.]

The organization and formalization of legal information for computer processing in order to support decision-making or enhance information search, retrieval and knowledge management is not recent, and neither is the need to represent legal knowledge in a machine-readable form. Nevertheless, since the first ideas of computerization of the law in the late 1940s, the appearance of the first legal information systems in the 1950s, and the first legal expert systems in the 1970s, claims, such as Hafner’s, that “searching a large database is an important and time-consuming part of legal work,” which drove the development of legal information systems during the 80s, have not yet been left behind.

Similar claims may be found nowadays as, on the one hand, the amount of available unstructured (or poorly structured) legal information and documents made available by governments, free access initiatives, blawgs, and portals on the Web will probably keep growing as the Web expands. And, on the other, the increasing quantity of legal data managed by legal publishing companies, law firms, and government agencies, together with the high quality requirements applicable to legal information/knowledge search, discovery, and management (e.g., access and privacy issues, copyright, etc.) have renewed the need to develop and implement better content management tools and methods.

Information overload, however important, is not the only concern for the future of legal knowledge management; other and growing demands are increasing the complexity of the requirements that legal information management systems and, in consequence, legal knowledge representation must face in the future. Multilingual search and retrieval of legal information to enable, for example, integrated search between the legislation of several European countries; enhanced laypersons’ understanding of and access to e-government and e-administration sites or online dispute resolution capabilities (e.g., BATNA determination); the regulatory basis and capabilities of electronic institutions or normative and multi-agent systems (MAS); and multimedia, privacy or digital rights management systems, are just some examples of these demands.

How may we enable legal information interoperability? How may we foster legal knowledge usability and reuse between information and knowledge systems? How may we go beyond the mere linking of legal documents or the use of keywords or Boolean operators for legal information search? How may we formalize legal concepts and procedures in a machine-understandable form?

In short, how may we handle the complexity of legal knowledge to enhance legal information search and retrieval or knowledge management, taking into account the structure and dynamic character of legal knowledge, its relation with common sense concepts, the distinct theoretical perspectives, the flavor and influence of legal practice in its evolution, and jurisdictional and linguistic differences?

These are challenging tasks, for which different solutions and lines of research have been proposed. Here, I would like to draw your attention to the development of semantic solutions and applications and the construction of formal structures for representing legal concepts in order to make human-machine communication and understanding possible.

Semantic metadata

For example, in the search and retrieval area, we still perform nowadays most legal searches in online or application databases using keywords (that we believe to be contained in the document that we are searching for), maybe together with a combination of Boolean operators, or supported with a set of predefined categories (metadata regarding, for example, date, type of court, etc.), a list of pre-established topics, thesauri (e.g., EuroVoc), or a synonym-enhanced search.

These searches rely mainly on syntactic matching, and — with the exception of searches enhanced with categories, synonyms, or thesauri — they will return only documents that contain the exact term searched for. To perform more complex searches, to go beyond the term, we require the search engine to understand the semantic level of legal documents; a shared understanding of the domain of knowledge becomes necessary.

Although the quest for the representation of legal concepts is not new, these efforts have recently been driven by the success of the World Wide Web (WWW) and, especially, by the later development of the Semantic Web. Sir Tim Berners-Lee described it as an extension of the Web “in which information is given well-defined meaning, better enabling computers and people to work in cooperation.”

From Web 2.0 to Web 3.0

Thus, the Semantic Web is envisaged as an extension of the current Web, which now comprises collaborative tools and social networks (the Social Web or Web 2.0). The Semantic Web is sometimes also referred to as Web 3.0, although there is no widespread agreement on this matter, as different visions exist regarding the enhancement and evolution of the current Web.

These efforts also include the Web of Data (or Linked Data), which relies on the existence of standard formats (URIs, HTTP and RDF) to allow the access and query of interrelated datasets, which may be granted through a SPARQL endpoint (e.g., Govtrack.us, US census data, etc.). Sharing and connecting data on the Web in compliance with the Linked Data principles enables the exploitation of content from different Web data sources with the development of search, browse, and other mashup applications. (See the Linking Open Data cloud diagram by Cyganiak and Jentzsch below.) [Editor's Note: Legislation.gov.uk also applies Linked Data principles to legal information, as John Sheridan explains in his recent post.]

LinkedData

Thus, to allow semantics to be added to the current Web, new languages and tools (ontologies) were needed, as the development of the Semantic Web is based on the formal representation of meaning in order to share with computers the flexibility, intuition, and capabilities of the conceptual structures of human natural languages. In the subfield of computer science and information science known as Knowledge Representation, the term “ontology” refers to a consensual and reusable vocabulary of identified concepts and their relationships regarding some phenomena of the world, which is made explicit in a machine-readable language. Ontologies may be regarded as advanced taxonomical structures, Semantic Web Stackwhere concepts are formalized as classes and defined with axioms, enriched with the description of attributes or constraints, and properties.

The task of developing interoperable technologies (ontology languages, guidelines, software, and tools) has been taken up by the World Wide Web Consortium (W3C). These technologies were arranged in the Semantic Web Stack according to increasing levels of complexity (like a layer cake). In this stack, higher layers depend on lower layers (and the latter are inherited from the original Web). These languages include XML (eXtensible Markup Language), a superset of HTML usually used to add structure to documents, and the so-called ontology languages: RDF/RDFS (Resource Description Framework/Schema), OWL, and OWL2 (Ontology Web Language). While the RDF language offers simple descriptive information about the resources on the Web, encoded in sets of triples of subject (a resource), predicate (a property or relation), and object (a resource or a value), RDFS allows the description of sets. OWL offers an even more expressive language to define structured ontologies (e.g. class disjointess, union or equivalence, etc.

Moreover, a specification to support the conversion of existing thesauri, taxonomies or subject headings into RDF triples has recently been published: the SKOS, Simple Knowledge Organization System standard. These specifications may be exploited in Linked Data efforts, such as the New York Times vocabularies. Also, EuroVoc, the multilingual thesaurus for activities of the EU is, for example, now available in this format.

Although there are different views in the literature regarding the scope of the definition or main characteristics of ontologies, the use of ontologies is seen as the key to implementing semantics for human-machine communication. Many ontologies have been built for different purposes and knowledge domains, for example:

  • OpenCyc: an open source version of the Cyc general ontology;
  • SUMO: the Suggested Upper Merged Ontology;
  • the upper ontologies PROTON (PROTo Ontology) and DOLCE (Descriptive Ontology for Linguistic and Cognitive Engineering);
  • the FRBRoo model (which represents bibliographic information);
  • the RDF representation of Dublin Core;
  • the Gene Ontology;
  • the FOAF (Friend of a Friend) ontology.

Although most domains are of interest for ontology modeling, the legal domain offers a perfect area for conceptual modeling and knowledge representation to be used in different types of intelligent applications and legal reasoning systems, not only due to its complexity as a knowledge intensive domain, but also because of the large amount of data that it generates. The use of semantically-enabled technologies for legal knowledge management could provide legal professionals and citizens with better access to legal information; enhance the storage, search, and retrieval of legal information; make possible advanced knowledge management systems; enable human-computer interaction; and even satisfy some hopes respecting automated reasoning and argumentation.

Regarding the incorporation of legal knowledge into the Web or into IT applications, or the more complex realization of the Legal Semantic Web, several directions have been taken, such as the development of XML standards for legal documentation and drafting (including Akoma Ntoso, LexML, CEN Metalex, and Norme in Rete), and the construction of legal ontologies.

Ontologizing legal knowledge

During the last decade, research on the use of legal ontologies as a technique to represent legal knowledge has increased and, as a consequence, a very interesting debate about their capacity to represent legal concepts and their relation to the different existing legal theories has arisen. It has even been suggested that ontologies could be the “missing link” between legal theory and Artificial Intelligence.

The literature suggests that legal ontologies may be distinguished by the levels of abstraction of the ideas they represent, the key distinction being between core and domain levels. Legal core ontologies model general concepts which are believed to be central for the understanding of law and may be used in all legal domains. In the past, ontologies of this type were mainly built upon insights provided by legal theory and largely influenced by normativism and legal positivism, especially by the works of Hart and Kelsen. Thus, initial legal ontology development efforts in Europe were influenced by hopes and trends in research on legal expert systems based on syllogistic approaches to legal interpretation.

More recent contributions at that level include the LKIF-Core Ontology, the LRI-Core Ontology, the DOLCE+CLO (Core Legal Ontology), and the Ontology of Fundamental Legal Concepts.Blue Scene Such ontologies usually include references to the concepts of Norm, Legal Act, and Legal Person, and may contain the formalization of deontic operators (e.g., Prohibition, Obligation, and Permission).

Domain ontologies, on the other hand, are directed towards the representation of conceptual knowledge regarding specific areas of the law or domains of practice, and are built with particular applications in mind, especially those that enable communication (shared vocabularies), or enhance indexing, search, and retrieval of legal information. Currently, most legal ontologies being developed are domain-specific ontologies, and some areas of legal knowledge have been heavily targeted, notably the representation of intellectual property rights respecting digital rights management (IPROnto Ontology, the Copyright Ontology, the Ontology of Licences, and the ALIS IP Ontology), and consumer-related legal issues (the Customer Complaint Ontology (or CContology), and the Consumer Protection Ontology). Many other well-documented ontologies have also been developed for purposes of the detection of financial fraud and other crimes; the representation of alternative dispute resolution methods, privacy compliance, patents, cases (e.g., Legal Case OWL Ontology), judicial proceedings, legal systems, and argumentation frameworks; and the multilingual retrieval of European law, among others. (See, for example, the proceedings of the JURIX and ICAIL conferences for further references.)

A socio-legal approach to legal ontology development

Thus, there are many approaches to the development of legal ontologies. Nevertheless, in the current legal ontology literature there are few explicit accounts or insights into the methods researchers use to elicit legal knowledge, and the accounts that are available reflect a lack of consensus as to the most appropriate methodology. For example, some accounts focus solely on the use of text mining techniques towards ontology learning from legal texts; while others concentrate on the analysis of legal theories and related materials to extract and formalize legal concepts. Moreover, legal ontology researchers disagree about the role that legal experts should play in ontology development and validation.

Orange SceneIn this regard, at the Institute of Law and Technology, we are developing a socio-legal approach to the construction of legal conceptual models. This approach stems from our collaboration with firms, government agencies, and nonprofit organizations (and their experts, clients, and other users) for the gathering of either explicit or tacit knowledge according to their needs. This empirically-based methodology may require the modeling of legal knowledge in practice (or professional legal knowledge, PLK), and the acquisition of knowledge through ethnographic and other social science research methods, together with the extraction (and merging) of concepts from a range of different sources (acts, regulations, case law, protocols, technical reports, etc.) and their validation by both legal experts and users.

For example, the Ontology of Professional Judicial Knowledge (OPJK) was developed in collaboration with the Spanish School of the Judicary to enhance search and retrieval capabilities of a Web-based frequentl- asked-question system (IURISERVICE) containing a repository of practical knowledge for Spanish judges in their first appointment. The knowledge was elicited from an ethnographic survey in Spanish First Instance Courts. On the other hand, the Neurona Ontologies, for a data protection compliance application, are based on the knowledge of legal experts and the requirements of enterprise asset management, together with the analysis of privacy and data protection regulations and technical risk management standards.

This approach tries to take into account many of the criticisms that developers of legal knowledge-based systems (LKBS) received during the 1980s and the beginning of the 1990s, including, primarily, the lack of legal knowledge or legal domain understanding of most LKBS development teams at the time. These criticisms were rooted in the widespread use of legal sources (statutes, case law, etc.) directly as the knowledge for the knowledge base, instead of including in the knowledge base the “expert” knowledge of lawyers or law-related professionals.

Further, in order to represent knowledge in practice (PLK), legal ontology engineering could benefit from the use of social science research methods for knowledge elicitation, institutional/organizational analysis (institutional ethnography), as well as close collaboration with legal practitioners, users, experts, and other stakeholders, in order to discover the relevant conceptual models that ought to be represented in the ontologies. Moreover, I understand the participation of these stakeholders in ontology evaluation and validation to be crucial to ensuring consensus about, and the usability of, a given legal ontology.

Challenges and drawbacks

Although the use of ontologies and the implementation of the Semantic Web vision may offer great advantages to information and knowledge management, there are great challenges and problems to be overcome.

First, the problems related to knowledge acquisition techniques and bottlenecks in software engineering are inherent in ontology engineering, and ontology development is quite a time-consuming and complex task. Second, as ontologies are directed mainly towards enabling some communication on the basis of shared conceptualizations, how are we to determine the sharedness of a concept? And how are context-dependencies or (cultural) diversities to be represented? Furthermore, how can we evaluate the content of ontologies?

Collaborative Current research is focused on overcoming these problems through the establishment of gold standards in concept extraction and ontology learning from texts, and the idea of collaborative development of legal ontologies, although these techniques might be unsuitable for the development of certain types of ontologies. Also, evaluation (validation, verification, and assessment) and quality measurement of ontologies are currently an important topic of research, especially ontology assessment and comparison for reuse purposes.

Regarding ontology reuse, the general belief is that the more abstract (or core) an ontology is, the less it owes to any particular domain and, therefore, the more reusable it becomes across domains and applications. This generates a usability-reusability trade-off that is often difficult to resolve.

Finally, once created, how are these ontologies to evolve? How are ontologies to be maintained and new concepts added to them?

Over and above these issues, in the legal domain there are taking place more particularized discussions:  for example, the discussion of the advantages and drawbacks of adopting an empirically based perspective (bottom-up), and the complexity of establishing clear connections with legal dogmatics or general legal theory approaches (top-down). To what extent are these two different perspectives on legal ontology development incompatible? How might they complement each other? What is their relationship with text-based approaches to legal ontology modeling?

I would suggest that empirically based, socio-legal methods of ontology construction constitute a bottom-up approach that enhances the usability of ontologies, while the general legal theory-based approach to ontology engineering fosters the reusability of ontologies across multiple domains.

The scholarly discussion of legal ontology development also embraces more fundamental issues, among them the capabilities of ontology languages for the representation of legal concepts, the possibilities of incorporating a legal flavor into OWL, and the implications of combining ontology languages with the formalization of rules.

Finally, the potential value to legal ontology of other approaches, areas of expertise, and domains of knowledge construction ought to be explored, for example: pragmatics and sociology of law methodologies, experiences in biomedical ontology engineering, formal ontology approaches, salamander.jpgand the relationships between legal ontology and legal epistemology, legal knowledge and common sense or world knowledge, expert and layperson’s knowledge, legal information and Linked Data possibilities, and legal dogmatics and political science (e.g., in e-Government ontologies).

As you may see, the challenges faced by legal ontology engineering are great, and the limitations of legal ontologies are substantial. Nevertheless, the potential of legal ontologies is immense. I believe that law-related professionals and legal experts have a central role to play in the successful development of legal ontologies and legal semantic applications.

[Editor's Note: For many of us, the technical aspects of ontologies and the Semantic Web are unfamiliar. Yet these technologies are increasingly being incorporated into the legal information systems that we use everyday, so it's in our interest to learn more about them. For those of us who would like a user-friendly introduction to ontologies and the Semantic Web, here are some suggestions:

Dr. Núria Casellas Dr. Núria Casellas is a visiting researcher at the Legal Information Institute at Cornell University. She is a researcher at the Institute of Law and Technology and an assistant professor at the UAB Law School (on leave). She has participated in several national and European-funded research projects regarding legal ontologies and legal knowledge management: these concern the acquisition of knowledge in judicial settings (IURISERVICE), modeling privacy compliance regulations (NEURONA), drafting legislation (DALOS), and the Legal Case Study of the Semantically Enabled Knowledge Technologies (SEKT VI Framework project), among others. Co-editor of the IDT Series, she holds a Law Degree from the Universitat Autònoma de Barcelona, a Master’s Degree in Health Care Ethics and Law from the University of Manchester, and a PhD (“Modelling Legal Knowledge through Ontologies. OPJK: the Ontology of Professional Judicial Knowledge”).

VoxPopuLII is edited by Judith Pratt. Editor in Chief is Robert Richards.