Multilingual legal information retrieval

About LII / Get the law / Find a lawyer / Legal Encyclopedia / Help Out

Envitool: Helps companies comply with EHS and CSR

Electronic legal publishing, Legal publishing, legal translation, Multilingual legal information retrieval No Responses »

Mar 302012

The Swedish legal publisher Notisum AB has been on the Swedish market for online legal publishing since 1996. Our Internet-based law book at www.notisum.se is read by more than 50,000 persons per week and our customers range from municipalities and government institutions to Swedish multinationals.

Now we are heading for China, and I would like to share with you some practical experiences from this highly dynamic market and our challenges in trying to conquer it.

The case for a legal monitoring tool, codenamed “EnviTool”

In close co-operation with our customers, we had developed a set of specialized Internet based tools in Sweden for supporting the process of legal compliance and legal information sharing within big organizations. The key driver of these needs was the growing number of certificates according to the international environmental management standard ISO 14001:2004.

ISO 14001 is a worldwide industry standard to help companies to improve their environmental performance through the implementation of an environmental management system. There is much to say about management systems. Continuous improvement is the heart of the matter–it is all about doing the right things right. Establish a plan, do what you planned, check your results and then start all over by correcting your plans. Plan, Do, Check, Act.

According to the standard, you have to identify the relevant environmental legislation for your organization. You need access to those laws and regulations, and you have to keep an updated list. You should also make the information available to the people of your organization.

By providing an online legal register, monitored for changes, with a whole set of information sharing and workflow features, Notisum helps the certified companies to comply with the environmental legislation.

We developed this system step by step. When it came to going outside the borders of the Kingdom of Sweden, we changed the name from Rättsnätet+Miljö to EnviTool.

The case for China

Sweden is a country of very high penetration of the ISO 14001 standard, and the use of the standard is in a mature phase in most organizations. China, on the other hand, is number one in the world, with more than 70,000 certificates issued. The growth is double-digit. So China is the place to be if you have products for this specific customer group. The users of the standard are yet immature in China, so we knew there were some challenges out there.

The market for legal information tools is overall immature in China and legal compliance is not always on top of the manager’s priority lists. However, Notisum took the first steps, starting in 2009, to take on the challenge to make China our second home market. Many challenges, expected and unexpected, were waiting for EnviTool.

Step one – the product

Like many commercial ventures, the EnviTool project was the result of a randomly started chain of events. Our Swedish CEO was playing golf with a professor at KTH, the Royal Institute of Technology in Stockholm. The professor was in charge of a student exchange program between National University of Singapore (NUS) and KTH. We were asked to host an internship for an ambitious computer science student in our company for one academic year.

The internship was successful, our student was doing a great job and we learned a lot about Asia and the Chinese culture. We have now hosted three excellent NUS students from Singapore, all good representatives of their university and their country. And all of them bilingual English and Chinese. That’s when we decided that China would be an interesting market to try. And yes – China is far away from Sweden, it is terribly big and it was really too large a challenge for our company. We wanted to try anyway, with the hope that Singapore could be the bridge for us.

We decided to start a subsidiary in Singapore and so we did. It is easy, by the way. According to the World Bank, Singapore ranks number one in the world in ease of doing business. Coming from Sweden, ranked number 50 in the world in terms of how easily you pay your taxes, I had an almost religious moment when we got a letter of gratitude from the Singaporean tax authorities after paying our taxes. Not so in Sweden, I may add…

With the first NUS intern now as our first employee, we started translating and adapting our internet tool together with our development manager in Sweden. The technological challenges were there, of course. We base our technology on the Microsoft.NET platform, but the support for the simplified Chinese character set was not totally implemented everywhere. Multi-language support was developed, and plenty were the occasions in the beginning when Swedish words popped up unexpectedly. The search function in Chinese is different in EnviTool and the relations between the legislative documents were so different from the Swedish and European law that we had to re-design our database structure.

Step two – the market research

With good help from the Swedish Trade Council in China, we did market research to see if there could be a similar market in China and if our business model could work.

After three journeys and two projects together with the trade council, we decided to give it a try. The EnviTool China project was about to take off. Learning to eat properly with chopsticks was part of the experience. Learning to appreciate the Chinese food was easier although there are some zoological challenges there too, outside the scope of this blog entry.

At this point in time we also employed a Chinese/Swedish project manager with extensive knowledge and experience in the field.

Step three – the content

Translating the tool to Chinese and English was the easy part. When it came to the content, we had to throw out everything from Sweden and put in Chinese legislation and comments. We soon found interesting challenges.

Our first experience of the Chinese legal tradition,which is in many ways different from where we come from, was the search for a standard for citations. In the Swedish databases we had successfully used computer software to automatically find citations, law titles, cross references and other document data. It became clear to us that there were no shortcuts in the Chinese material. We had to input all data manually.

We decided to restrict the information to cover relevant legislation in the EHS (Environmental, Health & Safety) and CSR (Corporate Social Responsibility) field and to concentrate on the national level with some provincial/municipal areas like Beijing and Shanghai. The EHS/CSR users are professionals in their field of work and their industries. They are not lawyers and not very used to legal information systems. EnviTool were developed with EHS/CSR managers in our minds. We wrote the editorial content to suit the needs of our target audience.

We realized that we needed a partner in China to provide fast and timely information. In ChinaLawInfo, established by Peking University in association with the university’s Legal Information Center, we found a great partner. They are the most important legal information provider in China and we saw that Notisum of Sweden and ChinaLawInfo had many similarities in experience and way of working. Yes, we are small and they are big, but that goes for Sweden and China all over. So EnviTool now provides the EHS/CSR laws and regulations from both ChinaLawInfo and government sources. We also have an on-going editorial co-operation in Beijing.

By now we also had good content. The EnviTool Internet service and database, provided from our Singapore company servers, were released in its first version in the fall of 2010.

Step four – market introduction

If company start-up was a short track in Singapore, it was a longer journey in the world’s second biggest economy. After having tried 50 other names, Envitool finally was translated to 安纬同 in Chinese and we got the business permit in August 2011.

We employed the people we needed and found a partner to help us with HR and finance issues. Since then we have started our sales and marketing activities, moving slowly forward. The use of legal information tools served from Singapore is combined with management consulting from our team in Shanghai. We provide training in using the tool and can assist the clients in finding the laws and regulations relevant to their operation.

The second generation of the site is up and running at www.envitool.com and we are proud to have customers from China, the US, Japan and four different European countries.

What we have learned and what we think of the future

To get to know China and the Chinese people is of course one part of the fun. Being a European, you make many mistakes, sometimes because of language, sometimes cultural.

One example of this confusion was when I intervened in the editorial process. In EnviTool we provide bi-lingual Chinese/English short and long comments to laws and regulations. In the Swedish service, which I am more familiar with, the short comment is rendered in italics with the longer comment below in plain text. In the English version of the comments in EnviTool, the short one was not in italics. I complained and our programmer quickly changed this. Shortly thereafter, at a customer meeting, I showed the comments, now in Chinese language version. (I don’t understand a word of Chinese.) Can you imagine Chinese characters in italics? I can tell you, it makes no sense and it looks bad. That was the language mistake. The cultural mistake was managerial. A Swedish employee would have told me how stupid I were, if I came up with such a bad idea. The Asian employee (highly intelligent and highly educated) probably saw the problem and maybe thought “the boss is more stupid than usual, but he is my boss so I have better do what he tells me!”. A lot to learn, many aspects to consider.

To conclude, the start-up was a bit slow because of the red tape but so far, our government contacts have been smooth. We have felt very welcome at the Chinese authorities like the Ministry of Environmental Protection and local governments. In the end, our goals are similar: better environmental and occupational health & safety legal compliance – better environment and better life for the citizens.

We know it will take a long time for us to get the knowledge and experience needed to be a significant player in the Chinese market, and we are prepared to stay there and step by step build our presence. It took many years to build a loyal and substantial customer base in Sweden. It will take even longer in China.

Magnus Svernlöv is the founder and chairman of the Swedish online legal publisher Notisum (www.notisum.se) and its Chinese subsidiary Envitool (www.envitool.cn). He holds an MBA from INSEAD, France, a MScEE degree from Chalmers University of Technology, Sweden and a BA from the School of Business, Ecnomics and Law, University of Gothenburg, Sweden. He welcomes any comment or feedback to ms@notisum.se

VoxPopuLII is edited by Judith Pratt. Editors-in-Chief are Stephanie Davidson and Christine Kirchberger, to whom queries should be directed. The information above should not be considered legal advice. If you require legal representation, please consult a lawyer.

Semantic Enhancement of Legal Information… Are We Up for the Challenge? [Revised Repost]

Cross-language legal information retrieval, information retrieval, knowledge management, Legal knowledge representation, Legal ontologies, Legal semantic web, Linked Data, Linked Data and law, Multilingual legal information retrieval, Semantic Web and law 1 Response »

Jan 182011

[Editor’s Note: We are republishing here, with some corrections, a post by Dr. Núria Casellas that appeared earlier on VoxPopuLII.]

The organization and formalization of legal information for computer processing in order to support decision-making or enhance information search, retrieval and knowledge management is not recent, and neither is the need to represent legal knowledge in a machine-readable form. Nevertheless, since the first ideas of computerization of the law in the late 1940s, the appearance of the first legal information systems in the 1950s, and the first legal expert systems in the 1970s, claims, such as Hafner’s, that “searching a large database is an important and time-consuming part of legal work,” which drove the development of legal information systems during the 80s, have not yet been left behind.

Similar claims may be found nowadays as, on the one hand, the amount of available unstructured (or poorly structured) legal information and documents made available by governments, free access initiatives, blawgs, and portals on the Web will probably keep growing as the Web expands. And, on the other, the increasing quantity of legal data managed by legal publishing companies, law firms, and government agencies, together with the high quality requirements applicable to legal information/knowledge search, discovery, and management (e.g., access and privacy issues, copyright, etc.) have renewed the need to develop and implement better content management tools and methods.

Information overload, however important, is not the only concern for the future of legal knowledge management; other and growing demands are increasing the complexity of the requirements that legal information management systems and, in consequence, legal knowledge representation must face in the future. Multilingual search and retrieval of legal information to enable, for example, integrated search between the legislation of several European countries; enhanced laypersons’ understanding of and access to e-government and e-administration sites or online dispute resolution capabilities (e.g., BATNA determination); the regulatory basis and capabilities of electronic institutions or normative and multi-agent systems (MAS); and multimedia, privacy or digital rights management systems, are just some examples of these demands.

How may we enable legal information interoperability? How may we foster legal knowledge usability and reuse between information and knowledge systems? How may we go beyond the mere linking of legal documents or the use of keywords or Boolean operators for legal information search? How may we formalize legal concepts and procedures in a machine-understandable form?

In short, how may we handle the complexity of legal knowledge to enhance legal information search and retrieval or knowledge management, taking into account the structure and dynamic character of legal knowledge, its relation with common sense concepts, the distinct theoretical perspectives, the flavor and influence of legal practice in its evolution, and jurisdictional and linguistic differences?

These are challenging tasks, for which different solutions and lines of research have been proposed. Here, I would like to draw your attention to the development of semantic solutions and applications and the construction of formal structures for representing legal concepts in order to make human-machine communication and understanding possible.

Semantic metadata

For example, in the search and retrieval area, we still perform nowadays most legal searches in online or application databases using keywords (that we believe to be contained in the document that we are searching for), maybe together with a combination of Boolean operators, or supported with a set of predefined categories (metadata regarding, for example, date, type of court, etc.), a list of pre-established topics, thesauri (e.g., EuroVoc), or a synonym-enhanced search.

These searches rely mainly on syntactic matching, and — with the exception of searches enhanced with categories, synonyms, or thesauri — they will return only documents that contain the exact term searched for. To perform more complex searches, to go beyond the term, we require the search engine to understand the semantic level of legal documents; a shared understanding of the domain of knowledge becomes necessary.

Although the quest for the representation of legal concepts is not new, these efforts have recently been driven by the success of the World Wide Web (WWW) and, especially, by the later development of the Semantic Web. Sir Tim Berners-Lee described it as an extension of the Web “in which information is given well-defined meaning, better enabling computers and people to work in cooperation.”

Thus, the Semantic Web is envisaged as an extension of the current Web, which now comprises collaborative tools and social networks (the Social Web or Web 2.0). The Semantic Web is sometimes also referred to as Web 3.0, although there is no widespread agreement on this matter, as different visions exist regarding the enhancement and evolution of the current Web.

These efforts also include the Web of Data (or Linked Data), which relies on the existence of standard formats (URIs, HTTP and RDF) to allow the access and query of interrelated datasets, which may be granted through a SPARQL endpoint (e.g., Govtrack.us, US census data, etc.). Sharing and connecting data on the Web in compliance with the Linked Data principles enables the exploitation of content from different Web data sources with the development of search, browse, and other mashup applications. (See the Linking Open Data cloud diagram by Cyganiak and Jentzsch below.) [Editor’s Note: Legislation.gov.uk also applies Linked Data principles to legal information, as John Sheridan explains in his recent post.]

Thus, to allow semantics to be added to the current Web, new languages and tools (ontologies) were needed, as the development of the Semantic Web is based on the formal representation of meaning in order to share with computers the flexibility, intuition, and capabilities of the conceptual structures of human natural languages. In the subfield of computer science and information science known as Knowledge Representation, the term “ontology” refers to a consensual and reusable vocabulary of identified concepts and their relationships regarding some phenomena of the world, which is made explicit in a machine-readable language. Ontologies may be regarded as advanced taxonomical structures, where concepts are formalized as classes and defined with axioms, enriched with the description of attributes or constraints, and properties.

The task of developing interoperable technologies (ontology languages, guidelines, software, and tools) has been taken up by the World Wide Web Consortium (W3C). These technologies were arranged in the Semantic Web Stack according to increasing levels of complexity (like a layer cake). In this stack, higher layers depend on lower layers (and the latter are inherited from the original Web). These languages include XML (eXtensible Markup Language), a superset of HTML usually used to add structure to documents, and the so-called ontology languages: RDF/RDFS (Resource Description Framework/Schema), OWL, and OWL2 (Ontology Web Language). While the RDF language offers simple descriptive information about the resources on the Web, encoded in sets of triples of subject (a resource), predicate (a property or relation), and object (a resource or a value), RDFS allows the description of sets. OWL offers an even more expressive language to define structured ontologies (e.g. class disjointess, union or equivalence, etc.

Moreover, a specification to support the conversion of existing thesauri, taxonomies or subject headings into RDF triples has recently been published: the SKOS, Simple Knowledge Organization System standard. These specifications may be exploited in Linked Data efforts, such as the New York Times vocabularies. Also, EuroVoc, the multilingual thesaurus for activities of the EU is, for example, now available in this format.

Although there are different views in the literature regarding the scope of the definition or main characteristics of ontologies, the use of ontologies is seen as the key to implementing semantics for human-machine communication. Many ontologies have been built for different purposes and knowledge domains, for example:

OpenCyc: an open source version of the Cyc general ontology;
SUMO: the Suggested Upper Merged Ontology;
the upper ontologies PROTON (PROTo Ontology) and DOLCE (Descriptive Ontology for Linguistic and Cognitive Engineering);
the FRBRoo model (which represents bibliographic information);
the RDF representation of Dublin Core;
the Gene Ontology;
the FOAF (Friend of a Friend) ontology.

Although most domains are of interest for ontology modeling, the legal domain offers a perfect area for conceptual modeling and knowledge representation to be used in different types of intelligent applications and legal reasoning systems, not only due to its complexity as a knowledge intensive domain, but also because of the large amount of data that it generates. The use of semantically-enabled technologies for legal knowledge management could provide legal professionals and citizens with better access to legal information; enhance the storage, search, and retrieval of legal information; make possible advanced knowledge management systems; enable human-computer interaction; and even satisfy some hopes respecting automated reasoning and argumentation.

Regarding the incorporation of legal knowledge into the Web or into IT applications, or the more complex realization of the Legal Semantic Web, several directions have been taken, such as the development of XML standards for legal documentation and drafting (including Akoma Ntoso, LexML, CEN Metalex, and Norme in Rete), and the construction of legal ontologies.

Ontologizing legal knowledge

During the last decade, research on the use of legal ontologies as a technique to represent legal knowledge has increased and, as a consequence, a very interesting debate about their capacity to represent legal concepts and their relation to the different existing legal theories has arisen. It has even been suggested that ontologies could be the “missing link” between legal theory and Artificial Intelligence.

The literature suggests that legal ontologies may be distinguished by the levels of abstraction of the ideas they represent, the key distinction being between core and domain levels. Legal core ontologies model general concepts which are believed to be central for the understanding of law and may be used in all legal domains. In the past, ontologies of this type were mainly built upon insights provided by legal theory and largely influenced by normativism and legal positivism, especially by the works of Hart and Kelsen. Thus, initial legal ontology development efforts in Europe were influenced by hopes and trends in research on legal expert systems based on syllogistic approaches to legal interpretation.

More recent contributions at that level include the LKIF-Core Ontology, the LRI-Core Ontology, the DOLCE+CLO (Core Legal Ontology), and the Ontology of Fundamental Legal Concepts. Blue Scene Such ontologies usually include references to the concepts of Norm, Legal Act, and Legal Person, and may contain the formalization of deontic operators (e.g., Prohibition, Obligation, and Permission).

Domain ontologies, on the other hand, are directed towards the representation of conceptual knowledge regarding specific areas of the law or domains of practice, and are built with particular applications in mind, especially those that enable communication (shared vocabularies), or enhance indexing, search, and retrieval of legal information. Currently, most legal ontologies being developed are domain-specific ontologies, and some areas of legal knowledge have been heavily targeted, notably the representation of intellectual property rights respecting digital rights management (IPROnto Ontology, the Copyright Ontology, the Ontology of Licences, and the ALIS IP Ontology), and consumer-related legal issues (the Customer Complaint Ontology (or CContology), and the Consumer Protection Ontology). Many other well-documented ontologies have also been developed for purposes of the detection of financial fraud and other crimes; the representation of alternative dispute resolution methods, privacy compliance, patents, cases (e.g., Legal Case OWL Ontology), judicial proceedings, legal systems, and argumentation frameworks; and the multilingual retrieval of European law, among others. (See, for example, the proceedings of the JURIX and ICAIL conferences for further references.)

A socio-legal approach to legal ontology development

Thus, there are many approaches to the development of legal ontologies. Nevertheless, in the current legal ontology literature there are few explicit accounts or insights into the methods researchers use to elicit legal knowledge, and the accounts that are available reflect a lack of consensus as to the most appropriate methodology. For example, some accounts focus solely on the use of text mining techniques towards ontology learning from legal texts; while others concentrate on the analysis of legal theories and related materials to extract and formalize legal concepts. Moreover, legal ontology researchers disagree about the role that legal experts should play in ontology development and validation.

Orange Scene In this regard, at the Institute of Law and Technology, we are developing a socio-legal approach to the construction of legal conceptual models. This approach stems from our collaboration with firms, government agencies, and nonprofit organizations (and their experts, clients, and other users) for the gathering of either explicit or tacit knowledge according to their needs. This empirically-based methodology may require the modeling of legal knowledge in practice (or professional legal knowledge, PLK), and the acquisition of knowledge through ethnographic and other social science research methods, together with the extraction (and merging) of concepts from a range of different sources (acts, regulations, case law, protocols, technical reports, etc.) and their validation by both legal experts and users.

For example, the Ontology of Professional Judicial Knowledge (OPJK) was developed in collaboration with the Spanish School of the Judicary to enhance search and retrieval capabilities of a Web-based frequentl- asked-question system (IURISERVICE) containing a repository of practical knowledge for Spanish judges in their first appointment. The knowledge was elicited from an ethnographic survey in Spanish First Instance Courts. On the other hand, the Neurona Ontologies, for a data protection compliance application, are based on the knowledge of legal experts and the requirements of enterprise asset management, together with the analysis of privacy and data protection regulations and technical risk management standards.

This approach tries to take into account many of the criticisms that developers of legal knowledge-based systems (LKBS) received during the 1980s and the beginning of the 1990s, including, primarily, the lack of legal knowledge or legal domain understanding of most LKBS development teams at the time. These criticisms were rooted in the widespread use of legal sources (statutes, case law, etc.) directly as the knowledge for the knowledge base, instead of including in the knowledge base the “expert” knowledge of lawyers or law-related professionals.

Further, in order to represent knowledge in practice (PLK), legal ontology engineering could benefit from the use of social science research methods for knowledge elicitation, institutional/organizational analysis (institutional ethnography), as well as close collaboration with legal practitioners, users, experts, and other stakeholders, in order to discover the relevant conceptual models that ought to be represented in the ontologies. Moreover, I understand the participation of these stakeholders in ontology evaluation and validation to be crucial to ensuring consensus about, and the usability of, a given legal ontology.

Challenges and drawbacks

Although the use of ontologies and the implementation of the Semantic Web vision may offer great advantages to information and knowledge management, there are great challenges and problems to be overcome.

First, the problems related to knowledge acquisition techniques and bottlenecks in software engineering are inherent in ontology engineering, and ontology development is quite a time-consuming and complex task. Second, as ontologies are directed mainly towards enabling some communication on the basis of shared conceptualizations, how are we to determine the sharedness of a concept? And how are context-dependencies or (cultural) diversities to be represented? Furthermore, how can we evaluate the content of ontologies?

Current research is focused on overcoming these problems through the establishment of gold standards in concept extraction and ontology learning from texts, and the idea of collaborative development of legal ontologies, although these techniques might be unsuitable for the development of certain types of ontologies. Also, evaluation (validation, verification, and assessment) and quality measurement of ontologies are currently an important topic of research, especially ontology assessment and comparison for reuse purposes.

Regarding ontology reuse, the general belief is that the more abstract (or core) an ontology is, the less it owes to any particular domain and, therefore, the more reusable it becomes across domains and applications. This generates a usability-reusability trade-off that is often difficult to resolve.

Finally, once created, how are these ontologies to evolve? How are ontologies to be maintained and new concepts added to them?

Over and above these issues, in the legal domain there are taking place more particularized discussions: for example, the discussion of the advantages and drawbacks of adopting an empirically based perspective (bottom-up), and the complexity of establishing clear connections with legal dogmatics or general legal theory approaches (top-down). To what extent are these two different perspectives on legal ontology development incompatible? How might they complement each other? What is their relationship with text-based approaches to legal ontology modeling?

I would suggest that empirically based, socio-legal methods of ontology construction constitute a bottom-up approach that enhances the usability of ontologies, while the general legal theory-based approach to ontology engineering fosters the reusability of ontologies across multiple domains.

The scholarly discussion of legal ontology development also embraces more fundamental issues, among them the capabilities of ontology languages for the representation of legal concepts, the possibilities of incorporating a legal flavor into OWL, and the implications of combining ontology languages with the formalization of rules.

Finally, the potential value to legal ontology of other approaches, areas of expertise, and domains of knowledge construction ought to be explored, for example: pragmatics and sociology of law methodologies, experiences in biomedical ontology engineering, formal ontology approaches, and the relationships between legal ontology and legal epistemology, legal knowledge and common sense or world knowledge, expert and layperson’s knowledge, legal information and Linked Data possibilities, and legal dogmatics and political science (e.g., in e-Government ontologies).

As you may see, the challenges faced by legal ontology engineering are great, and the limitations of legal ontologies are substantial. Nevertheless, the potential of legal ontologies is immense. I believe that law-related professionals and legal experts have a central role to play in the successful development of legal ontologies and legal semantic applications.

[Editor’s Note: For many of us, the technical aspects of ontologies and the Semantic Web are unfamiliar. Yet these technologies are increasingly being incorporated into the legal information systems that we use everyday, so it’s in our interest to learn more about them. For those of us who would like a user-friendly introduction to ontologies and the Semantic Web, here are some suggestions:

Tom Gruber, Where the Social Web Meets the Semantic Web (video);
Sandro Hawke, How the Semantic Web Works;
Kevin Hemenway, The Semantic Web: 1-2-3;
Jim Hendler et al., Introduction to the Semantic Web (video);
Ivan Herman, Introduction to the Semantic Web;
Brian Lowe, Introduction to Ontologies: Adding Meaning to Metadata;
Marek Obitko, Introduction to Ontologies and Semantic Web;
Sean B. Palmer, The Semantic Web: An Introduction;
Ioana Robu et al., An Introduction to the Semantic Web for Health Sciences Librarians;
Barry Smith, Ontology: An Introduction: Video: How to Build an Ontology;
University of Manchester, CO-ODE, Tutorial: A Practical Introduction to Ontologies and OWL;
Dr. Adam Z. Wyner, Legal Ontologies Spin a Semantic Web.]

Dr. Núria Casellas is a visiting researcher at the Legal Information Institute at Cornell University. She is a researcher at the Institute of Law and Technology and an assistant professor at the UAB Law School (on leave). She has participated in several national and European-funded research projects regarding legal ontologies and legal knowledge management: these concern the acquisition of knowledge in judicial settings (IURISERVICE), modeling privacy compliance regulations (NEURONA), drafting legislation (DALOS), and the Legal Case Study of the Semantically Enabled Knowledge Technologies (SEKT VI Framework project), among others. Co-editor of the IDT Series, she holds a Law Degree from the Universitat Autònoma de Barcelona, a Master’s Degree in Health Care Ethics and Law from the University of Manchester, and a PhD (“Modelling Legal Knowledge through Ontologies. OPJK: the Ontology of Professional Judicial Knowledge”).

VoxPopuLII is edited by Judith Pratt. Editor in Chief is Robert Richards.

Semantic Enhancement of Legal Information… Are We Up for the Challenge?

Cross-language legal information retrieval, information retrieval, knowledge management, Legal knowledge representation, Legal ontologies, Legal semantic web, Multilingual legal information retrieval, Semantic Web and law 8 Responses »

Feb 152010

The organization and formalization of legal information for computer processing in order to support decision-making or enhance information search, retrieval and knowledge management is not recent, and neither is the need to represent legal knowledge in a machine-readable form. Nevertheless, since the first ideas of computerization of the law in the late 1940s, the appearance of the first legal information systems in the 1950s, and the first legal expert systems in the 1970s, claims, such as Hafner’s, that “searching a large database is an important and time-consuming part of legal work,” which drove the development of legal information systems during the 80s, have not yet been left behind.

Similar claims may be found nowadays as, on the one hand, the amount of available unstructured (or poorly structured) legal information and documents made available by governments, free access initiatives, blawgs, and portals on the Web will probably keep growing as the Web expands. And, on the other, the increasing quantity of legal data managed by legal publishing companies, law firms, and government agencies, together with the high quality requirements applicable to legal information/knowledge search, discovery, and management (e.g., access and privacy issues, copyright, etc.) have renewed the need to develop and implement better content management tools and methods.

Information overload, however important, is not the only concern for the future of legal knowledge management; other and growing demands are increasing the complexity of the requirements that legal information management systems and, in consequence, legal knowledge representation must face in the future. Multilingual search and retrieval of legal information to enable, for example, integrated search between the legislation of several European countries; enhanced laypersons’ understanding of and access to e-government and e-administration sites or online dispute resolution capabilities (e.g., BATNA determination); the regulatory basis and capabilities of electronic institutions or normative and multi-agent systems (MAS); and multimedia, privacy or digital rights management systems, are just some examples of these demands.

How may we enable legal information interoperability? How may we foster legal knowledge usability and reuse between information and knowledge systems? How may we go beyond the mere linking of legal documents or the use of keywords or Boolean operators for legal information search? How may we formalize legal concepts and procedures in a machine-understandable form?

In short, how may we handle the complexity of legal knowledge to enhance legal information search and retrieval or knowledge management, taking into account the structure and dynamic character of legal knowledge, its relation with common sense concepts, the distinct theoretical perspectives, the flavor and influence of legal practice in its evolution, and jurisdictional and linguistic differences?

These are challenging tasks, for which different solutions and lines of research have been proposed. Here, I would like to draw your attention to the development of semantic solutions and applications and the construction of formal structures for representing legal concepts in order to make human-machine communication and understanding possible.

Semantic metadata

Nowadays, in the search and retrieval area, we still perform most legal searches in online or application databases using keywords (that we believe to be contained in the document that we are searching for), maybe together with a combination of Boolean operators, or supported with a set of predefined categories (metadata regarding, for example, date, type of court, etc.), a list of pre-established topics, thesauri (e.g., EUROVOC), or a synonym-enhanced search.

These searches rely mainly on syntactic matching, and — with the exception of searches enhanced with categories, synonyms, or thesauri — they will return only documents that contain the exact term searched for. To perform more complex searches, to go beyond the term, we require the search engine to understand the semantic level of legal documents; a shared understanding of the domain of knowledge becomes necessary.

Although the quest for the representation of legal concepts is not new, these efforts have recently been driven by the success of the World Wide Web (WWW) and, especially, by the later development of the Semantic Web. Sir Tim Berners-Lee described it as an extension of the Web “in which information is given well-defined meaning, better enabling computers and people to work in cooperation.”

Thus, the Semantic Web (including Linked Data efforts or the Web of Data) is envisaged as an extension of the current Web, which now also comprises collaborative tools and social networks (the Social Web or Web 2.0). The Semantic Web is sometimes also referred to as Web 3.0, although there is no widespread agreement on this matter, as different visions exist regarding the enhancement and evolution of the current Web.

Towards that shift, new languages and tools (ontologies) were needed to allow semantics to be added to the current Web, as the development of the Semantic Web is based on the formal representation of meaning in order to share with computers the flexibility, intuition, and capabilities of the conceptual structures of human natural languages. In the subfield of computer science and information science known as Knowledge Representation, the term “ontology” refers to a consensual and reusable vocabulary of identified concepts and their relationships regarding some phenomena of the world, which is made explicit in a machine-readable language. Ontologies may be regarded as advanced taxonomical structures, where concepts formalized as classes (e.g., “Actor”) are defined with axioms, enriched with the description of attributes or constraints (for example, “cardinality”), and linked to other classes through properties (e.g., “possesses” or “is_possessed_by”).

The task of developing interoperable technologies (ontology languages, guidelines, software, and tools) has been taken up by the World Wide Web Consortium (W3C). These technologies were arranged in the Semantic Web Stack according to increasing levels of complexity (like a layer cake), in the sense that higher layers depend on lower layers (and the latter are inherited from the original Web). The languages include XML (eXtensible Markup Language), a superset of HTML usually used to add structure to documents, and the so-called ontology languages: RDF (Resource Description Framework), OWL, and OWL2 (Ontology Web Language). Recently, a specification to support the conversion of existing thesauri, taxonomies or subject headings into RDF has been released (the the SKOS, Simple Knowledge Organization System standard).

Although there are different views in the literature regarding the scope of the definition or main characteristics of ontologies, the use of ontologies is seen as the key to implementing semantics for human-machine communication. Many ontologies have been built for different purposes and knowledge domains, for example:

OpenCyc,
SUMO,
PROTON,
DOLCE,
the FRBRoo model (used in the above code and graph examples),
the RDF representation of Dublin Core,
the Gene Ontology,
the Wine Ontology, and
the SemanticBible.

Although most domains are of interest for ontology modeling, the legal domain offers a perfect area for conceptual modeling and knowledge representation to be used in different types of intelligent applications and legal reasoning systems, not only due to its complexity as a knowledge intensive domain, but also because of the large amount of data that it generates. The use of semantically-enabled technologies for legal knowledge management could provide legal professionals and citizens with better access to legal information; enhance the storage, search, and retrieval of legal information; make possible advanced knowledge management systems; enable human-computer interaction; and even satisfy some hopes respecting automated reasoning and argumentation.

Regarding the incorporation of legal knowledge into the Web or into IT applications, or the more complex realization of the Legal Semantic Web, several directions have been taken, such as the development of XML standards for legal documentation and drafting (including Akoma Ntoso, LexML, CEN Metalex, and Norme in Rete), and the construction of legal ontologies.

Ontologizing legal knowledge

During the last decade, research on the use of legal ontologies as a technique to represent legal knowledge has increased and, as a consequence, a very interesting debate about their capacity to represent legal concepts and their relation to the different existing legal theories has arisen. It has even been suggested that ontologies could be the “missing link” between legal theory and Artificial Intelligence.

The literature suggests that legal ontologies may be distinguished by the levels of abstraction of the ideas they represent, the key distinction being between core and domain levels. Legal core ontologies model general concepts which are believed to be central for the understanding of law and may be used in all legal domains. In the past, ontologies of this type were mainly built upon insights provided by legal theory and largely influenced by normativism and legal positivism, especially by the works of Hart and Kelsen. Thus, initial legal ontology development efforts in Europe were influenced by hopes and trends in research on legal expert systems based on syllogistic approaches to legal interpretation.

More recent contributions at that level include the LRI-Core Ontology, the DOLCE+CLO (Core Legal Ontology), and the Ontology of Fundamental Legal Concepts Blue Scene (the basis for the LKIF-Core Ontology). Such ontologies usually include references to the concepts of Norm, Legal Act, and Legal Person, and may contain the formalization of deontic operators (e.g., Prohibition, Obligation, and Permission).

Domain ontologies, on the other hand, are directed towards the representation of conceptual knowledge regarding specific areas of the law or domains of practice, and are built with particular applications in mind, especially those that enable communication (shared vocabularies), or enhance indexing, search, and retrieval of legal information. Currently, most legal ontologies being developed are domain-specific ontologies, and some areas of legal knowledge have been heavily targeted, notably the representation of intellectual property rights respecting digital rights management (IPROnto Ontology, the Copyright Ontology, the Ontology of Licences, and the ALIS IP Ontology), and consumer-related legal issues (the Customer Complaint Ontology (or CContology), and the Consumer Protection Ontology). Many other well-documented ontologies have also been developed for purposes of the detection of financial fraud and other crimes; the representation of alternative dispute resolution methods, cases, judicial proceedings, and argumentation frameworks; and the multilingual retrieval of European law, among others. (See, for example, the proceedings of the JURIX and ICAIL conferences for further references.)

A socio-legal approach to legal ontology development

Thus, there are many approaches to the development of legal ontologies. Nevertheless, in the current legal ontology literature there are few explicit accounts or insights into the methods researchers use to elicit legal knowledge, and the accounts that are available reflect a lack of consensus as to the most appropriate methodology. For example, some accounts focus solely on the use of legal text mining and statistical analysis, in which ontologies are built by means of machine learning from legal texts; while others concentrate on the analysis of legal theories and related materials. Moreover, legal ontology researchers disagree about the role that legal experts should play in ontology validation.