{"id":380,"date":"2011-01-18T23:31:53","date_gmt":"2011-01-19T04:31:53","guid":{"rendered":"http:\/\/blog.law.cornell.edu\/voxpop\/2011\/01\/18\/semantic-enhancement-of-legal-information%e2%80%a6-are-we-up-for-the-challenge-revised-repost\/"},"modified":"2011-01-19T19:38:17","modified_gmt":"2011-01-20T00:38:17","slug":"semantic-enhancement-of-legal-information%e2%80%a6-are-we-up-for-the-challenge-revised-repost","status":"publish","type":"post","link":"https:\/\/blog.law.cornell.edu\/voxpop\/2011\/01\/18\/semantic-enhancement-of-legal-information%e2%80%a6-are-we-up-for-the-challenge-revised-repost\/","title":{"rendered":"Semantic Enhancement of Legal Information\u2026 Are We Up for the Challenge? [Revised Repost]"},"content":{"rendered":"

WorldLII<\/a>[Editor’s Note<\/i>: We are republishing here, with some corrections, a post by Dr. N\u00faria Casellas that appeared earlier on VoxPopuLII<\/i>.]<\/p>\n

The organization and formalization of legal information for computer processing in order to support decision-making or enhance information search, retrieval and knowledge management is not recent, and neither is the need to represent legal knowledge in a machine-readable form. Nevertheless, since the first ideas of computerization of the law in the late 1940s<\/a>, the appearance of the first legal information systems in the 1950s<\/a>, and the first legal expert systems in the 1970s<\/a>, claims, such as Hafner<\/a>\u2019s, that \u201csearching a large database is an important and time-consuming part of legal work,\u201d which drove the development of legal information systems during the 80s, have not yet been left behind.<\/p>\n

Similar claims may be found nowadays as, on the one hand, the amount of available unstructured (or poorly structured) legal information and documents made available by governments, free access initiatives<\/a>, blawgs<\/a>, and portals on the Web will probably keep growing as the Web expands<\/a>. And, on the other, the increasing quantity of legal data managed by legal publishing companies, law firms, and government agencies, together with the high quality requirements applicable to legal information\/knowledge search, discovery<\/a>, and management (e.g., access and privacy<\/a> issues, copyright, etc.) have renewed the need to develop and implement better content management tools and methods.<\/p>\n

Information overload, however important, is not the only concern for the future of legal knowledge management<\/a>; other and growing demands are increasing the complexity of the requirements that legal information management systems and, in consequence, legal knowledge representation<\/a> must face in the future. Multilingual<\/a> search and retrieval of legal information to enable, for example, integrated search between the legislation of several European countries; enhanced laypersons’ understanding of and access to e-government and e-administration sites or online dispute resolution capabilities (e.g., BATNA<\/a> determination); the regulatory basis and capabilities of electronic institutions<\/a> or normative and multi-agent systems (MAS<\/a>); and multimedia<\/a>, privacy or digital rights<\/a> management systems, are just some examples of these demands.<\/p>\n

How may we enable legal information interoperability<\/a>? How may we foster legal knowledge usability<\/a> and reuse<\/a> between information and knowledge systems? How may we go beyond the mere linking of legal documents or the use of keywords or Boolean operators for legal information search<\/a>? How may we formalize legal concepts and procedures in a machine-understandable form<\/a>?<\/p>\n

In short, how may we handle the complexity of legal knowledge<\/a> to enhance legal information search and retrieval or knowledge management, taking into account the structure and dynamic character of legal knowledge, its relation with common sense concepts, the distinct theoretical perspectives, the flavor and influence of legal practice in its evolution, and jurisdictional and linguistic differences<\/a>?<\/p>\n

These are challenging tasks, for which different solutions and lines of research have been proposed. Here, I would like to draw your attention to the development of semantic<\/strong> solutions and applications and the construction of formal<\/strong> structures for representing legal concepts<\/strong> in order to make human-machine communication and understanding possible.<\/p>\n

Semantic metadata<\/strong><\/p>\n

For example, in the search and retrieval area, we still perform nowadays most legal searches in online or application databases using keywords (that we believe to be contained in the document that we are searching for), maybe together with a combination of Boolean operators, or supported with a set of predefined categories (metadata regarding, for example, date, type of court, etc.), a list of pre-established topics, thesauri (e.g., EuroVoc<\/a>), or a synonym-enhanced search.<\/p>\n

These searches rely mainly on syntactic matching, and — with the exception of searches enhanced with categories, synonyms, or thesauri — they will return only documents that contain the exact term searched for. To perform more complex searches, to go beyond the term, we require the search engine to understand the semantic level of legal documents; a shared understanding<\/em> of the domain of knowledge becomes necessary.<\/p>\n

Although the quest for the representation of legal concepts is not new, these efforts have recently been driven by the success of the World Wide Web (WWW) and, especially, by the later development of the Semantic Web<\/a>. Sir Tim Berners-Lee<\/a> described it as an extension of the Web \u201cin which information is given well-defined meaning, better enabling computers and people to work in cooperation<\/a>.\u201d<\/p>\n

\"From<\/a><\/p>\n

Thus, the Semantic Web is envisaged as an extension of the current Web, which now comprises collaborative tools and social networks<\/a> (the Social Web or Web 2.0). The Semantic Web is sometimes also referred to as Web 3.0, although there is no widespread agreement on this matter, as different visions exist<\/a> regarding the enhancement and evolution of the current Web.<\/p>\n

These efforts also include<\/a> the Web of Data<\/a> (or Linked Data<\/a>), which relies on the existence of standard formats (URIs, HTTP and RDF) to allow the access and query of interrelated datasets, which may be granted through a SPARQL endpoint (e.g., Govtrack.us<\/a>, US census data<\/a>, etc.). Sharing and connecting data on the Web in compliance with the Linked Data principles enables the exploitation of content from different Web data sources with the development of search, browse, and other mashup applications. (See the Linking Open Data cloud diagram by Cyganiak and Jentzsch below.) [Editor’s Note<\/i>: Legislation.gov.uk<\/a> also applies Linked Data principles to legal information, as John Sheridan explains in his recent post<\/a>.]<\/p>\n

LinkedData<\/a><\/p>\n

Thus, to allow semantics to be added to the current Web, new languages and tools (ontologies<\/a>) were needed, as the development of the Semantic Web is based on the formal representation of meaning in order to share with computers the flexibility, intuition, and capabilities of the conceptual structures of human natural languages. In the subfield of computer science and information science known as Knowledge Representation<\/a>, the term “ontology”<\/a> refers to a consensual and reusable vocabulary of identified concepts and their relationships regarding some phenomena of the world, which is made explicit in a machine-readable language. Ontologies may be regarded as advanced taxonomical structures<\/a>, Semantic Web Stack<\/a>where concepts are formalized as classes and defined with axioms, enriched with the description of attributes or constraints, and properties.<\/p>\n

The task of developing interoperable technologies (ontology languages<\/a>, guidelines, software, and tools) has been taken up by the World Wide Web Consortium (W3C)<\/a>. These technologies were arranged in the Semantic Web Stack according to increasing levels of complexity (like a layer cake<\/i>). In this stack, higher layers depend on lower layers (and the latter are inherited from the original Web). These languages include XML (eXtensible Markup Language)<\/a>, a superset of HTML usually used to add structure to documents, and the so-called ontology languages: RDF\/RDFS (Resource Description Framework\/Schema)<\/a>, OWL<\/a>, and <\/a>OWL2<\/a> (Ontology Web Language). While the RDF language offers simple descriptive information about the resources on the Web, encoded in sets of triples of subject (a resource), predicate (a property or relation), and object (a resource or a value), RDFS allows the description of sets. OWL offers an even more expressive language to define structured ontologies (e.g. class disjointess, union or equivalence, etc.<\/p>\n

Moreover, a specification to support the conversion<\/a> of existing thesauri, taxonomies or subject headings into RDF triples has recently been published: the SKOS, Simple Knowledge Organization System<\/a> standard. These specifications may be exploited in Linked Data efforts, such as the New York Times vocabularies<\/a>. Also, EuroVoc<\/a>, the multilingual thesaurus for activities of the EU is, for example, now available in this format.<\/p>\n

Although there are different views in the literature regarding the scope of the definition or main characteristics of ontologies, the use of ontologies is seen as the key to implementing semantics for human-machine communication. Many ontologies have been built for different purposes and knowledge domains, for example:<\/p>\n