legal research » VoxPopuLII

Legal Research Ontology, Part II

Legal ontologies, legal research No Responses »

Aug 202015

My blog post last year about developing a legal research ontology was such an optimistic (i.e., naive), linear narrative. This was one of my final notes:

At this point, I am in the beginning stages of taking advantage of all the semantic web has to offer. The ontology’s classes now have subclasses. I am building the relationships between the classes and subclasses and using Protege to bring them all together.

I should have known better.

What I didn’t realize then was that I really didn’t understand anything about the semantic web. While I could use the term in a sentence and reference RDF and OWL and Protege, once you scratched the surface I was lost. Based on Sara Frug’s recommendation during a presentation at CALI Con 2014, I started reading Semantic Web for Dummies.

It has been, and continues to be, slow going. I don’t have a computer science or coding background, and so much of my project feels like trying to teach myself a new language without immersion or much of a guide. But the process of this project has become just as interesting to me as the end product. How are we equipped to teach ourselves anything? At a certain point, you just have to jump in and do something, anything, to get the project moving.

I had already identified the classes:
* Type of research material;
* Type of research problem;
* Source of law;
* Area of law;
* Legal action; and
* Final product.

I knew that each class has subclasses. Yet in my readings, as I learned how ontologies are used for constructing relationships between entities, I missed the part where I had to construct relationships between the entities. They didn’t just magically appear when you enter the terms into Protege.

I’m using Web Protege, an open-source product developed by the Stanford Center for Biomedical Informatics Research, using the OWL ontology language.

Ontology engineering is a hot topic these days, and there is a growing body of papers, tutorials, and presentations on OWL and ontology engineering. That’s also part of the problem: There’s a little too much out there. I knew that anything I would do with my ontology would happen in Protege, so I decided to start there with the extensive user documentation and user support. Their user guide takes you through setting up your first ontology with step-by-step illustrations and a few short videos. I also discovered a tutorial on the web titled Pizzas in 10 minutes.

Following the tutorial, you construct a basic ontology of pizza using different toppings and sauces. While it took me longer than 10 minutes to complete, it did give me enough familiarity with constructing relationships to take a stab at it with my ontology and its classes. Here’s what I came up with:

This representation doesn’t list every subclass; e.g., in Types of research material, I only listed primary source and in Area of law, I only listed contracts, torts and property. But it gives you an idea of how the classes relate to each other. Something I learned in building the sample pizza ontology in Protege is the importance of creating two properties: the relational “_property and the modifier_” property. The recommendation is to use has or is as prefixes1 for the properties. You can see how classes relate to each other in the above diagram as well as how classes are modified by subclasses and individuals.

I’m continuing to read Semantic Web for Dummies, and I’m currently focusing on Chapter 8: Speaking the Web Ontology Language. It has all kinds of nifty Venn diagrams and lines of computer code, and I’m working on understanding it all. This line keeps me going. However, if you’re looking for a system to draw inferences or to interpret the implications of your assertions (for example, to supply a dynamic view of your data), OWL is for you2.

One of my concerns is that a few of my subclasses belong to more than class. But the beauty of the Semantic web and OWL is that class and subclass are dynamic sets, and when you run the ontology individual members can change from one set to another. This means that Case Law can be both a subclass of Source of Law and an instance of Primary Source in the class Type of Research Material.

The way in which I set up my classes, subclasses, and the relationships between them are simple assertions3. Two equivalent classes would look like a enn diagram with the two sets as completely overlapping. This helps in dealing with synonyms. You can assert equivalence between individuals as well as classes, but it is better to set up each individual’s relationships with its classes, and then let the OWL reasoning system decide if the individuals are truly interchangeable. This is very helpful in a situation in which you are combining ontologies. There are more complicated assertions (equivalence, disjointness, and subsumption), and I am working on applying them and building out the ontology.

Next I need to figure out the characteristics of the properties relating the classes, subclasses, and individuals in my ontology: inverse, symmetric, transitive, intersection, union, complement, and restriction. As I continue to read (and reread) Semantic Web for Dummies, I am gaining a new appreciation for set theory and descriptive logic. Math seems to always have a way of finding you! I am also continuing to fill in the ontology with terms (using simple assertions), and I also need to figure out SPARQL so I can query the ontology. It still feels like one of those one step forward, two steps back endeavors, but it is interesting.

I hope to keep you posted, and I am grateful to the Vox PopuLII blog for having me back to write an update.

Amy Taylor is the Access Services Librarian and Adjunct Professor at American University Washington College of Law. Her main research interests are legal ontologies, organization of legal information and the influence of online legal research on the development of precedent. You can reach her on Twitter @taylor_amy or email: amytaylor@wcl.american.edu.

VoxPopuLII is edited by Judith Pratt. Editors-in-Chief are Stephanie Davidson and Christine Kirchberger, to whom queries should be directed.

—
1 Matthew Horridge, A Practical Guide to Building OWL Ontologies, 20, http://phd.jabenitez.com/wp-content/uploads/2014/03/A-Practical-Guide-To-Building-OWL-Ontologies-Using-Protege-4.pdf (last visited May 19, 2015.

2 Jeffrey Pollock, Semantic Web for Dummies 195 (Wiley 2009).

3 Id. at 200.

Building a Legal Research Ontology

Legal metadata, Legal ontologies, legal research 3 Responses »

Mar 192014

It’s hard for me to pin down exactly when I knew I wanted to build a legal research ontology. There was no light bulb moment; or perhaps I should say, there was no anvil falling on my head, Wile E. Coyote style. At the beginning of the fall 2012 semester, our Westlaw representative presented the newest features of Westlaw Next, including the new look of the headnotes in case law results. My first glance at it was jarring. At first I thought it was just the font and the streamlined interface, but after taking a closer look at it, I realized it was also the content.

The outline of the headnotes had been compressed. It was substantively different. Previously, each section of the key number system in your headnote was presented in outline form with indented lines and roman numerals, and you could click on any of the outline headings. In the new version, only the main key number heading and the section pertaining to the case are visible. While there is a Change View link in each case result that leads to the classic outline view, I am sceptical of it for a couple of reasons. One is that Westlaw has made the new look the default look and could at some point do away with the Change View option. Second, the new look becomes the look for each new class of law students. If style is all that is communicated by the interface, it would not be of much concern. But there is substance. There is function. How do we now communicate this substance? Should we be so dependent on vendors in legal research teaching? Given the paucity of time we have with first-year students, do we have other viable options?

These questions were in the back of my mind when I attended the LVI conference at Cornell later that semester. On the first day of the conference, I settled in to the Data Organization and Legal Informatics Track. By the end of the day, two of the presentations I heard, one on concept mapping and another on semantic web technologies using RDF and OWL, opened up a door to a new set of possibilities. One of the notes I scribbled during the conference was “ontology for westlaw problem?” I came back from the conference and began researching ontologies and ontology engineering. (I may have gone a little overboard. At last count, I have over 500 articles and book chapters.)

So what is an ontology exactly? Here’s the definition I’ve cobbled together from my readings and my subsequent translation of those readings into words I can actually understand. (Any conceptual errors are mine.) An ontology is a way to take a set of concepts and organize it in a formalized way (i.e., with standards and naming conventions and a machine-readable structure), using an ontology language that takes advantage of the semantic web. The rest of this blog post will be a more detailed description of this definition.

Before you can use the set of concepts to build the ontology, you have to define them. And when I first started thinking about this project, it was on a much grander scale. It didn’t take me long at all to realize that I could not single-handedly create a comprehensive ontology of U.S. law.

I decided to focus on what we do as legal research instructors. I’ve always thought that one of our primary duties is to show our students the big picture so they can be confident in their abilities to research in unfamiliar situations. Our teaching is complicated by the fact that very few of us have the kind of classroom time we would like, and even if we did, we are teaching concepts that students may not put to use for months afterwards. So I wanted this ontology to be something we could use to convey the big picture, as well as a tool our students could use at their point of need.

I further narrowed the focus of the ontology to what we teach 1Ls in basic legal research. We teach them how to research with primary and secondary sources (Type of Research Materials) in the broad categories of law they learn in their IL classes (Area of Law). We teach them about the types of law they will encounter (Type of Law). I also wondered if I could find a way to incorporate all the topics we teach them implicitly. Under the surface of black letter research is the knowledge that our students will be spending their summers as summer associates or summer interns. They will need to produce something tangible for a partner or a senior associate or a judge (Final Product). We’re not sending them out to do research as an intellectual exercise. Not only is something tangible expected from them, but they will also need to keep in mind that their work stems from some type of legal action (Legal Action). That legal action might be a breach of contract headed for litigation, or it might be the need to draft a contract between two parties, i.e., it could be litigation or a transaction.

Based on this focus, I had five classes: Type of Research Materials; Area of Law; Type of Law; Final Product; and Legal Action. I was fortunate enough to be able to participate in the Sixth Conference on Legal Information: Scholarship and Teaching (known as “The Boulder Conference”) with a working paper on the ontology. Drawing from his work on legal research instruction, Paul Callister suggested I add another class, Type of Research Problem. I took his advice, and I am grateful to him for his generosity. And now the classes number six.

My next task was coming up with the terms for the ontology — filling it in, so to speak. Some of the terms were almost self-evident. Types of Law include case law and statutory law and regulations. Areas of Law include torts and civil procedure and property and contracts. For others, the First Decennial Digest is out of copyright and so those terms can be used. Most volumes are available digitally from either HathiTrust or LLMC. (The rest are on a shelf in my office.) Some of the terms are outdated, but most legal concepts change gradually over time. I am also grateful to Ed Walters for sharing Fastcase search results with me (completely stripped of any identifying user data and also deduped). Between these two sources, I haven’t yet run out of terms.

Selecting the ontology language was the easiest part of the endeavor. I learned about the Ontology Web Language (OWL) at the LII conference. In my readings, I had also run across the World Wide Web Consortium (W3C), and their standards for OWL (now in two versions, OWL 1 and OWL 2). If you really want to let your inner geek out for a romp, go there and happy fun times will be had.

I also needed a program to build the ontology using the W3C standards and naming conventions. Protege is a free and open-source software program developed and distributed by Stanford University. It comes with extensive user guides. It allows for the creation, sharing and publishing of ontologies, and it uses OWL. And fortunately, a voluminous amount has been written and presented on the topic of ontology engineering, from papers and book chapters to slide decks on sites like SlideShare.

At this point, I am in the beginning stages of taking advantage of all the semantic web has to offer. The ontology’s classes now have subclasses. I am building the relationships between the classes and subclasses, and using Protege to bring them all together. I am also prototyping lesson plans that can take advantage of the ontology. For example, if you write a problem for your students that requires them to research strict tort liability for failure to warn of the danger in the use of a product, you can also use the ontology to bring in the Restatement Third of Torts: Products Liability, as well as secondary sources such as treatises. You can also tie this into whatever final product you want your students to produce: a client letter; a memo to the firm; results of research into punitive damages awards, etc. As long as you have the ontology classes set up, you can add anything to them in order to personalize your research problem.

I also hope to host the ontology on a website with a section for instructors to share lesson plans and ontology files. The files from Protege use an .owl extension, so they can be shared as easily as a pdf. All you need is a program like Protege to open the file. You could use the file as-is or modify it for any type of legal research problem. I also hope that the complete ontology, consisting of the permutations of legal research, can be available for students to query when they are researching as associates and interns.

VoxPopuLII is edited by Judith Pratt. Editors-in-Chief are Stephanie Davidson and Christine Kirchberger, to whom queries should be directed.

Future of legal services and the development of legal Knowledge Management

Disruptive legal technology, information retrieval, Innovation in legal technology, knowledge management, Legal knowledge management, legal research, Legal social networks 10 Responses »

Aug 302013

The legal profession has for long been notoriously averse to change, but now even the legal industry is affected by a new harsher reality with widespread changes impacting legal practice and client service. These changes come not merely from the aftermath of the economic downturn with price pressure and increased demands from clients, but also from the technological developments and regulatory changes that provide breeding ground for new kinds of competition. This post discusses the future of legal service, with a specific focus on how the current changes on the legal market demand a more strategic approach to knowledge management and efficient working processes and how technology is becoming more and more important as a way to develop new innovative ways to deliver legal services.

1. CHANGING LEGAL MARKET

For a long time, the legal market has been spared from some of the general business realities applicable to almost all other industries. Law has been something of a protected industry, with lawyers in a unique position as the only legitimate provider with access to legal knowledge and tools and no real competition – a “black box” exempt from normal rules of business, such as predictability in cost and time, budget restraints and value for money. After selection, the relationship with the client was controlled by the law firm, which decided almost entirely by itself how the service was to be delivered, billing it by the hour and dictating cost, pricing, staffing and strategic direction, with no need to innovate or provide cost-efficient legal services. Jordan Furlong has described this closed market more in detail and how the legal marketplace now is changing, in the series “The evolution of the legal services market stage 1-5”

But now, there are strong drivers for change affecting the legal market and rapidly forcing it out of the “black box” towards a new reality. One such driver is the regulatory changes in UK, with the Legal Services Act allowing different types of lawyer and non-lawyer to form businesses together, thereby facilitating the development of Alternative Business Structures, with external investments, in legal service providers. These regulatory changes have opened up the legal market for a new kind of competition from new entrants with disruptive business models. Unlike conventional law firms, these new providers tend to have a greater focus on rethinking legal services. They have developed both different kinds of legal services and new ways of providing them. They use technology to improve the way they connect to clients, offering new and easier ways to conduct legal tasks over the Internet, providing cloud-based customized legal documents and advice with arguments like “No surprise pricing. No hourly fees, no shocking bills.” This is a market that has gained a large interest from venture capitalists, for example by Google Venture in Rocket Lawyer (which recently also acquired competitor Law Pivot), Kleiner Perkins and Institutional Venture Partners in LegalZoom and Quotidian Ventures and others in Docracy. All this clearly indicates that there is a large market opportunity for these kinds of new legal solutions that are efficient, technology-driven and affordable to users. Other interesting new legal service or knowledge providers are VentureDocs, Docstoc and the Swedish Moretime Growth On Demand. Soon, we will probably also see global legal service providers outside the legal sphere, such as department stores or investment banks, accounting firms, insurance companies, or even Amazon.

Law firms also face a new kind of competition from Legal Process Outsourcing providers (LPOs), where legal work is exported to an outside law firm or legal support services company, often in low-wage markets overseas such as India, but also to new providers within the same country or to new brands established by the law firm itself, such as Herbert Smith’s document review centre in Belfast. The most commonly offered services are document review and legal research, but recently LPOs have started to move up the value chain by providing not only due diligence services but also the agreement drafting in M&A transactions. As reported in “LPOs Stealing Deal Work from Law Firms” alternative legal service providers are beginning to take the bread-and-butter of large law firms – handling whole mergers and acquisitions, not just the due diligence aspects of deals. Beyond cost savings, LPO has advantages like access to outside talent, 24-7 availability, and the ability to quickly scale up or cut back operations. According to the international LPO Market Study, general counsel are increasingly bypassing law firms and instructing legal process outsourcing suppliers directly. Currently worth over $1bn (£629m), the LPO market is forecast to double in size in the next two to three years.

Professor Richard Susskind with VQ Founders Ann Björk and Helena Hallgarn at VQ Forum.

A third major driver for change is the new client demands. Due to budget restraints, most general counsel face what Professor Richard Susskind refers to as the “more for less-challenge”, when clients have more legal issues to handle, but less in-house resources and less budget to spend on external advisers. This challenge has forced general counsel to examine alternative solutions, demand discounts and alternative fee arrangements, ask for predictability and metrics–all demands for added value and efficiency. When law firms are no longer the only providers in the legal market, clients have a diverse set of options to choose from for legal advice and they no longer accept hourly billing for inefficient work. General counsel are more closely reviewing external advisers and are very cost driven. More and more they turn to cost-effective solutions, like LPOs, or deploy the idea of “multi-sourcing” with the use of different legal service providers on different elements of a legal matter. Basically, the client has taken over the driver’s seat from law firms and is now dictating cost, pricing, staffing and strategic direction, which previously was in the law firm’s control. Together, these two factors — a decline in overall legal spending and new options for legal services — combine to reduce demand for the services of lawyers.

Susan Hackett on lawyers’ perception

Susan Hackett, in her key note presentation at VQ Forum 2012, described current legal market developments, which are based on shrinking demand and increasing supply, competition from non-legal sources, and a lack of experience to guide us in this rapidly changing reality. Today, many law firms still continue to work as if they can charge whatever they want for the limited services they wish to provide, which makes it difficult to profit from the more efficient and effective service delivery demanded in this competitive marketplace, while delivering greater client value. Most law firm base the lawyer compensation on lawyer activity instead of on client results. Many law firms do not even ask for feedback but simply assume that they are doing well and don’t need to change: “While 85% of partners think clients love them, only 35% of clients recommend their existing outside counsel to other clients.”

To address this “disconnect” between the lawyers’ high perception of their value, and re-connect it to what clients actually want their lawyers to do is essential in order to improve the value of long-term client relationships. Susan Hackett also pointed to the decreased client loyalty. The 2012 Altman Weil Chief Legal Officer Survey makes it clear that clients are on the move without concern for loyalty. The study reports that 77% of the participants terminate their relationships with at least one firm last year, while only 17% give their law firms an “A” grade and 87% rate their law firms’ efficiency as “low.”

An interesting new initiative to pinpoint law firms’ inefficiency has been made by D. Casey Flaherty, who has developed a basic technology competency audit that he administers to his outside counsel to show how the lack of proficiency with the common software tools at their disposal (Word, Excel, Acrobat, etc.) result in an inordinate amount of time wasted that is still billed to clients.

The fourth major driver for change on the legal market is the collaboration trend. Today’s business conditions has completely changed with the so-called ”sharing economy” and the new generation of ”Millenials”, as defined by inter alia futurist Michael Rogers. Michael Rogers talks about the future of the legal industry as an era inspired by the Millenials; those who do not consider themselves limited to meeting people in their neighborhood but instead create relationships with people all around the world, based on interests instead of locality. In this era, new business is gained through referrals and by creating relationships through social technologies, collaboration and providing information for free. This ”freemium” trend has also been noted for legal professionals in the American Bar Association’s Legal Technology Survey Report, where 56% of the respondents use a free online source for their legal research.

Clients are becoming more and more aware of the collaboration advantage. There are more and more legal collaboration portals available, such as the Association of Corporate Counsel that provides templates and other legal documents to its members, and Legal OnRamp, a collaboration system for in-house counsel and invited outside lawyers and third party service providers. Another interesting collaboration example is Pfizer Legal Alliance, a collaboration program for Pfizer’s outside counsel, which makes them work more closely and collaboratively both with Pfizer and with each other using standardized fixed-fee billing arrangements. Richard Susskind also talks about the “collaboration strategy” for law firms, where clients can come together and share the costs of certain types of legal service, as well as collaboration projects between law firms and clients, by online closed communities for collaboration, online legal services, automated drafting and electronic legal marketplaces. Although some lawyers might find this controversial, such collaboration has already started to take place: six major banks and the law firm Allen & Overyhave created a joint online legal risk management tool.

2. NEW STRATEGIC APPROACH TO KNOWLEDGE MANAGEMENT AND EFFICIENT WORKING PROCESSES

Analysing what is happening on the legal market with the new trends with LPOs, new legal service providers, virtual law firms and the increased collaboration and knowledge sharing within legal networks, you can see that clients are becoming more and more aware of new tools and processes and will start demanding their lawyers to adapt to the new technology to become more efficient. Law firms, therefore, have to review the value of their services and the use of technology to streamline processes and take better advantage of a firm’s accumulated knowledge to ensure better service than their competitors.

For the first time in legal history, there is now a true incentive for law firms to deliver results faster, through the right combination of internal and external resources and the better use of IT as a competitive edge.

This means that law firms have to take a new strategic approach to knowledge management as a business development tool, a way of delivering the changes and innovation that will help law firms to survive and thrive in today’s dynamic and uncertain business and professional landscape.

Furthermore, clients are no longer depending upon their law firms to receive standardized legal documents, since there are several sites with online legal documents easily available, often for free, as well as collaboration portals. Law firms have lost control over the legal documents they earlier considered the “crown jewels” of the firm. It also means that demanding and skilled clients, like in-house counsels, have easy access to more affordable legal resources and are becoming less willing to pay high fees for some of the work done by junior associates.

Law firms, therefore, have to rethink their view of these legal documents and realize they are only the basis for their legal service and are already easy available for the clients. Instead, they have to look closer at how to better share knowledge from their experience, better re-use documents they have developed, standardize more routine work, and to analyse their most valuable knowledge in order to leverage it to fully support their clients.

The American Lawyers’ survey reported recently that law firms seem to have realised this need for a more strategic approach to knowledge management (KM) and that firms were pushing for greater efficiency in their internal operations. Nearly half of the 200 responding law firms said they had aligned partner compensation with a willingness to cooperate in new initiatives, such as knowledge management. Mary Abraham has discussed the impact of this report in “Guiding Partners to Better Law Firm KM” and how KM professionals best should take advantage of this windfall by avoiding the traditional precedent collection projects or model-document drafting projects and instead focus on high-impact KM activities. This means investing in the KM projects that will provide the greatest return on investment for the firm. This also means that legal knowledge management is transforming, from the previously dominating precedent and knowledge-base building, to focus on problem solving and business development. Legal KM today is something very different from legal KM in its early days. Ron Friedmann has provided his inside insights on this transition in “The Evolution of KM from Content to Tools to True Productivity”:

“In the 1990s, we talked about work product retrieval and precedents. That continued into the new century until we finally realised how hard it is to find work products and to write, maintain, and organize precedents. Moreover, we also realised that content is not enough. We broadened our focus to finding experienced lawyers and finding relevant matters. /…/ More recently, KM has shifted again. Many KM professionals today focus on legal project management, alternative pricing arrangements and process improvement. In my view, this reflects more a discontinuity or abrupt shift than evolution. Legal KM sees the light: content is not an end. Even software is only a means to an end. The real end, the real goal, perhaps the Holy Grail, is improving lawyer productivity; is solving real problem.”

Michael Koenig also points to how “legal KM has its roots in helping attorneys practice more efficiently and effectively, by drawing on colleagues’ prior work product and through sharing information, expertise, and documents within the firm. Historically, much of this sharing happened without colleagues realizing it—KM was at work behind the scenes finding and organizing resources created by individual attorneys and providing searchable, efficient access to that product to all attorneys.” But when “combining strategic development of template or master resources with document automation, KM can shift attorneys from the ancient practice of search/save as/edit to web-based questionnaires that generate a customized “best practice” final document, at a fraction of the time and cost it would take to start from scratch and without the propensity for errors inherent in editing an older document.”

What is really interesting today is that not only legal KM professionals sees this “Holy Grail of legal KM” that Ron Friedmann refers to, but also recent developments in the legal publishing world prove that legal publishers are on the same path; e.g. Rocket Lawyers acquisition of LawPivot and Thomson Reuters’ launch of client-centric platforms. Thus both legal KM professionals and legal publishers seem to agree that it is not enough to provide information, work products or precedents. Instead, focus is on supporting lawyers to improve the way they work and serve clients, and ultimately to improve how law firms operate as businesses.

In “Is KM a Real Force Multiplier?” Mary Abraham explained how KM needs to improve productivity and problem solving and how “the key to force multiplications is not to settle for incremental improvements but to aim for dramatically improved results”. With such a new focus, the Holy Grail of legal knowledge management appears to be within reach — where the goal of KM is to provide true competitive advantage by developing a combination of tools and content to improve lawyer productivity, solve real problems and make the business more profitable.

By using IT in the right way, the possibilities of finding relevant information will be substantially improved and the internal knowledge sharing will be leveraged, since previous lessons learned, best practices and new ways of solving problems can be better shared and taken advantage of by all lawyers. Through these methods, substantial efficiency improvements and increased profitability can be reached. Developments in information technology will enhance the efficiency of legal work, not only by the use of standard documents like templates and checklists, but also by proceduralized processes and automated workflows. Systematization can also extend, however, to the actual drafting of documents by the use of document assembly technology. By implementing automated document production to support standardization, firms will be able to deliver the same quality legal services and still maintain profit margins regardless of fee structure.

Richard Susskind predicts this to be the future of legal service: “These systems, which can be used within legal businesses or made available online, are disruptive for lawyers who charge for their time, because they enable documents to be generated in minutes whereas, in the past, they would have taken many hours to craft. The end result is a tailored solution, delivered by an advanced system rather than by a human craftsman. That is the future of legal service.”

3. CONCLUSION

With a new approach to knowledge management as a management issue for the whole business, with embraced technology and new approaches to standardization by using document assembly tools and buying basic documents externally, substantial improvements in efficiency and increased profitability can be reached. Law firms will prosper by finding new legal services to offer their clients. New business opportunities will arise in the provision of services to fixed prices by the use of new specialized and individualized solutions for clients.

Helena Hallgarn and Ann Björk

Helena Hallgarn and Ann Björk are founders of Virtual Intelligence VQ, a Swedish consultancy firm that combines the practice of law with IT and knowledge management skills. They are two of the most experienced knowledge management professionals in Scandinavia, with backgrounds from legal practice and KM work at Scandinavian law firms Vinge, Mannheimer Swartling and Gernandt & Danielsson. Their focus is to strategically develop legal KM and drive innovation in the legal profession.

Helena and Ann blog at Legal Innovation Blog and manage the LinkedIn-discussion group Legal Innovation. Each year they also arrange VQ Forum with a focus on the most interesting ongoing discussions worldwide on strategy, leadership, innovation, technology and knowledge management for the legal sector. Helena and Ann can also be found as @VQab on Twitter.

VoxPopuLII is edited by Judith Pratt. Editors-in-Chief are Stephanie Davidson and Christine Kirchberger, to whom queries should be directed.

Next Generation Legal Search - It's Already Here

information retrieval, legal research, search, software innovation, User behavior information 4 Responses »

Mar 282013

[Editor’s Note: We are pleased to publish this piece from Qiang Lu and Jack Conrad, both of whom worked with Thomson Reuters R&D on the WestlawNext research team. Jack Conrad continues to work with Thomson Reuters, though currently on loan to the Catalyst Lab at Thomson Reuters Global Resources in Switzerland. Qiang Lu is now based at Kore Federal in the Washington, D.C. area. We read with interest their 2012 paper from the International Conference on Knowledge Engineering and Ontology Development (KEOD), “Bringing order to legal documents: An issue-based recommendation system via cluster association”, and are grateful that they have agreed to offer some system-specific context for their work in this area. Their current contribution represents a practical description of the advances that have been made between the initial and current versions of Westlaw, and what differentiates a contemporary legal search engine from its predecessors. -sd]

In her blog on “Pushing the Envelope: Innovation in Legal Search” (2009) [1], Edinburgh Informatics Ph.D. candidate K. Tamsin Maxwell presents her perspective of the state of legal search at the time. The variations of legal information retrieval (IR) that she reviews − everything from natural language search (e.g., vector space models, Bayesian inference net models, and language models) to NLP and term weighting − refer to techniques that are now 10, 15, even 20 years old. She also refers to the release of the first natural language legal search engine by West back in 1993−WIN (Westlaw Is Natural) [2]. Adding to this on-going conversation about legal search, we would like to check back in, a full 20 years after the release of that first natural language legal search engine. The objective we hope to achieve in this posting is to provide a useful overview of state-of-the-art legal search today.

What Maxwell’s article could not have predicted, even five years ago, are some of the chief factors that distinguish state-of-the-art search engines today from their earlier counterparts. One of the most notable distinctions is that unlike their predecessors, contemporary search engines, including today’s state-of-the-art legal search engine, WestlawNext , separate the function of document retrieval from document ranking. Whereas the first retrieval function primarily addresses recall, ensuring that all potentially relevant documents are retrieved, the second and ensuing function focuses on the ideal ranking of those results, addressing precision at the highest ranks. By contrast, search engines of the past effectively treated these two search functions as one and the same. So what is the difference? Whereas the document retrieval piece may not be dramatically different from what it was when WIN was first released in 1993, what is dramatically different lies in the evidence that is considered in the ranking piece, which allows potentially dozens of weighted features to be taken into account and tracked as part of the optimal ranking process.

Figure 1. The set of evidence (views) that can be used by modern legal search engines.

In traditional search, the principal evidence considered was the main text of the document in question. In the case of traditional legal search, those documents would be cases, briefs, statutes, regulations, law reviews and other forms of primary and secondary (a.k.a. analytical) legal publications. This textual set of evidence can be termed the document view of the world. In the case of legal search engines like Westlaw, there also exists the ability to exploit expert-generated annotations or metadata. These annotations come in the form of attorney-editor generated synopses, points of law (a.k.a. headnotes), and attorney-classifier assigned topical classifications that rely on a legal taxonomy such as West’s Key Number System [3]. The set of evidence based on such metadata can be termed the annotation view. Furthermore, in a manner loosely analogous to today’s World Wide Web and the lattice of inter-referencing documents that reside there, today’s legal search can also exploit the multiplicity of both out-bound (cited) sources and in-bound (citing) sources with respect to a document in question, and, frequently, the granularity of these citations is not merely at a document-level but at the sub-document or topic level. Such a set of evidence can be termed the citation network view. More sophisticated engines can examine not only the popularity of a given cited or citing document based on the citation frequency, but also the polarity and scope of the arguments they wager as well.

In addition to the “views” described thus far, a modern search engine can also harness what has come to be known as aggregated user behavior. While individual users and their individual behavior are not considered, in instances where there is sufficient accumulated evidence, the search function can consider document popularity thanks to a user view. That is to say, in addition to a document being returned in a result set for a certain kind of query, the search provider can also tabulate how often a given document was opened for viewing, how often it was printed, or how often it was checked for its legal validity (e.g., through citator services such as KeyCite [4]). (See Figure 1) This form of marshaling and weighting of evidence only scratches the surface, for one can also track evidence between two documents within the same research session, e.g., noting that when one highly relevant document appears in result sets for a given query-type, another document typically appears in the same result sets. In summary, such a user view represents a rich and powerful additional means of leveraging document relevance as indicated through professional user interactions with legal corpora such as those mentioned above.

It is also worth noting that today’s search engines may factor in a user’s preferences, for example, by knowing what jurisdiction a particular attorney-user practices in, and what kinds of sources that user has historically preferred, over time and across numerous result sets.

While the materials or data relied upon in the document view and citation network view are authored by judges, law clerks, legislators, attorneys and law professors, the summary data present in the annotation view is produced by attorney-editors. By contrast, the aggregated user behavior data represented in the user view is produced by the professional researchers who interact with the retrieval system. The result of this rich and diverse set of views is that the power and effectiveness of a modern legal search engine comes not only from its underlying technology but also from the collective intelligence of all of the domain expertise represented in the generation of its data (documents) and metadata (citations, annotations, popularity and interaction information). Thus, the legal search engine offered by WestlawNext (WLN) represents an optimal blend of advanced artificial intelligence techniques and human expertise [5].

Given this wealth of diverse material representing various forms of relevance information and tractable connections between queries and documents, the ranking function executed by modern legal search engines can be optimized through a series of training rounds that “teach” the machine what forms of evidence make the greatest contribution for certain types of queries and available documents, along with their associated content and metadata. In other words, the re-ranking portion of the machine learns how to weigh the “features” representing this evidence in a manner that will produce the best (i.e., highest precision) ranking of the documents retrieved.

Nevertheless, a search engine is still highly influenced by the user queries it has to process, and for some legal research questions, an independent set of documents grouped by legal issue would be a tremendous complementary resource for the legal researcher, one at least as effective as trying to assemble the set of relevant documents through a sequence of individual queries. For this reason, WLN offers in parallel a complement to search entitled “Related Materials” which in essence is a document recommendation mechanism. These materials are clustered around the primary, secondary and sometimes tertiary legal issues in the case under consideration.

Legal documents are complex and multi-topical in nature. By detecting the top-level legal issues underlying the original document and delivering recommended documents grouped according to these issues, a modern legal search engine can provide a more effective research experience to a user when providing such comprehensive coverage [6,7]. Illustrations of some of the approaches to generating such related material are discussed below.

Take, for example, an attorney who is running a set of queries that seeks to identify a group of relevant documents involving “attractive nuisance” for a party that witnessed a child nearly drowned in a swimming pool. After a number of attempts using several different key terms in her queries, the attorney selects the “Related Materials” option that subsequently provides access to the spectrum of “attractive nuisance”-related documents. Such sets of issue-based documents can represent a mother lode of relevant materials. In this instance, pursuing this navigational path rather than a query-based one turns out to be a good choice. Indeed, the query-based approach could take time and would lead to a gradually evolving set of relevant documents. By contrast, harnessing the cluster of documents produced for “attractive nuisance” may turn out to be the most efficient approach to total recall and the desired degree of relevance.

To further illustrate the benefit of a modern legal search engine, we will conclude our discussion with an instructive search using WestlawNext, and its subsequent exploration by way of this recommendation resource available through “Related Materials.”

The underlying legal issue in this example is “church support for specific candidates”, and a corresponding query is issued in the search box. Figure 2 provides an illustration of the top cases retrieved.

Figure 2: Search result from WestlawNext

Let’s assume that the user decides to closely examine the first case. By clicking the link to the document, the content of the case is rendered, as in Figure 3. Note that on the right-hand side of the panel, the major legal issues of the case “Canyon Ferry Road Baptist Church … v. Unsworth” have been automatically identified and presented with hierarchically structured labels, such as “Freedom of Speech / State Regulation of Campaign Speech” and “Freedom of Speech / View of Federal Election Campaign Act / Definition of Political Committee,” … By presenting these closely related topics, a user is empowered with the ability to dive deep into the relevant cases and other relevant documents without explicitly crafting any additional or refined queries.

Figure 3: A view of a case and complementary materials from WestlawNext

By selecting these sets of relevant topics, a set of recommended cases will be rendered under that particular label. Figure 4, for example, shows the related topic view of the case under the label of “Freedom of Speech / View of Federal Election Campaign Act / Definition of Political Committee.” Note that this process can be repeated based on the particular needs of a user, starting with a document in the original results set.

Figure 4: Related Topic view of a case

In summary, by utilizing the combination of human expert-generated resources and sophisticated machine-learning algorithms, modern legal search engines bring the legal research experience to an unprecedented and powerful new level. For those seeking the next generation in legal search, it’s no longer on the horizon. It’s already here.

References

[1] K. Tamsin Maxwell, “Pushing the Envelope: Innovation in Legal Search,” in VoxPopuLII, Legal Information Institute, Cornell University Law School, 17 Sept. 2009. http://blog.law.cornell.edu/voxpop/2009/09/17/pushing-the-envelope-innovation-in-legal-search/
[2] Howard Turtle, “Natural Language vs. Boolean Query Evaluation: A Comparison of Retrieval Performance,” In Proceedings of the 17th Annual International ACM-SIGIR Conference on Research & Development in Information Retrieval (SIGIR 1994) (Dublin, Ireland), Springer-Verlag, London, pp. 212-220, 1994.
[3] West’s Key Number System: http://info.legalsolutions.thomsonreuters.com/pdf/wln2/L-374484.pdf
[4] West’s KeyCite Citator Service: http://info.legalsolutions.thomsonreuters.com/pdf/wln2/L-356347.pdf
[5] Peter Jackson and Khalid Al-Kofahi, “Human Expertise and Artificial Intelligence in Legal Search,” in Structuring of Legal Semantics, A. Geist, C. R. Brunschwig, F. Lachmayer, G. Schefbeck Eds., Festschrift ed. for Erich Schweighofer, Editions Weblaw, Bern, pp. 417-427, 2011.
[6] On Cluster definition and population: Qiang Lu, Jack G. Conrad, Khalid Al-Kofahi, William Keenan, “Legal Document Clustering with Build-in Topic Segmentation,” In Proceedings of the 2011 ACM-CIKM Twentieth International Conference on Information and Knowledge Management (CIKM 2011)(Glasgow, Scotland), ACM Press, pp. 383-392, 2011.
[7] On Cluster association with individual documents: Qiang Lu and Jack G. Conrad, “Bringing order to legal documents: An Issue-based Recommendation System via Cluster Association,” In Proceedings of the 4th International Conference on Knowledge Engineering and Ontology Development (KEOD 2012) (Barcelona, Spain), SciTePress DL, pp. 76-88, 2012.

Jack G. Conrad currently serves as Lead Research Scientist with the Catalyst Lab at Thomson Reuters Global Resources in Baar, Switzerland. He was formerly a Senior Research Scientist with the Thomson Reuters Corporate Research & Development department. His research areas fall under a broad spectrum of Information Retrieval, Data Mining and NLP topics. Some of these include e-Discovery, document clustering and deduplication for knowledge management systems. Jack has researched and implemented key components for WestlawNext, West‘s next-generation legal search engine, and PeopleMap, a very large scale Public Record aggregation system. Jack completed his graduate studies in Computer Science at the University of Massachusetts–Amherst and in Linguistics at the University of British Columbia–Vancouver.

Qiang Lu was a Senior Research Scientist with Thomson Reuters Corporate Research & Development department. His research interests include data mining, text mining, information retrieval, and machine learning. He has extensive experience of applying various NLP technologies in various data sources, such as news, legal, financial, and law enforcement data. Qiang was a key member of WestlawNext research team. He has a Ph.D. in computer science and engineering from State University of New York at Buffalo. He is now a managing associate at Kore Federal in Washington D.C. area.

VoxPopuLII is edited by Judith Pratt. Editors-in-Chief are Stephanie Davidson and Christine Kirchberger, to whom queries should be directed.

Environmentally-Friendly Citations

commercial systems, Legal citation, Legal citations, Legal descriptive metadata, Legal informatics, Legal knowledge representation, Legal metadata, legal research, Standards 9 Responses »

Mar 012010

Today in Canada, nearly three quarters of citations to recent case law use the neutral citation – an industry-independent, open identifier assigned by courts to their decisions. When we call something a “game-changer” most people assume that it was invented by Apple. Yet even though the neutral citation was not, it definitely is a game-changer in the legal publishing business. Here are some thoughts about why.

Cited and Citing Cases

Legal publishing would be much simpler if cases did not cite other cases or all sorts of other legal documents. However, in that event the law would be far less intelligible. The citations between legal documents help establish a coherent body of law. The interpretation of cases and statutes in their surrounding context of citing and cited legal documents is crucial in legal practice. It is often considered prudent to wait until the courts have their say on a freshly enacted statute before relying blindly on it. And no lawyer would bring up a “dynamite” case in court before carefully checking to see how this case has been treated by other case law.

Indeed, all legal publishers try in various ways to exploit the relations between legal documents in order to stand out in the eyes of their customers. Many features of the electronic publishing systems are based in some way on the relations between documents – hyperlinking, note-up, lists of cases and statutes considered, related cases, judicial history and treatment, search results ranking based on popularity, etc. The use of citation data has defined legal publishing for many years. Any major change in how things are cited in law will continue to have a great impact on legal publishing in the future.

Originally, citation data was neither intended nor designed to yield itself easily to computer processing, let alone free online publishing.

The Chinese Walls Around Print Report Series

The problem is well known. Print report citations were not designed to function outside the context of the report series they belong to. For example, “301 D.L.R. (4th) 513” does not mean anything to you if you don’t have the Dominion Law Reports nearby. Even if you were lucky enough to have the series in your firm’s library, you could not safely cite a case by, for example, 301 D.L.R. (4th) 513 and expect all your readers to understand what you are saying unless you assume that all your readers have the Dominion Law Reports in their libraries. This issue is amplified by the number of print law reports. In Canada, there are 70 major law report series according to the Queen’s University Law Library. (Although some from the list may have disappeared since the list was last updated, the number of report series remains large.)

To cope with this reality, the legal publishing world came up with citators, and in particular, their ability to offer and make use of parallel citations. Citators can tell you, among other things, what possible identifiers (citations) have been assigned to a particular case. For example, the following list – [2009] 1 S.C.R. 181; 301 D.L.R. (4th) 513; [2009] 2 W.W.R. 385; 183 C.R.R. (2d) 1; 320 Sask. R. 305 – means that the case Ravndahl v. Saskatchewan can possibly be identified by any the citations included in the list.

Commercial Electronic Databases Are Not Better

Nobody will dispute the fact that, for all practical purposes, electronic sources are the research tool. Printed reports will wither and gradually disappear. Those that remain because of their official status will be used, not for research, but only as the recognized source of citable law. Many if not most legal researchers will even affirm that this is already the de facto situation.

It is worthwhile to analyze what will happen in that new electronic context. Citations will be based on database identifiers. In Canada, such citations will take the following form: “[1998] O.J. No. 2515 (QL)”. In some ways, such a citation is leading to the same old problems discussed earlier in this post: those of proprietary citations. However, if the inconvenience of having to check in a specific book to know what was cited was annoying, citations consisting of commercial database identifiers create a much more serious problem. To get the cited material, whereas in the print environment the researchers had to take the time to visit the library, in the digital environment they must subscribe to the commercial database. In the era of the Internet, any type of proprietary citations could seriously threaten the legal information system.

The Free Law Publisher’s Initial Approach

One way of dealing with Chinese Walls is to live with them, and use one’s wits to figure out what is on the other side.

In 2004, CanLII was striving to be recognized as a legal information product that could successfully serve the everyday research needs of legal professionals. The idea emerged to develop a citator in order to improve hyperlinking — and a series of other cool features — using the relations between legal documents.

Building a citator is an expensive operation. Here is a brief outline of the manual and automated methodology mix that was employed to build Reflex, CanLII’s citator.

a law library

1. An editor keys in information about all cases published in a particular report series, for example, the Dominion Law Reports. Such information includes the case name, the docket number, the issuing court, the date of the decision, a very short excerpt from the case, and the report citation.

2. This operation results in many records like the following one:

Record 1
Case name: Ravndahl v. Saskatchewan
Docket: 32225
Date: 2009-01-29
Court: SCC
Excerpt: The appellant lost…
Citation: 301 D.L.R. (4th) 513

3. Another editor keys in the same information about all cases published in another report series, for example, the Western Weekly Reports, producing records like this one:

Record 2
Case name: Ravndahl v. Saskatchewan
Docket: 32225
Date: 2009-01-29
Court: SCC
Excerpt: The appellant lost…
Citation: [2009] 2 W.W.R. 385

301 D.L.R. (4th) 513

[2009] 2 W.W.R. 385

5. The operation is repeated for 35 report series on an ongoing basis.

Of course, in practice the exercise is much more complicated, as the software has to deal with various degrees of similarities of metadata; for example, almost identical case names (R. v. Smith and The Queen v. Smith). The program can also encounter other misfortunes, such as the absence of docket numbers or dates.

With this approach, CanLII was able to expand significantly the breadth of hyperlinking within legal documents, and all features based on the hypertext, such as sorting search results by number of citations, providing lists of related cases, and a few more. Because Reflex is able to resolve citations to cases that are unavailable on CanLII, the number of citations was used also as an indicator as to what are the most important cases missing from CanLII, or, in other words, where to start from if we want to scan cases from paper and publish them on CanLII.

As one of the founding members of the free-access-to-law movement, CanLII may have revolutionized the way Canadian law was made accessible, but the print was still a ubiquitous part of our publishing routine.

Another Way of Dealing With the Chinese Walls…

… is to simply destroy them. Before CanLII’s arrival on the legal information playground, LexUM, in collaboration with representatives from the judiciary, law librarians, court staff, IT consultants, and several forward-looking individuals from the commercial publishing circles, had set up the Canadian Citation Committee. The CCC is an ad hoc group formed to support the standardization efforts of the Judges Technology Advisory Committee (JTAC) of the Canadian Judicial Council (CJC).

The CCC designed and promotes several documentary standards, among them the neutral citation. The neutral citation was proposed as a unique, industry-independent identifier, assigned to a case by the court. It is formed in a simple way: by the year of the case, an acronym for the issuing court, and a serial number. For example, 2009 SCC 7 designates the case of the Supreme Court of Canada Ravndahl v. Saskatchewan, released in 2009.

Simple, open, Internet-friendly, environmentally-caring, promising: such is the neutral citation.

Who’s on Board?

Courts have gradually been adopting the neutral citation, beginning in 1999 and continuing to the present. The first adopter of the neutral citation was the Superior Courts of British Columbia. Today, all 50 Canadian courts follow the neutral citation standard. The last one to join — just this year, in fact — was the Ontario Superior Court of Justice – the toughest jurisdiction in the country (from a judicial administration and legal publishing point of view) because of the complexity of its judicial structure. As a result, all 50,000 cases issued annually by Canadian appellate, superior, and trial courts now bear neutral citations that have been assigned by the courts. To that number, we must add the decisions rendered by at least two dozen administrative tribunals which have also adopted the standard.

Probably a more important question than “Who’s on board?” is: Why are those institutions on board? Before embracing a change, people often need at least one ideological reason and at least one practical reason. On philosophical (and economic) grounds, it certainly made sense for court decisions to be freed from proprietary citation schemes. From a practical point of view, the most convincing argument was the convenience for the court to have a unique designation of its own decision at the very moment the reasons of the decision are issued.

Are Lawyers and Judges Following?

If you read carefully the first paragraph of this post, you know that the answer is yes. Lawyers and judges do cite cases using the neutral citation. They use neutral citations much more frequently than one may think.

Let’s bring in some data. On CanLII, case citations are hyperlinked if the citation comes from one of the 35 reports covered by CanLII’s citator or if the citation is a neutral citation. This allows for a citation resolution success rate of about 80%. This means that 80% of case citations on CanLII are hyperlinked. The rest, many of which are citations to proprietary commercial databases, are not.

In this context, it was tempting to verify the portion of the links attributable to the neutral citation. Or in other words, what is the percentage of case citations that contain the neutral citation – alone or among other parallel citations?

So we examined two sets of citations. The first one contained 40,000 citations of cases released in 2006, 2007 and 2008. The second one included 41,000 citations of cases released in 2007, 2008 and 2009.

The count showed the following. In data set 1 (citations pointing to cases released in 2006, 2007 and 2008), 85% of hyperlinked citations are, or contain, a neutral citation. In data set 2 (citations to cases released in 2007, 2008 and 2009), the neutral citation accounts for 91% of the links.

Data Set #1
40,000 citations
Citing cases released in 2008
Cited cases released in 2006, 2007, 2008
Links based on neutral citations: 85%
Share of all citations that are or contain neutral citations: 68%

Data Set #2
41,000 citations
Citing cases released in 2009
Cited cases released in 2007, 2008, 2009
Links based on neutral citations: 91%
Share of all citations that are or contain neutral citations: 73%

Needless to say, both the numbers and the progression look exciting. This, of course, is not the last reason we need before sending the print reports sailing into history.

It is just one more.

Ivan Mokanov is Deputy Director of LexUM. He oversees LexUM’s publishing and development activities and supervises various consulting and research projects in Canada and abroad. As a member of LexUM’s Executive Committee, he participates in LexUM’s administration and business development. Ivan is a graduate from Sofia University (B.C.L.) and the University of Montreal (LL.M.), and he is currently enrolled at HEC Montreal (M.B.A).

Duopolies, web usability, and legal research instruction

digital law, information retrieval, Law librarians, legal research 8 Responses »

Nov 192009

Kangaroo Boxing It’s been a rocky year for West’s relationship with law librarians.

First, the company declined to participate in this year’s American Association of Law Libraries Price Index for Legal Publications. This led AALL to return West’s sponsorship check for the 2009 AALL Annual Meeting. For attendees, this decision was somewhat academic, as West still occupied a large space in the Exhibitor Hall and once again hosted a well-attended Customer Appreciation Party.

Shortly after the conference, West issued an email promotion to customers that asked:

Are you on a first name basis with the librarian? If so, chances are, you’re spending too much time at the library. What you need is fast, reliable research you can access right in your office.

Many law librarians felt publicly insulted by West, expressing their outrage on listservs, blogs, Twitter, Facebook and anywhere legal information professionals could be found that week.

Most recently, West released a video of University of California, Berkeley professor and law librarian Bob Berring explaining the advantages of “free market” premium legal databases over free legal information websites run by “volunteers:”

It’s not like legal information is going to the Safeway or to buy food. You’re not buying a packaged thing. If you say I need to find statutes about this, or what’s the administrative regulations on that, or have the courts spoken about this, you have to go find it. And just saying it’s all out there — I mean, the ocean is all out there, but you need a map, and you need a compass, and… you need a GPS system now. You need someone to tell you how to get there. That’s why librarians are even more important now, because they’ve got the GPS system. But you have to be working with organized information. The value added by folks like West, where the information is edited as it goes in, and it’s classified, and the hooks are put in — easy hooks for the people who I think are sloppy researchers just playing around on the tops, really sophisticated hooks for the people who take the time to learn how to really use the system and understand it. You just can’t say enough about those kind of things, because to say to the average person, “Well, it’s all out there, the law is all out there,” well, it’s a big bunch of goo.

Adding value to the goo

Unfortunately, the West/Lexis duopoly doesn’t provide consumers with the expected advantages of a free market economy. Neither vendor uses price as a marketing strategy, and both negotiate electronic database contracts with customers rather than charge a flat rate. Considering that West has increased its own annual profit margin to 30% or higher in recent years, while raising the cost of supplements at a rate far exceeding inflation, prices are hardly being driven by free market trends, making a price war seem unlikely. (This doesn’t mean consumers aren’t hopping mad about the price of legal information. They are.)

Instead, at least in the database market, both companies rely on content and features to market their products. Each July at the AALL Annual Meeting, both Lexis and Westlaw use their exhibitor space to educate attendees about whatever new databases and customer conveniences will be rolled out in the coming months.

Thomas Edison and car I often compare these annual feature introductions to the evolution of automobile engines, thanks to a childhood spent watching my father work on the family cars. At first Dad knew every nook and cranny of our vehicles, and there was little he couldn’t repair himself over the course of a few nights. As we traded in cars for newer models, his job became more difficult as engines became more complex. None of the automakers seemed to consider ease of access when adding new parts to an automobile engine. They were simply slapped on top of the existing ones, making it harder to perform simple tasks, like replacing belts or spark plugs.

Lexis and Westlaw also add new components on top of the old ones. To generalize, Lexis tends to add new features in the form of tabs (think “Total Litigator”) while Westlaw adds them in sidebars (think “Results Plus”), to the point where once clean interfaces are now littered with disparate elements sharing adjacent screen real estate.

Finding fault with filters

In a talk at last year’s Web 2.0 Expo in New York, author Clay Shirky stated that the fundamental information problem is not “information overload,” but “filter failure.” Shirky summarized this position in a recent interview with Yale Law School’s Jason Eiseman:

As I’ve often said, there’s no such thing as information overload. It’s filter failure, right? From the minute we had more books to read than the average literate person could read in a lifetime, which depending on the region you’re talking about happened someplace between the 16th and 19th century, from that moment on we’ve always had information overload. That’s the modern condition. What’s happening, I think, to our sense that we’re suffering acutely from information overload now is that the old professional filters have broken. They’re simply not adequate to contain a world in which anyone can put material out in the public.

Whether or not you agree with Shirky’s assessment, it provides an interesting framework with which to view the Lexis/Westlaw information problem. If the primary legal information within these systems are “a big bunch of goo,” then secondary resources, headnotes, subject-specific organization, and other finding aids are the filters necessary to cope with information overload.

For West’s “Are you on a first name basis with the librarian?” promotion to work, Westlaw has to provide the “fast, reliable research you can access right in your office” that it advertises. Assuming for purposes of this essay that the presence of relevant content isn’t an issue (an assumption with which many will quibble), this means the system’s filters need to provide reliable information quickly.

There’s no question that both West and Lexis provide an abundance of subject-specific organization, particularly for case law. Headnotes, topics, digests, tables of authority, citators and cross-references to secondary resources all go above and beyond what researchers find in most freely available resources. But these add-ons, or filters, are only effective if presented in a usable manner.

For an assignment in one of my legal research classes this semester, I provided a fact pattern and asked students to perform a Natural Language search in Westlaw of American Law Reports to find a relevant annotation. In a class of only 19 students, six of them answered with citations to resources other than ALR, including articles from American Jurisprudence, Am.Jur. Proof of Facts, and Shepards’ Causes of Action. The problem, it turned out, wasn’t that they had searched the wrong database. Every one of them searched ALR correctly, but those six students mistook Westlaw’s Results Plus, placed at the top of a sidebar on the results page, for their actual search results. Filter failure, indeed.

On another assignment, students were expected to find a particular statutory code section using a secondary resource, view the code section, then navigate to the code’s table of contents to browse related sections codified nearby. This proved nearly impossible for most of them, as the code section they accessed loaded in a pop-up window with no sidebar, thus providing no visible link to the table of contents. The problems didn’t stop there. Even once I told them to click the “Maximize” button at the bottom of the pop-up window, which reloads the code section into the main window with a sidebar, upon clicking the TOC link, anyone using Firefox for Windows loaded a blank page. (To resolve this error, you have to right-click on the frame where the TOC should’ve loaded and select “This Frame -> Reload This Frame.”)

While completing another portion of the statutory code assignment in Lexis, nearly half the students in the class became confused because numerous clickable links throughout the system display as plain black text which only appear as links when the user hovers over them. Also, within statutory code sections, the navigation links provided within the case annotation index routinely loaded an error page rather than navigating to the proper section further down the page.

This doesn’t even address basic usability issues such as broken back button functionality, heavy usage of frames, lack of permanent document URLs (Lexis and Westlaw each have external workarounds for this), and reliance on pop-up windows (something blocked by default on most browsers). In addition, Lexis still doesn’t support users accessing the system with Firefox for Mac.

The wide availability of secondary resources, annotated codes, and numerous other value-added content provides a clear advantage for Lexis and Westlaw over free and mid-level legal information services, and that’s why everyone continues to pay their steep prices. But so long as the systems themselves don’t provide usable access, each still suffers from filter failure.

Is there an incentive to improve?

VAB Under Construction There is evidence that the companies have the expertise to provide a better user experience. West has two electronic versions (one for desktop computers and one for the iPhone) of Black’s Law Dictionary available that offer more intuitive functionality than what’s provided for the same text in Westlaw. Don’t expect a price break, however. The desktop version of Black’s has a list price of $99, while the iPhone version costs $49.99. By comparison, the print version of Black’s Standard Ninth Edition, which likely has substantially higher production costs than the electronic equivalents, carries a list price of $75, meaning iPhone users receive a slightly lower price while desktop users pay even more. Worse still, both electronic versions as well as the content in Westlaw contain the text of the outdated 8th Edition.

Lexis also has an iPhone app, and it’s a free download that requires an existing Lexis password to function. Substantially simplified from its traditional web interface, the user experience is clean and easy to understand. Yet while one can retrieve both primary and secondary documents, as well as Shepardize documents, none of the documents in this interface contain links, only plain citations that must be copied and pasted into the search form to be retrieved.

Of course, the bigger problem with these progressive moves is that they don’t address any of the existing problems in the web interfaces for either product. No one is redesigning the engine, so to speak. These are simply variations of the now traditional roll-out of new features and functionality on top of existing ones that still have the same significant issues.

This is the problem with a duopoly. There aren’t enough producers in the economy to assert significant pressure on either to improve usability. Consumer power is also limited because multi-year contracts prevent easy product substitution, and there’s only one true product substitute available. The producers dictate the competition, and thus far they have dictated a content competition (“The Tabs and Sidebars War”), rather than a usability one — or even a price one.

There are events on the horizon that could impact this stalemate. Bloomberg continues to develop its own legal research product, allegedly designed to be a Westlaw/Lexis competitor. Perhaps this third producer will see value in using price or usability to gain market share. Lewis & Clark law student (and VoxPopuLII author) Robb Shecter recently introduced OregonLaws.org, a free repository of Oregon law that currently features the entire Oregon Revised Statutes and a legal glossary. The site’s simple, logical navigation reflects current web usability norms more accurately than either Lexis or Westlaw, and for a “micro-fee” users can bookmark code sections for quick access and save unlimited “human readable” research trails. And, of course, Google Scholar just added “Legal opinions and journals.” It’s far too early to know if it will become a true player in legal information, but Google always has the potential to be a game changer with anything it does.

What can legal research instructors DO?

Despite the presence of these interesting new projects, consumers can’t expect a quick usability turnaround from Lexis and Westlaw, nor the sudden presence of a competitor with the same depth and breadth of content. History doesn’t support such an expectation, leaving legal research instructors in a precarious position.

Many schools leave Lexis/Westlaw training solely in the hands of the companies’ representatives. While a company rep will be knowledgeable about the system, he will also paint the product in the best possible light for the company, glossing over usability issues and emphasizing new features. After all, law students are future customers, so this instruction is part of a long-term sales pitch.

In order to provide a balanced picture of these systems, legal research instructors need to provide their own Lexis and Westlaw training. This can either be in place of or in addition to what’s provided by company reps, but students need to hear the voice of an experienced researcher who doesn’t rely on either company for a paycheck. Some may see this as an implied institutional endorsement of the high-priced systems, but the reality is most students will end up working with one or both of these systems on a daily basis after graduation. Ignoring this would be an educational disservice. Any sense of endorsement can be addressed through thorough coverage of the usability limitations and a short education on the price realities. Instructors can also discuss the availability of lower priced databases for lawyers who simply want access to primary legal materials.

If the market is going to change, it won’t be because Lexis and Westlaw spontaneously decide to improve products that generate significant profits already. Until then, legal researchers need to be better educated on the limitations of these systems so that their work product isn’t compromised by over-reliance on a duopoly disguised as a free market.

Tom Boone is a reference librarian and adjunct professor at Loyola Law School in Los Angeles. He’s also webmaster and a contributing editor for Henderson Valley Eggs, a “themed information collective” website covering law library issues.

VoxPopuLII is edited by Judith Pratt

The Recipe for Better Legal Information Services

comparative, digital law, legal research 7 Responses »

Aug 122009

A new style of legal research

An attorney/author in Baltimore is writing an article about state bans of teachers’ religious clothing. She finds one of the tersely written statutes online. The website then does a query of its own and tells her about a useful statute she wasn’t aware of—one setting out the permitted disciplinary actions. When she views it, the site makes the connection clear by showing her the where the second statute references the original. This new information makes her article’s thesis stronger. Recipe card

Meanwhile, 2800 miles away in Oregon, a law student is researching the relationship between the civil and criminal state codes. Browsing a research site, he notices a pattern of civil laws making use of the criminal code, often to enact civil punishments or enable adverse actions. He then engages the website in an interactive text-based dialog, modifying his queries as he considers the previous results. He finally arrives at an interesting discovery: the offenses with the least additional civil burdens are white collar crimes.

A new kind of research system

A new field of computer-assisted legal research is emerging: one that encompasses research in both the academic and the practical “legal research” senses. The two scenarios above both took place earlier this year, enabled by the OregonLaws.org research system that I created and which typifies these new developments.

Interestingly, this kind of work is very recent; it’s distinct from previous uses of computers for researching the law and assisting with legal work. In the past, techniques drawn from computer science have been most often applied to areas such as document management, court administration, and inter-jurisdiction communication. Working to improve administrative systems’ efficiency, people have approached these problem domains through the development of common document formats and methods of data interchange.

The new trend, in contrast, looks in the opposite direction: divergently tackling new problems as opposed to convergently working towards a few focused goals. This organic type of development is occurring because programming and computer science research is vastly cheaper—and much more fun—than it has ever been in the past. Here are a couple of examples of this new trend:

“Computer Programming and the Law”

Law professor Paul Ohm recently wrote a proposal for a new “interdisciplinary research agenda” which he calls “Computer Programming and the Law.” (The law review article is itself also a functioning computer program, written in the literate programming style.) He envisions “researcher-programmers,” enabled by the steadily declining cost of computer-aided research, using computers in revolutionary ways for empirical legal scholarship. He illustrates four new methods for this kind of research: developing computer programs to “gather, create, visualize, and mine data” that can be found in diverse and far-flung sources.

“Computational Legal Studies”

Grad students Daniel Katz and Michael Bommarito (researcher-programmers, as Paul Ohm would call them) created the Computational Legal Studies Blog in March, 2009. The web site is a growing collection of visualization applied to diverse legal and policy issues. The site is part showcase for the authors’ own work and part catalog of the current work of others.

OregonLaws.org

I started the OregonLaws.org project because I wanted faster and and easier access to the 2007 Oregon Revised Statutes (ORS) and other primary and secondary sources. I had a couple of very statute-heavy courses (Wills & Trusts, and Criminal Law) and I frequently needed to quickly find an ORS section. But as I got further into the development, I realized that it could become a platform for experimenting with computational analysis of legal information, similar to the work being done on the Computational Legal Studies Blog.

I developed the system using pretty much the the steps that Paul Ohm discussed:

Gathering data: I downloaded and cleaned up the ORS source documents, converting them from MS Word/HTML to plain text;
Creating: I parsed the texts, creating a database model reflecting the taxonomy of the ORS: Volumes, Titles, Chapters, etc.;
Creating: I created higher-level database entities based on insights into the documents. For example, by modeling textual references between sections explicitly as reference objects which capture a relationship between a referrer and a referent, and;
Mining and Visualizing: Finally, I’ve begun making web-based views of these newly found objects and relationships.

The object database is the key to intelligent research

By taking the time to go through the steps listed above, powerful new features can be created. Below are some additions to the features described in the introductory scenarios:

We can search smarter. In a previous VoxPopulii post, Julie Jones advocates dropping our usual search methods, and applying techniques like subject-based indexing (a la Factiva’s) to legal content.

This is straightforward to implement with an object model. The Oregon Legislature created the ORS with a conceptual structure similar to most states: The actual content is found in Sections. These are grouped into Chapters, which are in turn grouped into Titles. I was impressed by the organization and the architecture that I was discovering—insights that are obscured by the ways statutes are traditionally presented.

And so I sought out ways to make use of the legislature’s efforts whenever it made sense. In the case of search results, the Title organization and naming were extremely useful. Each Section returned by the search engine “knows” what Chapter and Title it belongs to. A small piece of code can then calculate what Titles are represented in the results, and how frequently. The resulting bar graph doubles as an easy way for users to specify filtering by “subject area”. The screenshot above shows a search for forest.

The ORS’s framework of Volumes, Titles, and Chapters was essentially a tag cloud waiting to be discovered.

We can get better authentication. In another VoxPopulii post, John Joergensen discussed the need for authentication of digital resources. One aspect of this is showing the user the chain of custody from the original source to the current presentation. His ideas about using digital signatures are excellent: a scenario of being able to verify an electronic document’s legitimacy with complete assurance.

We can get a good start towards this goal by explicitly modeling content sources. A source is given attributes for everything we’d want to know to create a citation; date last accessed, URL available at, etc. Every content object in the database is linked to one of these source objects. Now, every time we display a document, we can create properly formatted citations to the original sources.

The gather/create/mine/visualize and object-based approaches open up so many new possibilities, they can’t all be discussed in one short article. It sometimes seems that each new step taken enables previously unforeseen features. A few these others are new documents created by re-sorting and aggregating content, web service APIs, and extra annotations that enhance clarity. I believe that in the end, the biggest accomplishment of projects like this will be to raise our expectations for electronic legal research services, increase their quality, and lower their cost.

Robb Shecter is a software engineer and third year law student at Lewis & Clark Law School in Portland, Oregon. He is Managing Editor for the Animal Law Review, plays jazz bass, and has published articles in Linux Journal, Dr. Dobbs Journal, and Java Report.

VoxPopuLII is edited by Judith Pratt.

Suffusion theme by Sayontan Sinha

VoxPopuLII

Legal Research Ontology, Part II

Building a Legal Research Ontology

Next Generation Legal Search - It's Already Here

Environmentally-Friendly Citations