To take the words of Walt Whitman, when it comes to improving legal information retrieval (IR), lawyers, legal librarians and informaticians are all to some extent, “Both in and out of the game, and watching and wondering at it“. The reason is that each group holds only a piece of the solution to the puzzle, and as pointed out in an earlier post, they’re talking past each other.
In addition, there appears to be a conservative contingent in each group who actively hinder the kind of cooperation that could generate real progress: lawyers do not always take up technical advances when they are made available, thus discouraging further research, legal librarians cling to indexing when all modern search technologies use free-text search, and informaticians are frequently impatient with, and misunderstand, the needs of legal professionals.
What’s holding progress back?
At root, what has held legal IR back may be the lack of cross-training of professionals in law and informatics, although I’m impressed with the open-mindedness I observe at law and artificial intelligence conferences, and there are some who are breaking out of their comfort zone and neat disciplinary boundaries to address issues in legal informatics.
I recently came back from a visit to the National Institute of Informatics in Japan where I met Ken Satoh, a logician who, late in his professional career, has just graduated from law school. This is not just hard work. I believe it takes a great deal of character for a seasoned academic to maintain students’ respect when they are his seniors in a secondary degree. But the end result is worth it: a lab with an exemplary balance of lawyers and computer scientists, experience and enthusiasm, pulling side-by-side.
Still, I occasionally get the feeling we’re all hoping for some sort of miracle to deliver us from the current predicament posed by the explosion of legal information. Legal professionals hope to be saved by technical wizardry, and informaticians like myself are waiting for data provision, methodical legal feedback on system requirements and performance, and in some cases research funding. In other words, we all want someone other than ourselves to get the ball rolling.
The need to evaluate
Take for example, the lack of large corpora for study, which is one of the biggest stumbling blocks in informatics. Both IR and natural language processing (NLP) currently thrive on experimentation with vast amounts of data, which is used in statistical processing. More data means better statistical estimates and the fewer `guesses’ at relevant probabilities. Even commercial legal case retrieval systems, which give the appearance of being Boolean, use statistics and have done so for around 15 years. (They are based on inference networks that simulate Boolean retrieval with weighted indexing by reducing the rigidness associated with conditional probability estimates for Boolean operators `and’, `or’ and `not’. In this way, document ranking increasingly depends on the number of query constraints met).
The problem is that to evaluate new techniques in IR (and thus improve the field), you need not only a corpus of documents to search but also a sample of legal queries and a list of all the relevant documents in response to those queries that exist in your corpus, perhaps even with some indication of how relevant they are. This is not easy to come by. In theory a lot of case data is publicly available, but accumulating and cleaning legal text downloaded from the internet, making it amenable to search, is nothing short of tortuous. Then relevance judgments must be given by legal professionals, which is difficult given that we are talking about a community of people who charge their time by the hour.
Of course, the cooperation of commercial search providers, who own both the data and training sets with relevance judgments, would make everyone’s life much easier, but for obvious commercial reasons they keep their data to themselves.
To see the productive effects of a good data set we need only look at the research boom now occurring in e-discovery (discovery of electronic evidence, or DESI). In 2006 the TREC Legal Track, including a large evaluation corpus, was established in response to the number of trials requiring e-discovery: 75% of Fortune 500 company trials, with more than 90% of company information now stored electronically. This has generated so much interest that an annual DESI workshop has been established since 2007.
Qualitative evaluation of IR performance by legal professionals is an alternative to the quantitative evaluation usually applied in informatics. The development of new ways to visualize and browse results seems particularly well suited to this approach, where we want to know whether users perceive new interfaces to be genuine improvements. Considering the history of legal IR, qualitative evaluation may be as important as traditional IR evaluation metrics of precision and recall. (Precision is the number of relevant items retrieved out of the total number of items retrieved, and recall is the number of relevant items retrieved out of the total number of relevant items in a collection). However, it should not be the sole basis for evaluation.
A well-known study by Blair and Maron makes this point plain. The authors showed that expert legal researchers retrieve less than 20% of relevant documents when they believe they have found over 75%. In other words, even experts can be very poor judges of retrieval performance.
Context in legal retrieval
Setting this aside, where do we go from here? Dan Dabney has argued at the American Association of Law Libraries (AALL) 2005 Annual Meeting that free text search decontextualizes information, and he is right. One thing to notice about current methods in open domain IR, including vector space models, probabilistic models and language models, is that the only context they are taking into account is proximate terms (phrases). At heart, they treat all terms as independent.
However, it’s risky to conclude what was reported from the same meeting: “Using indexes improves accuracy, eliminates false positive results, and leads to completion in ways that full-text searching simply cannot.” I would be interested to know if this continues to be a general perception amongst legal librarians despite a more recent emphasis on innovating with technologies that don’t encroach upon the sacred ground of indexing. Perhaps there’s a misconception that capitalizing on full-text search methods would necessarily replace the use of index terms. This isn’t the case; inference networks used in commercial legal IR are not applied in the open domain, and one of their advantages is that they can incorporate any number of types of evidence.
Specifically, index numbers, terms, phrases, citations, topics and any other desired information are treated as representation nodes in a directed acyclic graph (the network). This graph is used to estimate the probability of a user’s information need being met given a document.
For the time being lawyers, unaware of technology under the hood, default to using inference networks in a way that is familiar, via a search interface that easily incorporates index terms and looks like a Boolean search. (Inference nets are not Boolean but they can be made to behave in the same way.) While Boolean search does tend to be more precise than other methods, the more data there is to search the less well the system performs. Further, it’s not all about precision. Recall of relevant documents is also important and this can be a weak point for Boolean retrieval. Eliminating false positives is no accolade when true positives are eliminated at the same time.
Since the current predicament is an explosion of data, arguing for indexing by contrasting it with full-text retrieval without considering how they might work together seems counterproductive.
Perhaps instead we should be looking at revamping legal reliance on a Boolean-style interface so that we can make better use of full-text search. This will be difficult. Lawyers who are charged, and charge, per search, must be able to demonstrate the value of each search to clients; they can’t afford the iterative nature of what characterizes open domain browsing. Further, if the intelligence is inside the retrieval system, rather than held by legal researchers in the form of knowledge about how to put complex queries together, how are search costs justified? Although Boolean queries are no longer well-adapted, at least value is easy to demonstrate. A push towards free-text search by either legal professionals or commercial search providers will demand a rethink of billing structures.
Given our current systems, are there incremental ways we can improve results from full-text search? Query expansion is a natural consideration and incidentally overlaps with much of the technology underlying graphical means of data exploration such as word clouds and wonderwheels; the difference is that query expansion goes on behind the scenes, whereas in graphical methods the user is allowed to control the process. Query expansion helps the user find terms they hadn’t thought of, but this doesn’t help with the decontextualization problem identified by Dabney; it simply adds more independent terms or phrases.
In order to contextualize information we can marry search using text terms and index numbers as is already applied. Even better would be to do some linguistic analysis of a query to really narrow down the situations in which we want terms to appear. In this way we might get at questions such as “What happened in a case?” or “Why did it happen?” rather than just, “What is this document about?”.
Language processing and IR
Use of linguistic information in IR isn’t a novel idea. In the 1980s, IR researchers started to think about incorporating NLP as an intrinsic part of retrieval. Many of the early approaches attempted to use syntactic information for improving term indexing or weighting. For example, Fagan improved performance by applying syntactic rules to extract similar phrases from queries and documents and then using them for direct matching, but it was held that this was comparable to a less complex, and therefore preferable, statistical approach to language analysis. In fact, Fagan’s work demonstrated early on what is now generally accepted: statistical methods that do not assume any knowledge of word meaning or linguistic role are surprisingly (some would say depressingly) hard to beat for retrieval performance.
Since then there have been a number of attempts to incorporate NLP in IR, but depending on the task involved, there can be a lack of highly accurate methods for automatic linguistic analysis of text that are also robust enough to handle unexpected and complex language constructions. (There are exceptions, for example, part-of-speech tagging is highly accurate.) The result is that improved retrieval performance is often offset by negative effects, resulting in a minimal positive, or even a negative impact on overall performance. This makes NLP techniques not worth the price of additional computational overheads in time and data storage.
However, just because the task isn’t easy doesn’t mean we should give up. Researchers, including myself, are looking afresh at ways to incorporate NLP into IR. This is being encouraged by organizations such as the NII Test Collection for IR Systems Project (NTCIR), who from 2003 to 2006 amassed excellent test and training data for patent retrieval with corpora in Japanese and English and queries in five languages. Their focus has recently shifted towards NLP tasks associated with retrieval, such as patent data mining and classification. Further, their corpora enable study of cross-language retrieval issues that become important in e-discovery since only a minority fraction of a global corporation’s electronic information will be in English.
We stand on the threshold of what will be a period of rapid innovation in legal search driven by the integration of existing knowledge bases with emerging full-text processing technologies. Let’s explore the options.
K. Tamsin Maxwell is a PhD candidate in informatics at the University of Edinburgh, focusing on information retrieval in law. She has a MSc in cognitive science and natural language engineering, and has given guest talks at the University of Bologna, UMass Amherst, and NAIST in Japan. Her areas of interest include text processing, machine learning, information retrieval, data mining and information extraction. Her passion outside academia is Arabic dance.
VoxPopuLII is edited by Judith Pratt.