skip navigation

Artisanal Algorithms

Down here in Durham, NC, we have artisanal everything: bread, cheese, pizza, peanut butter, and of course coffee, coffee, and more coffee. It’s great—fantastic food and coffee, that is, and there is no doubt some psychological kick from knowing that it’s been made carefully by skilled craftspeople for my enjoyment. The old ways are better, at least until they’re co-opted by major multinational corporations.

Artisanal Cheese. Source: Wikimedia Commons

Aside from making you either hungry or jealous, or perhaps both, why am I talking about fancy foodstuffs on a blog about legal information? It’s because I’d like to argue that algorithms are not computerized, unknowable, mysterious things—they are produced by people, often painstakingly, with a great deal of care. Food metaphors abound, helpfully I think. Algorithms are the “special sauce” of many online research services. They are sets of instructions to be followed and completed, leading to a final product, just like a recipe. Above all, they are the stuff of life for the research systems of the near future.

Human Mediation Never Went Away

When we talk about algorithms in the research community, we are generally talking about search or information retrieval (IR) algorithms. A recent and fascinating VoxPopuLII post by Qiang Lu and Jack Conrad, “Next Generation Legal Search – It’s Already Here,” discusses how these algorithms have become more complicated by considering factors beyond document-based, topical relevance. But I’d like to step back for a moment and head into the past for a bit to talk about the beginnings of search, and the framework that we have viewed it within for the past half-century.

Many early information-retrieval systems worked like this: a researcher would come to you, the information professional, with an information need, that vague and negotiable idea which you would try to reduce to a single question or set of questions. With your understanding of Boolean search techniques and your knowledge of how the document corpus you were searching was indexed, you would then craft a search for the computer to run. Several hours later, when the search was finished, you would be presented with a list of results, sometimes ranked in order of relevance and limited in size because of a lack of computing power. Presumably you would then share these results with the researcher, or perhaps just turn over the relevant documents and send him on his way. In the academic literature, this was called “delegated search,” and it formed the background for the most influential information retrieval studies and research projects for many years—the Cranfield Experiments. See also “On the History of Evaluation in IR” by Stephen Robertson (2008).

In this system, literally everything—the document corpus, the index, the query, and the results—were mediated. There was a medium, a middle-man. The dream was to some day dis-intermediate, which does not mean to exhume the body of the dead news industry. (I feel entitled to this terrible joke as a former journalist… please forgive me.) When the World Wide Web and its ever-expanding document corpus came on the scene, many thought that search engines—huge algorithms, basically—would remove any barrier between the searcher and the information she sought. This is “end-user” search, and as algorithms improved, so too would the system, without requiring the searcher to possess any special skills. The searcher would plug a query, any query, into the search box, and the algorithm would present a ranked list of results, high on both recall and precision. Now, the lack of human attention, evidenced by the fact that few people ever look below result 3 on the list, became the limiting factor, instead of the lack of computing power.

A search for delegated search

A search for delegated search

The only problem with this is that search engines did not remove the middle-man—they became the middle-man. Why? Because everything, whether we like it or not, is editorial, especially in reference or information retrieval. Everything, every decision, every step in the algorithm, everything everywhere, involves choice. Search engines, then, are never neutral. They embody the priorities of the people who created them and, as search logs are analyzed and incorporated, of the people who use them. It is in these senses that algorithms are inherently human.

Empowering the Searcher by Failing Consistently

In the context of legal research, then, it makes sense to consider algorithms as secondary sources. Law librarians and legal research instructors can explain the advantages of controlled vocabularies like the Topic and Key Number System®, of annotated statutes, and of citators. In several legal research textbooks, full-text keyword searching is anathema because, I suppose, no one knows what happens directly after you type the words into the box and click search. It seems frightening. We are leaping without looking, trusting our searches to some kind of computer voodoo magic.

This makes sense—search algorithms are often highly guarded secrets, even if what they select for (timeliness, popularity, and dwell time, to name a few) is made known. They are opaque. They apparently do not behave reliably, at least in some cases. But can’t the same be said for non-algorithmic information tools, too? Do we really know which types of factors figure in to the highly vaunted editorial judgment of professionals?

To take the examples listed above—yes, we know what the Topics and Key Numbers are, but do we really know them well enough to explain why the work the way they do, what biases are baked-in from over a century of growth and change? Without greater transparency, I can’t tell you.

How about annotated statutes: who knows how many of the cases cited on online platforms are holdovers from the soon-to-be print publications of yesteryear? In selecting those cases, surely the editors had to choose to omit some, or perhaps many, because of space constraints. How, then, did the editors determine which cases were most on-point in interpreting a given statutory section, that is, which were most relevant? What algorithms are being used today to rank the list of annotations? Again, without greater transparency, I can’t tell you.

And when it comes to citators, why is there so much discrepancy between a case’s classification and which later-citing cases are presented as evidence of this classification? There have been several recent studies, like this one and this one, looking into the issue, but more research is certainly needed.

Finally, research in many fields is telling us that human judgments of relevance are highly subjective in the first place. At least one court has said that algorithmic predictive coding is better at finding relevant documents during pretrial e-discovery than humans are.

Where are the relevant documents? Source: CC BY 2.0, flickr user gosheshe

I am not presenting these examples to discredit subjectivity in the creation of information tools. What I am saying is that the dichotomy between editorial and algorithmic, between human and machine, is largely a false one. Both are subjective. But why is this important?

Search algorithms, when they are made transparent to researchers, librarians, and software developers (i.e. they are “open source”), do have at least one distinct advantage over other forms of secondary sources—when they fail, they fail consistently. After the fact or even in close to real-time, it’s possible to re-program the algorithm when it is not behaving as expected.

Another advantage to thinking of algorithms as just another secondary source is that, demystified, they can become a less privileged (or, depending on your point of view, less demonized) part of the research process. The assumption that the magic box will do all of the work for you is just as dangerous as the assumption that the magic box will do nothing for you. Teaching about search algorithms allows for an understanding of them, especially if the search algorithms are clear about which editorial judgments have been prioritized.

Beyond Search, Or How I Learned to Stop Worrying and Love Automated Research Tools

As an employee at Fastcase, Inc. this past summer, I had the opportunity to work on several innovative uses of algorithms in legal research, most notably on the new automated citation-analysis tool Bad Law Bot. Bad Law Bot, at least in its current iteration, works by searching the case law corpus for significant signals—words, phrases, or citations to legal documents—and, based on criteria selected in advance, determines whether a case has been given negative treatment in subsequent cases. The tool is certainly automated, but the algorithm is artisanal—it was massaged and kneaded by caring craftsmen to deliver a premium product. The results it delivered were also tested meticulously to find out where the algorithm had failed. And then the process started over again.

This is just one example of what I think the future of much general legal research will look like—smart algorithms built and tested by people, taking advantage of near unlimited storage space and ever-increasing computing power to process huge datasets extremely fast. Secondary sources, at least the ones organizing, classifying, and grouping primary law, will no longer be static things. Rather, they will change quickly when new documents are available or new uses for those documents are dreamed up. It will take hard work and a realistic set of expectations to do it well.

Computer assisted legal research cannot be about merely returning ranked lists of relevant results, even as today’s algorithms get better and better at producing these lists. Search must be only one component of a holistic research experience in which the searcher consults many tools which, used together, are greater than the sum of their parts. Many of those tools will be built by information professionals and software engineers using algorithms, and will be capable of being updated and changed as the corpus and user need changes.

It’s time that we stop thinking of algorithms as alien, or other, or too complicated, or scary. Instead, we should think of them as familiar and human, as sets of instructions hand-crafted to help us solve problems with research tools that we have not yet been able to solve, or that we did not know were problems in the first place.

Aaron KirschenfeldAaron Kirschenfeld is currently pursuing a dual J.D. / M.S.I.S. at the University of North Carolina at Chapel Hill. His main research interests are legal research instruction, the philosophy and aesthetics of legal citation analysis, and privacy law. You can reach him on Twitter @kirschsubjudice.

His views do not represent those of his part-time employer, Fastcase, Inc. Also, he has never hand-crafted an algorithm, let alone a wheel of cheese, but appreciates the work of those who do immensely.


VoxPopuLII is edited by Judith Pratt. Editors-in-Chief are Stephanie Davidson and Christine Kirchberger, to whom queries should be directed.

The first thing we do, let’s kill all the lawyers.
– Henry VI, Pt. 2, Act 4, sc. 2.

This line, delivered by Dick the Butcher (turned revolutionary) in Shakespeare’s Henry VI, is often performed tongue-in-cheek by actors to elicit an expected laugh from the audience. The essence of the line, however, is no joke, and relates to destabilizing the rule of law by removing its agents — those who promote and enforce the law. What no one could predict, including Shakespeare himself, is the horrific precision with which such a deed could be carried out.

The 1994 Genocide in Rwanda showed this horror and more, with upwards of one million killed in the span of three months. The effect on the legal system was particularly devastating, with the targeting of lawyers and the justice sector, resulting in the targeted killing of prosecutors and judges at its outset.

Rwanda’s Justice Sector Development
Since 1994, Rwanda has done a remarkable job rebuilding its society, establishing security, curbing corruption, and creating one of the fastest growing economies in sub-Saharan Africa.

Law Library at the Ministry of Justice, Kigali, Rwanda.

Law Library at the Ministry of Justice, Kigali, Rwanda.

One of the biggest areas of development in Rwanda, and in other areas of the world, has been strengthening justice sector institutions and strengthening the rule of law. In transitional states, especially those developing systems of democratic governance, the creation of online, reliable, and accessible legal information systems is a critical component of good governance. Rwanda’s efforts and opportunities for development in this area are noted below.

From 2010-2011, I played a very small part of this development when I served as a law clerk and legal advisor to then-Chief Justice Aloysie Cyanzayire of the Supreme Court of Rwanda. Working with a USAID-funded project, I was also able to participate with legal education reform, and the development of an online database of laws, the Rwanda Legal Information Portal (RwandaLIP). In the summer of 2013 I returned to Rwanda, with the support of the American Association of Law Libraries, to visit its law libraries and understand the role of law libraries in legal institutions and overall society. After learning the Rwanda LIP was no longer updated (and now offline entirely), investigating Rwanda’s online legal presence became a secondary research goal for the trip. The discovery also highlighted the importance of legal information systems and their role in justice sector reform. Part of this justice sector reform related to changes in Rwanda’s legal system. Once a Belgian colony, at independence Rwanda inherited a civil law system, codified much of the Belgian civil code, and today the main body of laws comes from enactments of Parliament. Rwanda’s judicial system, rebuilt after the 1994 Genocide, is made up of four levels of courts: District Courts, Provincial Courts, High Courts, and the Supreme Court.
With its civil law roots, courts in Rwanda were largely unconcerned with precedent. As Rwanda became a member of the East African Community in 2007 (and adopted English as an official language), the judiciary started a transition to a hybrid common law system, considering how to assign precedential value to court decisions. With this ongoing transition in Rwanda’s legal system, an online legal information system has become a significant need for legal and civil society.

One of four computer labs, called the "digital library" at Kigali Independent University, with more than 400 computer workstations available for student use.

One of four computer labs, called the “digital library” at Kigali Independent University, with more than 400 computer workstations available for student use.

Online Legal Information Systems
In order to establish the rule of law in a democratic system, citizens must have access, at the very minimum, to laws of a government. To make this access meaningful, a searchable database of laws should be created to allow users of legal information to find laws based on their particular information need. For this reason alone it is important for governments in transitional states to make a commitment to developing online legal information systems.

John Palfrey aptly noted: “In most countries, primary legal information is broadly accessible in one format or another, but it is rarely made accessible online in a stable and reliable format.” This is basically the case in Rwanda. Every law library, university library, and even the Kigali Public Library have paper copies of the Official Journal — the official laws of Rwanda. Today, however, the only current place to find laws online is through the Prime Minister’s webpage, where PDF copies of the Official Gazette are published. The website (Kinyarwanda for “law”) was frequently used by lawyers and members of the justice sector to search Rwanda’s laws, and allowed the general public to not only access laws, but run a full text search for keywords. This site, however, was not updated after 2011, and is now completely offline. The result is no online source to search Rwanda’s laws.

Law Library at the Parliament of the Republic of Rwanda in Kigali.

Law Library at the Parliament of the Republic of Rwanda in Kigali.

Rwanda is using its growing information infrastructure, however, to create other online quasi-legal information databases. For instance, the Rwanda Development Board created an online portal for businesses to access information on “investment related procedures” in Rwanda. The government is also allowing online registration of businesses, streamlining the processes and making it more accessible. These developments make sense with Rwanda’s reforms in the area of economic development, and its recent ranking in the top 30% globally for ease of doing business, and 3rd best in sub-Saharan Africa. While economic reform has driven these changes, justice sector reform has not yet yielded the same results for online legal information systems.

Service counter at the University Library at Kigali Independent University in Rwanda.  Students aren't allowed to browse the library stacks.

Service counter at the University Library at Kigali Independent University in Rwanda. Students aren’t allowed to browse the library stacks.

Rwanda’s Legal Information Culture Despite the limited online access to laws, there is a high value placed on legal information in Rwanda. Every legal institution has a law library and a dedicated library staff member (although most don’t have formal education in librarianship or information management). Moreover, members of the justice sector, from staff members to Permanent Secretaries and Ministers, believe libraries and access to legal information is of critical importance. A common theme in Rwanda’s law libraries, however, is the lack of funding. Some libraries have not invested in library materials in years, and have solely relied on donations to add items to their collections. It is not altogether surprising, then, that the Rwanda LIP remained un-funded, and is now completely defunct as an online legal information system. One source close to the Rwanda LIP project indicated that funding has been sought at Parliament, but as of today has yet to be successful.

The failure of the Rwanda LIP is perhaps a victim of how it came to be; that is, through donor-funded development. Creating sustainable online databases requires a government commitment of financial support. Just as before it, the Rwanda LIP was created through a donor-funded initiative, and at its conclusion the LIP’s source of funding also ended. For any donor-funded development initiative, sustainability is a key concern, and significant government collaboration is necessary for initiatives to remain after donor-funded projects end. This concept is especially true with legal information systems, and is perhaps the cause for the Rwanda LIP’s demise. While created in partnership with the Government of Rwanda, it failed to adequately secure a commitment for continued funding at its outset. Sustainability issues are not unique to Rwanda’s experience with online legal information systems. The availability of financial resources is one of the key challenges to creating a sustainable online database of laws. Working with developing countries in Africa, SAFLII found that sustainability issues come from “shortages of resources, skills and technical services.” While donor-funded projects have serious limitations, others experiencing the sustainability challenge have suggested databases supported by private enterprise, “offering free content as well as value-added services for sale.” One thing for certain is that long-term sustainability remains one of the biggest challenges for online legal information systems.

View of the Kigali Public Library in Kigali, Rwanda.

View of the Kigali Public Library in Kigali, Rwanda.

Print to Digital Transition and Overcoming the Digital Divide In addition to sustainability, transition from print to digital poses its own complications, and has emerged as a major issue in law libraries, from even the most established institutions. This challenge is especially unique in the context of developing and transitional states, where access to the internet can pose a significant challenge. This problem, known as the “digital divide,” has been described as something that “disproportionately disenfranchises certain segments of society and runs counter to the notion that inclusiveness and opportunity build strong communities and countries.” This is an even larger problem in developing and transitional states, where there is far less wealth and technological infrastructure for internet connectivity, and a greater disparity in access between and among communities.

Of all countries in the process of developing online legal information systems, however, Rwanda is perhaps the best suited to succeed. With high-speed fibre-optic internet cables recently installed throughout the small East African country, Rwanda has one of the best internet penetration rates in the developing world. So, while Rwanda’s law libraries (and other libraries) throughout the country have print copies of laws, there may be a legitimate opportunity to give a large number of citizens online access. For example, the Kigali Public Library, the flagship institution of the Rwanda Library Services, houses print copies of the laws of Rwanda but also has an internet cafe giving free access to online resources. Kigali Independent University has an “Internet Library” with more than 500 computers for student use. Rwanda’s law libraries are also open and accessible to the public, some of which have computers for use by the public as well. Other libraries, including the law library at the National University of Rwanda, have increasing access to online resources to serve their users.

In Rwanda, a new access to information law (Official Gazette No. 10 of 11.03.2013) makes online legal information even more critical in the developing state, and Rwanda’s current efforts can serve as an example for the importance of modernizing online legal information. The access to information law imposes a positive obligation on the Government of Rwanda, and some private companies working under government contracts, to disclose a broad range of information to the public and press. It has been stated that the law “meets standards of best practice in terms of scope and application” for freedom of information laws. Despite the law’s conditions to withhold information under Article 4, the significant shift in policy and the law’s broad range of information available are very positive signs. This and similar laws across the developing world have created a need for the improvement of existing legal information systems, or the creation of new systems to adequately make available essential legal information. A critical component to the implementation of this law, therefore, is a reliable and sustainable online legal information system.

A view of the volcanoes in the Northern Province of Rwanda.

A view of the volcanoes in the Northern Province of Rwanda.

Lessons Learned from Rwanda’s Experience
While and the Rwanda LIP are no longer online, institutions within the justice sector of Rwanda are currently working on solutions. In the meantime, there is no meaningful way to search Rwanda’s laws online. It is possible that a stronger financial commitment at the outset of the Rwanda LIP would have solved this. In the future, long-term sustainability should be one of the primary qualifications for creating an online system.

In the meantime, there are other ways of expanding Rwanda’s access to online legal information through databases of foreign law and secondary sources. Talking with law librarians in Rwanda, I learned that there is little, if any research instruction being delivered from law libraries. Even in the few libraries with subscription electronic databases, users aren’t necessarily being directed to relevant legal resources. Furthermore, law librarians generally collect, catalog and retrieve legal materials for users, rather than directing users to relevant sources. Users of legal information in Rwanda (and elsewhere) would be well served by being exposed to other online sources of legal information. Sites like the LII, WorldLII, and the Directory of Open Access Journals offers access to a wealth of free online primary and secondary materials that could be useful to researchers. Creating research guides and offering research instruction in these areas costs very little, and opens up countless resources that could be valuable to users of legal information in Rwanda, and elsewhere. Those working in justice sector development should investigate the possibility for this, in conjunction with creating online legal information systems of domestic laws.

Directional sign outside the Law Faculty at the Independent Institute of Lay Adventist of Kigali.

Directional sign outside the Law Faculty at the Independent Institute of Lay Adventist of Kigali.

Finally, the majority of those working as librarians in Rwanda’s law libraries have no formal instruction in library or information science. Nonetheless, it is remarkable that those with little or no formal training are competent librarians. Formal training or not, qualified librarians generally do not have the opportunity to offer research training to users of legal information. Treating law librarians as professionals would open up many opportunities to increase the capacity of users of legal information, and the online resources available.


IMG_1857Brian Anderson is a Reference Librarian and Assistant Professor at the Taggart Law Library at Ohio Northern University. His research involves the use of law libraries and legal information systems to support the rule of law in developing and transitional states. In September 2013 Brian presented two papers at the 2013 Law Via the Internet conference related to this topic; one related to civil society organizations and the use of the internet to strengthen the rule of law, and another about starting online legal information systems from scratch.

VoxPopuLII is edited by Judith Pratt. Editors-in-Chief are Stephanie Davidson and Christine Kirchberger, to whom queries should be directed.


AT4AM – Authoring Tool for Amendments – is a web editor provided to Members of European Parliament (MEPs) that has greatly improved the drafting of amendments at European Parliament since its introduction in 2010.

The tool, developed by the Directorate for Innovation and Technological Support of European Parliament (DG ITEC) has replaced a system based on a collection of macros developed in MS Word and specific ad hoc templates.

Why move to a web editor?

The need to replace a traditional desktop authoring tool came from the increasing complexity of layout rules combined with a need to automate several processes of the authoring/checking/translation/distribution chain.

In fact, drafters not only faced complex rules and had to search among hundreds of templates in order to get the right one, but the drafting chain for all amendments relied on layout to transmit information down the different processes. Bold / Italic notation or specific tags were used to transmit specific information on the meaning of the text between the services in charge of subsequent revision and translation.

Over the years, an editor that was initially conceived to support mainly the printing of documents was often used to convey information in an unsuitable manner. During the drafting activity, documents transmitted between different services included a mix of content and layout where the layout sometime referred to some information on the business process that should rather be transmitted via other mediums.

Moreover, encapsulating in one single file all the amendments drafted in 23 languages was a severe limitation for subsequent revisions and translations carried out by linguistic sectors. Experts in charge of legal and linguistic revision of drafted amendments, who need to work in parallel on one document grouping multilingual amendments, were severely hampered in their work.

All the needs listed above justified the EP undertaking a new project to improve the drafting of amendments. The concept was soon extended to the drafting, revision, translation and distribution of the entire legislative content in the European Parliament, and after some months the eParliament Programme was initiated to cover all projects of the parliamentary XML-based drafting chain.

It was clear from the beginning that, in order to provide an advanced web editor, the original proposal to be amended had to be converted into a structured format. After an extensive search, XML Akoma Ntoso format was chosen, because it is the format that best covers the requirements for drafting legislation. Currently it is possible to export amendments produced via AT4AM in Akoma Ntoso. It is planned to apply Akoma Ntoso schema to the entire legislative chain within eParliament Programme. This will enable EP to publish legislative texts in open data format.

What distinguishes the approach taken by EP from other legislative actors who handle XML documents is the fact that EP decided to use XML to feed the legislative chain rather than just converting existing documents into XML for distribution. This aspect is fundamental because requirements are much stricter when the result of XML conversion is used as the first step of legislative chain. In fact, the proposal coming from European Commission is first converted in XML and after loaded into AT4AM. Because the tool relies on the XML content, it is important to guarantee a valid structure and coherence between the language versions. The same articles, paragraphs, point, subpoints must appear at the correct position in all the 23 language versions of the same text.

What is the situation now?

After two years of intensive usage,  Members of European Parliaments have drafted 285.000 amendments via AT4AM. The tool is also used daily by the staff of the secretariat in charge of receiving tabled amendments, checking linguistic and legal accuracy and producing voting lists. Today more then 2300 users access the system regularly, and no one wants to go back to the traditional methods of drafting. Why?

Automatic Bold ItalicBecause it is much simpler and faster to draft and manage amendments via an editor that takes care of everything, thus  allowing drafters to concentrate on their essential activity: modifying the text.

Soon after the introduction of AT4AM, the secretariat’s staff who manage drafted amendments breathed a sigh of relief, because errors like wrong position references, which weBetterre the cause of major headaches, no longer occurred.

What is better than a tool that guides drafters through the amending activity by adding all the surrounding information and taking care of all the metadata necessary for subsequent treatment, while letting the drafter focus on the text amendments and produce well-formatted output with track changes?

After some months of usage, it was clear that not only the time to draft, check and translate amendments was drastically reduced, but also the quality of amendments increased.

QuickerThe slogan that best describes the strength of this XML editor is: “You are always just two clicks away from tabling an amendment!”



Web editor versus desktop editor: is it an acceptable compromise?

One of the criticisms that users often raise against web editors is that they are limited when compared with a traditional desktop rich editor. The experience at the European Parliament has demonstrated that what users lose in terms of editing features is highly compensated by the gains of getting a tool specifically designed to support drafting activity. Moreover, recent technologies enable programmers to develop rich web WYSIWYG (What You See Is What You Get) editors that include many of the traditional features plus new functions specific to a “networking” tool.

What’s next?

The experience of EP was so positive and so well received by other Parliaments that in May 2012, at the opening of the international workshop “Identifying benefits deriving from the adoption of XML-based chains for drafting legislation“, Vice President Wieland announced the launch of a new project aimed at to providing an open source version of the AT4AM code.

AT4AM for All in a video conference with the United Nations Department for General Assembly and Conference Management from New York on 19 March 2013, Vice President Wieland announced,  the UN/DESA’s Africa i-Parliaments Action Plan from Nairobi and the Senate of Italy from Rome, the availability of AT4AM for All, which is the name given to this open source version, for any parliament and institution interested in taking advantage of this well-oiled IT tool that has made the life of MEPs much easier.

The code has been released under EUPL(European Union Public Licence), an open source licence provided by European Commission that is compatible with major open source licences like Gnu GPLv2 with the advantage of being available in the 22 official languages of the European Union.

AT4AM for All is provided with all the important features of the amendment tool used in the European Parliament and can manage all type of legislative content provided in the XML format Akoma Ntoso. This XML standard, developed through the UN/DESA’s initiative Africa i-Parliaments Action Plan, is currently under certification process at OASIS, a non-profit consortium that drives the development, convergence and adoption of open standards for the global information society. Those who are interested may have a look to the committee in charge of the certification: LegalDocumentML

Currently the Documentation Division, Department for General Assembly and Conference Management of United Nations is evaluating the software for possible integration in their tools to manage UN resolutions.

The ambition of EP is that other Parliaments with fewer resources may take advantage of this development to improve their legislative drafting chain. Moreover, the adoption of such tools allows a Parliament to move towards an XML based legislative chain. The distribution of legislative content in open document formats like XML allows other parties to treat in an efficient way the legislation produced.

Thanks to the efforts of European Parliament, any parliament in the world is now able to use the advanced features of AT4AM to support the drafting of amendments. AT4AM will serve as a useful tool for all those interested in moving towards open data solutions and more democratic transparency in the legislative process.

At AT4AM for All website it is possible to get the status of works and run a sample editor with several document types. Any Parliament interested can go to the repository and download the code.

Claudio FabianiClaudio Fabiani is Project Manager at the Directorate-General for Innovation and Tecnological Support of European Parliament. After an experience of several years in private sector as IT consultant, he started his career as civil servant at European Commission, in 2001, where he has managed several IT developments. Since 2008 he is responsible of AT4AM project and more recently he has managed the implementation of AT4AM for All, the open source version.



VoxPopuLII is edited by Judith Pratt. Editors-in-Chief are Stephanie Davidson and Christine Kirchberger, to whom queries should be directed.


Vox.summer_readingMaybe it’s a bit late for a summer reading list, or maybe you’re just now starting to pack for your vacation, deep in a Goodreads list that you don’t ever expect to dig your way out of. Well, let us add to your troubles with a handful of books your editors are currently enjoying.

Clearing in the forest : law, life, and mind, by Steven L. Winter. A 2001 cognitive science argument for studying and developing law. Perhaps a little heavy for poolside, one of your editors finds it perfect for multi-day midwestern summer rainstorms, alons with a pot of tea. Review by Lawrence Solan in the Brooklyn Law Review, as part of a symposium.

Digital Disconnect: How Capitalism is Turning the Internet Against Democracy, by Robert W. McChesney.

“In Digital Disconnect, Robert McChesney offers a groundbreaking critique of the Internet, urging us to reclaim the democratizing potential of the digital revolution while we still can.”

This is currently playing on my work commute.

The Cognitive Style of Power Point: Pitching Out Corrupts Within, by Edward Tufte. Worth re-reading every so often, especially heading into conference/teaching seasons.

Delete: The Virtue of Forgetting in a Digital Age, by VoxPopuLII contributor Viktor Mayer-Schonberger. Winner of the 2010 Marshall McLuhan Award for Outstanding Book in Media ecology, Media Ecology Association; Winner of the 2010 Don K. Price Award for Best Book in Science and Technology Politics, Section on Science, Technology, and Environmental Politics (STEP) by the American Political Science Association. Review at the Times Higher Education.

Piracy: The Intellectual Property Wars from Gutenberg to Gates, by Adrian Johns (2010). A historian’s view of Intellectual Property — or, this has all happened before. Reviews at the Washington Post and the Electronic Frontier Foundation. From the latter, “Radio arose in the shadow of a patent thicket, became the province of tinkers, and posed a puzzle for a government worried that “experimenters” would ruin things by mis-adjusting their sets and flooding the ether with howling oscillation. Many will immediately recognize the parallels to modern controversies about iPhone “jailbreaking,” user innovation, and the future of the Internet.”

The Master Switch: The Rise and Fall of Information Empires, by Tim Wu (2010). A history of communications technologies, and the cyclical (or not) trends of their openness, and a theory on the fate of the Internet. Nice reviews on Ars Tecnica and The Guardian.

Too Big to Know: Rethinking Knowledge Now That the Facts Aren’t the Facts, Experts Are Everywhere, and the Smartest Person in the Room Is the Room, by David Weinberger (author of the Cluetrain Manifesto). For more, check out this excerpt by Weinberger in The Atlantic and

You are not so smart, by David McRaney. Examines the myth of being intelligent — a very refreshing read for the summer. A review of the book can be found at Brainpickings, which by the way is an excellent blog and definitely worth a look.

On a rainy day you can always check out the BBC series “QI” with a new take on what we think we know but don’t know. Hosted by Stephen Fry. Comedians share their intelligence with witty humour and you will learn a thing or two along the way. The TV show has also led to a few books, e.g. Qi: the Book of General Ignorance (Q1), by John Lloyd


Sparing the cheesy beach reads, here’s a fiction set that you may find interesting.

The Ware Tetralogy: Ware #1-4 , by Rudy Rucker (currently $6.99 for the four-pack)

Rucker’s four Ware novels–Software (1982), Wetware (1988), Freeware (1997), and Realware (2000)–form an extraordinary cyberweird future history with the heft of an epic fantasy novel and the speed of a quantum processor. Still exuberantly fresh despite their age, they primarily follow two characters (and their descendants): Cobb Anderson, who instigated the first robot revolution and is offered immortality by his grateful “children,” and stoner Sta-Hi Mooney, who (against his impaired better judgment) becomes an important figure in robot-human relations. Over several generations, humans, robots, and society evolve, but even weird drugs and the wisdom gathered from interstellar signals won’t stop them from making the same old mistakes in new ways. Rucker is both witty and serious as he combines hard science and sociology with unrelentingly sharp observations of all self-replicating beings. — Publisher’s Weekly

Happy reading! We’ll return mid-August with a feature on AT4AM.


VoxPopuLII is edited by Judith Pratt. Editors-in-Chief are Stephanie Davidson and Christine Kirchberger, to whom queries should be directed.

In March, Mike Lissner wrote for this blog about the troubling state of access to case law – noting with dismay that most of the US corpus is not publicly available. While a few states make official cases available, most still do not, and neither does the federal government. At Ravel Law we’re building a new legal research platform and, like Mike, we’ve spent substantial time troubleshooting access to law issues. Here, we will provide some more detail about how official case law is created and share our recommendations for making it more available and usable. We focus in particular on FDsys – the federal judiciary’s effort in this space – but the ideas apply broadly.

The Problem

If you ask a typical federal court clerk, such as our friend Rose, Pacific_Reporterabout the provenance of case opinions you will only learn half the story. Rose can tell you that after she and her judge finish an opinion it gets sent to a permanent court staffer. After that the story that Rose knows basically ends. The opinion at this stage is in its “slip” opinion state, and only some time later will Rose see the “official” version – which will have a citation number, copy edits, and perhaps other alterations. Yet, it is only this new “official” version that may be cited in court. For Mike Lissner, for Ravel, and for many others, the crux of the access challenge lies in steps beyond Rose’s domain, beyond the individual court’s in fact – when a slip becomes an official opinion.

For years the federal government has outsourced the creation of official opinions, relying on Westlaw and Lexis to create and publish them. These publishers are handed slip opinions by court staff, provide some editing, assign citations and release official versions through their systems. As a result, access to case law has been de facto privatized, and restricted.


Of late, however, courts are making some strides to change the nature of this system. The federal judiciary’s FDsys_bannerprimary effort in this regard is FDsys (and also see the 9th Circuit’s recent moves). But FDsys’s present course gives reason to worry that its goals have been too narrowly conceived to achieve serious benefit. This discourages the program’s natural supporters and endangers its chances of success.

We certainly count ourselves amongst FDsys’s strongest supporters, and we applaud the Judicial Conference for its quick work so far. And, as friends of the program, we want to offer feedback about how it might address the substantial skepticism it faces from those in the legal community who want the program to succeed but fear for its ultimate success and usability.

Our understanding is that FDsys’s primary goal is to provide free public access to court opinions. Its strategy for doing so (as inexpensively and as seamlessly as possible) seems to be to fully implement the platform at all federal courts before adding more functionality. This last point is especially critical. Because FDsys only offers slip opinions, which can’t be cited in court, its current usefulness for legal professionals is quite limited; even if every court used FDsys it would only be of marginal value. As a result, the legal community lacks incentive to lend its full, powerful, support to the effort. This support would be valuable in getting courts to adopt the system and in providing technology that could further reduce costs and help to overcome implementation hurdles.

Setting Achievable Goals

We believe that there are several key goals FDsys can accomplish, and that by doing so it will win meaningful support from the legal community and increase its end value and usage. With loftier goals (some modest, others ambitious), FDsys would truly become a world-class opinion publishing system. The following are the goals we suggest, along with metrics that could be used to assess them.



1. Comprehensive Access to Opinions – Does every federal court release every published and unpublished opinion?
  – Are the electronic records comprehensive in their historic reach?
2. Opinions that can be Cited in Court – Are the official versions of cases provided, not just the slip opinions?
  – And/or, can the version released by FDsys be cited in court?
3. Vendor-Neutral Citations – Are the opinions provided with a vendor-neutral citation (using, e.g., paragraph numbers)?
4. Opinions in File Formats that Enable Innovation – Are opinions provided in both human and machine-readable formats?
5. Opinions Marked with Meta-Data – Is a machine-readable language such as XML used to tag information like case date, title, citation, etc?
  – Is additional markup of information such as sectional breaks, concurrences, etc. provided?
6. Bulk Access to Opinions – Are cases accessible via bulk access methods such as FTP or an API?


The first three goals are the basic building blocks necessary to achieve meaningful open-access to the law. As Professor Martin of Cornell Law and others have chronicled, the open-access community has converged around these goals in recent years, and several states (such as Oklahoma) have successfully implemented them with very positive results.

Goals 3-6 involve the electronic format and storage medium used, and are steps that would be low-cost enablers of massive innovation. If one intention of the FDsys project is to support the development of new legal technologies, the data should be made accessible in ways that allow efficient computer processing. Word documents and PDFs do not accomplish this. PDFs, for example, are a fine format for archival storage and human reading, but computers don’t easily read them and converting PDFs into more usable forms is expensive and imperfect.

In contrast, publishing cases at the outset in a machine-readable Oliver_Wendell_Holmes_Jr_circa_1930-editformat is easy and comes at virtually no additional cost. It can be done in addition to publishing in PDF. Courts and the GPO already have electronic versions of cases and with a few mouse clicks could store them in a format that would inspire innovation rather than hamper it. The legal technology community stands ready to assist with advice and development work on all of these issues.

We believe that FDsys is a commendable step toward comprehensive public access to law, and toward enabling innovation in the legal space. Left to its current trajectory, however, it is certain to fall short of its potential. With some changes now, the program could be a home run for the entire legal community, ensuring that clerks like Rose can rest assured that the law as interpreted by her judge is accessible to everyone.


Nik and DanielDaniel Lewis and Nik Reed are graduates of Stanford Law School and the co-founders of Ravel Law, a legal search, analytics, and collaboration platform. In 2012, Ravel spun out of a Stanford University Law School, Computer Science Department, and Design School collaborative research effort focused on legal citation networks and information design. The Ravel team includes software engineers and data scientists from Stanford, MIT, and Georgia Tech. You can follow them on Twitter @ravellaw

VoxPopuLII is edited by Judith Pratt. Editors-in-Chief are Stephanie Davidson and Christine Kirchberger, to whom queries should be directed.

For decades, words have been lawyers’ tools of trade. Today, we should no longer let tradition force us to think inside the text-only box. Apart from words, there are other means available.

It is no longer enough (if it ever was) to offer more information or to enhance access alone: the real challenge is the understandability of the content. We might have access to information, but still be unable to decode it or realize its importance. It is already painfully clear that the general public does not understand legalese, and that communication is becoming more and more visual and rapid. There is a growing literature about style and typography for legal documents and contracts, yet the use of visual and non-textual elements has been so far omitted for the most part. Perhaps images do not seem “official”, “legal”, or trustworthy enough for all.

Last year, in Sean McGrath’s post on Digital Law, we were alerted to what lawyers need to learn from accountants. In this post, we present another profession as our role model, one with a considerably shorter history than that of accountants: information designers.

Focus on users and good communication

Lawyers are communication professionals, even though we do not tend to think about ourselves in these terms. Most of us give advice and produce content and documents to deliver a specific message. In many cases a document — such as a piece of legislation or a contract — in itself is not the goal; its successful implementation is. Implementation, in turn, means adoption and action, often a change of behavior, on the part of the intended individuals and organizations.

Law school does not teach us how to enhance the effectiveness of our message. While many lawyers are known to be good communicators, most have had to learn the hard way. It is easy to forget that our colleagues, members of the legal community, are not the only users of our work. When it comes to other users of our content and documents, we can benefit from starting to think about 1) who these users are, 2) what they want or need to know, 3) what they want to achieve, 4) in which situation, and 5) how we can make our content and documents as clear, engaging and accessible as possible.

These questions are deeply rooted in the discipline of information design. The work of information designers is about organizing and displaying information in a way that maximizes its clarity and understandability. It focuses on the needs of the users and the context in which they need to find and apply information. When the content is complex, readers need to grasp both the big picture and the details and often switch between these two views. This is where visualization — here understood as adding graphs, icons, tables, charts and images to supplement text — enters the picture. Visualization can help in navigating text, opening up its meaning and reinforcing its message, even in the field of law. And information design is not about visualization only: it is also about many other useful things such as language, readability, typography, layout, color coding, and white space.

Want to see examples? Look no further!


Figure 1: Excerpt from Vendor Power! – a visual guide to the rights and duties for street vendors in New York City. © 2009 The Center for Urban Pedagogy.

A convincing example of visualizing legal rules is “Vendor Power!”, a work carried out by a collaboration of the Center for Urban Pedagogy, the designer Candy Chang, and the advocacy organization the Street Vendor Project. After noting that the “rulebook [of legal code] is intimidating and hard to understand by anyone, let alone someone whose first language isn’t English”, the project prepared Vendor Power!, a visual Street Vendor Guide that makes city regulations accessible and understandable (Figure 1). The Guide presents key information using short sentences in five languages along with diagrams illustrating vendors’ rights and the rules that are most commonly violated.

In the UK, the TDL London team turned recent changes in the rules related to obtaining a UK motorcycle licence into an interactive diagram that helps its viewers understand which motorcycles they are entitled to ride and how to go about obtaining a motorcycle licence.  In Canada in 2000, recognizing the need for new ways to improve public access to the law, the Government commissioned a White Paper proposing a new format for legislation. The author, communication designer David Berman, introduced graphic design methods and the concept of using diagrams to help describe laws. While creating a flowchart diagram, Berman’s team revealed inconsistencies not accounted for in the legislation, suggesting that if visualization was used in the drafting process, the resulting legislation could be improved. One of the authors (the designer) can confirm this “logical auditing”  power of visualization, as similar information gaps were promptly revealed by visualizing through flowcharts the Finnish General Terms of Public Procurement in Service Contracts, during the PRO2ACT research project.

Not only have designers applied their talent to legal information; some lawyers, like Susanne Hoogwater of Legal Visuals and Olivia Zarcate of Imagidroit, and future lawyers, like Margaret Hagan of Open Law Lab, have turned into designers themselves, with some remarkable results that you can find on their websites.

Legal visualization may deal with data, information, or knowledge. While the former two require software tools and coding expertise in order to generate images that represent complex data structures (an example is the work of Oliver Bieh-Zimmert who visualized the network of paragraphs and the structure of the German Civil Code), knowledge visualization tends to use a more ‘handcrafted’ approach, similar to how graphic designers rather than programmers work. The authors of this post have relied on the latter when enhancing contract usability and user experience through visualization, utilizing simple yet effective visualizations such as “metro maps” (Figure 2), timelines, flowcharts, icons and graphs. More examples of the work, carried out in the FIMECC research program User Experience & Usability in Complex Systems (UXUS), are available here, while our most recent paper, Transforming Contracts from Legal Rules to User-centered Communication Tools, published in 2013 Volume I Issue III of Communication Design Quarterly Review , discusses how greatly visualization can contribute to the user-centeredness of contracts.

Figure 2

Figure 2. Example of a “metro map” that explains the process of availability testing, as described in an agreement on the purchase of industrial machinery and equipment. © 2012 Aalto University. Author: Stefania Passera.

When teaching cross-border contract law to business managers and students, one of the authors (the lawyer) has also experimented with graphic facilitation and real-time visualization, with the aim of curing contract phobia, changing attitudes, and making contracts’ invisible (implied) terms visible. Examples of images by Annika Varjonen of Visual Impact are available here and, dating back from 1997, here.

The Wolfram Demonstrations Project illustrates a library of visual and interactive demonstrations, including one contributed by Seth Chandler on the Battle of Forms that describes the not-uncommon situation where one company makes an offer using a pre-printed form containing its standard terms, and the other party responds with its own form and set of standard terms. The demonstration allows users to choose various details of the case, with the output showing the most likely finding as to whether a contract exists and the terms of that contract, together with the arguments that can be advanced in support of that finding.

In the digital world, Creative Commons licenses use simple, recognizable icons which can be clicked on to reveal a plain-language version of the relevant text. If additional information is required, the full text is also available and just one click away. The information is layered: there is what the authors call the traditional Legal Code (the “lawyer readable” version), the Commons Deed (the “human readable” version, acting as a user-friendly interface to the Legal Code), and the “machine readable” version of the license. A compilation made by Pär Lannerö in the context of the Common Terms project reveals a number of projects that have looked into the simplification of online terms, conditions and policies. An experiment involving icons was carried out by Aza Raskin for Mozilla. The set of Privacy Icons developed by Raskin can be used by websites to clarify the ways in which users of the website are agreeing to allow their personal data to be used (Figure 3).

Figure 3

Figure 3. Examples of icons used for the rapid communication of complex content on the Web: Mozilla Privacy Icons by Aza Raskin. Source: Image released under a CreativeCommons licence CC BY-NC 2.0

In Australia, Michael Curtotti and Eric McCreath have worked with enhancing the online visualization of legislation, and work is currently in progress on the development of software-based tools for reading and writing law . This work has grown out of experience in contract drafting and the drafters’ needs for practical software tools. Already in 2001, in their ACCA Docket article Doing deals with flowcharts, Henry W. (Hank) Jones and Michael Oswald recognized this need and discussed the technology tools available to help lawyers and others to use flowcharts to clarify contractual information. They showed examples of how the logic of contract structure, the actors involved, and clauses such as contract duration and indemnification can be visualized, as well as explaining why this should be done.

In the United States, the State Decoded (State codes, for humans) is a platform that develops new ways to display state codes, court decisions, and information from legislative tracking services. With typography, embedded definitions of legal terms and other means, this project aims to make the law more easily understandable. The first two state sites, Virginia and Florida, are currently being tested.

Recently, visual elements have even made their way into court decisions: In Sweden, a 2009 judgment of the Court of Appeal for Western Sweden includes two timeline images showing the chain of events that is crucial to understanding the facts of the case. This judgment won the Plain Swedish Crystal 2010, a plain language award. In the United States, an Opinion by Judge Richard Posner of the Chicago-based 7th U.S. Circuit Court of Appeals uses the ostrich metaphor to criticize lawyers who ignore court precedent. Two photos are included in this opinion: one of an ostrich with its head buried in the sand, another of a man in a suit with his head buried in the sand.

Want to learn more and explore? Read this – or join one of our Design Jams!

In Central Europe, the visualization of legal information has developed into a research field in its own right. In German-speaking countries, the terms legal visualization (Rechtsvisualisierung), visual legal communication, visual law and multisensory law have been used to describe this growing field of research and practice. The pioneer, Colette R. Brunschwig, defended her doctoral thesis on the topic  in 2001, and has since published widely on related topics. She is the leader of the Multisensory Law & Visual Law Community at beck-community.

In his doctoral research related to legal risks in the context of contracts at the Faculty of Law in the University of Oslo, Tobias Mahler used icons and diagrams illustrating legal risk and developed a graphical modeling language. In a case study he conducted, a group of lawyers, managers, and engineers were asked to use the method to analyze the risks connected with a contract proposal. The results showed that the diagrams were perceived as very helpful in communicating risk.

At the Nordic Conference of Law and IT, “Internationalisation of law in the digital information society” in Stockholm in November 2012, visualization of law was one of the three main topics. The proceedings, which include visual law related papers by Colette R. Brunschwig, Tobias Mahler and Helena Haapio, will be published in the forthcoming Nordic Yearbook of Legal Informatics (Svantesson & Greenstein, eds., Ex Tuto Publishing 2013).

Furthermore, the use of visualizations has been studied, for example, in the context of improving comprehension of jury instructions and in facilitating the making of complex decisions connected with dispute resolution. Visualization has also been observed in the role of a persuasion tool in a variety of settings, from the courtroom to the boardroom. After Richard Sherwin debuted Visual Persuasion in the Law at New York Law School and launched the Visual Persuasion Project website , it has become easier for law schools to teach their students about visual evidence and visual advocacy. It is no longer unusual for law teachers or students to use flowcharts and decision trees, and the list goes on. A Google search will reveal the growing number of such applications in law.

If instead of reading you prefer learning by doing, there are some great opportunities later this year. The Simplification Centre and the University of the Aegean will run an international summer/autumn Course on Information Design 30 September to 4 October 2013 in Syros, Greece. Provided that there is enough interest, we plan to arrange 1) special sessions on merging contract/legal design with information design and visualization; and 2) a Legal Design Jam, modeled on hackathons, with a small committed group of interested people, including legal and other practitioners and graphic designers, aiming at giving an extreme visual makeover to a chosen text or document (piece of legislation, contract, license, terms and conditions, …). If you are interested, please contact the organizers at info(at)

On 8 October 2013, the International Association for Contract & Commercial Management (IACCM) will hold its Academic Forum in Phoenix, Arizona. The conference topics include legal visualization as it relates to commercial and contract management. If you are interested in submitting a proposal for a presentation or a paper, there is still time to do so: the deadline is 1 July 2013. Please see the Call for Papers for details. If we can find a host and a group of committed professionals, scholars and graphic designers, we are also planning to put together a Design Jam right before or after the IACCM Americas Forum on a US location to be agreed. The candidate document for redesign is still to be decided, so please send us your suggestions! If you are interested to host or to participate, please contact either of us at the email address below to express interest, ask questions, or give suggestions.

What does the future hold?

We see these steps as just the beginning. Once the visual turn has begun, we do not think it can be stopped; the benefits are just too many. As lawyers, we have a lot to learn and we could do our job better in so many respects if we indeed started to get into the mode of thinking and acting like a designer and not just like a lawyer. This applies not only to purely legal information, but everything else we produce: contracts, memos, corporate governance materials, policies, manuals, employee handbooks, and guidance.

Legal information tends to be complex, and information design(ers) can help us make it easier to understand and act upon. The goal is accomplishing the writer’s goals by meeting the readers’ needs. We can start to radically transform legal information following the footsteps of Rob Waller’s team at the Simplification Centre by applying What makes a good document to legal documents.

With new tools and services being developed, it will become easier to convey our content and documents in more usable and more engaging ways. As the work progresses and new tools and apps appear, we are likely to see a major change in the legal industry. Meanwhile, let us know your views and ideas and what you are doing or interested in doing with visuals.

Helena HaapioHelena Haapio is International Contract Counsel for Lexpert Ltd based in Helsinki, Finland. Before founding Lexpert she served for several years as in-house legal counsel. She holds a Diploma in Legal Studies (University of Cambridge) and a LL.M. (Turku). She does research on proactive contracting, user-centered contract design and visualization as means to enhance companies’ ease of doing business and to simplify contracting processes and documents as part of her Ph.D. at the University of Vaasa, where she teaches strategic business law. She also acts as arbitrator. Her recent books include A Short Guide to Contract Risk (Gower 2013) and Proactive Law for Managers (Gower 2011), co-authored with Professor George Siedel. Through visualization, she seeks to revolutionize the way contracts and the law are communicated, taught, and perceived. Helena can be contacted at Helena.Haapio(at)

Stefania Passera Soita mummolle!Stefania Passera is a researcher in MIND Research Group, a multidisciplinary research team at Aalto University School of Science, Helsinki, Finland. She holds a MA in graphic design (Aalto University School of Art, Design and Architecture), and has been doing research on the usability and user experience of information visualizations in contracts as part of her Ph.D. The leitmotiv of her work is to explore how design and designers can contribute to new multidisciplinary endeavors and what value their way of thinking and doing bring to the mix. Stefania has been collaborating with private and public organizations in Finland on the development of user-centered visual contract documents since 2011.
Stefania can be contacted at stefania.passera(at)


VoxPopuLII is edited by Judith Pratt. Editors-in-Chief are Stephanie Davidson and Christine Kirchberger, to whom queries should be directed.

Take a look at your bundle of tags on Delicious. Would you ever believe you’re going to change the law with a handful of them?

You’re going to change the way you research the law. The way you apply it. The way you teach it and, in doing so, shape the minds of future lawyers.

Do you think I’m going too far? Maybe.

But don’t overlook the way taxonomies have changed the law and shaped lawyers’ minds so far. Taxonomies? Yeah, taxonomies.

We, the lawyers, have used extensively taxonomies through the years; Civil lawyers in particular have shown to be particularly prone to them. We’ve used taxonomies for three reasons: to help legal research, to help memorization and teaching, and to apply the law.


Taxonomies help legal research.

2959826262_9b724b5a72First, taxonomies help us retrieve what we’ve stored (rules and case law).

Are you looking for a rule about a sales contract? Dive deep into the “Obligations” category and the corresponding book (Recht der Schuldverhältnisse, Obbligazioni, Des contrats ou des obligations conventionnelles en général, you name it ).

If you are a Common Lawyer, and ignore the perverse pleasure of browsing through Civil Code taxonomy, you’ll probably know Westlaw’s classification and its key numbering system. It has much more concrete categories and therefore much longer lists than the Civilians’ classification.
Legal taxonomies are there to help users find the content they’re looking for.

However, taxonomies sometimes don’t reflect the way the users reason; when this happens, you just won’t find what you’re looking for.

The problem with legal taxonomies.

If you are a German lawyer, you’ll probably be searching the “Obligations” book for rules concerning marriage; indeed in the German lawyer’s frame of mind, marriage is a peculiar form of contract. But if you are Italian, like I am, then you will most probably start looking in the “Persons” book; marriage rules are simply there, and we have been taught that marriage is not a contract but an agreement with no economic content (we have been trained to overlook the patrimonial shade in deference to the sentimental one).

So if I, the Italian, look for rules about marriage in the German civil code, I won’t find anything in the “Persons” book.
In other words, taxonomies work when they’re used by someone who reasons like the creator or–-and this happens with lawyers inside a certain legal system–-when users are trained to use the same taxonomy, and lawyers are trained at length.

But let’s take my friend Tim; he doesn’t have a legal education. He’s navigating Westlaw’s key number system looking for some relevant case law on car crashes. By chance he knows he should look below “torts,” but where? Is this injury and damage from act (k439)? Is this injury to a person in general (k425)? Is this injury to property or right of property in general (k429)? Wait, should he look below “crimes” (he is unclear on the distinction between torts and crimes)? And so on. Do these questions sound silly to you, the lawyers? Consider this: the titles we mentioned give no hint of the content, unless you already know what’s in there.

Because Law, complex as it is, needs a map. Lawyers have been trained to use the map. But what about non-lawyers?

In other words, the problems with legal taxonomies occur when the creators and the users don’t share the same frame of mind. And this is most likely to happen when the creators of the taxonomy are lawyers and the users are not lawyers.
Daniel Dabney wrote something similar some time ago. Let’s imagine that I buy a dog, take the little pooch home and find out that it’s mangy. Let’s imagine I’m that kind of aggressively unsatisfied customer and want to sue the seller, but know nothing about law. I go to the library and what will I look for? Rules on dogs sale? A book on Dog’s law? I’m lucky, there’s one, actually: “Dog law”, a book that gathers all laws regarding dogs and dogs owners.
But of course, that’s just luck, and  if I had to browse through legal category in the Westlaw’s index, I would never have found anything regarding “dogs”. I will never find the word “dog”, which is nonetheless the first word a non-legal trained person would think of. A savvy lawyer would look for rules regarding sales and warranties: general categories I may not know of (or think of) if I’m not a lawyer. If I’m not a lawyer I may not know that “the sale of arguably defective dogs are to be governed by the same rules that apply to other arguably defective items, like leaky fountain pens”. Dogs are like pens for a lawyer, but they are just dogs for a dogs-owner: so a dogs owner will look for rules about dogs, not rules about sales and warranties (or at least he would look for sale of dogs). And dog law, a  user aimed, object oriented category would probably fits his needs.

Observation #1: To make legal content available to everyone we must change the information architecture through which legal information are presented.

Will folksonomies make a better job?
Let’s come to folksonomies now. Here, the mismatch between creators (lawyers) and users’ way of reasoning is less likely to occur. The very same users decide which category to create and what to put into it. Moreover, more tags can overlap; that is, the same object can be tagged more than once. This allows the user to consider the same object from different perspectives. Take Delicious. If you search for “Intellectual property” on the Delicious search engine, you find a page about Copyright definition on Wikipedia. It was tagged mainly with “copyright.” But many users also tagged it with “wikipedia,” “law” and “intellectual-property” and even “art”. Maybe it was the non-lawyers out there who found it more useful to tag it with the “law” tag (a lawyer’s tag would have been more specific); maybe it was the lawyers who massively tagged it with “art” (there are a few “art” tags in their libraries). Or was it the other way around? The thing is, it’s up to users to decide where to classify it.

People also tag laws on Delicious using different labels that may or may not be related to law, because Delicious is a general-use website. But instead, let’s take a crowdsourced legal content website like Docracy. Here, people upload and tag their contracts, so it’s only legal content, and they tag them using only legal categories.

On Docracy, I found out that a whole category of documents that was dedicated to Terms of Service. Terms of Service is not a traditional legal category—-like torts, property, and contracts—-but it was a particularly useful category for Docracy users.

Docracy: WordPress Terms of Service are tagged with "TOS" but also with "Website".

Docracy: WordPress Terms of Service are tagged with “TOS” but also with “Website”.

If I browse some more, I see that the WordPress TOS are also tagged with “website.” Right, it makes sense; that is, if I’m a web designer looking for the legal stuff I need to know before deploying my website. If I start looking just from “website,” I’ll find TOS, but also “contract of works for web design or “standard agreements for design services” from AIGA.

You got it? What legal folksonomies bring us is:

  1. User-centered categories
  2. Flexible categorization systems. Many items can be tagged more than once and so be put into different categories. Legal stuff can be retrieved through different routes but also considered under different lights.

Will this enhance findability? I think it will, especially if the users are non-lawyers. And services that target the low-end of the legal market usually target non-lawyers.

Alright, I know what you’re thinking. You’re thinking, oh no, again another naive folksonomy supporter! And then you say: “Folksonomie structures are too flat to constitute something useful for legal research!” and “Law is too a specific sector with highly technical vocabulary and structure. Non-legal trained users would just tag wrongly”.

Let me quickly address these issues.

Objection 1: Folksonomies are too flat to constitute something  useful for legal research

Let’s start from a premise: we have no studies on legal folksonomies yet. Docracy is not a full folksonomy yet ( users can tag but tags are pre-determined by administrators). But we do have examples of folksonomies tout court, so my argument moves analogically from them. Folksonomies do work. Take  the Library of Congress Flickr project. Like an old grandmother, the Library gathered thousands of pictures that no-one ever had the time to review and categorize.  So pictures were uploaded on Flickr and left for the users to tag and comment. They did it en masse, mostly by using descriptive or topical tags (non-subjective) that were useful for retrieval. If folksonomies work for pictures (Flickr), books (Goodreads), questions and answers (Quora), basically everything else (Delicious), why shouldn’t they work for law? Given that premise, let’s move to first objection: folksonomies are flat. Wrong. As folksonomies evolve, we find out that they can have two, three and even more levels of categories. Take a look at the Quora hierarchy.

That’s not flat. Look, there are at least four levels in the screenshot: Classical Musicians & Composers > Pianists > Jazz Pianists > Ray Charles > What’d I Say. Right, Jazz pianists are not classical musicians: but mistakes do occur and the good point in folksonomies is that users can freely correct them.

Second point: findability doesn’t depend only on hierarchies. You can browse the folksonomy’s categories but you can also use free text search to dig into it.  In this case, users’ tags are metadata and so findability is enhanced because the search engine retrieves what users have tagged–not what admins have tagged.


Objection 2: Non-legal people will use the wrong tags

Uhm, yes, you’re right. They will tag a criminal law document with “tort” and a tort case involving a car accident with “car crash”. And so? Who cares? What if the majority of users find it useful? We forget too often that law is a social phenomenon, not a tool for technicians. And language is a social phenomenon too. If users consistently tag a legal document with the “wrong” tag X instead of the “right” tag Y, it means that they usually name that legal document with X. So most of them, when looking for that document, will look for X. And they’ll retrieve it, and be happy with that.

Of course, legal-savvy people would like to search by typical legal words (like, maybe, “chattel”?) or by using the legal categories they know so well.  Do we want to compromise? The fact is, in a system where there is only user-generated content, it goes without saying that a traditional top-down taxonomy would not work. But if we have to imagine a system where content is not user-generated, like a legal or case law database, that could happen. There could be, for instance, a mixed taxonomy-folksonomy system where taxonomy is built with traditional legal terms and scheme, whereas folksonomy is built by the users who are free to tag. Search in the end, can be done by browsing the taxonomy, by browsing the folksonomy or by means of a search engine which fishes on content relying both on metadata chosen by system administrators and on metadata chosen by the users who tagged the content.

This may seem like an imaginary system–but it’s happening already. Amazon uses traditional categories and leave the users free to tag. The BBC website followed a similar pattern, moving from full taxonomy system to a hybrid taxonomy-folksonomy one. Resilience, resilience, as Andrea Resmini and Luca Rosati put it in their seminal book on information architecture. Folksonomies and taxonomies can coexist. But this is not what this article is about, so sorry for the digression and let’s move to the first prediction.

Prediction #1: Folksonomies will provide the right information architecture for non-legal users.

Taxonomies and folksonomies help legal teaching.

7797310218_8d42f4743bSecondly, taxonomies help us memorize rules and case law. Put all the things in a box and group them on the basis of a common feature, and you’ll easily remember where they are. For this reason, taxonomies have played a major role in legal teaching. I’ll tell you a little story. Civil lawyers know very well the story of Gaius, the ancient Roman jurist who created a successful taxonomy for his law handbook, the Institutiones. His taxonomy was threefold: all law can be divided into persons, things, and actions. Five centuries later (five centuries!) Emperor Justinian transferred the very same taxonomy into his own Institutiones, a handbook aimed at youth “craving for legal knowledge” (cupida legum iuventes). Why? Because it worked! How powerful, both the slogan and the taxonomy! Indeed more than 1000 years later, we found it again, with a few changes, in German, French, Italian, and Spanish Civil Codes and that, in a whole bunch of nutshells, explains private law following the taxonomy of the Codes.

And now, consider what the taxonomies have done to lawyers’ minds.

Taxonomies have shaped their way of considering facts. Think. Put something into a category and you will lose all the other points of view on the same thing. The category shapes and limits our way to look at that particular thing.

Have you ever noticed how civil lawyers and common lawyers have a totally different way of looking at facts? Common lawyers see and take into account the details. Civil lawyers overlook them because the taxonomy they use has told them to do so.

In Rylands vs Fletcher (a UK tort case) some water escapes from a reservoir and floods a mine nearby. The owner of the reservoir could not possibly foresee the event and prevent it. However, the House of Lords states that the owner of the mine has the right to recover damages, even if there is no negligence. (“The person who for his own purpose brings on his lands and collects and keeps there anything likely to do mischief, if it escapes, must keep it in at his peril, and if he does not do so, is prima facie answerable for all the damage which is the natural consequence of its escape.”)

In Read vs Lyons, however, an employee gets injured during an explosion occurring in the ammunition factory where she is employed. The rule set in Rylands couldn’t be applied, as, according to the House of Lords, the case was very different; there is no escape.

On the contrary, for a Civil lawyer the decision would have been the same in both cases. For instance, under Italian Civil Code (but French and German Codes are not substantially different on this point), one would apply the general rule that grants reward for damages caused by “dangerous activities” and requires no proof of negligence on the plaintiff (art.2050 of the Civil Code), no matter what causes the danger (a big reservoir of water, an ammunition factory, whatever else).

Observation#2: taxonomies are useful for legal teaching and they shape lawyers minds.

Folksonomies for legal teaching?

Okay, and what about folksonomies? What if the way people tag legal concepts makes its way into legal teaching?

Take the Docracy‘s TOS category—have you ever thought about a course on TOS?

Another website, another example: Rocket Lawyer. Its categorization is not based on folksonomy, however; it’s purposely built around a user’s needs, which have been tested over the years, so in a way the taxonomy of the website comes from its users. One category is “identity theft”, which should be quite popular if it is prompted on the first page. What about teaching a course on identity theft? That would merge some material traditionally taught in privacy law, criminal law, and torts courses. Some course areas would overlap, which is good for memorization. Think again to the example of “Dog Law” by Dabney. What about a course about Dog Law, collecting material that refers to dogs across traditional legal categories?

Also, the same topic would be considered from different points of view.

What if students were trained to the specifications of the above-mentioned flexibility of categories? They wouldn’t get trapped into a single way of seeing things. If folksonomies account for different levels of abstractions, they would be trained to consider details. Not only that,  they would develop a very flexible frame of mind.

Prediction #2: legal folksonomies in legal teaching would keep lawyers’ minds flexible.


Taxonomies and folksonomies SHAPE the law.

Third, taxonomies make the law apply differently. Think about it. They are the very highways that allow the law to travel down to us. And here it comes, the real revolutionary potential of legal folksonomies, if we were to make them work.

Let’s start from taxonomies, with a couple of examples.

Civil lawyers are taught that Public and Private Law are two distinctive areas of law, to which different rules apply. In common law, the distinction is not that clear-cut. In Rigby vs Chief Constable of Northamptonshire  (a tort case from UK case law) the police—in an attempt to catch a criminal—damage a private shop by accidentally firing a canister of gas and setting the shop ablaze. The Queen’s Bench Division establishes that the police are liable under the tort of negligence only because the plaintiff manages to prove the police’s fault; they apply a private law category to a public body.
How would the same case have been decided under, say, French law? As the division between public and private law is stricter, the category of liability without fault, which is traditionally used when damages are caused by public bodies, would apply. The State would have to indemnify the damage, no matter if there was negligence.

Remember Rylands vs Fletcher and Lyons vs Read? The presence of escape/no escape was determinant, because the English taxonomy is very concrete. Civil lawyers work with taxonomies that have fewer, larger, and more abstract categories. If you cause damages by performing a risky activity, even if conducted without fault, you have to repay them. Period. Abstract taxonomy sweeps out any concrete detail. I think that Robert Berring had something like this in mind–although he referred to legal research–when he said  that “classification  defines the world of thinkable thoughts”. Or, as Dabney puts it, “thoughts that aren’t represented in the system had become unthinkable”.
So taxonomies make the law apply differently. In the former case, by setting a boundary between the public-private spheres; in the latter by creating a different framework for the application of more abstract or more detailed rules.


You don’t get it? All right, it’s tough, but do you have two minutes more? Let’s take this example by Dabney. Key number system’s taxonomy distinguishes between Navigable and Non-navigable waters (in the screenshot: waters and water courses). There’s a reason for that: lands under navigable waters presumptively belongs to the state, because “private ownership of the land under navigable waters would (…) compromise the use of those waters for navigation ad commerce”. So there are two categories because different laws apply to each. But now look at this screenshot.avulsion

Find anything strange? Yes:  avulsion rules are “doubled”: they are contained in both categories. But they are the very same: rules concerning avulsion don’t change if the water is navigable or not (check avulsion definition if you, like me, don’t remember what it is ). Dabney: “In this context,(…) there is no difference in the legal rules that are applied that depend on whether or not the water is navigable. Navigability has an effect on a wide range of issues concerning waters, but not on the accretion/avulsion issue. Here, the organization of the system needlessly separates cases from each other on the basis of an irrelevant criterion”. And you think, ok, but as long as we are aware of this error and know the rules concerning avulsion are the same, it’s not biggie. Right, but in the future?

“If searchers, over time, find cases involving navigable waters in one place and non-navigable waters in another, there might develop two distinct bodies of law.” Got it? Dabney foresees it. The way we categorize the law would shape the way we apply it.

Observation #3 Different taxonomies entail different ways to apply the law.

So, what if we substitute taxonomies with folksonomies?

And what if they had the power to shape the way judges, legal scholars, lawmakers and legal operators think?

Legal folksonomies are just starting out, and what I envisage is still yet to come. Which makes this article kind of a visionary one, I admit.

However, what Docracy is teaching us is that users—I didn’t say lawyers, but users—are generating decent legal content. Would you have bet your two cents on this, say, five years ago?
What if users started generating new legal categories (legal folksonomies?)

Berring wrote something really visionary more than ten years ago in his beautiful “Legal Research and the World of Thinkable Thoughts”. He couldn’t have folksonomies in mind, and still, wouldn’t you think he referred to them when writing: “There is simply too much stuff to sort through. No one can write a comprehensive treatise any more, and no one can read all of the new cases. Machines are sorting for us. We need a new set of thinkable thoughts.  We need a new Blackstone. We need someone, or more likely a group of someones, who can reconceptualize the structure of legal information.“?

Prediction #3 Legal folksonomies will make the law apply differently.

Let’s wait and see. Let the users tag. Where this tagging is going to take us is unpredictable, yes, but if you look at where taxonomies have taken us for all these years, you may find a clue.

I have a gut feeling that folksonomies are going to change the way we search, teach, and apply the law.




Serena Manzoli is a legal architect and the founder at WildLawyer, a design agency for law firms. She has been a Euro bureaucrat, a cadet, an in-house counsel, a bored lawyer. She holds an LLM from University of Bologna. She blogs at Lawyers are boring.  Twitter: SquareLaw

[Editor’s Note: We are pleased to publish this piece from Qiang Lu and Jack Conrad, both of whom worked with Thomson Reuters R&D on the WestlawNext research team. Jack Conrad continues to work with Thomson Reuters, though currently on loan to the Catalyst Lab at Thomson Reuters Global Resources in Switzerland. Qiang Lu is now based at Kore Federal in the Washington, D.C. area. We read with interest their 2012 paper from the International Conference on Knowledge Engineering and Ontology Development (KEOD), “Bringing order to legal documents: An issue-based recommendation system via cluster association”, and are grateful that they have agreed to offer some system-specific context for their work in this area. Their current contribution represents a practical description of the advances that have been made between the initial and current versions of Westlaw, and what differentiates a contemporary legal search engine from its predecessors.  -sd]

In her blog on “Pushing the Envelope: Innovation in Legal Search” (2009) [1], Edinburgh Informatics Ph.D. candidate K. Tamsin Maxwell presents her perspective of the state of legal search at the time. The variations of legal information retrieval (IR) that she reviews − everything from natural language search (e.g., vector space models, Bayesian inference net models, and language models) to NLP and term weighting − refer to techniques that are now 10, 15, even 20 years old. She also refers to the release of the first natural language legal search engine by West back in 1993−WIN (Westlaw Is Natural) [2]. Adding to this on-going conversation about legal search, we would like to check back in, a full 20 years after the release of that first natural language legal search engine. The objective we hope to achieve in this posting is to provide a useful overview of state-of-the-art legal search today.

What Maxwell’s article could not have predicted, even five years ago, are some of the chief factors that distinguish state-of-the-art search engines today from their earlier counterparts. One of the most notable distinctions is that unlike their predecessors, contemporary search engines, including today’s state-of-the-art legal search engine, WestlawNext , separate the function of document retrieval from document ranking. Whereas the first retrieval function primarily addresses recall, ensuring that all potentially relevant documents are retrieved, the second and ensuing function focuses on the ideal ranking of those results, addressing precision at the highest ranks. By contrast, search engines of the past effectively treated these two search functions as one and the same. So what is the difference? Whereas the document retrieval piece may not be dramatically different from what it was when WIN was first released in 1993, what is dramatically different lies in the evidence that is considered in the ranking piece, which allows potentially dozens of weighted features to be taken into account and tracked as part of the optimal ranking process.

Figure 1: Views

Figure 1. The set of evidence (views) that can be used by modern legal search engines.

In traditional search, the principal evidence considered was the main text of the document in question. In the case of traditional legal search, those documents would be cases, briefs, statutes, regulations, law reviews and other forms of primary and secondary (a.k.a. analytical) legal publications. This textual set of evidence can be termed the document view of the world. In the case of legal search engines like Westlaw, there also exists the ability to exploit expert-generated annotations or metadata. These annotations come in the form of attorney-editor generated synopses, points of law (a.k.a. headnotes), and attorney-classifier assigned topical classifications that rely on a legal taxonomy such as West’s Key Number System [3]. The set of evidence based on such metadata can be termed the annotation view. Furthermore, in a manner loosely analogous to today’s World Wide Web and the lattice of inter-referencing documents that reside there, today’s legal search can also exploit the multiplicity of both out-bound (cited) sources and in-bound (citing) sources with respect to a document in question, and, frequently, the granularity of these citations is not merely at a document-level but at the sub-document or topic level. Such a set of evidence can be termed the citation network view. More sophisticated engines can examine not only the popularity of a given cited or citing document based on the citation frequency, but also the polarity and scope of the arguments they wager as well.

In addition to the “views” described thus far, a modern search engine can also harness what has come to be known as aggregated user behavior. While individual users and their individual behavior are not considered, in instances where there is sufficient accumulated evidence, the search function can consider document popularity thanks to a user view. That is to say, in addition to a document being returned in a result set for a certain kind of query, the search provider can also tabulate how often a given document was opened for viewing, how often it was printed, or how often it was checked for its legal validity (e.g., through citator services such as KeyCite [4]). (See Figure 1) This form of marshaling and weighting of evidence only scratches the surface, for one can also track evidence between two documents within the same research session, e.g., noting that when one highly relevant document appears in result sets for a given query-type, another document typically appears in the same result sets. In summary, such a user view represents a rich and powerful additional means of leveraging document relevance as indicated through professional user interactions with legal corpora such as those mentioned above.

It is also worth noting that today’s search engines may factor in a user’s preferences, for example, by knowingVOX.LegalResearch what jurisdiction a particular attorney-user practices in, and what kinds of sources that user has historically preferred, over time and across numerous result sets.

While the materials or data relied upon in the document view and citation network view are authored by judges, law clerks, legislators, attorneys and law professors, the summary data present in the annotation view is produced by attorney-editors. By contrast, the aggregated user behavior data represented in the user view is produced by the professional researchers who interact with the retrieval system. The result of this rich and diverse set of views is that the power and effectiveness of a modern legal search engine comes not only from its underlying technology but also from the collective intelligence of all of the domain expertise represented in the generation of its data (documents) and metadata (citations, annotations, popularity and interaction information). Thus, the legal search engine offered by WestlawNext (WLN) represents an optimal blend of advanced artificial intelligence techniques and human expertise [5].

Given this wealth of diverse material representing various forms of relevance information and tractable connections between queries and documents, the ranking function executed by modern legal search engines can be optimized through a series of training rounds that “teach” the machine what forms of evidence make the greatest contribution for certain types of queries and available documents, along with their associated content and metadata. In other words, the re-ranking portion of the machine learns how to weigh the “features” representing this evidence in a manner that will produce the best (i.e., highest precision) ranking of the documents retrieved.

Nevertheless, a search engine is still highly influenced by the user queries it has to process, and for some legal research questions, an independent set of documents grouped by legal issue would be a tremendous complementary resource for the legal researcher, one at least as effective as trying to assemble the set of relevant documents through a sequence of individual queries. For this reason, WLN offers in parallel a complement to search entitled “Related Materials” which in essence is a document recommendation mechanism. These materials are clustered around the primary, secondary and sometimes tertiary legal issues in the case under consideration.

Legal documents are complex and multi-topical in nature. By detecting the top-level legal issues underlying the original document and delivering recommended documents grouped according to these issues, a modern legal search engine can provide a more effective research experience to a user when providing such comprehensive coverage [6,7]. Illustrations of some of the approaches to generating such related material are discussed below.

Take, for example, an attorney who is running a set of queries that seeks to identify a group of relevant documents involving “attractive nuisance” for a party that witnessed a child nearly drowned in a swimming pool. After a number of attempts using several different key terms in her queries, the attorney selects the “Related Materials” option that subsequently provides access to the spectrum of “attractive nuisance”-related documents. Such sets of issue-based documents can represent a mother lode of relevant materials. In this instance, pursuing this navigational path rather than a query-based one turns out to be a good choice. Indeed, the query-based approach could take time and would lead to a gradually evolving set of relevant documents. By contrast, harnessing the cluster of documents produced for “attractive nuisance” may turn out to be the most efficient approach to total recall and the desired degree of relevance.

To further illustrate the benefit of a modern legal search engine, we will conclude our discussion with an instructive search using WestlawNext, and its subsequent exploration by way of this recommendation resource available through “Related Materials.”

The underlying legal issue in this example is “church support for specific candidates”, and a corresponding query is issued in the search box. Figure 2 provides an illustration of the top cases retrieved.


Figure 2: Search result from WestlawNext

Let’s assume that the user decides to closely examine the first case. By clicking the link to the document, the content of the case is rendered, as in Figure 3. Note that on the right-hand side of the panel, the major legal issues of the case “Canyon Ferry Road Baptist Church … v. Unsworth” have been automatically identified and presented with hierarchically structured labels, such as “Freedom of Speech / State Regulation of Campaign Speech” and “Freedom of Speech / View of Federal Election Campaign Act / Definition of Political Committee,” … By presenting these closely related topics, a user is empowered with the ability to dive deep into the relevant cases and other relevant documents without explicitly crafting any additional or refined queries.


Figure 3: A view of a case and complementary materials from WestlawNext

By selecting these sets of relevant topics, a set of recommended cases will be rendered under that particular label. Figure 4, for example, shows the related topic view of the case under the label of “Freedom of Speech / View of Federal Election Campaign Act / Definition of Political Committee.” Note that this process can be repeated based on the particular needs of a user, starting with a document in the original results set.


Figure 4: Related Topic view of a case

In summary, by utilizing the combination of human expert-generated resources and sophisticated machine-learning algorithms, modern legal search engines bring the legal research experience to an unprecedented and powerful new level. For those seeking the next generation in legal search, it’s no longer on the horizon. It’s already here.


[1] K. Tamsin Maxwell, “Pushing the Envelope: Innovation in Legal Search,” in VoxPopuLII, Legal Information Institute, Cornell University Law School, 17 Sept. 2009.
[2] Howard Turtle, “Natural Language vs. Boolean Query Evaluation: A Comparison of Retrieval Performance,” In Proceedings of the 17th Annual International ACM-SIGIR Conference on Research & Development in Information Retrieval (SIGIR 1994) (Dublin, Ireland), Springer-Verlag, London, pp. 212-220, 1994.
[3] West’s Key Number System:
[4] West’s KeyCite Citator Service:
[5] Peter Jackson and Khalid Al-Kofahi, “Human Expertise and Artificial Intelligence in Legal Search,” in Structuring of Legal Semantics, A. Geist, C. R. Brunschwig, F. Lachmayer, G. Schefbeck Eds., Festschrift ed. for Erich Schweighofer, Editions Weblaw, Bern, pp. 417-427, 2011.
[6] On Cluster definition and population: Qiang Lu, Jack G. Conrad, Khalid Al-Kofahi, William Keenan, “Legal Document Clustering with Build-in Topic Segmentation,” In Proceedings of the 2011 ACM-CIKM Twentieth International Conference on Information and Knowledge Management (CIKM 2011)(Glasgow, Scotland), ACM Press, pp. 383-392, 2011.
[7] On Cluster association with individual documents: Qiang Lu and Jack G. Conrad, “Bringing order to legal documents: An Issue-based Recommendation System via Cluster Association,” In Proceedings of the 4th International Conference on Knowledge Engineering and Ontology Development  (KEOD 2012) (Barcelona, Spain), SciTePress DL, pp. 76-88, 2012.

Jack G. Conrad currently serves as Lead Research Scientist with the Catalyst Lab at Thomson Reuters Global Resources in Baar, Switzerland. He was formerly a Senior Research Scientist with the Thomson Reuters Corporate Research & Development department. His research areas fall under a broad spectrum of Information Retrieval, Data Mining and NLP topics. Some of these include e-Discovery, document clustering and deduplication for knowledge management systems. Jack has researched and implemented key components for WestlawNext, West‘s next-generation legal search engine, and PeopleMap, a very large scale Public Record aggregation system. Jack completed his graduate studies in Computer Science at the University of Massachusetts–Amherst and in Linguistics at the University of British Columbia–Vancouver.

Qiang Lu was a Senior Research Scientist with Thomson Reuters Corporate Research & Development department. His research interests include data mining, text mining, information retrieval, and machine learning. He has extensive experience of applying various NLP technologies in various data sources, such as news, legal, financial, and law enforcement data. Qiang was a key member of WestlawNext research team. He has a Ph.D. in computer science and engineering from State University of New York at Buffalo. He is now a managing associate at Kore Federal in Washington D.C. area.

VoxPopuLII is edited by Judith Pratt. Editors-in-Chief are Stephanie Davidson and Christine Kirchberger, to whom queries should be directed.