skip navigation
search

On December 5th, LII engineers Nic Ceynowa and Sylvia Kwakye, Ph.D., looked on in pride as the Cornell University Masters of Engineering students they’d supervised presented a trio of fall projects to LII and Cornell Law Library staff.

Entity Linking

Mutahir Kazmi and Shraddha Vartak pulled together, enhanced, and scaled a group of applications that link entities in the Code of Federal Regulations. Entity linking is a set of techniques that detect references to things in the world (such as people, places, animals, pharmaceuticals) and link them to data sources that provide more information about them. The team analyzed the entities and the corpus in order to determine which entities required disambiguation, distinguished entities to mark before and after defined-term markup, and used Apache Spark to speed the overall application by 60%.

screen-shot-2016-12-14-at-2-19-32-pm

 

US Code Definition Improvement

Khaleel Khaleel, Pracheth Javali, Ria Mirchandani, and Yashaswini Papanna took on the task of adapting our CFR definition extraction and markup software to meet the unique requirements of the US Code. In addition to learning the hierarchical structure and identifier schemes within the US Code corpus, the project involved discovering and extracting definition patterns that had not before been identified; parsing multiple defined terms, word roots, and abbreviations from individual definitions; and correctly detecting the boundaries of the definitions.

Before:

screen-shot-2016-12-14-at-9-44-33-am

And after:

screen-shot-2016-12-14-at-1-35-07-pm

Search Prototype

Anusha Morappanavar, Deekshith Belchapadda, and Monisha Pavagada Chandrashekar built a prototype of the semantic search application using ElasticSearch and Flask. In addition to learning how to work with ElasticSearch, they had to learn the hierarchical structure of the US Code and CFR, understand how cross-references work within legal corpora, and make use of additional metadata such as the definitions and linked entities the other groups had been working on. Their work will support a search application that distinguishes matches in which the search term is defined, appears in the full text, or appears in a definition of a term that appears within the full text of a document.

screen-shot-2016-12-14-at-2-24-06-pm

We’ll be rolling out the features supported by this semester’s M.Eng. projects starting with entity linking in January.