One afternoon in late March, nine Masters of Engineering students crowded around a table in the atrium of Duffield Hall, Cornell’s nanoscale science and engineering building. They were about to show their work at BOOM (Bits On Our Minds), the University’s annual “science fair” for computer science and engineering students.This semester they had been working on a complicated set of inter-related software engineering projects with LII, applying their training in information retrieval, machine learning, and natural language processing to automate the process of producing a topic model for federal regulations. Topic models are an advanced application of machine learning, used to discover, automatically, the subject matter contained in large, undifferentiated collections of text.
Right now, though, they urgently need to address a more mundane set of engineering challenges: finding an electrical outlet that would minimize the chances of tripping an interested professor, correcting a perversely twitchy virtual machine display, and, most importantly, affixing a 24″x36″ poster to a 22″x28″ display board.
For the next two hours, the students presented their work and fielded questions from computer scientists, engineers from other fields, and members of the general public (who had come to BOOM mostly to see the robots). Also presenting at BOOM were LII student collaborators Geoffrey Goh (presenting his work on the “Visualizing the Law of Fracking” project), and Jai Bhatt .
We’ve talked a lot in the past about what LII gets out of working with students, but what do students get out of working with LII?
Topic-modeling team member Eva Sharma, who came to Cornell from SAP Labs India (where she had worked after undergraduate study at SRM University and MIT) says about the project: “The project and the opportunity that we got to present in BOOM was really exciting. I learnt a lot about topic modeling and the problems that you face when handling big data. I also learnt experimenting with different methods and comparing their results. Normally a course project doesn’t give this much flexibility.” Eva’s CFR topic modeling teammates Shreya Chowdhury and Lisha Murthy both noted the application of – and extension beyond – their coursework. Says Lisha: “I learned that many of the tools and techniques we use for CS domain problems are applicable to Law corpora, so some things were not entirely new.”
The topic modeling project also provided an opportunity for M.Eng. students and law students to work collaboratively. Law student Jonathan LaPlante (JD ’15) served as a domain expert on the topic modeling project, helping the visualization team to understand his process for labeling topic models and providing insight into tasks they might be able to use statistical software to help streamline. Building from the needs unearthed by talking to Jonathan, the students customized a topic model visualizer, adding supplementary visualizations and highlighting proposed “stop-words” (terms that were too general, like “CFR”) from each topic, as well as attempting to align pre-existing labels with generated topics. Josh Campbell, now at LinkedIn, who built the stop-word recommender and topic-label-mapper, remarked “after manually labeling a 500 topic model (for another class), I realized how time consuming this process actually is!”
Jonathan, meanwhile, continued his long streak of LII project contributions. During his time working with the LII, he has analyzed government data on regulatory violations, labeled legislative topic models, brought to bear knowledge he gained during a summer associate job for the fracking visualization information science student project, and consulted on other projects. Jonathan told us: “the most gratifying part of working on LII projects has been the ability to apply concepts from law school in creative ways that may potentially assist practitioners and others in their interactions with law. In particular, my work this year with machine learning tools presented several practical and currently unavailable uses that can now be realized through the application of technology. For example, categorizing vast swaths of the law, such as a sample of 25,000 cases or all of the statutes of ten states, is something that is impractical using traditional methods of legal inquiry. It was satisfying to apply technological tools to do this, knowing that the team I was working with was one of the first in the field to do so as well as the opportunities it presents. This work also broadened my perspective on the law, especially seeing the repeating patterns in law and the similarities and contrasts in the ways different states and forums approached law.”
Law students also tell us what they learn from writing for the general public on the LII Supreme Court Bulletin. Executive Editor Dan Rosales found working with a retired US Supreme Court Reporter of Decisions to be a highlight of his legal education: “Working and editing material with Frank Wagner undoubtedly made me a better writer.” Dan intends to further hone his craft after sitting for the New York Bar this summer by working as a judicial law clerk for the New Mexico Supreme Court for 2015-2016 term.
Projects like these form the scaffolding for features that help others find and understand the law. It can take as much as two years of such work to make something that our audience will find really cool — but the work the students do ultimately carries a huge payoff for us and our audience.
We’re looking forward to more such collaborative endeavors in the coming year, particularly where we can bring law students together with engineering students on projects that make the best use of their respective training. The LII creates are one-of-a-kind opportunities for students across the University. They’re not only “win-win” for us and the students, but the content they generate is a “win” for everyone who uses open access legal resources.