sara

Worth a thousand words: We're upping our game on CFR images

Apr 042019

This is part of a series of posts about LII’s accessibility compliance initiative. The “accessibility” tag links to the posts in this series-in-progress.

LII’s CFR – why bother re-publishing?

A bit of history: for many years, LII published a finding aid for CFR, which offered tables of contents and facilitated a quick lookup by part and section from the eCFR website. This might not sound like much, but at the time, it was the only practical way to link to or bookmark CFR. In 2010, we undertook a joint study with the Office of the Federal Register, GPO, and the Federal Depository Library Program to convert CFR typesetting code to XML. This project resulted in our publication of an enhanced version of the print-CFR. But there was an enormous demand for the eCFR, which, unlike the print-CFR is updated to within a few days of final-rule publication in the Federal Register, so, in 2015, when the eCFR became available in XML, we started from scratch and re-built our enhanced CFR from that. Nowadays, our CFR reaches about 12 million people each year. If our readers reflect the U.S. population, this would mean more than 300,000 users with visual disabilities are relying on our CFR.

So, what’s the problem?

Simply put, we don’t control the content of CFR. The government publishes it, and we are at their mercy if there’s a problem with the content. And there is a problem with the content, a big one. Or, at last count, about 15,639 big ones. The problem is images. These are not just the kinds of images we’d assume (like diagrams), but images of equations, forms, data tables, even images of whole documents. Usually their captions, if even present, convey next to none of the information the images contain. Sometimes the images are printed sideways. Often they are blurry. Never are they machine-readable.

Now, you might say that remediating this problem should be the responsibility of the publisher. But we don’t have time. Our accessibility compliance target is the end of 2019 (see earlier post), and we can’t realistically expect the government to remediate in time.

Public.Resource.Org to the rescue

While we’ve been working on enhancing CFR for publication, Public.Resource.Org has been testing the limits of free access to law, challenging copyright claims to codes and standards incorporated by reference. Founder and president Carl Malamud is famous for saying “Law is the operating system of our society … So show me the manual!” The “manual” is often a very detailed explanation of what the law requires in practice. Together with Point.B Studio, they have converted innumerable codes and standards into machine-readable formats, including transforming pixel-images to Scalable Vector Graphics.

When we realized that we were going to need to deal with more than 15,000 images, our first step was to ask Carl whether he minded if we used the 1237 CFR images Public.Resource.Org converted in 2016-2017. He very enthusiastically encouraged us to go ahead and, beyond that, offered to support the conversion of more of the images. We provided an inventory of where in the CFR the images appear and how much traffic those pages get. Point.B Studio undertook the monumental task of first sorting the images by category and then converting the most prominent diagrams into SVG and equations into XML. In addition to support from Public Resource, Justia is helping to support this work by Point.B Studio as well.

We’ll be annotating the SVGs with standardized labels and titles. We want to deliver the results of this work to the public as quickly as possible, so we’re releasing the new content as it becomes available.

Step 1: Math is hard, let’s use MathJax

Close to 40% of the images in CFR involve some type of equation. Fortunately, MathML, an XML standard developed in the late 1990s, can be embedded in web pages. The American Mathematical Society (AMS) and the Society for Industrial and Applied Mathematics (SIAM) have teamed up to create MathJax, a Javascript plugin that lets us turn machine-readable MathML into readable, screen-reader accessible web content. Public.Resource.Org and Point.B Studio are converting pictures-of-equations into MathML; our website is using MathJax to provide visual and screen-reader-accessible presentations of the markup. You can see an example at 34 CFR 685.203.

Steps 2 and beyond

In the coming weeks and months, we’ll be remediating accessibility problems and releasing accessible images for eCFR. Stay tuned for updates here and via Twitter.

Accessibility improvements incoming

Accessibility No Responses »

Apr 022019

LII’s overarching mission is to help people find and understand the law, and if free access to law means anything at all, it includes accessibility by definition. Over the years, we’ve worked to make the law more discoverable, more usable, and more accessible. But we’ve not done everything. Now we’re doing a lot more.

What is Web Accessibility?

At its simplest, web accessibility means making web content and services usable by everyone. That “everyone” includes, among others, people with vision impairments, including blindness; people who are deaf or hard-of-hearing; and people with mobility and fine-motor impairments. These days, the acronym “POUR” summarizes the operating principles of web accessibility; content must be:

Perceivable (e.g., audio must be transcribed and text must be machine-readable);
Operable (e.g., users must be able to navigate using only the keyboard);
Understandable (e.g., navigation must be consistent); and
Robust (e.g., the website must behave in the same way across a variety of browsers and assistive technologies, including potential future ones).

More complicatedly, web accessibility means conforming to the standards developed by accessibility experts and adopted by government. These standards change over time. In 1998, the federal government and the World Wide Web Consortium (W3C) disagreed about the standards, so government web developers had to conform to what were then the Section 508 Standards, while others might be free to conform to the W3C standard. The standards keep developing (in fact, between the time the federal government finalized the rule adopting the current standard and the time that implementation was required a year later, the W3C issued an additional set of guidelines).

What’s the history?

The last time we undertook a dedicated effort to bring our website into accessibility conformance (in the mid-2000s), we used the government’s Section 508 standards. At the time, these required that:

(a) A text equivalent for every non-text element shall be provided (e.g., via “alt”, “longdesc”, or in element content).
(b) Equivalent alternatives for any multimedia presentation shall be synchronized with the presentation.
(c) Web pages shall be designed so that all information conveyed with color is also available without color, for example from context or markup.
(d) Documents shall be organized so they are readable without requiring an associated style sheet.
(e) Redundant text links shall be provided for each active region of a server-side image map.
(f) Client-side image maps shall be provided instead of server-side image maps except where the regions cannot be defined with an available geometric shape.
(g) Row and column headers shall be identified for data tables.
(h) Markup shall be used to associate data cells and header cells for data tables that have two or more logical levels of row or column headers.
(i) Frames shall be titled with text that facilitates frame identification and navigation.
(j) Pages shall be designed to avoid causing the screen to flicker with a frequency greater than 2 Hz and lower than 55 Hz.
(k) A text-only page, with equivalent information or functionality, shall be provided to make a web site comply with the provisions of this part, when compliance cannot be accomplished in any other way. The content of the text-only page shall be updated whenever the primary page changes.
(l) When pages utilize scripting languages to display content, or to create interface elements, the information provided by the script shall be identified with functional text that can be read by assistive technology.
(m) When a web page requires that an applet, plug-in or other application be present on the client system to interpret page content, the page must provide a link to a plug-in or applet that complies with §1194.21(a) through (l).
(n) When electronic forms are designed to be completed on-line, the form shall allow people using assistive technology to access the information, field elements, and functionality required for completion and submission of the form, including all directions and cues.
(o) A method shall be provided that permits users to skip repetitive navigation links.
(p) When a timed response is required, the user shall be alerted and given sufficient time to indicate more time is required.
Source: Access Board (2009, via Archive.org)

At that time, most of our content problems were with marking up data tables. We were extremely fortunate to be aided in this effort by then-student, now Cornell University Associate Vice President for Inclusion and Workforce Diversity, Angela Winfield, who made accessibility conformance part of her editorial work for us. And semantic markup made everything more discoverable.

And then our attention was diverted elsewhere. From time to time, we’d receive fan mail from a blind law student, thanking us for providing an accessible version of the Federal Rules that they could use in class. We thought we were doing pretty well. But we were not keeping up.

Several things had changed. Just as we were wrapping up our accessibility conformance project, the Web Accessibility Initiative of the W3C standards body promulgated new Web Content Accessibility Guidelines (WCAG 2.0). Whereas in the past, federal regulations diverged from WCAG standards, within the past few years, WCAG 2.0 level AA has become the standard for new government websites in both the United States and the European Union. The WCAG 2.0 standards are far more extensive than the former Section 508 regulations, so modern accessibility compliance initiatives are far broader in scope.

Another major change after our last push on accessibility was that we became re-publishers of the Code of Federal Regulations. This project introduced a panoply of accessibility challenges, most of which we believed the federal government publishers could – and should – remediate, but which we endeavored to ameliorate ourselves. The results were mixed. We added indentation to make the text more readable, but because the text is not consistent in its enumeration, the markup is sometimes flawed. We added links to defined terms and acronyms to improve understandability, but this resulted in semi-duplicated links that were difficult for screen readers to distinguish. Most frustratingly, the CFR contains more that fifteen thousand images, many of which could not be fully remediated by adding “alt” or “longdesc” text (our next blog post will cover that).

What now?

TL;DR: our parent institution, Cornell University, has committed to achieving WCAG 2.0 conformance by the end of 2019, so our timetable is tight. Most saliently, we will not be able to wait for the federal government to publish underlying data that fully supports web accessibility.

Over the next several months, you’ll probably notice some changes to the LII website. We’re releasing accessibility-enhanced content and features as soon as they are complete, and we will plan to update you as we progress through this project. If you need assistance in the meantime, you can reach us at liiaccessibility@cornell.edu.

Work we admire: Law2Go

Uncategorized No Responses »

Aug 172017

In June, we received an email from Nigerian human rights lawyer Jake Effoduh, who was starting a free access to law project, #Law2Go, while on a summer fellowship at the Harvard Library Innovation Lab. In his concept note, he said:

“#Law2Go seeks to leverage on the extraordinary growth in the use of smart phones in Nigeria. By the end of 2017, there will be 18 million smart phone users in Nigeria with 38 million smartphones projected to be in used in Nigeria by 2018 – a growth like no other on the continent. This platform can be utilised to address one of the most crucial problems in Nigeria’s justice sector which is access.”

Two months later, Effoduh sent us a link to the Law2Go website, with a link to the Android app. Distinctive among open access to law websites is the innovative combination of translation and audio. Effoduh hosts a popular radio show in Nigeria and has not only translated the Nigerian Constitution into local languages, but also provided a simple English interpretation and paired each text with an audio recording.

Effoduh has used social media to develop an FAQ with questions ranging from the most general (e.g., “what are human rights?”) to the very specific (e.g., “my land containing minerals, oils, natural gas and the government wants to take it away; do they have a right to?”). The site has also already published a number of resources for those seeking legal services, and provides a contact form for those seeking legal advice.

2017 Legislative Data and Transparency Conference

Uncategorized No Responses »

Jul 202017

We’re back from the 2017 Legislative Data and Transparency Conference in Washington, DC, where technologists from the federal government and transparency organizations presented their latest open data work.

In the past year, several government websites have completed initiatives that make their data more accessible and more re-usable: from mobile-friendly redesigns from Congress.gov, Govinfo.gov and GPO.gov, to new repositories of bulk data for download, to initiatives that will support original drafting in formats suitable for publication. We were particularly excited to see LII’s work on the Legislative Data Model being adopted in government information systems, as well as FDsys metadata in RDF.

I spoke on a panel about data integration, along with my co-panelist, GovTrack founder Josh Tauberer, and our moderator, GPO’s Lisa LaPlant. Each of us is finding new ways to pick up a legal text, learn what we can about it, and connect it to other legal texts and, particularly in LII’s case, real-world objects.

This presentation was the latest installment in the ongoing work we’ve been doing to aggregate different data sources and connect them to one another, thus helping people navigate from what they know to what they don’t know and therefore making it easier for everyone to find and understand the law that affects them.

LII's M.Eng. crew rebuilds Docket Wrench

Uncategorized No Responses »

May 182017

On May 12, LII engineers Sylvia Kwakye, Ph.D., and Nic Ceynowa hosted a presentation by the 14 Cornell University Masters of Engineering students they’d supervised this spring as they presented their project work on the Docket Wrench application to LII and Cornell Law Library staff.

LII adopted the Docket Wrench application from the Sunlight Foundation when it closed its software development operation last fall. Originally developed by software engineer Andrew Pendleton in 2012, Docket Wrench is designed to help users explore public participation in the rulemaking process. It supports exploration by rulemaking docket, agency, commenting company or organization, and the language of the comments themselves. It is a sprawling application with many moving parts, and when LII adopted it, it had not been running for two years.

On the infrastructure team, Mahak Garg served as project manager and, along with Mutahir Kazmi, focused on updating and supporting infrastructure for the application. They worked on updating the software and creating a portable version of the application for other teams to use for development.

The search team, Gaurav Keswani, Soorya Pillai, Ayswarya Ravichandran, Sheethal Shreedhara, and Vinayaka Suryanarayana, ensured that data made its way into, and could be correctly retrieved from, the search engine. This work included setting up and maintaining automated testing to ensure that the software would continue to function correctly after each enhancement was made.

The entities team, Shweta Shrivastava, Vikas Nelamangala, and Saarthak Chandra, ensured that the software could detect and extract the names of corporations and organizations submitting comments in the rulemaking process. Because the data on which Docket Wrench originally relied was no longer available, they researched, found a new data source, and altered the software to make use of it. (Special thanks to Jacob Hileman at the Center for Responsive Politics for his help with the Open Secrets API.)

Deekshith Belchappada, Monisha Chandrashekar, and Anusha Morappanavar, evaluated alternate techniques for computing document similarity, which enables users to find clusters of similar comments and see which language from a particular comment is unique. And Khaleel R prototyped the use of Apache Spark to detect and mark legal citations and legislation names from within the documents.

So, where is it?

The good news is that after a semester of extremely hard work, “Team Docket” has Docket Wrench up and running again. But we need to ingest a great deal more data and test to make sure that the application can run once we’ve done so. This will take a while. As soon as the students have completed their final project submission, though, we’ll be starting a private beta in which our collaborators can nominate dockets, explore the service, and propose features. Please join us!

The new normal: How we stay fast by being slow and steady

Infrastructure, LII No Responses »

Feb 162017

In the fall of 2015, we wrote about a traffic spike that had occurred during one of the Republican primary debates. Traffic on the night of Sept. 16, 2015 peaked between 9 and 10pm, with a total of 47,025 page views during that interval. For that day, traffic totaled 204,905 sessions and 469,680 page views. At the time, the traffic level seemed like a big deal – our server had run out of resources to handle the traffic, and some of the people who had come to the site had to wait to find the content they were looking for – at that time, the 14th Amendment to the Constitution.

A year later, we found traffic topping those levels on most weekdays. But by that time, we barely noticed. Nic Ceynowa, who runs our systems, had, over the course of the prior year, systematically identified and addressed unnecessary performance-drains across the website. He replaced legacy redirection software with new, more efficient server redirects. He cached dynamic pages that we knew to be serving static data (because we know, for instance, that retired Supreme Court justices issue no new opinions). He throttled access to the most resource-intensive pages (web crawlers had to slow down a bit so that real people doing research could proceed as usual). As a result, he could allow more worker processes to field page requests and we could continue to focus on feature development rather than server load.

Then came the inauguration on January 20th. Presidential memoranda and executive orders inspired many, many members of the general public to read the law for themselves. Traffic hovered around 220,000 sessions per day for the first week. And then the President issued the executive order on immigration. By Sunday January 29th, we had 259,945 sessions – more than we expect on a busy weekday. On January 30th, traffic jumped to 347,393. And then on January 31st traffic peaked at 435,549 sessions – and over 900,000 page views.

The servers were still quiet. Throughout, we were able to continue running some fairly resource-hungry updating processes to keep the CFR current. We’ll admit to having devoted a certain amount of attention to checking in on the real-time analytics to see what people were looking at, but for the most part it was business as usual.

Now, the level of traffic we were talking about was still small compared to the traffic we once fielded when Bush v. Gore was handed down in 2000 (that day we had steady traffic of about 4000 requests per minute for 24 hours). And Nic is still planning to add clustering to our bag of tricks. But the painstaking work of the last year has given us a lot of breathing room – even when one of our fans gives us a really big internet hug. In the meantime, we’ve settled into the new normal and continue the slow, steady work of making the website go faster when people need it the most.

eCFR: Updates on the Updates

Uncategorized No Responses »

Jan 192017

A bit over a year ago, we released the first iteration of our new version of the eCFR , the Office of the Federal Register’s unofficial compilation of the current text of the Code of Federal Regulations. At the time, we’d been using the text of the official print version of the CFR to generate our electronic version – it was based on GPO’s authenticated copy of the text, but it was woefully out of date because titles are prepared for printing only once a year. During the past year, while retrofitting and improving features like indentation , cross references, and definitions , we maintained the print-CFR in parallel so that readers could, if they chose, refer to the outdated-but-based-on-official-print version.

This week we’re discontinuing the print-CFR. The reason? Updates. As agencies engage in rulemaking activity, they amend, revise, add, and remove sections, subparts, parts, and appendices. During the past year, the Office of the Federal Register has published thousands of such changes to the eCFR. These changes will eventually make their way into the annual print edition of the CFR, but most of the time, the newest rules making the headlines are, at best, many months away from reaching print.

What’s new? Well, among those thousands of changes were a number of places where agencies were adding rules reflecting new electronic workflows. And these additions provide us with an occasion for checking every facet of our own electronic workflows. When the Citizenship and Immigration Service added the Electronic Visa Update System, they collected the existing sections in Part 215 of Title 8 of the CFR into a new Subpart A and added sections 215.21-215.24 under the new Subpart B . So, after adding the new sections, the software had to refresh the table of contents for Part 215 and create the table of contents for 8 CFR Part 215 Subpart A.

What you’ll see doesn’t look a whole lot different from what’s been there for the past year, but it will be a lot easier to find new CFR sections, the pages will load more quickly, and we will be able to release new CFR features more quickly.

Project Work from the M.Eng. Crew

Uncategorized No Responses »

Dec 152016

On December 5th, LII engineers Nic Ceynowa and Sylvia Kwakye, Ph.D., looked on in pride as the Cornell University Masters of Engineering students they’d supervised presented a trio of fall projects to LII and Cornell Law Library staff.

Entity Linking

Mutahir Kazmi and Shraddha Vartak pulled together, enhanced, and scaled a group of applications that link entities in the Code of Federal Regulations. Entity linking is a set of techniques that detect references to things in the world (such as people, places, animals, pharmaceuticals) and link them to data sources that provide more information about them. The team analyzed the entities and the corpus in order to determine which entities required disambiguation, distinguished entities to mark before and after defined-term markup, and used Apache Spark to speed the overall application by 60%.

US Code Definition Improvement

Khaleel Khaleel, Pracheth Javali, Ria Mirchandani, and Yashaswini Papanna took on the task of adapting our CFR definition extraction and markup software to meet the unique requirements of the US Code. In addition to learning the hierarchical structure and identifier schemes within the US Code corpus, the project involved discovering and extracting definition patterns that had not before been identified; parsing multiple defined terms, word roots, and abbreviations from individual definitions; and correctly detecting the boundaries of the definitions.

Before:

And after:

Search Prototype

Anusha Morappanavar, Deekshith Belchapadda, and Monisha Pavagada Chandrashekar built a prototype of the semantic search application using ElasticSearch and Flask. In addition to learning how to work with ElasticSearch, they had to learn the hierarchical structure of the US Code and CFR, understand how cross-references work within legal corpora, and make use of additional metadata such as the definitions and linked entities the other groups had been working on. Their work will support a search application that distinguishes matches in which the search term is defined, appears in the full text, or appears in a definition of a term that appears within the full text of a document.

We’ll be rolling out the features supported by this semester’s M.Eng. projects starting with entity linking in January.

What I did on my Summer Vacation: A Harvard Tale

Uncategorized No Responses »

Sep 142016

“There is nothing like looking, if you want to find something.”

-J.R.R. Tolkein

There…

This summer, at close to the very last minute, I set out for Cambridge, Massachusetts to pursue a peculiar quest for open access to law. Steering clear of the dragon on its pile of gold, I found some very interesting people in a library doing something in some ways parallel, and in many ways complementary, to what we do at LII.

At the Harvard Law School Library, there’s a group called the Library Innovation Lab, which uses technology to improve preservation and public access to library materials, including digitizing large corpora of legal documents. It is a project which complements what we do at the LII, and I went there to develop some tools that would be of help to us both and to others.

The LIL summer fellowship program that made this possible brought together a group with wide-ranging interests, in both substantive areas, such as Neel Agrawal’s website on the history of African drumming laws to Muira McCammon’s research on the Guantanamo detainee library, to crowdsourced documentation and preservation projects such as Tiffany Tseng’s Spin and Pix devices, Alexander Nwala’s Local Memory and Ilya Kreymer’s Webrecorder, to infrastructure projects such as Jay Edwards’s Caselaw Access Project API.

My project involved work on a data model to help developers make connections between siloed collections of text and metadata — which will hopefully help future developers to automate the process of connecting concepts in online legal corpora (both that at the LIL and ours at LII) to enriching data and context from multiple different sources.

The work involved exploring a somewhat larger-than-usual number of ontologies, structured vocabularies, and topic models. Each, in turn, came with one or more sets of subjects. Some (like Eurovoc and the topic models) came with sizable amounts of machine-readable text; others (like Linked Data For Libraries) came with very little machine-accessible text. As my understanding of the manageable as well as the insurmountable challenges associated with each one increased, I developed a far greater appreciation for the intuition that had led me to the project all along: there is a lot of useful information locked in these resources; each has a role to play.

In the process, I drew enormous inspiration from the dedication and creativity of the LIL group, from Paul Deschner’s Haystacks project, which provides a set of filters to create a manageable list of books on any subject or search term, to Brett Johnson’s work supporting the H2O open textbook platform, to Matt Phillips’s exploration of private talking spaces, to the Caselaw Access Project visualizations such as Anastasia Aisman’s topic mapping and Jack Cushman’s word clouds (supported by operational, programming, and metadata work from Kerri Fleming, Andy Silva, Ben Steinberg, and Steve Chapman). (All of this is thanks to the Harvard Law Library leadership of Jonathan Zittrain, LIL founder Kim Dulin, managing director Adam Ziegler, and library director Jocelyn Kennedy.)

And back again…

Returning to home to LII, I’m grateful to have the rejuvenating energy that arises from talking to new people, observing how other high-performing groups do their work, and having had the dedicated time to bring a complicated idea to fruition. All in all, it was a marvelous summer with marvelous people. But they did keep looking at me as if to ask why I’d brought along thirteen dwarfs, and how I managed to vanish any time I put that gold ring on my finger.

Remember Me

Work we admire No Responses »

Jul 062016

One of the great things that happens at the LII is working with the amazing students who come to study at Cornell — and finding out about the projects they’ve been cooking up while we weren’t distracting them by dangling shiny pieces of law before their eyes. This spring, Karthik Venkataramaiah, Vishal Kumkar, Shivananda Pujeri, and Mihir Shah — who previously worked with us on regulatory definition extraction and entity linking — invited us to attend a presentation they were giving at a conference of the American Society for Engineering Education: they had developed an app to assist dementia patients in interacting with their families.

The Remember Me app does a number of useful things — reminds patients to prepare for appointments, take medications, and so forth. But the remarkable idea is the way it would help dementia patients interact with people in their lives.

Here’s how it works: the app is installed on both the phone of the dementia sufferer and their loved ones and caregivers. When one of the people whom the dementia patient knows comes into proximity to the patient, the app automatically reminds the patient who the person is and how they know them by flashing up pictures designed to place the person in familiar context and remind the patient of their connection. Given the way that memory is always keyed to specific contexts, this helps patients stay grounded in relating to people whom they love but which their disease may hinder their recollection of.

One notable feature of the app is that it was designed not for a class in app development but in cloud computing, which means that the app can be used by a large number of people. The nature of the app also presented additional requirements: the team noted that “as our project is related to health domain, we need to be more careful with respect to cloud data security.” Further, although the students were software engineers who were tasked with developing a scalable application, their app reflects a thoughtful approach to developing a user experience that can benefit people with memory and other cognitive impairments. Associate Director Sara Frug says “among the many teams of talented M.Eng. students with whom we have worked over the years, Karthik, Vishal, Shivananda, and Mihir have shown a rare combination of skill and sophistication in software engineering, product design, and project management. Their app is a remarkable achievement, and we are proud to have seen its earliest stages of development.”

The Remember Me app has been developed as a prototype, with its first launch scheduled for August.

Older Entries

Suffusion theme by Sayontan Sinha

LII: Under the Hood

sara

Worth a thousand words: We're upping our game on CFR images

LII’s CFR – why bother re-publishing?

So, what’s the problem?

Public.Resource.Org to the rescue

Step 1: Math is hard, let’s use MathJax

Steps 2 and beyond

Accessibility improvements incoming

What is Web Accessibility?

What’s the history?

What now?

Work we admire: Law2Go

2017 Legislative Data and Transparency Conference

LII's M.Eng. crew rebuilds Docket Wrench

The new normal: How we stay fast by being slow and steady

eCFR: Updates on the Updates

Project Work from the M.Eng. Crew

What I did on my Summer Vacation: A Harvard Tale

Remember Me

Recent Posts

Recent Comments

Archives

Categories

Meta