{"id":42,"date":"2015-10-23T13:48:15","date_gmt":"2015-10-23T17:48:15","guid":{"rendered":"http:\/\/blog.law.cornell.edu\/tech\/?p=42"},"modified":"2015-10-23T15:49:07","modified_gmt":"2015-10-23T19:49:07","slug":"a-look-at-indentation","status":"publish","type":"post","link":"https:\/\/blog.law.cornell.edu\/tech\/2015\/10\/23\/a-look-at-indentation\/","title":{"rendered":"A Look at Indentation"},"content":{"rendered":"<p><em>Great was the rejoicing in the south tower of Myron Taylor Hall, headquarters of the LII, when we got notice of the bulk release of the Electronic Code of Federal Regulations (eCFR) in XML format.<\/em><\/p>\n<p><em>What was not to like? The data was as up-to-date as the CFR could get, the XML was much cleaner than the book version, it had a friendly user guide etc., etc., etc..<\/em><\/p>\n<p><em>It was also different enough from the book XML of the CFR, that we could not simply run it through our existing data enrichment process and serve it to the public as is. So, we curbed our enthusiasm long enough to put together a measured plan to re-do our code.<\/em><\/p>\n<p><em>We have heard enough from you, our wonderful readers, that text indentation was one of the most valued features of our data presentation. Thus, it was the first feature we chose to implement.<\/em><\/p>\n<p><em>All this verbosity is the set-up for a look at some of the messy sausage making details of adding indentation to the eCFR.<\/em><\/p>\n<p><code>***<\/code><\/p>\n<p>If you&#8217;re not familiar with XML (eXtensible Markup Language), it&#8217;s simply a way of marking up data with a predefined, consistent set of descriptive tags that are both easily human and machine readable. So, when we get XML data from the GPO, it looks something like this&#8230;<\/p>\n<div class=\"line\"><span class=\"text xml\">Snippet 1: XML from Title 1 of the CFR<\/span><\/div>\n<div class=\"line\"><span class=\"text xml\">==============================================<\/span><\/div>\n<div class=\"line\"><\/div>\n<div class=\"line\"><span class=\"text xml\"><span class=\"meta tag preprocessor xml\"><span class=\"punctuation definition tag xml\">&lt;?<\/span><span class=\"entity name tag xml\">xml<\/span><span class=\"entity other attribute-name xml\"> version<\/span>=<span class=\"string quoted double xml\"><span class=\"punctuation definition string begin xml\">&#8220;<\/span>1.0<span class=\"punctuation definition string end xml\">&#8220;<\/span><\/span><span class=\"entity other attribute-name xml\"> encoding<\/span>=<span class=\"string quoted double xml\"><span class=\"punctuation definition string begin xml\">&#8220;<\/span>UTF-8<span class=\"punctuation definition string end xml\">&#8220;<\/span><\/span> <span class=\"punctuation definition tag xml\">?&gt;<\/span><\/span><\/span><\/div>\n<div class=\"line\"><span class=\"text xml\"><span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;<\/span><span class=\"entity name tag localname xml\">DLPSTEXTCLASS<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span><\/span><\/div>\n<div class=\"line\"><span class=\"text xml\"><span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;<\/span><span class=\"entity name tag localname xml\">HEADER<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span><\/span><\/div>\n<div class=\"line\"><span class=\"text xml\"><span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;<\/span><span class=\"entity name tag localname xml\">FILEDESC<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span><\/span><\/div>\n<div class=\"line\"><span class=\"text xml\"><span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;<\/span><span class=\"entity name tag localname xml\">TITLESTMT<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span><\/span><\/div>\n<div class=\"line\"><span class=\"text xml\"><span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;<\/span><span class=\"entity name tag localname xml\">TITLE<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span><\/span><\/div>\n<div class=\"line\"><span class=\"text xml\">Title 1: General Provisions<span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;\/<\/span><span class=\"entity name tag localname xml\">TITLE<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span><\/span><\/div>\n<div class=\"line\"><span class=\"text xml\"><span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;<\/span><span class=\"entity name tag localname xml\">AUTHOR<\/span> <span class=\"entity other attribute-name localname xml\">TYPE<\/span>=<span class=\"string quoted double xml\"><span class=\"punctuation definition string begin xml\">&#8220;<\/span>nameinv<span class=\"punctuation definition string end xml\">&#8220;<\/span><\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span><\/span><\/div>\n<div class=\"line\"><span class=\"text xml\"><span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;\/<\/span><span class=\"entity name tag localname xml\">AUTHOR<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span><\/span><\/div>\n<div class=\"line\"><span class=\"text xml\"><span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;\/<\/span><span class=\"entity name tag localname xml\">TITLESTMT<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span><\/span><\/div>\n<div class=\"line\"><span class=\"text xml\"><span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;<\/span><span class=\"entity name tag localname xml\">PUBLICATIONSTMT<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span><\/span><\/div>\n<div class=\"line\"><span class=\"text xml\"><span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;<\/span><span class=\"entity name tag localname xml\">PUBLISHER<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span><\/span><\/div>\n<div class=\"line\"><span class=\"text xml\"><span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;\/<\/span><span class=\"entity name tag localname xml\">PUBLISHER<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span><\/span><\/div>\n<div class=\"line\"><span class=\"text xml\"><span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;<\/span><span class=\"entity name tag localname xml\">PUBPLACE<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span><\/span><\/div>\n<div class=\"line\"><span class=\"text xml\"><span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;\/<\/span><span class=\"entity name tag localname xml\">PUBPLACE<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span><\/span><\/div>\n<div class=\"line\"><span class=\"text xml\"><span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;<\/span><span class=\"entity name tag localname xml\">IDNO<\/span> <span class=\"entity other attribute-name localname xml\">TYPE<\/span>=<span class=\"string quoted double xml\"><span class=\"punctuation definition string begin xml\">&#8220;<\/span>title<span class=\"punctuation definition string end xml\">&#8220;<\/span><\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span><\/span><\/div>\n<div class=\"line\"><span class=\"text xml\">1<span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;\/<\/span><span class=\"entity name tag localname xml\">IDNO<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span><\/span><\/div>\n<div class=\"line\"><span class=\"text xml\"><span class=\"meta tag no-content xml\"><span class=\"punctuation definition tag xml\">&lt;<\/span><span class=\"entity name tag localname xml\">DATE<\/span><span class=\"punctuation definition tag xml\">&gt;<span class=\"meta scope between-tag-pair xml\">&lt;<\/span><\/span>\/<span class=\"entity name tag localname xml\">DATE<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span><\/span><\/div>\n<div class=\"line\"><span class=\"text xml\"><span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;\/<\/span><span class=\"entity name tag localname xml\">PUBLICATIONSTMT<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span><\/span><\/div>\n<div class=\"line\"><span class=\"text xml\"><span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;<\/span><span class=\"entity name tag localname xml\">SERIESSTMT<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span><\/span><\/div>\n<div class=\"line\"><span class=\"text xml\"><span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;<\/span><span class=\"entity name tag localname xml\">TITLE<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span><\/span><\/div>\n<div class=\"line\"><span class=\"text xml\"><span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;\/<\/span><span class=\"entity name tag localname xml\">TITLE<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span><\/span><\/div>\n<div class=\"line\"><span class=\"text xml\"><span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;\/<\/span><span class=\"entity name tag localname xml\">SERIESSTMT<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span><\/span><\/div>\n<div class=\"line\"><span class=\"text xml\"><span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;\/<\/span><span class=\"entity name tag localname xml\">FILEDESC<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span><\/span><\/div>\n<div class=\"line\"><span class=\"text xml\"><span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;<\/span><span class=\"entity name tag localname xml\">PROFILEDESC<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span><\/span><\/div>\n<div class=\"line\"><span class=\"text xml\"><span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;<\/span><span class=\"entity name tag localname xml\">TEXTCLASS<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span><\/span><\/div>\n<div class=\"line\"><span class=\"text xml\"><span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;<\/span><span class=\"entity name tag localname xml\">KEYWORDS<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span><\/span><\/div>\n<div class=\"line\"><span class=\"text xml\"><span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;\/<\/span><span class=\"entity name tag localname xml\">KEYWORDS<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span><\/span><\/div>\n<div class=\"line\"><span class=\"text xml\"><span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;\/<\/span><span class=\"entity name tag localname xml\">TEXTCLASS<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span><\/span><\/div>\n<div class=\"line\"><span class=\"text xml\"><span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;\/<\/span><span class=\"entity name tag localname xml\">PROFILEDESC<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span><\/span><\/div>\n<div class=\"line\"><span class=\"text xml\"><span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;\/<\/span><span class=\"entity name tag localname xml\">HEADER<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span><\/span><\/div>\n<div class=\"line\"><span class=\"text xml\"><span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;<\/span><span class=\"entity name tag localname xml\">TEXT<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span><\/span><\/div>\n<div class=\"line\"><span class=\"text xml\"><span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;<\/span><span class=\"entity name tag localname xml\">BODY<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span><\/span><\/div>\n<div class=\"line\"><span class=\"text xml\"><span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;<\/span><span class=\"entity name tag localname xml\">ECFRBRWS<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span><\/span><\/div>\n<div class=\"line\"><span class=\"text xml\"><span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;<\/span><span class=\"entity name tag localname xml\">AMDDATE<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span>Jan. 30, 2015<span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;\/<\/span><span class=\"entity name tag localname xml\">AMDDATE<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span><\/span><\/div>\n<div class=\"line\"><span class=\"text xml\"><span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;<\/span><span class=\"entity name tag localname xml\">DIV1<\/span> <span class=\"entity other attribute-name localname xml\">N<\/span>=<span class=\"string quoted double xml\"><span class=\"punctuation definition string begin xml\">&#8220;<\/span>1<span class=\"punctuation definition string end xml\">&#8220;<\/span><\/span> <span class=\"entity other attribute-name localname xml\">NODE<\/span>=<span class=\"string quoted double xml\"><span class=\"punctuation definition string begin xml\">&#8220;<\/span>1:1<span class=\"punctuation definition string end xml\">&#8220;<\/span><\/span> <span class=\"entity other attribute-name localname xml\">TYPE<\/span>=<span class=\"string quoted double xml\"><span class=\"punctuation definition string begin xml\">&#8220;<\/span>TITLE<span class=\"punctuation definition string end xml\">&#8220;<\/span><\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span><\/span><\/div>\n<div class=\"line\"><span class=\"text xml\"><span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;<\/span><span class=\"entity name tag localname xml\">HEAD<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span>Title 1 &#8211; General Provisions&#8211;Volume 1<span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;\/<\/span><span class=\"entity name tag localname xml\">HEAD<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span><\/span><\/div>\n<div class=\"line\"><span class=\"text xml\"><span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;<\/span><span class=\"entity name tag localname xml\">CFRTOC<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span><\/span><\/div>\n<div class=\"line\"><span class=\"text xml\"><span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;<\/span><span class=\"entity name tag localname xml\">PTHD<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span>Part <span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;\/<\/span><span class=\"entity name tag localname xml\">PTHD<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span><\/span><\/div>\n<div class=\"line\"><span class=\"text xml\"><span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;<\/span><span class=\"entity name tag localname xml\">CHAPTI<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span><\/span><\/div>\n<div class=\"line\"><span class=\"text xml\"><span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;<\/span><span class=\"entity name tag localname xml\">SUBJECT<\/span><span class=\"punctuation definition tag xml\">&gt;&lt;<\/span><span class=\"entity name tag localname xml\">E<\/span> <span class=\"entity other attribute-name localname xml\">T<\/span>=<span class=\"string quoted double xml\"><span class=\"punctuation definition string begin xml\">&#8220;<\/span>04<span class=\"punctuation definition string end xml\">&#8220;<\/span><\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span>chapter i<span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;\/<\/span><span class=\"entity name tag localname xml\">E<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span> &#8211; Administrative Committee of the Federal Register <span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;\/<\/span><span class=\"entity name tag localname xml\">SUBJECT<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span><\/span><\/div>\n<div class=\"line\"><span class=\"text xml\"><span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;<\/span><span class=\"entity name tag localname xml\">PG<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span>1<span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;\/<\/span><span class=\"entity name tag localname xml\">PG<\/span><span class=\"punctuation definition tag xml\">&gt;&lt;\/<\/span><span class=\"entity name tag localname xml\">CHAPTI<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span><\/span><\/div>\n<div class=\"line\"><span class=\"text xml\"><span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;<\/span><span class=\"entity name tag localname xml\">CHAPTI<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span><\/span><\/div>\n<div class=\"line\"><span class=\"text xml\"><span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;<\/span><span class=\"entity name tag localname xml\">SUBJECT<\/span><span class=\"punctuation definition tag xml\">&gt;&lt;<\/span><span class=\"entity name tag localname xml\">E<\/span> <span class=\"entity other attribute-name localname xml\">T<\/span>=<span class=\"string quoted double xml\"><span class=\"punctuation definition string begin xml\">&#8220;<\/span>04<span class=\"punctuation definition string end xml\">&#8220;<\/span><\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span>chapter ii<span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;\/<\/span><span class=\"entity name tag localname xml\">E<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span> &#8211; Office of the Federal Register <span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;\/<\/span><span class=\"entity name tag localname xml\">SUBJECT<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span><\/span><\/div>\n<div class=\"line\"><span class=\"text xml\"><span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;<\/span><span class=\"entity name tag localname xml\">PG<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span>51<span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;\/<\/span><span class=\"entity name tag localname xml\">PG<\/span><span class=\"punctuation definition tag xml\">&gt;&lt;\/<\/span><span class=\"entity name tag localname xml\">CHAPTI<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span><\/span><\/div>\n<div class=\"line\"><span class=\"text xml\"><span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;<\/span><span class=\"entity name tag localname xml\">CHAPTI<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span><\/span><\/div>\n<div class=\"line\"><span class=\"text xml\"><span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;<\/span><span class=\"entity name tag localname xml\">SUBJECT<\/span><span class=\"punctuation definition tag xml\">&gt;&lt;<\/span><span class=\"entity name tag localname xml\">E<\/span> <span class=\"entity other attribute-name localname xml\">T<\/span>=<span class=\"string quoted double xml\"><span class=\"punctuation definition string begin xml\">&#8220;<\/span>04<span class=\"punctuation definition string end xml\">&#8220;<\/span><\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span>chapter iii<span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;\/<\/span><span class=\"entity name tag localname xml\">E<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span> &#8211; Administrative Conference of the United States <span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;\/<\/span><span class=\"entity name tag localname xml\">SUBJECT<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span><\/span><\/div>\n<div class=\"line\"><span class=\"text xml\"><span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;<\/span><span class=\"entity name tag localname xml\">PG<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span>301<span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;\/<\/span><span class=\"entity name tag localname xml\">PG<\/span><span class=\"punctuation definition tag xml\">&gt;&lt;\/<\/span><span class=\"entity name tag localname xml\">CHAPTI<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span><\/span><\/div>\n<div class=\"line\"><span class=\"text xml\"><span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;<\/span><span class=\"entity name tag localname xml\">CHAPTI<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span><\/span><\/div>\n<div class=\"line\"><span class=\"text xml\"><span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;<\/span><span class=\"entity name tag localname xml\">SUBJECT<\/span><span class=\"punctuation definition tag xml\">&gt;&lt;<\/span><span class=\"entity name tag localname xml\">E<\/span> <span class=\"entity other attribute-name localname xml\">T<\/span>=<span class=\"string quoted double xml\"><span class=\"punctuation definition string begin xml\">&#8220;<\/span>04<span class=\"punctuation definition string end xml\">&#8220;<\/span><\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span>chapter iv<span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;\/<\/span><span class=\"entity name tag localname xml\">E<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span> &#8211; Miscellaneous Agencies <span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;\/<\/span><span class=\"entity name tag localname xml\">SUBJECT<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span><\/span><\/div>\n<div class=\"line\"><span class=\"text xml\"><span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;<\/span><span class=\"entity name tag localname xml\">PG<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span>425<\/span><\/div>\n<div class=\"line\"><span class=\"text xml\"><span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;\/<\/span><span class=\"entity name tag localname xml\">PG<\/span><span class=\"punctuation definition tag xml\">&gt;&lt;\/<\/span><span class=\"entity name tag localname xml\">CHAPTI<\/span><span class=\"punctuation definition tag xml\">&gt;&lt;\/<\/span><span class=\"entity name tag localname xml\">CFRTOC<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span><\/span><\/div>\n<div class=\"line\"><span class=\"text xml\"><span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;<\/span><span class=\"entity name tag localname xml\">DIV3<\/span> <span class=\"entity other attribute-name localname xml\">N<\/span>=<span class=\"string quoted double xml\"><span class=\"punctuation definition string begin xml\">&#8220;<\/span>I<span class=\"punctuation definition string end xml\">&#8220;<\/span><\/span> <span class=\"entity other attribute-name localname xml\">NODE<\/span>=<span class=\"string quoted double xml\"><span class=\"punctuation definition string begin xml\">&#8220;<\/span>1:1.0.1<span class=\"punctuation definition string end xml\">&#8220;<\/span><\/span> <span class=\"entity other attribute-name localname xml\">TYPE<\/span>=<span class=\"string quoted double xml\"><span class=\"punctuation definition string begin xml\">&#8220;<\/span>CHAPTER<span class=\"punctuation definition string end xml\">&#8220;<\/span><\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span><\/span><\/div>\n<div class=\"line\"><span class=\"text xml\"><span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;<\/span><span class=\"entity name tag localname xml\">HEAD<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span> CHAPTER I &#8211; ADMINISTRATIVE COMMITTEE OF THE FEDERAL REGISTER<span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;\/<\/span><span class=\"entity name tag localname xml\">HEAD<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span><\/span><\/div>\n<div class=\"line\"><span class=\"text xml\"><span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;<\/span><span class=\"entity name tag localname xml\">DIV4<\/span> <span class=\"entity other attribute-name localname xml\">N<\/span>=<span class=\"string quoted double xml\"><span class=\"punctuation definition string begin xml\">&#8220;<\/span>A<span class=\"punctuation definition string end xml\">&#8220;<\/span><\/span> <span class=\"entity other attribute-name localname xml\">NODE<\/span>=<span class=\"string quoted double xml\"><span class=\"punctuation definition string begin xml\">&#8220;<\/span>1:1.0.1.1<span class=\"punctuation definition string end xml\">&#8220;<\/span><\/span> <span class=\"entity other attribute-name localname xml\">TYPE<\/span>=<span class=\"string quoted double xml\"><span class=\"punctuation definition string begin xml\">&#8220;<\/span>SUBCHAP<span class=\"punctuation definition string end xml\">&#8220;<\/span><\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span><\/span><\/div>\n<div class=\"line\"><span class=\"text xml\"><span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;<\/span><span class=\"entity name tag localname xml\">HEAD<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span>SUBCHAPTER A &#8211; GENERAL<\/span><\/div>\n<div class=\"line\"><span class=\"text xml\"><span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;\/<\/span><span class=\"entity name tag localname xml\">HEAD<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span><\/span><\/div>\n<div class=\"line\"><span class=\"text xml\"><span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;<\/span><span class=\"entity name tag localname xml\">DIV5<\/span> <span class=\"entity other attribute-name localname xml\">N<\/span>=<span class=\"string quoted double xml\"><span class=\"punctuation definition string begin xml\">&#8220;<\/span>1<span class=\"punctuation definition string end xml\">&#8220;<\/span><\/span> <span class=\"entity other attribute-name localname xml\">NODE<\/span>=<span class=\"string quoted double xml\"><span class=\"punctuation definition string begin xml\">&#8220;<\/span>1:1.0.1.1.1<span class=\"punctuation definition string end xml\">&#8220;<\/span><\/span> <span class=\"entity other attribute-name localname xml\">TYPE<\/span>=<span class=\"string quoted double xml\"><span class=\"punctuation definition string begin xml\">&#8220;<\/span>PART<span class=\"punctuation definition string end xml\">&#8220;<\/span><\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span><\/span><\/div>\n<div class=\"line\"><span class=\"text xml\"><span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;<\/span><span class=\"entity name tag localname xml\">HEAD<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span>PART 1 &#8211; DEFINITIONS <span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;\/<\/span><span class=\"entity name tag localname xml\">HEAD<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span><\/span><\/div>\n<div class=\"line\"><span class=\"text xml\"><span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;<\/span><span class=\"entity name tag localname xml\">AUTH<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span><\/span><\/div>\n<div class=\"line\"><span class=\"text xml\"><span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;<\/span><span class=\"entity name tag localname xml\">HED<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span>Authority:<span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;\/<\/span><span class=\"entity name tag localname xml\">HED<\/span><span class=\"punctuation definition tag xml\">&gt;&lt;<\/span><span class=\"entity name tag localname xml\">PSPACE<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span>44 U.S.C. 1506; sec. 6, E.O. 10530, 19 FR 2709; 3 CFR, 1954-1958 Comp., p.189.<\/span><\/div>\n<div class=\"line\"><span class=\"text xml\"><span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;\/<\/span><span class=\"entity name tag localname xml\">PSPACE<\/span><span class=\"punctuation definition tag xml\">&gt;&lt;\/<\/span><span class=\"entity name tag localname xml\">AUTH<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span><\/span><\/div>\n<div class=\"line\"><span class=\"text xml\"><span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;<\/span><span class=\"entity name tag localname xml\">DIV8<\/span> <span class=\"entity other attribute-name localname xml\">N<\/span>=<span class=\"string quoted double xml\"><span class=\"punctuation definition string begin xml\">&#8220;<\/span>\u00a7 1.1<span class=\"punctuation definition string end xml\">&#8220;<\/span><\/span> <span class=\"entity other attribute-name localname xml\">NODE<\/span>=<span class=\"string quoted double xml\"><span class=\"punctuation definition string begin xml\">&#8220;<\/span>1:1.0.1.1.1.0.1.1<span class=\"punctuation definition string end xml\">&#8220;<\/span><\/span> <span class=\"entity other attribute-name localname xml\">TYPE<\/span>=<span class=\"string quoted double xml\"><span class=\"punctuation definition string begin xml\">&#8220;<\/span>SECTION<span class=\"punctuation definition string end xml\">&#8220;<\/span><\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span><\/span><\/div>\n<div class=\"line\"><span class=\"text xml\"><span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;<\/span><span class=\"entity name tag localname xml\">HEAD<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span>\u00a7 1.1 Definitions.<span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;\/<\/span><span class=\"entity name tag localname xml\">HEAD<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span><\/span><\/div>\n<div class=\"line\"><span class=\"text xml\"><span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;<\/span><span class=\"entity name tag localname xml\">P<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span>As used in this chapter, unless the context requires otherwise &#8211; <span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;\/<\/span><span class=\"entity name tag localname xml\">P<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span><\/span><\/div>\n<div class=\"line\"><span class=\"text xml\"><span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;<\/span><span class=\"entity name tag localname xml\">P<\/span><span class=\"punctuation definition tag xml\">&gt;&lt;<\/span><span class=\"entity name tag localname xml\">I<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span>Administrative Committee<span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;\/<\/span><span class=\"entity name tag localname xml\">I<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span> means the Administrative Committee of the Federal Register established under section 1506 of title 44, United States Code; <span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;\/<\/span><span class=\"entity name tag localname xml\">P<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span><\/span><\/div>\n<div class=\"line\"><span class=\"text xml\"><span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;<\/span><span class=\"entity name tag localname xml\">P<\/span><span class=\"punctuation definition tag xml\">&gt;&lt;<\/span><span class=\"entity name tag localname xml\">I<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span>Agency<span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;\/<\/span><span class=\"entity name tag localname xml\">I<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span> means each authority, whether or not within or subject to review by another agency, of the United States, other than the Congress, the courts, the District of Columbia, the Commonwealth of Puerto Rico, and the territories and possessions of the United States; <span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;\/<\/span><span class=\"entity name tag localname xml\">P<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span><\/span><\/div>\n<div class=\"line\"><span class=\"text xml\"><span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;<\/span><span class=\"entity name tag localname xml\">P<\/span><span class=\"punctuation definition tag xml\">&gt;&lt;<\/span><span class=\"entity name tag localname xml\">I<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span>Document<span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;\/<\/span><span class=\"entity name tag localname xml\">I<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span> includes any Presidential proclamation or Executive order, and any rule, regulation, order, certificate, code of fair competition, license, notice, or similar instrument issued, prescribed, or promulgated by an agency; <span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;\/<\/span><span class=\"entity name tag localname xml\">P<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span><\/span><\/div>\n<div class=\"line\"><span class=\"text xml\"><span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;<\/span><span class=\"entity name tag localname xml\">P<\/span><span class=\"punctuation definition tag xml\">&gt;&lt;<\/span><span class=\"entity name tag localname xml\">I<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span>Document having general applicability and legal effect<span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;\/<\/span><span class=\"entity name tag localname xml\">I<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span> means any document issued under proper authority prescribing a penalty or course of conduct, conferring a right, privilege, authority, or immunity, or imposing an obligation, and relevant or applicable to the general public, members of a class, or persons in a locality, as distinguished from named individuals or organizations; and <span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;\/<\/span><span class=\"entity name tag localname xml\">P<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span><\/span><\/div>\n<div class=\"line\"><span class=\"text xml\"><span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;<\/span><span class=\"entity name tag localname xml\">P<\/span><span class=\"punctuation definition tag xml\">&gt;&lt;<\/span><span class=\"entity name tag localname xml\">I<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span>Filing<span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;\/<\/span><span class=\"entity name tag localname xml\">I<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span> means making a document available for public inspection at the Office of the Federal Register during official business hours. A document is filed only after it has been received, processed and assigned a publication date according to the schedule in part 17 of this chapter.<span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;\/<\/span><span class=\"entity name tag localname xml\">P<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span><\/span><\/div>\n<div class=\"line\"><span class=\"text xml\"><span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;<\/span><span class=\"entity name tag localname xml\">P<\/span><span class=\"punctuation definition tag xml\">&gt;&lt;<\/span><span class=\"entity name tag localname xml\">I<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span>Regulation<span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;\/<\/span><span class=\"entity name tag localname xml\">I<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span> and <span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;<\/span><span class=\"entity name tag localname xml\">I<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span>rule<span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;\/<\/span><span class=\"entity name tag localname xml\">I<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span> have the same meaning. <span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;\/<\/span><span class=\"entity name tag localname xml\">P<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span><\/span><\/div>\n<div class=\"line\"><span class=\"text xml\"><span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;<\/span><span class=\"entity name tag localname xml\">CITA<\/span> <span class=\"entity other attribute-name localname xml\">TYPE<\/span>=<span class=\"string quoted double xml\"><span class=\"punctuation definition string begin xml\">&#8220;<\/span>N<span class=\"punctuation definition string end xml\">&#8220;<\/span><\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span>[37 FR 23603, Nov. 4, 1972, as amended at 50 FR 12466, Mar. 28, 1985]<\/span><\/div>\n<div class=\"line\"><span class=\"text xml\"><span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;\/<\/span><span class=\"entity name tag localname xml\">CITA<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span><\/span><\/div>\n<div class=\"line\"><span class=\"text xml\"><span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;\/<\/span><span class=\"entity name tag localname xml\">DIV8<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span><\/span><\/div>\n<div class=\"line\"><span class=\"text xml\"><span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;\/<\/span><span class=\"entity name tag localname xml\">DIV5<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span><\/span><\/div>\n<div class=\"line\"><span class=\"text xml\">&#8230;<\/span><\/div>\n<div class=\"line\"><span class=\"text xml\">&#8230;<\/span><\/div>\n<div class=\"line\"><span class=\"text xml\">&#8230;<\/span><\/div>\n<div class=\"line\"><span class=\"text xml\"><span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;\/<\/span><span class=\"entity name tag localname xml\">DIV1<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span><\/span><\/div>\n<div class=\"line\"><span class=\"text xml\"><span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;\/<\/span><span class=\"entity name tag localname xml\">ECFRBRWS<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span><\/span><\/div>\n<div class=\"line\"><span class=\"text xml\"><span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;\/<\/span><span class=\"entity name tag localname xml\">BODY<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span><\/span><\/div>\n<div class=\"line\"><span class=\"text xml\"><span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;\/<\/span><span class=\"entity name tag localname xml\">TEXT<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span><\/span><\/div>\n<div class=\"line\"><span class=\"text xml\"><span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;\/<\/span><span class=\"entity name tag localname xml\">DLPSTEXTCLASS<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span><\/span><\/div>\n<p>The text of the regulations are enclosed within tags that provide some context for what you&#8217;re looking at, have meaning for how it should be displayed or provide additional metadata that may be useful to the enrichment process.<\/p>\n<p>As a first step, we consulted the user guide to see if there was any information on how to indent the text. There was something! On page 13, was this snippet of XML (Figure 1) with the enumeration indicators highlighted. The next page had a suggestion for how that could be displayed (Figure 2).<\/p>\n<p><a href=\"http:\/\/blog.law.cornell.edu\/tech\/files\/2015\/10\/pg13-xml.png\"><img loading=\"lazy\" decoding=\"async\" width=\"1624\" height=\"1144\" class=\"wp-image-33\" src=\"http:\/\/blog.law.cornell.edu\/tech\/files\/2015\/10\/pg13-xml.png\" alt=\"Figure 1: 5 CFR 151.101 in XML format\" \/><\/a><\/p>\n<p><a href=\"http:\/\/blog.law.cornell.edu\/tech\/files\/2015\/10\/pg14-presentation.png\"><img loading=\"lazy\" decoding=\"async\" width=\"1508\" height=\"1070\" class=\"wp-image-34\" src=\"http:\/\/blog.law.cornell.edu\/tech\/files\/2015\/10\/pg14-presentation.png\" alt=\"Figure 2: Presentation suggested by the Government Print Office for 5 CFR 151.101\" \/><\/a><\/p>\n<p>Obvious to us and as indicated by the user guide itself, there was no way to achieve this display given just the information from the markup. A good place to look for extra information was within the CFR itself.<\/p>\n<p>We found what we were looking for in <a href=\"https:\/\/www.law.cornell.edu\/cfr\/text\/1\/21.11\">Title 1, Section 21.11<\/a>, which is about how the CFR enumerators are organized, or more accurately, are supposed to be organized. Of particular interest was the hierarchy of paragraphs given by subsection 21.11(h):<\/p>\n<div class=\"line\"><span class=\"text plain\"><span class=\"meta paragraph text\">(h) Paragraphs, which are designated as follows:<\/span><\/span><\/div>\n<div class=\"line\"><span class=\"text plain\"><span class=\"meta paragraph text\">level 1(a), (b), (c), etc.<\/span><\/span><\/div>\n<div class=\"line\"><span class=\"text plain\"><span class=\"meta paragraph text\">level 2(1), (2), (3), etc.<\/span><\/span><\/div>\n<div class=\"line\"><span class=\"text plain\"><span class=\"meta paragraph text\">level 3(i), (ii), (iii), etc.<\/span><\/span><\/div>\n<div class=\"line\"><span class=\"text plain\"><span class=\"meta paragraph text\">level 4(A), (B), (C), etc.<\/span><\/span><\/div>\n<div class=\"line\"><span class=\"text plain\"><span class=\"meta paragraph text\">level 5(1), (2), (3), etc.<\/span><\/span><\/div>\n<div class=\"line\"><span class=\"text plain\"><span class=\"meta paragraph text\">level 6(i), (ii), (iii), etc.<\/span><\/span><\/div>\n<p>In our first iteration of indentation, we added attributes to each paragraph defining a depth of indentation corresponding to the 6 levels above. <a href=\"https:\/\/www.law.cornell.edu\/cfr\/text\/5\/151.101\">Section 151.101 of Title 5<\/a>, the example in the user guide pages above, looked lovely. But, (you knew it would not be that simple, right?) this implementation worked fine for only about 60% of the random selection of sections we tested it on.<\/p>\n<p>Where the algorithm did not work, the main reason for failure was the presence of multiple enumerators within a single paragraph. In other words, each enumerator should have its own paragraph but not all paragraphs were marked as such.<\/p>\n<div class=\"line\"><span class=\"text xml\">Snippet 2: XML from 9 CFR 2.1<\/span><\/div>\n<div class=\"line\"><span class=\"text xml\">==============================================<\/span><\/div>\n<div class=\"line\"><\/div>\n<div class=\"line\"><span class=\"text xml\"><span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;<\/span><span class=\"entity name tag localname xml\">p<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span>(a)(1) Any person operating or intending to operate as a dealer, exhibitor, or operator of an auction sale, except persons who are exempted from the licensing requirements under paragraph (a)(3) of this section, must have a valid license. A person must be 18 years of age or older to obtain a license. A person seeking a license shall apply on a form which will be furnished by the AC Regional Director in the State in which that person operates or intends to operate. The applicant shall provide the information requested on the application form, including a valid mailing address through which the licensee or applicant can be reached at all times, and a valid premises address where animals, animal facilities, equipment, and records may be inspected for compliance. The applicant shall file the completed application form with the AC Regional Director. <span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;\/<\/span><span class=\"entity name tag localname xml\">p<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span><\/span><\/div>\n<p>In the snippet above, we have the case where there are 2 enumerators at the beginning of the paragraph. Since our algorithm assumed one enumerator per paragraph, it would only find (a) but not (1). We fixed that in the second iteration.<\/p>\n<p>In our third iteration, we went after more embedded enumerators (see snippet 3 below) by creating a category for these previously untagged enumerators. We named them, <em>nested paragraphs<\/em>, and tagged them as such.<\/p>\n<div class=\"line\"><span class=\"text xml\">Snippet 3: XML from 8 CFR 103.3<\/span><\/div>\n<div class=\"line\"><span class=\"text xml\">==============================================<\/span><\/div>\n<div class=\"line\"><\/div>\n<div class=\"line\"><span class=\"text xml\"><span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;<\/span><span class=\"entity name tag localname xml\">p<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span>(a) <span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;<\/span><span class=\"entity name tag localname xml\">i<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span>Denials and appeals<span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;\/<\/span><span class=\"entity name tag localname xml\">i<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span> &#8211; (1) <span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;<\/span><span class=\"entity name tag localname xml\">i<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span>General<span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;\/<\/span><span class=\"entity name tag localname xml\">i<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span> &#8211; (i) <span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;<\/span><span class=\"entity name tag localname xml\">i<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span>Denial of application or petition.<span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;\/<\/span><span class=\"entity name tag localname xml\">i<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span> When a Service officer denies an application or petition filed under \u00a7 103.2 of this part, the officer shall explain in writing the specific reasons for denial. If Form I-292 (a denial form including notification of the right of appeal) is used to notify the applicant or petitioner, the duplicate of Form I-292 constitutes the denial order.<span class=\"meta tag xml\"><span class=\"punctuation definition tag xml\">&lt;\/<\/span><span class=\"entity name tag localname xml\">p<\/span><span class=\"punctuation definition tag xml\">&gt;<\/span><\/span><\/span><\/div>\n<p>In the last snippet, the paragraph has 3 enumerators, (a), (1), and (i). We&#8217;ve developed a library of patterns that our algorithm uses to find them all. In title 26 alone, we find and tag 13,563 nested paragraphs!<\/p>\n<p>So, we now have a pretty nice indentation feature, that while not completely finished, is already an improvement over what we were able to do before for the CFR. See <a href=\"https:\/\/www.law.cornell.edu\/cfr\/text\/8\/103.3\">8 CFR 103.3 (a)(1)(iii)(A)<\/a> and its corresponding <a href=\"https:\/\/www.law.cornell.edu\/cfr\/text\/8\/103.3?qt-cfr_tabs=1#qt-cfr_tabs\">eCFR version<\/a> for an example of this.<\/p>\n<p>We&#8217;re putting it on the back burner for now but there is more to come for indentation. For instance, we know from extensive study of the markup that there are actually 8 levels of nesting to be had, not 6. And, we have to provide special handling for sections that do not follow the numbering scheme in <a href=\"https:\/\/www.law.cornell.edu\/cfr\/text\/1\/21.11\">1 CFR 21.11<\/a>.<\/p>\n<p>We&#8217;re grateful for our beta testers and readers. If you come across places where our current indentation scheme does not work, please let us know. In the interim, we&#8217;ll be devoting some brain cycles to adding cross references and other links to the eCFR.<\/p>\n<!-- AddThis Advanced Settings generic via filter on the_content --><!-- AddThis Share Buttons generic via filter on the_content -->","protected":false},"excerpt":{"rendered":"<p>Great was the rejoicing in the south tower of Myron Taylor Hall, headquarters of the LII, when we got notice of the bulk release of the Electronic Code of Federal Regulations (eCFR) in XML format. What was not to like? The data was as up-to-date as the CFR could get, the XML was much cleaner <a href='https:\/\/blog.law.cornell.edu\/tech\/2015\/10\/23\/a-look-at-indentation\/'>[&#8230;]<\/a><!-- AddThis Advanced Settings generic via filter on get_the_excerpt --><!-- AddThis Share Buttons generic via filter on get_the_excerpt --><\/p>\n","protected":false},"author":128,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[322],"tags":[],"class_list":["post-42","post","type-post","status-publish","format-standard","hentry","category-cfr"],"_links":{"self":[{"href":"https:\/\/blog.law.cornell.edu\/tech\/wp-json\/wp\/v2\/posts\/42","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.law.cornell.edu\/tech\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.law.cornell.edu\/tech\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.law.cornell.edu\/tech\/wp-json\/wp\/v2\/users\/128"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.law.cornell.edu\/tech\/wp-json\/wp\/v2\/comments?post=42"}],"version-history":[{"count":12,"href":"https:\/\/blog.law.cornell.edu\/tech\/wp-json\/wp\/v2\/posts\/42\/revisions"}],"predecessor-version":[{"id":63,"href":"https:\/\/blog.law.cornell.edu\/tech\/wp-json\/wp\/v2\/posts\/42\/revisions\/63"}],"wp:attachment":[{"href":"https:\/\/blog.law.cornell.edu\/tech\/wp-json\/wp\/v2\/media?parent=42"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.law.cornell.edu\/tech\/wp-json\/wp\/v2\/categories?post=42"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.law.cornell.edu\/tech\/wp-json\/wp\/v2\/tags?post=42"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}