{"id":1342,"date":"2011-10-02T09:38:12","date_gmt":"2011-10-02T14:38:12","guid":{"rendered":"http:\/\/blog.law.cornell.edu\/voxpop\/?p=1342"},"modified":"2011-10-02T09:38:12","modified_gmt":"2011-10-02T14:38:12","slug":"csl-metadata-and-legal-information-that-just-works","status":"publish","type":"post","link":"https:\/\/blog.law.cornell.edu\/voxpop\/2011\/10\/02\/csl-metadata-and-legal-information-that-just-works\/","title":{"rendered":"CSL, Metadata, and Legal Information that Just Works"},"content":{"rendered":"

\"\"<\/a><\/p>\n

In the wake of a decisive victory at the Battle of Sekigahara in 1600, Tokugawa Ieyasu treated rival Japanese warlords to a simple but effective instrument of control, pioneered in the preceding Era of the Warring States. The Daimyo, as the defeated clan heads were known, retained control of their respective domains, but were required to reside in the newly established seat of government at Edo (now Tokyo) in alternate years. They were free to return home in the off-years, but only by leaving their princesses and heirs behind in the walled gardens of the capitol, as a token of the enduring bond of friendship and mutual admiration that united the Shogun and his sometimes grudging subordinates.<\/p>\n

The processions of competing Daimyo moving to and from the seat of real power soon became a measure of status, and the cost of these semi-annual journeys would eventually consume fully half of each Daimyo\u2019s disposable income. This contributed greatly to the prosperity of communities stationed along the wayside, where tradesmen, innkeepers, chefs, entertainers, and the occasional thief shared in revenue extracted from the peasants in the Daimyo\u2019s fiefdom back home. A cynic might say that the practice of san-kin-k\u014dtai<\/em> (\u53c2\u52e4\u4ea4\u4ee3) was little more than an elaborate system of hostage-taking, but in its way it was very good for business \u2014 at least if you did not have the misfortune to be a peasant.<\/p>\n

\"Original<\/a><\/p>\n

Japan later shed the hobbles of feudal regulation, of course, and the population are now free to move about as they please; but for Daimyo read content<\/em>, and for the Daimyo\u2019s princesses and progeny read metadata<\/em>, and you have a description of a familiar Internet business model. Too familiar, perhaps, as most of us now rely on content supplied through walled gardens<\/a> for much of our research work.<\/p>\n

Just as the freedom of individuals is improved by lifting restraints on travel, so the flow of content is more meaningful when accompanied by the descriptive metadata that is its natural companion. As observed by others in this space (most recently here<\/a> and here<\/a>), there are barriers today to the free flow of legal information. As will be outlined below, hamstrung metadata is, unfortunately, one of them. This information \u2014 mundane details like the date, court, and party names of a legal decision, and the volume, journal, page or identifier used to locate it \u2014 are curiously hard for machines<\/em> to find in the pages issued by any of the leading commercial services in the 40-year-old online legal information industry.<\/p>\n

More than any fundamental difference in the materials themselves, captive metadata accounts for the striking gap that has emerged between the research tools available in law and in other disciplines. Driven by the needs of researchers in the sciences and the humanities, personal research platforms that thrive on metadata are now widely available: to make them servants of the law, they want only to be fed.<\/p>\n

One element of this alternative infrastructure that depends on rich metadata provision is the Citation Style Language (CSL<\/a>), which is the proper subject of this essay. The next three sections provide a short introduction to CSL, followed by a few observations on the state of legal metadata provision on today’s legal Internet. The essay concludes with a comment on some of the lights that seem to be flickering into view at the end of this particular tunnel, and on the prospective benefits of at last bringing the law within reach of a modern research support ecosystem.<\/p>\n

About CSL<\/h1>\n

\"\"<\/p>\n

The Citation Style Language is an XML vocabulary for accurately describing citation and bibliography formats. Given the breath of life<\/a> by the original Zotero<\/a> citation formatter, CSL is now entering its eighth year of development, can boast two full production implementations, and drives citation formatting in at least six major bibliographic or text processing projects, with total user numbers in the millions.<\/p>\n

The illustration to the right provides a simplified view of CSL processing flow. In greater detail it works like this:<\/p>\n

    \n
  1. A running copy of the processor is cast (“instantiated”) using the rules specified in a particular CSL style file<\/a>.<\/li>\n
  2. The calling application sends fine-grained item metadata to the processor.<\/li>\n
  3. The processor registers data it receives, for the purpose of tracking the document context of each item.<\/li>\n
  4. The calling application sends a request for a citation or a bibliography listing. In the former case, the call will supply information about document state (note numbers and the like), and additional details specific to the request (such as a pinpoint page number).<\/li>\n
  5. The processor analyses the request, calculates any auto-generated item variables, and applies any disambiguation rules defined in the style to assure that item references are unique.<\/li>\n
  6. The processor returns the citation or bibliography listing as a serialized string in the language (such as English or French) and the markup format (such as XHTML or RTF) that it has been configured to deliver.<\/li>\n<\/ol>\n

    The upshot of all this swirling machinery is that generic metadata<\/em> can be used to generate citations in arbitrary formats<\/em>. In operation, this means that an article originally written according to, say, the Oxford Standard for Citation of Legal Authorities (OSCOLA<\/a>) can be reformatted on the fly to conform to the requirements of, say, the McGill Guide<\/a>, or perhaps the Australian Guide to Legal Citation<\/a> (PDF) or the ALWD Manual<\/a>. This functionality is used daily by researchers in most fields worldwide, and there is no reason the law should be an exception.<\/p>\n

    The automated generation of citations is just one benefit of this processing flow; it also enables the embedding of cited metadata directly in the source document (for sharing between collaborators), and it allows links to referenced resources to be attached at the point of production (for ease of referencing after publication). Hints of resistance from some quarters<\/a> notwithstanding, such tools clearly promise to save law professors, law students, lawyers, court clerks, judges, and others who must do legal drafting a tremendous amount of time.<\/p>\n

    Formatting citations<\/h1>\n

    There are a few commonly-encountered wrinkles in legal data and citation styles that CSL and the citeproc-js<\/span> formatter have been carefully designed to address. To give readers a glimpse of this work, a few basic elements of the language are laid out below. We’ll begin with the following sample citation in the OSCOLA style: Jones & others v Wright<\/em> [1991] 3 All ER 88.<\/p>\n

    The bare case name can be produced with the following construct:<\/p>\n

    <text variable=\"title\" font-style=\"italic\" strip-periods=\"true\"\/><\/span><\/pre>\n

    (Note the use of font-style=”italic”<\/span> to render the variable content in italic type, and of the strip-periods=”true”<\/span> attribute, which will be discussed below.)<\/p>\n

    The year element can be produced with the following code:<\/p>\n

    <date variable=\"issued\" form=\"text\" date-parts=\"year\" prefix=\"[\" suffix=\"]\"\/><\/span><\/pre>\n

    (Note the use of prefix<\/span> and suffix<\/span>.)<\/p>\n

    To build the full cite, we join these and other elements together by wrapping them in a group<\/span> element and setting a single space as the delimiter. In the example below, we also define this construct as a macro, so that it can easily be reused in multiple contexts in the style:<\/p>\n

    <macro name=\"oscola-case\"><\/span>\r\n    <group delimiter=\" \"><\/span>\r\n        <text variable=\"title\" font-style=\"italic\" strip-periods=\"true\"\/>\r\n        <date variable=\"issued\"  form=\"text\" date-parts=\"year\"\r\n              prefix=\"[\" suffix=\"]\"\/>\r\n        <number variable=\"issue\"\/><\/span>\r\n        <text variable=\"container-title\"\/><\/span>\r\n        <text variable=\"page-first\"\/><\/span>\r\n    <\/group><\/span>\r\n<\/macro><\/span><\/pre>\n

    If we want to use this cite form for English legal cases only, we can wrap it in a condition:<\/p>\n

    <choose><\/span>\r\n    <if type=\"legal_case\" jurisdiction=\"gb\" match=\"all\"><\/span>\r\n        <text macro=\"oscola-case\"\/>\r\n    <\/if><\/span>\r\n<\/choose><\/span><\/pre>\n

    (Note the type<\/span>, jurisdiction<\/span> and match<\/span> attributes, and the use of a text<\/span> node with a macro<\/span> attribute to call our macro.)<\/p>\n

    With the code above, we will obtain something close to our target cite format if we arrange for the calling application to feed the processor JSON input like the following:<\/p>\n

    {\r\n    \"container-title\": \"All England Law Reports\",\r\n    \"date\": {\r\n        \"date-parts\": [[\"1991\"]]\r\n    },\r\n    \"issue\": \"3\",\r\n    \"page\": \"88\",\r\n    \"title\": \"Jones & others v. Wright\"\r\n}<\/pre>\n

    Looking carefully at this input, we can see that there are some small discrepancies in the metadata:<\/p>\n