{"id":33,"date":"2012-06-11T13:53:54","date_gmt":"2012-06-11T18:53:54","guid":{"rendered":"http:\/\/blog.law.cornell.edu\/metasausage\/?p=33"},"modified":"2012-06-12T07:06:48","modified_gmt":"2012-06-12T12:06:48","slug":"identifiers-part-3","status":"publish","type":"post","link":"https:\/\/blog.law.cornell.edu\/metasausage\/2012\/06\/11\/identifiers-part-3\/","title":{"rendered":"Identifiers, part 3"},"content":{"rendered":"<p><img loading=\"lazy\" decoding=\"async\" style=\"float: left;\" src=\"http:\/\/blog.law.cornell.edu\/metasausage\/files\/2012\/05\/identifier-e1336301865368.jpg\" alt=\"\" width=\"125\" height=\"125\" \/>[<em>This is part 3 of a three-part post on identifiers. Here are parts <a href=\"http:\/\/blog.law.cornell.edu\/metasausage\/2012\/05\/07\/identifiers-part-1\/\">1<\/a> and <a href=\"http:\/\/blog.law.cornell.edu\/metasausage\/2012\/05\/15\/identifiers-part-2\/\">2<\/a><\/em>]<\/p>\n<h2 style=\"font-weight: normal; font-size: medium; font-family: Times; display: inline !important;\" dir=\"ltr\"><strong><span style=\"font-family: Arial; vertical-align: baseline; white-space: pre-wrap; font-size: large;\">How well does current practice measure up?<\/span><\/strong><\/h2>\n<p><span id=\"internal-source-marker_0.8899018629454076\"><span style=\"font-weight: normal; font-size: 15px; font-family: Arial; vertical-align: baseline; white-space: pre-wrap;\">To judge by the examples presented so far, current practice in legislative identifiers for US materials might best be described as \u201ccoping\u201d, and specifically \u201ccoping in a way that was largely designed to deal with the problems of print\u201d. Current practice presents a welter of &#8220;identifiers&#8221;, monikers, names, and titles, all believed by those who create and use them to be sufficiently rigorous to qualify as identifiers whether they are or not. \u00a0It might be useful to divide these into four categories:<\/span><\/span><\/p>\n<ul style=\"font-weight: normal; font-size: medium; font-family: Times; margin-top: 0pt; margin-bottom: 0pt;\">\n<li style=\"list-style-type: disc; font-size: 15px; font-family: Arial; vertical-align: baseline;\"><span style=\"font-style: italic; vertical-align: baseline; white-space: pre-wrap;\">Well-understood<\/span><span style=\"vertical-align: baseline; white-space: pre-wrap;\"> monikers, issued in predetermined ways as part of the legislative process by known actors. \u00a0Their administrative stability may well be the product of statutory requirement or of requirements embedded in House or Senate rules. Many of these will also correspond to definite stages in the legislative process. Examples would include House and Senate bill and resolution numbers.<\/span><\/li>\n<li style=\"list-style-type: disc; font-size: 15px; font-family: Arial; vertical-align: baseline;\"><span style=\"vertical-align: baseline; white-space: pre-wrap;\">Monikers <\/span><span style=\"font-style: italic; vertical-align: baseline; white-space: pre-wrap;\">arising from need<\/span><span style=\"vertical-align: baseline; white-space: pre-wrap;\"> and possibly semi-formalized, or possibly \u201cbent\u201d versions of monikers created for a purpose other than that they end up serving. \u00a0\u00a0Monikers of this kind are widely relied-on, \u00a0but nobody is really responsible for them. \u00a0Some end up being embedded in retrieval systems because they\u2019re all there is. \u00a0A variety of such approaches are on display in the world of House committee prints.<\/span><\/li>\n<li style=\"list-style-type: disc; font-size: 15px; font-family: Arial; vertical-align: baseline;\"><span style=\"vertical-align: baseline; white-space: pre-wrap;\">Monikers <\/span><span style=\"font-style: italic; vertical-align: baseline; white-space: pre-wrap;\">imposed after the fact<\/span><span style=\"vertical-align: baseline; white-space: pre-wrap;\"> in an effort to systematize things or otherwise compensate for any deficiencies of monikers issued at earlier stages of the process. \u00a0Certainly internal database identifiers would fit this description; so would most official citation.<\/span><\/li>\n<li style=\"list-style-type: disc; font-size: 15px; font-family: Arial; vertical-align: baseline;\"><span style=\"vertical-align: baseline; white-space: pre-wrap;\">A grab-bag of other monikers. These might be created within government ( as with GPO\u2019s SuDoc numbers), or outside government altogether (as with accession numbers or other schemes that identify historical papers held in other libraries). \u00a0Here, a good model would provide a set of properties enabling others to relate their schemes to ours.<\/span><\/li>\n<\/ul>\n<h2 style=\"font-weight: normal; font-size: medium; font-family: Times;\" dir=\"ltr\"><span style=\"font-family: Arial; vertical-align: baseline; white-space: pre-wrap;\">Identifiers in a Linked Data context<\/span><\/h2>\n<p><span style=\"font-weight: normal; font-size: 15px; font-family: Arial; vertical-align: baseline; white-space: pre-wrap;\">John Sheridan (of legislation.gov.uk) has <a href=\"http:\/\/liicr.nl\/N4FN8o\">written eloquently about the use of legislative Linked Data<\/a> to support the development of \u201caccountable systems\u201d.\u00a0The key idea is that exposing legislative data using Linked Data techniques has particular informational and economic value when that data <\/span><span style=\"font-weight: normal; font-size: 15px; font-family: Arial; font-style: italic; vertical-align: baseline; white-space: pre-wrap;\">defines real-world objects<\/span><span style=\"font-weight: normal; font-size: 15px; font-family: Arial; vertical-align: baseline; white-space: pre-wrap;\"> for legal purposes. \u00a0If we turn our attention from statutes to regulations, that value becomes even more obvious.<\/span><\/p>\n<h3 style=\"font-weight: normal; font-size: medium; font-family: Times;\" dir=\"ltr\"><strong><span style=\"font-family: Arial; vertical-align: baseline; white-space: pre-wrap; font-size: large;\">Valuable features of Linked Data approaches to legislative information<\/span><\/strong><\/h3>\n<h4 style=\"font-weight: normal; font-size: medium; font-family: Times;\" dir=\"ltr\"><span style=\"font-size: 16px; font-family: Arial; vertical-align: baseline; white-space: pre-wrap;\">Ability to reference real-world objects<\/span><\/h4>\n<h4 style=\"font-weight: normal; font-size: medium; font-family: Times;\" dir=\"ltr\"><span style=\"font-size: 16px; font-family: Arial; font-style: italic; vertical-align: baseline; white-space: pre-wrap;\">\u201c<\/span><span style=\"font-size: 15px; font-family: Arial; font-weight: normal; font-style: italic; vertical-align: baseline; white-space: pre-wrap;\">On the Semantic Web, URIs identify not just Web documents, but also real-world objects like people and cars, and even abstract ideas and non-existing things like a mythical unicorn. We call these real-world objects or things.\u201d &#8212; <a href=\"http:\/\/www.w3.org\/TR\/cooluris\/#semweb\">Tim Berners-Lee<\/a><\/span><\/h4>\n<p><span style=\"font-weight: normal; font-size: 15px; font-family: Arial; vertical-align: baseline; white-space: pre-wrap;\">There are no unicorns in the United States Code. Nevertheless, legislative data describes and references many, many things. \u00a0More, it provides fundamental definitions of how those things are seen by Federal law. \u00a0It is valuable to be able to expose such definitions &#8212; and other fundamental information &#8212; in a way that allows it to be related to other collections of information for consumption by a global audience.<\/span><\/p>\n<h4 style=\"font-weight: normal; font-size: medium; font-family: Times;\" dir=\"ltr\"><span style=\"font-size: 16px; font-family: Arial; vertical-align: baseline; white-space: pre-wrap;\">Avoiding cumbersome standards-building processes<\/span><\/h4>\n<p><span style=\"font-weight: normal; font-size: 15px; font-family: Arial; vertical-align: baseline; white-space: pre-wrap;\">In <a href=\"http:\/\/www.jenitennison.com\/blog\/node\/140\">a particularly insightful blog post that discusses the advantages of the Linked Data methods<\/a> used in building legislation.gov.uk, Jeni Tennison points out the ability that RDF and Linked Data standards have to solve a longstanding problem in government information systems: the social problem of standard-setting and coordination:<\/span><\/p>\n<p style=\"font-weight: normal; font-size: medium; font-family: Times; margin-left: 36pt; margin-top: 0pt; margin-bottom: 0pt;\" dir=\"ltr\"><span style=\"font-size: 15px; font-family: Arial; font-style: italic; vertical-align: baseline; white-space: pre-wrap;\">RDF has this balance between allowing individuals and organisations complete freedom in how they describe their information and the opportunity to share and reuse parts of vocabularies in a mix-and-match way. This is so important in a government context because (with all due respect to civil servants) we really want to avoid a situation where we have to get lots of civil servants from multiple agencies into the same room to come up with the single government-approved way of describing a school. We can all imagine how long that would take.<\/span><\/p>\n<p style=\"font-weight: normal; font-size: medium; font-family: Times; margin-left: 36pt; margin-top: 0pt; margin-bottom: 0pt;\" dir=\"ltr\"><span style=\"font-size: 15px; font-family: Arial; font-style: italic; vertical-align: baseline; white-space: pre-wrap;\">The other thing about RDF that really helps here is that it\u2019s easy to align vocabularies if you want to, post-hoc.<\/span><a href=\"http:\/\/www.w3.org\/TR\/rdf-schema\/\"><span style=\"font-size: 15px; font-family: Arial; color: #000099; font-style: italic; vertical-align: baseline; white-space: pre-wrap;\">RDFS<\/span><\/a><span style=\"font-size: 15px; font-family: Arial; font-style: italic; vertical-align: baseline; white-space: pre-wrap;\"> and<\/span><a href=\"http:\/\/www.w3.org\/TR\/owl-overview\/\"><span style=\"font-size: 15px; font-family: Arial; color: #000099; font-style: italic; vertical-align: baseline; white-space: pre-wrap;\">OWL<\/span><\/a><span style=\"font-size: 15px; font-family: Arial; font-style: italic; vertical-align: baseline; white-space: pre-wrap;\"> define properties that you can use to assert that this property is really the same as that property, or that anything with a value for this property has the same value for that other property. This lowers the risk for organisations who are starting to publish using RDF, because it means that if a new vocabulary comes along they can opportunistically match their existing vocabulary with the new one. It enables organisations to tweak existing vocabularies to suit their purposes, by creating specialised versions of established properties.<\/span><\/p>\n<p><span style=\"font-weight: normal; font-size: 15px; font-family: Arial; vertical-align: baseline; white-space: pre-wrap;\">While Tennison\u2019s remarks here concentrate on vocabularies, a similar point can be made about identifier schemes; it is easy to relate multiple legacy identifiers to a \u201cgold standard\u201d.<\/span><\/p>\n<h4 style=\"font-weight: normal; font-size: medium; font-family: Times;\" dir=\"ltr\"><strong><span style=\"font-size: 16px; font-family: Arial; vertical-align: baseline; white-space: pre-wrap;\">Layering and API-building<\/span><\/strong><\/h4>\n<p><span style=\"font-weight: normal; font-size: 15px; font-family: Arial; vertical-align: baseline; white-space: pre-wrap;\">Well-designed, URI-based identifier schemes create APIs for the underlying data. \u00a0At the moment, the leading example for legislative information is the scheme used by legislation.gov.uk, described in summary at <\/span><a style=\"font-weight: normal; font-size: medium; font-family: Times;\" href=\"http:\/\/data.gov.uk\/blog\/legislationgovuk-api\"><span style=\"font-size: 15px; font-family: Arial; color: #000099; vertical-align: baseline; white-space: pre-wrap;\">http:\/\/data.gov.uk\/blog\/legislationgovuk-api<\/span><\/a><span style=\"font-weight: normal; font-size: 15px; font-family: Arial; vertical-align: baseline; white-space: pre-wrap;\"> \u00a0and in detail in a collection of developer documentation linked from that page. \u00a0Because a URI is resolvable, functioning as a sort of retrieval hook, it is also the basis of a well-organized scheme for accessing different facets of the underlying information. \u00a0<\/span><a style=\"font-weight: normal; font-size: medium; font-family: Times;\" href=\"http:\/\/legislation.gov.uk\/\"><span style=\"font-size: 15px; font-family: Arial; color: #000099; vertical-align: baseline; white-space: pre-wrap;\">legislation.gov.uk<\/span><\/a><span style=\"font-weight: normal; font-size: 15px; font-family: Arial; vertical-align: baseline; white-space: pre-wrap;\"> \u00a0uses a three-layer system to distinguish the abstract identity of a piece of legislation from its current online expression as a document and from a variety of format-specific representations. \u00a0<\/span><\/p>\n<p><span style=\"font-weight: normal; font-size: 15px; font-family: Arial; vertical-align: baseline; white-space: pre-wrap;\">That is an inspiring approach, but we would want to extend it to encompass point-in-time as well as point-in-process identification (such as being able to retrieve all of the codified fragments of a piece of legislation as codified, using its original bill number, popular name, or what-have-you). \u00a0At the moment, <\/span><a style=\"font-weight: normal; font-size: medium; font-family: Times;\" href=\"http:\/\/legislation.gov.uk\/\"><span style=\"font-size: 15px; font-family: Arial; color: #000099; vertical-align: baseline; white-space: pre-wrap;\">legislation.gov.uk<\/span><\/a><span style=\"font-weight: normal; font-size: 15px; font-family: Arial; vertical-align: baseline; white-space: pre-wrap;\"> does this only via search, but the recently announced Dutch statutory collection at <\/span><a style=\"font-weight: normal; font-size: medium; font-family: Times;\" href=\"http:\/\/doc.metalex.eu\/\"><span style=\"font-size: 15px; font-family: Arial; color: #000099; vertical-align: baseline; white-space: pre-wrap;\">http:\/\/doc.metalex.eu\/<\/span><\/a><span style=\"font-weight: normal; font-size: 15px; font-family: Arial; vertical-align: baseline; white-space: pre-wrap;\"> does support some point-in-time features. \u00a0\u00a0It is worth pointing out that the American system presents greater challenges than either of these, \u00a0because of our more chaotic legislative drafting practices, the complexity of the legislative process itself, and our approach to amendment and codification.<\/span><\/p>\n<h3 style=\"font-weight: normal; font-size: medium; font-family: Times;\" dir=\"ltr\"><strong><span style=\"font-family: Arial; vertical-align: baseline; white-space: pre-wrap; font-size: large;\">Identifier challenges arising from Linked Data (and Web exposure generally)<\/span><\/strong><\/h3>\n<p><span style=\"font-weight: normal; font-size: 15px; font-family: Arial; vertical-align: baseline; white-space: pre-wrap;\">The idea that we would publish legislative information using Linked Data approaches has obvious granularity implications (see above), but there are others that may prove more difficult. \u00a0Here we discuss three: \u00a0uniqueness over wider scope, resolvability, and the practical needs of \u201cidentifier manufacturing\u201d:<\/span><\/p>\n<h4 style=\"font-size: medium; font-family: Times; display: inline !important;\" dir=\"ltr\"><span style=\"font-size: 16px; font-family: Arial; vertical-align: baseline; white-space: pre-wrap;\">Uniqueness over wider scope<\/span><\/h4>\n<p><span style=\"font-weight: normal; font-size: 15px; font-family: Arial; vertical-align: baseline; white-space: pre-wrap;\">Many of the identifiers developed in the closed silo of the world of legal citation could be reused as URIs in a linked data context, exposing them to use and reuse in environments outside the world where legal citation has developed. \u00a0In the open world, identifiers need to carry their context with them, rather than have that context assumed or dependent on bespoke processes for resolution or access. \u00a0\u00a0For the most part, citation of judicial opinions survives wide exposure in fair style. \u00a0Other identifiers used for government documents do not cope as well. \u00a0\u00a0Above, we mentioned bill numbers as being limited in chronological scope; other identifiers (particularly those that rely heavily on document titles or dates as the sole means of distinction from other documents in the same corpus) may not fare well either.<\/span><\/p>\n<h4 style=\"font-weight: normal; font-size: medium; font-family: Times;\" dir=\"ltr\"><span style=\"font-size: 16px; font-family: Arial; vertical-align: baseline; white-space: pre-wrap;\">Resolvability<\/span><\/h4>\n<h4 style=\"font-weight: normal; font-size: medium; font-family: Times;\" dir=\"ltr\"><strong id=\"internal-source-marker_0.8899018629454076\" style=\"font-weight: normal;\"><span style=\"font-size: 15px; font-family: Arial; vertical-align: baseline; white-space: pre-wrap;\">The differences between URNs (Uniform Resource Names) and URLs (Uniform Resource Locations, the URIs based on the HTTP protocol) are significant. \u00a0Wikipedia notes that the URNs are similar to personal names, the URLs to street addresses&#8211;the first rely on resolution services to function. \u00a0In many cases, URNs can provide the basis for URLs, with resolution built into the http address, but in the world we\u2019re now working in, URNs must be seen as insufficient for creating linked open data.<\/span><\/strong><\/h4>\n<p><span style=\"font-weight: normal; font-size: 15px; font-family: Arial; vertical-align: baseline; white-space: pre-wrap;\">In reality, they have different goals. \u00a0URIs provide resolvability &#8212; that is, the ability to actually find your way to an information resource, \u00a0or to information about a real-world thing that is not on the web. \u00a0As Jeni Tennison remarks in her blog#, they do that at the expense of creating a certain amount of ambiguity. \u00a0Well-designed URN schemes, on the other hand, can be unambiguous in what they name, particularly if they are designed to be part of a global document identification scheme from the beginning, as they are in <a href=\"http:\/\/tools.ietf.org\/html\/draft-spinosa-urn-lex-06\">the emerging URN:Lex specification<\/a> . \u00a0\u00a0<\/span><\/p>\n<p><span style=\"font-weight: normal; font-size: 15px; font-family: Arial; vertical-align: baseline; white-space: pre-wrap;\">For our purposes, we probably want to think primarily in terms of URIs, but (as with legacy identifier schemes) there will be advantages to creating sensible linkages between our system, which emphasizes reliability, and others that emphasize a lack of ambiguity and coordination with other datasets. \u00a0<\/span><\/p>\n<h4 style=\"font-size: medium; font-family: Times; display: inline !important;\" dir=\"ltr\"><span style=\"font-size: 16px; font-family: Arial; vertical-align: baseline; white-space: pre-wrap;\">Things not on the Web<\/span><\/h4>\n<p><strong id=\"internal-source-marker_0.8899018629454076\" style=\"font-weight: normal;\">\u00a0<\/strong><\/p>\n<p><span style=\"font-weight: normal; font-size: 15px; font-family: Arial; vertical-align: baseline; white-space: pre-wrap;\">Legislation is created by real people and it acts on real things. \u00a0It is incredibly valuable to be able to relate legislative documents to those things. \u00a0The challenge lies, as it always has, \u00a0in eliminating ambiguity about which object we are talking about. \u00a0A newer and more subtle need is the need to distinguish references to the real-world object itself from references to representations of the object on the web. \u00a0The problems of distinguishing one John Smith from another are already well understood in the library community. \u00a0URIs present a new set of challenges. \u00a0For instance, we might want to think about how we are to correctly interpret a URI that might refer to John Smith, the off-web object that is the person himself, and a URI that refers to the Wikipedia entry that is (possibly one of many) on-web representations of John Smith. \u00a0This presents <a href=\"http:\/\/www.jenitennison.com\/blog\/node\/159\">a variety of technical challenges that are still being resolved<\/a>.\u00a0<\/span><\/p>\n<h4 style=\"font-weight: normal; font-size: medium; font-family: Times;\" dir=\"ltr\"><span style=\"font-size: 16px; font-family: Arial; vertical-align: baseline; white-space: pre-wrap;\">Practical manufacturing and assignment of Web-oriented identifiers<\/span><\/h4>\n<p><span style=\"font-weight: normal; font-size: 15px; font-family: Arial; vertical-align: baseline; white-space: pre-wrap;\">Thinking about the highly-granular approach needed to make legislative data usefully recombinant &#8212; as suggested in the section on fragmentation and recombination above &#8212; quickly leads to practical questions about where all those granular identifiers will come from. The problem becomes more acute when we being to think about retrofitting such schemes to large bodies of legacy information. \u00a0For these among other reasons, the ability to manufacture and assign high-quality identifiers by automated means has become the Philosopher\u2019s Stone of digital legal publishers. \u00a0It is not that easy to do. \u00a0<\/span><\/p>\n<p><span style=\"font-weight: normal; font-size: 15px; font-family: Arial; vertical-align: baseline; white-space: pre-wrap;\">The reasons are many, and some arise from design goals that may not be shared by everyone, or from misperceptions about the data. \u00a0For example, it\u2019s reasonable to assume that a sequence of accession numbers represents a chronological sequence of some kind, but as we\u2019ve already seen, that\u2019s not always the case. \u00a0Legacy practices complicate this. \u00a0For example, it would be interesting to see how the sequence of Supreme Court cases for which we have an exact chronological record (via file datestamping associated with electronic transmission) corresponds to their sequence as officially published in printed volumes. \u00a0It may well be that sequence in print has been dictated as much by page-layout considerations as by chronology. \u00a0It might well be that two organizations assigning sequential identifiers to the same corpus retrospectively would come up with a different sequence.<\/span><\/p>\n<p><span style=\"font-weight: normal; font-size: 15px; font-family: Arial; vertical-align: baseline; white-space: pre-wrap;\">Those are the problems we encounter in an identifier scheme that is, theoretically, content-independent. \u00a0Content-dependent schemes can be even more challenging. \u00a0Automatic creation of identifiers typically rests on the automated extraction of one or more document features that can be concatenated to make a unique identifier of wide scope. \u00a0There are some document collections where that may be difficult or impossible, either because there is no combination of extractable document features that will result in a unique identifier, or because legacy practices have somehow obliterated necessary information, or because it is not easy to extract the relevant features by automated means. \u00a0We imagine that retroconversion of House Committee prints would present exactly this challenge. \u00a0<\/span><\/p>\n<p><span style=\"font-weight: normal; font-size: 15px; font-family: Arial; vertical-align: baseline; white-space: pre-wrap;\">At the same time, it is worth remembering that the technologies available for extracting document features are improving dramatically, suggesting that a layered, incremental approach might be rewarded in the future. \u00a0While the idea of \u201cgraceful degradation\u201d seems at first blush to be less applicable to identifiers than to other forms of metadata, it is possible to think about the problem a little differently in the context of corpus retroconversion. \u00a0That is a complicated discussion, but it seems possible that the use of provisional, accession-based identifiers within a system of properties and relationships designed to accomodate incomplete knowledge about the document might yield good results.<\/span><\/p>\n<h2 style=\"font-weight: normal; font-size: medium; font-family: Times;\" dir=\"ltr\"><span style=\"font-family: Arial; vertical-align: baseline; white-space: pre-wrap;\">A final note on economics<\/span><\/h2>\n<p><span style=\"font-weight: normal; font-size: 15px; font-family: Arial; vertical-align: baseline; white-space: pre-wrap;\">Identifiers have special value in an information domain where authority is as important as it is for legal information. \u00a0In the event of disputes, parties need to be able to definitively identify a dispositive, authoritative version of a statute, regulation, or other legal document. \u00a0There is, then, a temptation toward a soft monopoly in identifiers: the idea that there should be a definitive, authoritative copy somewhere leads to the idea of a definitive, authoritative identifier administered by a single organization. Very often, challenges of scale and scope have dictated that that be a commercial publisher. \u00a0Such a scheme was followed for many years in the citation of judicial opinions, resulting in an effective monopoly for one publisher. \u00a0That is proving remarkably difficult and expensive to undo, even though it has had serious cost implications and other detrimental effects on the legal profession and for the public. \u00a0Care is needed to ensure that the soft, natural monopoly that arises from the creation of authoritative documents by authoritative sources does not harden into real impediments to the free flow of public information, as it did in the case of judicial opinions.<\/span><\/p>\n<h2 style=\"font-weight: normal; font-size: medium; font-family: Times;\" dir=\"ltr\"><span style=\"font-family: Arial; vertical-align: baseline; white-space: pre-wrap;\">What we recommend<\/span><\/h2>\n<p><span style=\"font-weight: normal;\">This is not a complete set of general recommendations &#8212; really more a series of guideposts or suggestions, to be altered and tempered by institutional realities:<\/span><\/p>\n<ul style=\"font-weight: normal; font-size: medium; font-family: Times; margin-top: 0pt; margin-bottom: 0pt;\">\n<li style=\"list-style-type: disc; font-size: 15px; font-family: Arial; vertical-align: baseline;\"><span style=\"vertical-align: baseline; white-space: pre-wrap;\">At the most fundamental level, <\/span><span style=\"font-style: italic; vertical-align: baseline; white-space: pre-wrap;\">everything should have an identifier<\/span><span style=\"vertical-align: baseline; white-space: pre-wrap;\">. It should be available for use by the public. For example, Congressional committee reports appear not to have any identifiers, but it would be reasonable to assume that some system is in use in the background, at least for their publication by GPO.<\/span><\/li>\n<li style=\"list-style-type: disc; font-size: 15px; font-family: Arial; vertical-align: baseline;\"><span style=\"vertical-align: baseline; white-space: pre-wrap;\">Many legacy identifier systems will need to be <\/span><span style=\"font-style: italic; vertical-align: baseline; white-space: pre-wrap;\">extended \u00a0or modified to create a gold standard system<\/span><span style=\"vertical-align: baseline; white-space: pre-wrap;\">, probably issued by a third party and not by the document creators themselves. \u00a0That is especially the case because there is nobody in a position to compel good practice by document creators over the long term. \u00a0Such a gold-standard will need to be:<\/span>\n<ul style=\"margin-top: 0pt; margin-bottom: 0pt;\">\n<li style=\"list-style-type: circle; font-size: 15px; font-family: Arial; vertical-align: baseline;\"><span style=\"font-style: italic; vertical-align: baseline; white-space: pre-wrap;\">Unambiguous<\/span><span style=\"vertical-align: baseline; white-space: pre-wrap;\">. For example, existing bill and resolution numbers would need to be extended by, eg., a date of introduction.<\/span><\/li>\n<li style=\"list-style-type: circle; font-size: 15px; font-family: Arial; vertical-align: baseline;\"><span style=\"font-style: italic; vertical-align: baseline; white-space: pre-wrap;\">Designed to resist tampering<\/span><span style=\"vertical-align: baseline; white-space: pre-wrap;\">. When things are numbered and labelled, there is a temptation to alter numbers and labels to serve short-term interests. \u00a0The reservation of \u201cimportant\u201d bill numbers under House procedural rules is an example; another (from the executive branch) is the long-standing practice of manipulating RIN numbers to color assessments of agency activity.<\/span><\/li>\n<li style=\"list-style-type: circle; font-size: 15px; font-family: Arial; vertical-align: baseline;\"><span style=\"font-style: italic; vertical-align: baseline; white-space: pre-wrap;\">Clear as to the separation <\/span><span style=\"vertical-align: baseline; white-space: pre-wrap;\">of titling, dating, and identification functions. \u00a0Presidential documents provide a good example of something currently needing improvement in this respect.<\/span><\/li>\n<li style=\"list-style-type: circle; font-size: 15px; font-family: Arial; vertical-align: baseline;\"><span style=\"vertical-align: baseline; white-space: pre-wrap;\">Taking advantage of <\/span><span style=\"font-style: italic; vertical-align: baseline; white-space: pre-wrap;\">carefully designed relationships<\/span><span style=\"vertical-align: baseline; white-space: pre-wrap;\"> among identifiers to allow the retention of well-understood legacy monikers for foreground use, while making use of a well-structured \u201cgold standard\u201d from the beginning. \u00a0Those relationships should enable automated linkage that will allow retrieval across multiple, related identifier systems.<\/span><\/li>\n<\/ul>\n<\/li>\n<li style=\"list-style-type: disc; font-size: 15px; font-family: Arial; vertical-align: baseline;\"><span style=\"vertical-align: baseline; white-space: pre-wrap;\">Where possible, <\/span><span style=\"font-style: italic; vertical-align: baseline; white-space: pre-wrap;\">retain useful semantics<\/span><span style=\"vertical-align: baseline; white-space: pre-wrap;\"> in identifiers as a way of increasing access and reducing errors. \u00a0It is possible that different audiences will require different semantics, making this unlikely to happen in the background, but it should be possible to retain this functionality in the foreground.<\/span><\/li>\n<li style=\"list-style-type: disc; font-size: 15px; font-family: Arial; vertical-align: baseline;\"><span style=\"font-style: italic; vertical-align: baseline; white-space: pre-wrap;\">Maintain granularity at the level of common citation and crossreferencing practice<\/span><span style=\"vertical-align: baseline; white-space: pre-wrap;\">, but with a distinction between identifiers and labels. \u00a0<\/span><span style=\"font-style: italic; vertical-align: baseline; white-space: pre-wrap;\">Identifiers<\/span><span style=\"vertical-align: baseline; white-space: pre-wrap;\"> should be assigned at the whole-document level, with the notion of \u201cwhole document\u201d determined on a corpus-by-corpus basis. \u00a0<\/span><span style=\"font-style: italic; vertical-align: baseline; white-space: pre-wrap;\">Labels<\/span><span style=\"vertical-align: baseline; white-space: pre-wrap;\"> may be assigned to subdocuments (eg., a section of a bill) for purposes of navigation and retrieval. \u00a0This is similar in function and purpose to the distinction between HREF and NAME attributes in HTML anchor tags.<\/span><\/li>\n<li style=\"list-style-type: disc; font-size: 15px; font-family: Arial; vertical-align: baseline;\"><span style=\"font-style: italic; vertical-align: baseline; white-space: pre-wrap;\">Use a layered approach<\/span><span style=\"vertical-align: baseline; white-space: pre-wrap;\">. \u00a0In our view, it is important not to hold future systems hostage to what is practicable in legacy document collections. \u00a0In general, it will be much harder to implement good practices over documents that were not \u201cborn digital\u201d. \u00a0That is not a good reason to water down our prospective approach, but it is a good reason to design systems that degrade gracefully when it becomes difficult or impossible to deal with older collections. That is particularly true at a time when the technologies for extracting metadata from legacy documents are improving dramatically, suggesting that a layered, incremental approach might produce great gains in the future.<\/span><\/li>\n<\/ul>\n<p><em>We conclude, as always, with a <a href=\"http:\/\/www.youtube.com\/watch?v=R0mylMh__Sc\">musical selection<\/a> or <a href=\"http:\/\/www.youtube.com\/watch?v=eYOC4d9YN34\">two<\/a>. \u00a0Next time, some stuff about people and organizations as we find them in the legislative world.<\/em><\/p>\n<!-- AddThis Advanced Settings generic via filter on the_content --><!-- AddThis Share Buttons generic via filter on the_content -->","protected":false},"excerpt":{"rendered":"<p>[This is part 3 of a three-part post on identifiers. Here are parts 1 and 2] How well does current practice measure up? To judge by the examples presented so far, current practice in legislative identifiers for US materials might best be described as \u201ccoping\u201d, and specifically \u201ccoping in a way that was largely designed <a href='https:\/\/blog.law.cornell.edu\/metasausage\/2012\/06\/11\/identifiers-part-3\/'>[&#8230;]<\/a><!-- AddThis Advanced Settings generic via filter on get_the_excerpt --><!-- AddThis Share Buttons generic via filter on get_the_excerpt --><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-33","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/blog.law.cornell.edu\/metasausage\/wp-json\/wp\/v2\/posts\/33","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.law.cornell.edu\/metasausage\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.law.cornell.edu\/metasausage\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.law.cornell.edu\/metasausage\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.law.cornell.edu\/metasausage\/wp-json\/wp\/v2\/comments?post=33"}],"version-history":[{"count":8,"href":"https:\/\/blog.law.cornell.edu\/metasausage\/wp-json\/wp\/v2\/posts\/33\/revisions"}],"predecessor-version":[{"id":43,"href":"https:\/\/blog.law.cornell.edu\/metasausage\/wp-json\/wp\/v2\/posts\/33\/revisions\/43"}],"wp:attachment":[{"href":"https:\/\/blog.law.cornell.edu\/metasausage\/wp-json\/wp\/v2\/media?parent=33"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.law.cornell.edu\/metasausage\/wp-json\/wp\/v2\/categories?post=33"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.law.cornell.edu\/metasausage\/wp-json\/wp\/v2\/tags?post=33"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}