These days, there’s no need to settle on a single answer to the question of what standard to reference in describing people and organizations. The current environment — where the speed of change can be daunting — demands strategies that start with descriptive properties that meet local needs as expressed in use cases. Taking the further step of mapping these known needs to a variety of existing standards best provides both local flexibility and interoperability with the wider world.
In the world of Web standards, most thinking about how to describe people and organizations begins with the FOAF vocabulary (http://xmlns.com/foaf/spec/), developed in 2000 as ‘Friend of a Friend’ and now used extensively on the web to describe people, groups, and organizations. FOAF is an RDF-based specification, and as such is poised to gain further in importance as the ideas behind Linked Data gain traction and more widespread implementation. FOAF is quite simple on its face, but as an RDF vocabulary it is easily extended to meet needs for richer and more complex information. FOAF is now in a stable state, and its developers have recently entered into an agreement with the Dublin Core Metadata Initiative (DCMI), to provide improved persistence and sustainability for the website and the standard.
More recent standards efforts are emerging and deserve attention as well. Several that address the building of descriptions for people and organizations are in working draft at the World Wide Web Consortium (W3C). Although still in draft status, they offer several alternative methods for description that look very useful. Because organizations in these standards are declared as subclasses of foaf:agent, the close association with the FOAF standard is built in.
What may be most useful about FOAF — and more recent standards that seek to extend it — is both its simple and unambiguous method of providing identification of people and groups, as well as its recommendations for minting URIs for significant information about the person or group identified.
But despite its wide adoption, there are some limitations to basic FOAF that weigh on any assessment of its capacity to describe the diversity of people and organizations involved in the legislative process. Critically, FOAF lacks a method for assigning a temporal dimension to roles, membership, or affiliations. That temporal requirement is essential to any model used for describing relationships between legislation, legislators, and legislative groups or organizations, both retrospectively and prospectively. The emerging W3C standard for modeling governmental organizational structures (which includes the modelling descriptions of people and organizations mentioned above), contemplates extensions to FOAF designed to address this limitation. Another emerging standard, the Society of American Archivists’ EAC-CPF, also includes provisions for temporal metadata, and seems to take a very broad view of what it models, making it a standard worth watching.
Thinking about affiliations gives a good feel for the process of working with standards; it takes a certain amount of careful thought and some trimming to fit. As an illustration, think about a member of Congress and her history as a congressional committee member. It’s not unusual for a member to serve on a committee for a while, become its chairperson, become ranking minority member after a change in the majority party, become chairperson once again, and finally settle down as a regular member of the committee. One might imagine this as a series of memberships in the committee, each with a different flavor, or as a single membership with changing roles. The figure at the right illustrates that history. The illustration at the top represents the “serial-membership” approach that is used in the W3C standard. In it, a membership also represents a specific role within the committee and has a duration; the total timespan for an individual’s committee service can only be found by calculation or inference. The bottom illustration, which represents roles and membership independently, is a little clunky in that it assigns durations to both roles and overall membership independently. Nevertheless, we prefer it. It does not require predecessor/successor relationships to link the individual role-memberships into a continuous membership span, nor does it require the slightly contrived idea of a “plain-vanilla” membership. On the other hand, it is a bit clunky in that it requires the assignment of durations in a way that might be considered duplicative.
We think that modelers are often tempted to choose standards prematurely, taking a kind of Chinese-menu approach to modeling can be overly influenced by the appeal of one-stop shopping. Our preference has been to model as closely to the data as we can. Once we have a model that is faithful to the data, we can start to think about which of its components should be taken from existing models — no sooner. In that way we avoid representation problems and also some hidden “gotchas” such as nonsensical or inappropriate default values, relationships that almost work, and so on. The same can be said of structure and hierarchy among objects — best to start modeling in a way that is very flat and very close to the data, and only once that is completed gather things into sub- and super-classes, sub properties, and so on.
Standards encountered in libraries
One question that always arises in discussing standards like FOAF in a library context is the prevalence of the MARC model in most discussions of description of people and organizations. Traditionally, libraries have used MARC name authority records as the basis for uniquely identifying people and organizations, providing text strings for both identification and display. Similar functionality has been attempted with the recent additions to the Library of Congress’s Metadata Authority Description Schema (MADS). MADS was originally developed as an XML expression of the traditional MARC authority data. Now, with the arrival of a public draft standard, focus is shifting toward an RDF expression to provide a path for migration of MADS data into the Semantic Web environment.
MADS, like its parent USMARC Authority Format, focuses on preferred names for people and organizations, including variants, rather than on describing the person or organization more fully. As such it provides a useful ‘hook’ into library data referencing the person or organization, but is not really designed to accommodate the broader uses required for this project.
There is also a question about where this new RDF pathway for MADS might go, given the traditional boundaries of the MARC name authority world. In that tradition, names are added to the distributed file based on ‘literary warrant’, requiring that there be an object of descriptive interest which is by or about the person or organization that is a candidate for inclusion. That is not a particularly useful basis for describing legislators, hearing witnesses, or others who have not written books or been the subject of them. Control of names and name variants will surely be necessary in the new web environment, and the extensive data and experience with the inherent problems of change in names will be essential, but not sufficient, for more widely-scoped projects like this one.
Groups vs. Organizations
Legislatures create myriad documents that must be identified and related to one another. For each of those documents, there are people and organizations fulfilling a variety of roles in the events the documents narrate, the creation of the documents themselves, the endorsement of their conclusions, or the implementation of whatever those documents describe. Those people and organizations include not only legislators and the various committees and other sub-organizations of the legislature, but also the executive branch which, primarily through the President, exercises the final steps in the legislative process, as well as bearing responsibility for implementation. Finally, there are other parties, often outside government, who are involved in the legislative process as hearing witnesses or authors of committee prints, whose identity and organizational affiliations are essential to full description and interpretation. These latter present a particularly strong case for linked-data approaches, as they are unlikely to have any sort of formal description constructed for them by the legislature. The Congressional Biographical Dictionary is an excellent resource — but it is a dictionary of Congresspeople, not of all those who appear in Congressional documents. The latter would be impossible for any single entity to construct and maintain. But the task can be divided and conquered in concert with resources like the New York Times linked-data publishing effort, DBPedia, Freebase, and so on.
When discussing organizations, it is sometimes useful to distinguish between more and less formal groupings. In the FOAF specification, that is conceptualized in the categories “group” and “organisation” Generally, FOAF imagines that an “organisation” is a more formalized entity with fairly well defined memberships and descriptions, whereas a “group” is a more informal concept, intended to capture collections of agents where a strict specification of membership may be problematic, or impossible. In practice, the distinction tends to be a very blurry one, and seems to be a sort of summary calculation done on a number of dimensions:
- the temporal stability of the group itself, for example “people eating dinner at Tom’s house”, as opposed to “the House Judiciary Committee”;
- the temporal stability of the group’s membership, which may be relatively fixed or constantly churning ( “the Supreme Court” versus “the people waiting in the anteroom” )
- the existence of institutional trappings such as office locations, meeting rooms and websites;
- the level of “institutionalization” or “officialness”. In the case of government institutions in any branch, that may often rest on some legal authority that establishes the group and describes its scope of operations (as with the Federal courts). It may also take the form of a single, very narrow capability (as when an agency is said to have “gift authority”#). Finally, it may also be established through tradition. For example, the Congressional Black Caucus has existed for over 40 years, and occupies offices in the Longworth House Office Building, but has no formal existence in law.
Because that distinction is so blurry, we have chosen to treat all organizations similarly, using common properties that allow users to determine how official the organization is by ‘following their noses’. The accumulation of statement-level data about any of the dimensions listed above (or others, for that matter) serves as evidence. Thus, users of the model are free to draw their own conclusions about the “officialness” of any collection of people, although a statutory or constitutional mandate might well be interpreted as dispositive.