[ Author note: I’d like to start by saying thanks to all of our pals at Justia, and especially Tim Stanley and Nick Moline, for their help with all that is described in the following, and to the denizens of law-lib and contest winners Scott Vanderlin and David Curle in particular for help with testing images.]
Some weeks back, our friends at Justia.com very generously arranged for us to have access to Google Glass. We weren’t at all sure how Glass fits into the legal-information world. We still aren’t, and that’s what motivates this post. But Glass is very, very cool, and it or something very like it will be transformative.
Because it was very, very cool, we wanted to develop an app for it. Like all garage-bound experimenters, we looked to see what we had laying around that might be made to work with it. It turned out that Wayne Weibel had done something very smart a while back (and that is not meant to imply that there is anything remarkable about Wayne doing smart things; he does that all the time). We have a tool called Citationer that extracts citations from documents, and it has a Tesseract-based OCR component that we use with image-based PDFs. Turns out that it’ll work with any image format, and Wayne had built that capacity out a bit, in the expectation that people would want to use it with document images sent from phones. And, as it turns out, from Glass.
So Wayne and Sara and I took two days out of the office to see if we could whack something together that would let you take a picture with Glass and send it off to a server-based application that would send you back a link or links to anything cited in the image. The result, almost entirely Wayne’s work, was an app called “Signtater”. It works well with documents and some signage, and it raises a lot of questions.
But let’s talk about its limitations, first. I had a rush of brains to the head and realized that we needed things other than documents to test with. There is a lot of signage out there with US Code and CFR citations on it. I’ve collected a few pictures of such things myself, and a quick look at Google Images and Flickr suggested that others might have done likewise. We ran a little contest in which we enlisted law librarians to help us collect some of these images in Pinterest. [ parenthetical note: if I had this to do over again, I wouldn’t use Pinterest, because the apparatus for random participation by multiple individuals is terrible. I’d either ask people to tweet them with a particular hashtag, or collect them in Flickr.] A lot of people helped out, and we collected a lot of very good images of signs with citations, some of which appear in this post.
What we didn’t know was that we’d stumbled into a very hard problem in computer vision (“hard” in the “over one hundred research papers written last year on tiny aspects of the problem” sense of “hard”). Extracting text from natural scenes, as it’s known, is something that many people would like to be able to do. But it presents a lot of challenges, and is not very far advanced. We were able to do some improvement of Signtater’s performance by doing some simple image preprocessing using ImageMagick, and I’d like to do more when time permits. But for right now Signtater’s capabilities are limited to documents and signage that can be made to look like documents in the image — that is, mostly dark lettering on white backgrounds in an area that mostly fills the screen. We were disappointed by the lack of range… but Signtater is still really, really cool.
But then we started thinking
Some of the images were problematic for other reasons. Lots had incomplete context, like the one here that is missing its CFR Title number. It’s an example of what Robin Wendler has called the “on a horse” problem. Robin once worked on metadata for a collection of images of Ulysses S. Grant. The original cataloger had written descriptions every one of which assumed that you knew that the context was a collection of photos of Ulysses S. Grant. So you’d find descriptions like “On a horse”, implying “Ulysses S. Grant on a horse”. Such context-dependent descriptions, and identifiers are the bane of Linked Data people. And before you start making fun of the hapless Park Service employee who spends his whole life in Title 36 and assumes that everyone else does too, stop and think about Congressional bill numbers, which are no different, or the nest of snakes called SuDoc numbers, which changes with changes in government structure.
At a much more fundamental level, what do any of these signs even mean? Until we came along with Signtater (which, by the way, is really, really cool), there was almost zero possibility than anyone would actually dereference any of these public-notice citations to find out what the law actually said. Well, OK, I guess I should be more generous — you don’t need anything as really, really cool as Signtater, you could do the same thing from a phone. But before 5 or 6 years ago there was almost no chance that anyone would ever look to see what the law actually said. I’m intrigued by the idea of a backpacker in a national park lugging around the CFR so she could check on such things, but I’d say there’s little chance that ever happened.
So what’s really being said when somebody puts a citation on a sign in a public park? One of two things, really:
a) We claim to have the authority to do this, and here it is, so sit down and shut up, or
b) We live so much in our own world that we don’t even realize that most people don’t live there with us.
In the first, uncharitable explanation, the citation is just something official-looking that is put there to tell you that officials are official and are telling you something that you are officially required to obey. It says what, but not in any language that anyone other than the author and a few others operating in the same narrow context are likely to understand. As it happens, 36 CFR 261.50 (a) and (b) are, more or less, simple assignments of authority that don’t specify particular behaviors. You could argue, and I would agree, that some things like “501(c)(3)” and “S corporation” are sufficiently part of the culture that they can, in fact, be dereferenced by most of the affected populations, but that ain’t the case most of the time. That raises the question of how much authority we should grant to any reference that can’t be followed and read by those who are supposed to obey it.
In the second, more charitable interpretation, the author is simply someone who lives and breathes Title 36, has done so for much of his working life, and has committed the somewhat more pardonable sin of assuming that everybody else does too. Of course it’s Title 36 — we’re in a park, silly.
You might find such behaviors bad in varying degrees depending on what you think about the motivations of those involved. A cynic would see the goal of a) as intimidation, where another might see b) as having its roots in something akin to the behavior of absent-minded professors. Fact is, technical specialists of many kinds are vulnerable to the interpretation of b)-based behaviors as a)-based misbehaviors. That’s why I’ve always found lawyers and law professors who complain about the poor explanatory skills of computer technicians to be so ridiculous. Talk about holding a mirror up to nature. But I digress.
Wait, where are we?
There’s a still more intriguing question here. The signs are an attempt to associate some part of the law with a particular place. Glass has geolocation capability, and could do all that through an application that understood something I guess you’d call the law of where I’m standing right now. But what is that, really? We’d need to know a lot more in order to find out, even though theoretically a lot of the data is retrievable. Apart from simple (or sometimes complex) questions of jurisdiction, places are regulated in many ways for many purposes. A person standing in a Weis supermarket somewhere near Altoona, PA might want:
The local zoning ordinances, signage regulations, or whatever law relates to any commercial establishment (follow the link and see how many local ordinances of whatever kind are in fact location-dependent).
A Supreme Court case dealing with free speech. (interestingly, three of the first four paragraphs of the majority opinion in this case amount to drawing a map)
That suggests that before a legal-information retrieval application asks “Where are you?” it should ask, “Who are you?”. And for those purposes I might be different people at different times.
But leaving that aside, all the new location-aware devices raise the same question: “What’s the law of where I’m standing, right now?” And while that is not a question that information-retrieval systems can respond to directly, at least not without some further context, it is one that is helpful in thinking about what the design issues for such systems really are.
Karl Llewellyn once said, “Each concrete fact of the case arranges itself, I say, as the representative of a much wider abstract category of facts, and it is not in itself but as a member of the category that you attribute significance to it.” That this is a problem for legal information retrieval has been pointed out by many, most notably Dan Dabney in his piece on the “Universe of Thinkable Thoughts” (curiously, Dabney does use at least one location-dependent example, saying that there are no laws in any jurisdiction forbidding cruelty to fountain pens. This claim is not substantiated by research). The point, though, is that any use of geolocation in information retrieval needs to be accompanied by a lot of contextual information about the asker and about the problem; location is differentially material. Finding that context, and determining whether location is material, is a particular problem for non-lawyers and it is one where we might give them a good bit of help.
But, as I was saying, Signtater is really, really cool.
Merry Christmas, all.