[teknoids] Elmer's cases

Tom Bruce tom.bruce at cornell.edu
Tue Feb 26 06:44:23 EST 2008

John, Elmer, et al:

We've been a little busy here and haven't hacked at these much.  But let me suggest a couple of things that we'd do ourselves, if only we had time, thus furthering the fine legal-academic tradition of Upholding the Work Ethic For Others:

1) There would be great utility in putting the extracted metadata into an accessible mySQL database; you'd probably want to register users just to hold down the nonsense.  I'd throw in all the metadata plus the original URL (folks wanting pieces of that could parse it).  An old trick of ours is to use the MD5 hash of the full URL as the primary key, though these are probably short enough not to pose a problem (we ran into limitations with the Circuit Court stuff we were doing and so developed the hash method).  Throwing in John's first-para extraction would be handy too, for zippy delivery of indicative-indexing information.

2) I'm curious about what could be done with off-the-rack summarizers, maybe in conjunction with John's para extraction or maybe run on the full opinion, though that seems laborious.  Come to think of it, I wonder what you could do by summarizing a little more than the first paragraph.  A lot of opinions have a restatement of the facts and posture that precedes a first structural division indicating the beginning of the analysis.  This stuff is probably too diverse to permit universal use of that approach. But you could try it and if finding the start-of-analysis milestone failed, just do the first para.

For my own amusement, I ran the background section of the First Circuit opinion in Lotus v. Borland through the online summarizer at http://search.iiit.net/~jags/summarizer/index.cgi, limited to six sentences, and got

992) ("Borland I"); Lotus Dev.
 1992) ("Borland II"); Lotus Dev.
 1993) ("Borland III"); Lotus Dev.
 Because so many variations were possible, the district court concluded that the Lotus developers' choice and arrangement of command terms, reflected in the Lotus menu command hierarchy, constituted copyrightable expression.
[14] Immediately following the district court's summary judgment decision, Borland removed the Lotus Emulation Interface from its products.
 The district court also rejected Borland's affirmative defenses of waiver, laches, estoppel, and fair use.

...which is actually not bad.  Running the first two sections of the recent Supreme Court opinion in Ali v. BOP gives this:
” Petitioner contends that this clause applies only to law enforcement officers enforcing customs or excise laws, and thus does not affect the waiver of sovereign immunity for his property claim against officers of the Federal Bureau of Prisons (BOP).
 We conclude that the broad phrase “any other law enforcement officer” covers all law enforcement officers.
I    Petitioner Abdus-Shahid M. S. Ali was a federal prisoner at the United States Penitentiary in Atlanta, Georgia, from 2001 to 2003.
 778, 779–780 (2006) (per curiam).
 See 204 Fed.
, at 779–780.

also not entirely terrible.  But I would (and can) seek some expert advice on summarizers, if anyone's interested.


----- "John P. Joergensen" <jjoerg at camden.rutgers.edu> wrote:
> Fist off, Elmer rocks for swish-e'ing the federal cases.
> I have been working on these myself, but I started with parsing the 
> metadata.  So far, I have just done the F2d, but I am impressed. 
> There 
> was a bit of hacking to do with some non-conforming documents, but in
> the end, out of ~468,791 documents (again, just F2d), there was only a
> 1% error rate with lifting out parties, docket, date, citation, and an
> attempt at the 1st paragraph of text.  I haven't checked too closely 
> yet, but I suspect that most of those failures were table cases.  I'll
> post a link later this week with something useable.
> In any case, my real purpose in posting is that I think this new 
> resource should be fully developed, and I think the people who attend
> CALI are the ones to do it best.
> How about a Session at CALI where we can all share what we have been 
> able to do individually, what we can do to coordinate and standardize
> our efforts, and generally make the federal reporter a truly reliable
> and competitive resource?
> Elmer, anyone else interested?
> John
> -- 
> _________________________________________
> John P. Joergensen
> Librarian II
> Rutgers University School of Law - Camden
> jjoerg at camden.rutgers.edu
> ________________________________________
> _______________________________________________
> You are currently subscribed to teknoids as: trb2 at cornell.edu.
> To unsubscribe send a blank email to
> teknoids-leave at ruckus.law.cornell.edu
> --
> See the web interface at
> http://ruckus.law.cornell.edu/mailman/listinfo/teknoids to get your
> list password, unsubscribe, and view your list settings.

Thomas R. Bruce (tom.bruce at cornell.edu)
Director, Legal Information Institute
Cornell Law School

"The best way to protect the young 
from books is, first, to make sure that 
they shall be so dry as to offer no 
temptation; and, second, to store them 
in such a way that no one can find them 
without several years' training"
                           -- FM Cornford

More information about the Teknoids mailing list