Optimality Theory: the Future of the Justice System?

by Anthony Meyer
Linguistics News Correspondent

Small claims court is no joke. Just ask the interior muralist who desperately needs the return of her security deposit to buy supplies for her next project. Or the couple whose prize toy poodle was impregnated by the neighbor’s Cane Corso. Or the wretch who chipped a tooth when he bit into a burrito that had a rock in it. And yet for no one is small claims court less amusing than the judge who presides over it.

In the fall of 2014, Murray T. Nevelson was serving his second term as a small claims judge, or magisterial district judge, as they are known in Pennsylvania. Now, magisterial district judges are elected to terms of six years. Judge Nevelson was then not even halfway through his current term. This was a particularly low point in Judge Nevelson’s legal career. “I was stuck in the doldrums,” he told me in a recent interview. “My sails had gone limp.”

Judge Nevelson presided over one of the three magisterial district courts located in Harlow County, Pennsylvania. In 2014, more than 6,000 new small-claims cases were filed in his district court alone, which is about 22 new cases each business day. “And more unwanted poodle pregnancies than I would care to hear about in ten lifetimes,” Judge Nevelson said, exhibiting a judge’s knack for putting things into perspective.

But everything changed for the judge in single evening not long after Thanksgiving, 2014. He was sitting in his office, resting his head on papers strewn across the surface of his desk when his nephew, Elliot Nevelson, barged in, laptop in hand. He had something astounding to show his uncle. He wanted to demonstrate a computer program that he had been working on, a program he had named “OptimalJustice.” It would prove to be the judge’s salvation.

Elliot Nevelson was then a linguistics major at Dartford College. He started working on OptimalJustice as a project for a seminar on computational linguistics. Elliot’s inspiration for OptimalJustice was a linguistic theory called Optimality Theory (OT).

It is widely accepted in linguistics, particularly in phonology, that every surface form is associated with an underlying form. For example, the underlying /teIp+z/ ‘tapes’ surfaces as [teIps]. According to OT, the “optimal” surface form is selected by a set of constraints/ in a sort of “survival of the fittest” fashion. That is, an optimal form such as [teIps] emerges only after the constraints have eliminated all other potential candidates, candidates like *[teIpz] (the asterisk meaning something like “the following is prohibited.”) The crucial constraint in this case is *Obs[-voi]Sib[+voi], which is to say, “A voiceless obstruent (Obs[-voi]) cannot be immediately followed by a voiced sibilant (Sib[+voi]).” Crucially, this constraint must outrank the constraint FAITHFULNESS, which prohibits any alteration to the underlying form. Thus, the ranking constraints is important.

Now, in order to create OptimalJustice, Elliot had to come with non-linguistic constraints, in particular constraints relevant to small claims court (or magisterial district court). But how did one come up with constraints? Should he just make them up–conjure them from thin air? He decided to try to extract them automatically from text. He gathered digitized records of his uncle’s court decision. He also–well, how shall we put this–procured access to the judge’s personal diary, a massive Microsoft Word document. The diary entries supplemented the court records with information of a more personal nature.

Elliot used natural language processing tools, such as a part-of-speech tagger and syntactic parser to extract the constraints, ending up with nearly 2000. Some examples are the following:

*POOR:PAY: Anyone who is poor must not pay any money to the opposing side. One violation mark for each $150 such a person pays.
*DEFENDANT: Defendants are banned. One violation mark if party in question is the defendant.
*PLAINTIFF: Plaintiffs are banned. One violation mark if the party in question is the plaintiff.
*BURRITO: Burritos are banned. One violation mark per burrito.
*NO-LIPGLOSS: (“Don’t wipe off that lip gloss!”) Not to wear lip gloss is prohibited. One violation mark for a complete absence of lip gloss
*LIPGLOSS: (“Wipe off that lip gloss!”) To wear lip gloss is prohibited. One violation mark for presence of lip gloss.
*TATTOOS: (“Hide your tats!”) Tattoos are banned. One violation mark per tattoo.
*NO-TATTOOS: (“Don’t hide your tats!”) The absence of tattoos in banned. One violation mark for a complete lack of tattoos.
*NO-LIPGLOSS&*NO-TATTOOS: (“Lip gloss and tats go great together!”) The simultaneous absence of lip gloss and tattoos is prohibited. The violation of the conjoined constraint incurs a single violation mark (not two). It what follows, we shall sometimes abbreviate this constraint as *NLG&*NT.

Elliot was kind enough to sit down with me and explain these constraints. However, he was careful to point out there are in fact thousands of constraints. and that it is the interaction of many constraints that yields the subtlest and most interesting effects. Note that the asterisk in the above constraints is a kind of negation. Also, one keeps track of individual violation in order to break ties if necessary.

The constraint *POOR:PAY serves to mitigate against other constraints that might work to make a poor person pay a burdensome amount of money. It outranks *DEFENDANT, for instance. *PLAINTIFF also outranks *DEFENDANT, which, according to Elliot, represents the plaintiff’s burden of proof, although he allows that it could stem from his uncle’s sour attitude toward plaintiffs, whom he sees as instigators and the source of much of his misery. *BURRITO is one of Elliot’s personal favorites. “Burritos are always bad news in small claims court,” he said.

We see the influence of the judge’s diary in the constraints pertaining to lip gloss and tattoos. “My uncle seems to have a thing for lip gloss,” Elliot observed with a grimace when we turned to these constraints. “The most notable constraint in this group is *NO-LIPGLOSS&*NO-TATTOOS [i.e., *NLG&*NT], which is actually a complex constraint, namely, the conjunction of the atomic constraints *NO-LIPGLOSS and *NO-TATTOOS.”

But still more interesting is the ranking of the constraints in this group, which is detailed below in (1-3). Note that the symbol “>>” means “outranks.”


Subranking A in (1) can be paraphrased as “Wear lip gloss,” and subranking B in (2) “Hide your tattoos!” Now, in (3) the conjunction constraint *NLG&*NT outranks B. (3) can thus be paraphrased as “Don’t hide your tattoos if you’re wearing lip gloss!” I asked Elliot what we thought of all this. He sighed and said, “My uncle is a complicated man.”

The above rankings constitute a tiny sample of OptimalJustice’s globally optimally constraint rankings, a ranking of nearly 2000 constraints. The globally optimally ranking is the one that most accurately models the judge’s decision-making process, i.e., the one that most consistently replicates the judge’s past decisions. Elliot used a machine learning algorithm to find the optimal ranking from among the innumerable possible rankings. Once the constraint ranking was computed, OptimalJustice was essentially ready to go. Elliot took it to Judge Nevelson’s office that very evening–that fateful evening not long after Thanksgiving, 2014.

The Judge was blown away. “It was me, but better.” he said. “I was amazed.” Throughout the remainder of the judge’s term, OptimalJustice allowed him to zone out for most of the day. “I no longer had to think about poodles, burritos, or anything else that I didn’t want to think about.” He still had to be appear in the courtroom, but OptimalJustice was with him at all times to do his thinking for him.

Elliot set up microphones to record the sound of the courtroom proceedings. Judge Nevelson himself captured the requisite visual data, using his smartphone to take photographs of both the plaintiff and defendant. Elliot incorporated into OptimalJustice an image processing program capable of recognizing lip gloss at an accuracy of 97 percent accuracy.

With the help of his nephew’s program, Judge Nevelson sailed through the rest of that second term. He is now happily retired. I asked him there were ever any complaints pertaining to his using OptimalJustice. “None to my knowledge,” he said. “I don’t think anyone ever caught on. They may have found my cell-phone photography a little strange at first. But then again, maybe not. Nowadays people are always taking pictures of each other. No, if anything, OptimalJustice was an improvement. It was a more consistent version of me. And there’s something about consistency that just resonates with folks.”

The younger Nevelson aced his computational linguistics seminar. OptimalJustice was a big hit. His professor’s only criticism was that OptimalJustice’s domain was too narrow, as it was basically a Judge Murray T. Nevelson automaton. But according to Elliot, this can easily be remedied by expanding OptimalJustice’s training corpus, i.e., by training it on decisions from more judges. “There is boundless room for improvement, but it will never be perfect,” said Elliot. “One of my uncle’s favorite sayings is, ‘Everything is imperfect, but the law is really imperfect.’” While perfect justice may indeed be unattainable, Elliot Nevelson’s ingenious work may just have put optimal justice within reach.


Happy April Fool’s Day from your LINGUIST List team! Don’t forget to visit us at our Fund Drive homepage to help us reach our goal!

Leave a Reply