Day: August 23, 2013

Research Colloquium 2013: “Formal Languages & the Chomsky-Schützenberger Hierarchy” by Zac Smith (Cornell University)

Former LINGUIST List student editor Zac Smith, currently a Ph.D. student at Cornell University, visited and presented a talk entitled “Formal Languages & the Chomsky-Schützenberger Hierarchy.” He summarized the four main classes of formal grammar systems, as defined in “Three Models for the description of language” (Chomsky 1956): unrestricted, context-sensitive, context-free, and regular, unrestricted being the most complex and regular being the least complex in regards to rewrite rules. In formal grammar systems, rewrite rules are responsible for generating a set of disparate symbols into a set of cohesive strings in any given language. According to Smith, languages are classified by “which types of rewrite rules are minimally required to produce all of its possible strings.”

The focus of Zac’s talk was how this hierarchy can be applied by linguists studying Natural Language Processing and Computational Linguistics, by using computational models and computing grammars such as Finite State Automata (FSA) and Pushdown Automata (PDA) – automata being algorithmic computers that map grammars into a computational system and recognize language strings. The more complex the rewrite rules of the grammar, the more complex the computational model must be. For example, using a FSA is sufficient to describe a Regular grammar, but is insufficient in describing Context-free languages. A PDA is sufficient in describing a Context-free language, because it allows for mapping center-embedding and recursion (e.g. relative clauses), but it is insufficient in mapping the more complex Context-sensitive and Unrestricted grammars. Formal grammars are not only limited to the field of syntax, but can be also used in many fields of linguistics – for instance, phonological theories, such as optimality theory, rule-ordering, and phonotactic constraints. Other computing grammars, such as lexicalized tree-adjoining grammars (LTAGs) and other supertagging methods, are faster and more accurate at parsing sets of strings. These computational methods are extremely useful in studying natural language phenomena processing, and implementing these grammars help to advancing linguistic studies in general in this increasingly technological age.