Month: August 2013

Research Colloquium 2013: Discussion of Non-Human Primates and Human Language

The LINGUIST List has been privileged to host several up-and-coming linguists as interns and their diverse backgrounds have allowed for a range of entertaining and insightful presentations. On July 18, 2013, Emily Remirez, student of Rice University, demonstrated that language and linguistics may extend beyond the boundaries of the human experience.

Emily’s presentation, “Non-Human Primates and Human Language” set the stage for an intriguing journey through varying research and hypotheses from the field of cognitive linguistics. For the purposes of Emily’s presentation, she explores three following tenets of cognitive linguistics: the human mind does not have an autonomous linguistic faculty, grammar is conceptual, and that language arises from use and social interaction. Examples of these fundamental principles have been explored through the scientific study of language acquisition in non-human primates.

Specific examples of language in the context of non-human primates is limited due to physical constraints of primate anatomy. Primate vocal tracts are not conducive to the production of consonants, and their vocalizations are limited to a select number of sounds. Furthermore, non-human primates are quadripedal, meaning their capacity for consistent sign language communication is not practical. A breakthrough in determining the cognitive capacity for language in non-human primates has developed from the use of testing via lexigrams. Lexigrams are pictographs that represent abstract concepts or refer to external environmental constituents. Traditionally, lexigrams are represented on a keyboard limited to 256 characters.

The use of lexigrams to determine non-human primate cognition began with a bonobo named Matata. Matata learned several lexigram signs and was successful in communicating with researchers through this system. Matata’s adopted son Kanzi also learned the lexigram system. Kanzi’s case is unique in that he actively decided to learn lexigrams independently of Matata or the researcher’s requests. Kanzi now understands and uses over 3,000 lexigram items.

Sue Savage-Rumbaugh with Kanzi in 2003.

The ability to understand human language is not limited to non-human primates. A border collie named Betsy is purported to understand 300 English words. Alex the African Gray Parrot understands 150 English words, and Ake the Bottlenose Dolphin understands at least 50 hand gestures. From the example of Kanzi and other animals that are able to understand human language, there is reason to infer that several linguistic phenomena may be at work in the minds of these creatures. Agent-patient relations and pronoun antecedents have been strongly observed in Kanzi’s use of lexigrams.

In conclusion, the presence of language is assumed to exist on a gradient scale in the animal world. Kanzi presents a strong case that language, broadly defined, exists in non-human primates and other closely related species. Emily hypothesizes that non-human primates’ use of language as interpreted through the canon of cognitive linguistics is perhaps best explained by borrowing the fuzzy boundaries and gradient membership of prototype theory. The results of animal language study are inconclusive and tenuous at best, but with time, resources, and scientific inquiry, much has been gathered regarding the classification of non-human communication systems.

Updated on: 9/3/2013

Research Colloquium 2013: “Formal Languages & the Chomsky-Schützenberger Hierarchy” by Zac Smith (Cornell University)

Former LINGUIST List student editor Zac Smith, currently a Ph.D. student at Cornell University, visited and presented a talk entitled “Formal Languages & the Chomsky-Schützenberger Hierarchy.” He summarized the four main classes of formal grammar systems, as defined in “Three Models for the description of language” (Chomsky 1956): unrestricted, context-sensitive, context-free, and regular, unrestricted being the most complex and regular being the least complex in regards to rewrite rules. In formal grammar systems, rewrite rules are responsible for generating a set of disparate symbols into a set of cohesive strings in any given language. According to Smith, languages are classified by “which types of rewrite rules are minimally required to produce all of its possible strings.”

The focus of Zac’s talk was how this hierarchy can be applied by linguists studying Natural Language Processing and Computational Linguistics, by using computational models and computing grammars such as Finite State Automata (FSA) and Pushdown Automata (PDA) – automata being algorithmic computers that map grammars into a computational system and recognize language strings. The more complex the rewrite rules of the grammar, the more complex the computational model must be. For example, using a FSA is sufficient to describe a Regular grammar, but is insufficient in describing Context-free languages. A PDA is sufficient in describing a Context-free language, because it allows for mapping center-embedding and recursion (e.g. relative clauses), but it is insufficient in mapping the more complex Context-sensitive and Unrestricted grammars. Formal grammars are not only limited to the field of syntax, but can be also used in many fields of linguistics – for instance, phonological theories, such as optimality theory, rule-ordering, and phonotactic constraints. Other computing grammars, such as lexicalized tree-adjoining grammars (LTAGs) and other supertagging methods, are faster and more accurate at parsing sets of strings. These computational methods are extremely useful in studying natural language phenomena processing, and implementing these grammars help to advancing linguistic studies in general in this increasingly technological age.

Research Colloquium 2013: Discussion of Bird & Simons 2003 and Maxwell 2012

To kick-start our LINGUIST List Research Colloquium, we read and discussed two articles concerning best practices in language documentation over two weeks: “The Seven Dimensions of Portability for Language Documentation and Description” by Steven Bird & Gary Simons 2003 and “Electronic Grammars and Reproducible Research” by Mike Maxwell 2012. Since language documentation is our business at the LINGUIST List, it is good to keep up to date on new methods in field work and research, and these two articles generated some good discussion on advantages and trials of language documentation.

In Week One, Bryn Hauk from Eastern Michigan University led the discussion on Bird & Simons 2003, which discusses the best methodology in archiving and documenting linguistic data in a way that is more accessible and lasting. Bird & Simons detailed the problems of language archiving and documentation, and how they think these problems should be addressed for the betterment of linguistic research, especially for the documentation of endangered language. They believe that all linguistic data should be “portable”, that is, to have the ability to be “ported” and accessed to multiple systems and technologies. The “seven dimensions of portability” mentioned in the title were:

1. Content (the quality of the data recorded)

2. Format (using XML and Unicode to streamline and standardize documentation)

3. Discovery (making resources easier to find by researchers)

4. Access (making it easier to obtain and access resources)

5. Citation (providing citations to online sources and reducing broken links)

6. Preservation (digitizing records and having back-ups for resources)

7. Rights (protecting intellectual property rights, but also limiting restrictions to research)

Essentially, as linguists, we should be aiming for clarity in our research and documentation, so that it can be accessible to future generations of linguists and language enthusiasts.

In Week 2, Brent Woo of Eastern Michigan University led the discussion of the second paper, “Electronic Grammars and Reproducible Research” by Maxwell 2012. We discussed Maxwell’s argument that linguists should use computational tools, such as XML tagging, to reduce ambiguity in annotation and rule-writing, in a way that is understandable to humans as well as computers in order to make linguistic research reproducible, but not tied or limited to “any particular linguistic theory” or any “particular computational tool”, or in other words, not bound to technology that may be obsolete or difficult to use in five years. This is very important to keep in mind as technology keeps advancing at an increasing rate, especially since new technologies can often become obsolete within months. If our annotations or documentation is recorded on obsolete formats, we may not be able to access them in the future, and there is the potential for this data to be lost.

Presenting the LINGUIST List’s Summer 2013 Research Colloquium Series

This summer, the LINGUIST List has been holding a research colloquium once a week. We have been reading papers that pertain mostly to computational linguistics and language documentation. In addition to reading papers, the LINGUIST List’s summer interns and graduate assistants even presented some of their own works-in-progress and original research in morphosyntax, cognitive linguistics, and corpus linguistics. Since our staff has a wide range of linguistic interests, the topics discussed during the research colloquium sessions  have given our interns and graduate assistants an opportunity to read and discuss current research in the field outside of the classroom, as well as gain exposure and experience in presenting research to our fellow scholars and receiving immediate feedback on our research.

Over the next several blog posts, we’ll be sharing summaries of the following talks and presentations:

Weeks One and Two
Discussion of Bird & Simons 2003 and Maxwell 2012

Week Three
“Formal Languages & the Chomsky-Schützenberger Hierarchy” by Zac Smith from Cornell University

Week Four
“Index-Concord Mismatch in Russian” by Bryn Hauk from Eastern Michigan University

Week Five
“Non-Human Primates and Human Language” by Emily Remirez from Rice University

Week Six
“Error Annotation of English Learner German” by Eric Benzschawel from Indiana University

Week Seven
LINGUIST List Corpus Linguistics Project by Thomas Haider from Heidelberg University

Week Eight
“Visualizing Endangered Language Contexts” by Jacob Collard from Swarthmore College

If you are interested in giving a presentation during one of our future colloquium sessions, please contact us at

Ask-A-Linguist: Characteristics of Arabic

Answers for this blog excerpt were researched and provided by Carmen Cross with input from other panelists. For a full response, please see the Ask-A-Linguist FAQ section about Arabic.

Where is Arabic spoken?

Mauretania, Morocco, Algeria, Tunisia, Libya, Egypt, Sudan, Saudi Arabia, Yemen, Oman, The United Arab Emirates, Bahrain, Qatar, Kuwait, Iraq, Syria, Lebanon, and Jordan have Arabic as their primary official language, although not all of the citizens of these countries are speakers of Arabic. Arabic is also an official language of Israel, Djibouti, and Somalia. There are also Arabic-speaking populations in Turkey, Iran, Central Asia, and Sub-Saharan Africa.

One of the official languages of Malta, Maltese, is an interesting case. Even though it is related to the Algerian and Tunisian varieties of Arabic, and thus classified as a Semitic language, it is the only known form of Arabic that is written in the Latin script. Moreover, due to its linguistic isolation from the Arab world during the heyday of European colonization, it has been heavily influenced by Italian and English.

Are h (plosive) and h (pharyngeal fricative) two separate phonemes?

Before an answer to this question is provided, it is useful to define both “phoneme” and “allophone.” Phonemes are used to differentiate words. Thus, they change the meaning of a particular word. For instance, “top” and “mop” begin with different phonemes, /t/ and /m/ respectively. On the other hand, allophones are predictable variants of a particular phoneme and do not change the meaning of a word. For example, if you pay careful attention to your pronunciation of the English word /kat/ “cat,” you will notice a slight aspiration after the /k/ as if the word is actually spelled /khat/. Since this type of variation is predictable in English (aspiration typically occurs after stop consonants, i.e., /t/, /p/, and /k/) and does not change the meaning of a particular word, /k h / is considered to be an allophone of /k/.

Now, let us consider an example in Arabic: Arabic has two separate letters, or phonemes (they are used to distinguish words): the first approximates the English /h/ and is classified as a glottal fricative (it will be represented as /h/), and the second is a pharyngeal fricative /h/ and is characterized as a pharyngeal fricative (it will be represented as /ħ/). In the word /habba/ “gust of wind,” the first letter is /h/. However, in the word /ħabba/ “pill,” the first letter is /ħ/. Since these two letters serve to distinguish word meanings, they are considered to be separate entities and do not represent two allophones of a single phoneme.

What characteristics are common to languages classified as Semitic?

Similarities among the Semitic languages are especially noticeable phonology and morphology:


There are only six vowels, three short /a, i, u/ and their long counterparts, in the phonological inventory of Semitic languages. Of course, this does not take into account the dialects, especially the Arabic dialects, which have a more varied vowel inventory.

In addition, Semitic languages have rare consonant phonemes, such as the pharyngeal fricatives /ʕ/ and /x/.

There is a higher proportion of consonants to vowels.


Semitic nouns have only two genders (masculine or feminine) but three numbers (singular, plural, and dual).

Semitic languages distinguish gender in both the second and third person. So, for instance, at least in Arabic, “he studies” [yadrusu] is contrasted with “she studies” [tadrusu] and “you study, masculine” [tadrusu] with “you study, feminine” [tadruseen]. So, it is not merely the question of just substituting a pronoun but also the addition of prefixes and suffixes in order to conjugate a verb according to gender and person.

The majority of words are derived from three-consonant roots, in which the vowels are not written. For example, the root /drs/ from /darasa/, “he studied,” is used to form, with the addition of affixes, /madrasa/, “school,” /diraasa/, “studying,” /mudarris/, “male teacher, /dars/, “class,” etc.