Featured Linguist: Joakim Nivre

For this week’s featured linguist, we bring you a great piece from Professor Joakim Nivre!

Professor Joakim Nivre

I am delighted to support the fund drive for the LINGUIST List in the year of its 30th anniversary. Like so many of my colleagues I have relied on the services of the LINGUIST List throughout the years, and this gives me a wonderful opportunity to share some glimpses from my career as a computational linguist as well as some reflections on the development of the field during these three decades.

When the LINGUIST List was started in 1990, I was a PhD student in general linguistics at the University of Gothenburg, trying to complete a thesis on situation semantics (a framework of formal semantics that has since faded into oblivion) and mostly ignorant of the computational side of linguistics that later became the focus of my career. The 1990s was the decade when computational linguistics was transformed by the so-called statistical revolution, which meant a methodological shift from carefully hand-crafted rule-based systems that delivered a deep linguistic analysis but were often lacking in coverage and robustness to statistical models trained on corpus data going for breadth instead of depth.

The statistical turn in computational linguistics is also what got me into the field, more or less by accident. After graduating in 1992, I was hired as a lecturer in the linguistics department in Gothenburg, where around 1995 there was a pressing need for a course on statistical methods in computational linguistics but there was no one who was qualified to teach it. Young and foolish, and eager to learn something new, I decided to accept the challenge and started developing a course, using as my main sources Eugene Charniak’s beautiful little book Statistical Language Learning and a compendium on statistics for linguists by Brigitte Krenn and Christer Samuelsson with the words “Don’t Panic!” in big boldface letters on the cover. As it turned out, the University of Gothenburg was not the only institution that needed someone to teach statistical methods in computational linguistics at the time, and I ended up almost making a career as an itinerant lecturer in statistical NLP in Scandinavia and Northern Europe.

Eventually, I also managed to apply my newly acquired expertise to research, notably in a series of papers on statistical part-of-speech tagging. Fascinated by the power of inductive inference that allowed us to build practical systems for linguistic analysis from what was essentially just frequency counts from corpora, I found that statistical NLP was more fun than formal semantics and slowly but surely started converting from theoretical to computational linguistics.

The following decade meant great changes for me both personally and professionally. After switching gears and getting serious about computational linguistics, I realized I needed to strengthen my computational competence and decided to do a second PhD in computer science. In the process, I also moved from the University of Gothenburg to Växjö University, a young small university in the south of Sweden, with more limited resources for research but a lot of enthusiasm and pioneer spirit to make up for it. Looking for a topic for my second PhD thesis, I stumbled on dependency parsing, which at the time was a niche area with very little impact in mainstream computational linguistics. As an illustration of this, when giving my first conference presentation on dependency parsing in 2003, I had to devote almost half the talk to explaining what dependency parsing was in the first place and motivating why such a thing could be worth studying at all.

By another case of fortunate timing, however, I happened to be one of the first researchers to approach dependency parsing using the new kind of statistical methods, and together with colleagues like Yuji Matsumoto, Ryan McDonald, Sabine Buchholz and Kenji Sagae, building on foundational work by Jason Eisner and Mike Collins, among others, I was fortunate to become one of the leaders in a new and fruitful line of research that has turned dependency parsing into the dominant approach to syntactic analysis in NLP, especially for languages other than English. A milestone year in this development was 2006, when Sabine Buchholz led the organization of the first CoNLL shared task on multilingual dependency parsing and Sandra Kübler and I gave the first dependency parsing tutorial at the joint ACL-COLING conference in Sydney.

The rapidly increasing popularity of dependency parsing was in my view due to three main factors. First, dependency representations provide a more direct representation of predicate-argument structure than other syntactic representations, which makes them practically useful when building natural language understanding applications. Second, thanks to their constrained nature, these representations can be processed very efficiently, which facilitates large-scale deployment. And finally, thanks to efforts like the CoNLL shared tasks, multilingual data sets were made available, which together with off-the-shelf systems like MSTParser (by Ryan McDonald) and MaltParser (by my own group) facilitated parser development for many languages. Towards the end of the decade we also saw dependency parsing being integrated on a large scale in real applications like information extraction and machine translation.

The third decade of my co-existence with the LINGUIST List started with the biggest computational linguistics event in Sweden so far, the ACL conference in Uppsala in 2010. Together with my colleagues at Uppsala University, where I had moved to take up a professorship in computational linguistics, I was very happy to receive computational linguists from all corners of the world during a very hot week in July. The conference was considered huge at the time, with almost 1000 participants, but would be considered small by today’s standards (with over 3000 participants in Florence last year), so I am really glad that we took the opportunity while it was still possible to fit ACL into a small university town like Uppsala.

My own research during the last decade has to a large extent been concerned with trying to understand how we can build models that are better equipped to deal with the structural variation found in the world’s languages. In the case of parsing, for example, it is easy to see that models developed for English, a language characterized by relatively rigid word order constraints and limited morphological inflection, often do not work as well when applied to languages that exhibit different typological properties. However, it is much harder to see what needs to be done to rectify the situation. A major obstacle to progress in this area has been the lack of cross-linguistically consistent morphosyntactic annotation of corpora, making it very hard to clearly distinguish differences in language structure from more or less accidental differences in annotation standards. This is why I and many of my colleagues have devoted considerable time and effort to the initiative known as Universal Dependencies (UD), whose goal is simply to create cross-linguistically consistent morphosyntactic annotation for as many languages as possible.

Given that UD is an open community effort without dedicated funding, it has been remarkably successful and has grown in only six years from ten treebanks and a dozen researchers to 163 treebanks for 92 languages with contributions from 370 researchers around the world. I am truly amazed and grateful for the wonderful response from the community, and UD resources are now used not only for NLP research but increasingly also in other areas of linguistics, notably for empirical studies of word order typology. All members of the UD community deserve recognition for their efforts, but I especially want to thank Marie de Marneffe, Chris Manning and Ryan McDonald, for being instrumental in getting the project off the ground, and Filip Ginter, Sampo Pyysalo and (above all) Dan Zeman, for doing all the heavy lifting as our documentation and release team.

But is there really a need for something like UD in computational linguistics today? You may think that, if I was fortunate to experience a few cases of good timing in my previous career, the decision to start the UD initiative in 2014 may with hindsight look like a case of extremely bad timing. The field of computational linguistics, and especially the more practical NLP side of it, has in recent years undergone a second major transformation known as the deep learning revolution. This has meant a switch from discrete symbolic representations to dense continuous representations, representations that are learnt by deep neural networks that are trained holistically for end-to-end tasks, where the role of traditional linguistic representations has been reduced to a minimum. In fact, it is very much an open question whether traditional linguistic processing tasks like part-of-speech tagging and dependency parsing have any role to play in the NLP systems of the future.

Looking back at the three decades of the LINGUIST List, there is no question that computational linguistics has gradually diverged from other branches of linguistics both theoretically and methodologically. The statistical revolution of the 1990s meant a shift from knowledge-driven to data-driven methods, but theoretical models from linguistics such as formal grammars were still often used as the backbone of the systems. The shift from generative to discriminative statistical models during the next decade further emphasized the role of features learned from data and so-called grammarless parsers became the norm, especially for dependency parsing, reducing the role of traditional linguistics to corpus annotation and (sometimes) clever feature engineering. During the last decade, the advent of deep learning has to a large extent eliminated the need for feature engineering, in favor of representation learning, and the emphasis on end-to-end learning has further reduced the role of linguistic annotation.

Should we therefore conclude that there is no linguistics left in computational linguistics? I think not. Paradoxically, as the importance of explicit linguistic notions in NLP has decreased, the desire to know whether NLP systems nevertheless learn linguistic notions seems to have increased. There is a whole new subfield of computational linguistics often referred to as interpretability studies, which is about understanding the inner workings of the complex deep neural networks now used for language processing. And a substantial part of this field is concerned with figuring out whether, say, a deep neural language model like ELMo or BERT (to mention just two of the most popular models on the market) implicitly learn linguistic notions like part-of-speech categories, word senses or even syntactic dependency structure. And resources like UD have turned out to be of key importance when trying to probe the black-box models in search for linguistic structure. This opens up exciting new possibilities for research, which can ultimately be expected to influence also other branches of linguistics. Exactly where this research will take us is impossible to say today, but I for one am eagerly looking forward to following the development over the coming decades in the company of the LINGUIST List. If you are too, please consider contributing to the fund drive.

Thanks for reading and if you want to donate to the LINGUIST List, you can do so here: https://funddrive.linguistlist.org/donate/
All the best,
–the LL Team

Leave a Reply