We are proud to share with our readers the next featured linguist of our 2017 Fund Drive: Nicoletta Calzolari. We hope that you enjoy reading Dr. Calzolari’s thoughts on her long and varied career as a computational linguist.
It is difficult to write about myself, but it can be an occasion to relive some moments of my life. I am grateful to Damir also for this. Here some notes, with personal memories interspersed with moments of professional life.
The beginning: the role of chance
Immediately after I graduated in philosophy, with a thesis on Logical antinomies, I remember saying to myself: words, words, words, I have enough of words! I did not know, but my destiny was linked to words.
So many things in life happen by chance. I moved to Pisa from Ferrara for family reasons and I saw a notice for a grant at Pisa University in a completely new field: Computational Linguistics. I tried applying, knowing that it would have been impossible. But I won it. This was the beginning.
I started studying that new area … and I loved it. It was not just words! I also started, as an autodidact, to write programs by myself, in the language of the time: PL1. The Pisa Summer Schools that Zampolli organised (in the ‘70s and ‘80s) were very influential for me (as for many others): I met the most brilliant researchers and I found them fascinating. I did not know that I would have become friend with many of them. I just followed the first as a student, then I was involved in the organisation, and finally I gave some lectures.
CL was a young field, with many possible research paths. It was probably easier at that time: you could have a new idea and experiment it even working alone, without the need of a big group. It is different today.
Since then we made great advances, but the more we understand about language the more we see how many problems are still in front of us. And this is what makes this field so interesting and challenging: language is a very complex phenomenon.
The first steps: the most creative and innovative, from a research perspective
More and more science is driven by data and our field is not different. Natural Language Processing is a data intensive field. Major achievements come from the use of large Language Resources (LRs). But it was not always like that. At the beginning, in the ‘80s, we had to fight to recognise the value of working with data.
Probably I was one of the pioneers in the revolution of the ‘80s when LRs (i.e. linguistic data) started to be understood as critical to make steps forward, while before data were even despised. I started research at the time quite new: acquiring information from Machine Readable Dictionaries, instead of relying on linguist’s intuition. This became soon a trend, followed by many others in all the continents. Relying on data was a change in the research paradigm, in the sense of Kuhn.
The great thing was that we succeeded in getting our first European project around this topic. Also this happened somehow by chance: I was discussing my work with Bran Boguraev sitting in the sun in Stanford and we had the idea of proposing a European project. We did it, and we got it: it was ACQUILEX, an ESPRIT Basic Research project that lasted 6 years and laid the foundation not only for stronger research but also for working relations with many interesting colleagues in Europe. Immediately after we had another research project, SPARKLE, probably the first European project aiming at extracting linguistic information from texts.
I understood, working on the first funded project, that I had to create the conditions for new research trends, that could possibly be funded afterwards. It was this way, through a virtuous circle, that we won so many EC projects, one after the other. I was involved – either coordinating the Pisa unit, or manging the whole European project – in more than 50 EC projects, in collaboration with hundreds of institutions all over the world.
There is more than research in science … or coming to adulthood
It was Antonio Zampolli who, in 1991, introduced the term “language resources” for our data: the term “resources” was meant to highlight their infrastructural nature (like electricity, railroads etc. for a country development). Some consequences derive from their infrastructural nature, among which the need to consider, in addition to research and technological aspects, also methodological and policy dimensions.
Working with data – expensive to create and annotate – made me realise that we needed to create the conditions to build on each other results. In 1991, I coined the term “reusability” to express the need not to start reinventing the wheel every time, but to re-use available data and join forces. It was the first step towards thinking at standards and interoperability. This term is reused today in the MetaNet Strategic Research Agenda: “2018: Ease re-use of linguistic resources in all parts of the data value chain across languages and sectors”.
The ideas and initiatives that led to the first European project on standards – EAGLES – were discussed at a breakfast table in Grosseto, during the Workshop “On Automating the Lexicon” (organised in 1986 by Walker, Zampolli and me). That Workshop was very influential: a Manifesto was drawn at the end, where the essential role of language data was emphasised and a number of actions were recommended: it laid the foundations for a large number of initiatives that took place later in Europe.
In the ‘90s with Zampolli we also started to define a global vision of the field and its main components, identified in: creation of LRs, standards, distribution, and automatic acquisition of LRs. These were considered the main components of an infrastructure of LRs for Language Technology (LT). ELRA (European Language Resources Association) was founded in 1995 to take care of one of these components, distribution of LRs.
After those pioneering years, the importance of LRs for LT was recognised more and more, and the flow of data began. Today we have a LR community culture, also thanks to the many initiatives around LRs that we started, like ELRA, LREC, LRE Journal, CLARIN, FLaReNet, MetaShare. In the FLaReNet project we identified the major dimensions around which to structure our community recommendations for the future of the field: documentation, interoperability, availability, coverage/quality, sustainability, recognition, development, international cooperation. These dimensions – constituting the infrastructure around LRs – are at the basis of the current paradigm of LRs.
Acting on Policy issues for a (finally) mature field
Working with data one recognises the critical role of what is around data, i.e. of notions such as standardisation, sharing, openness, evaluation, interoperability, metadata, collaborative annotation, crowdsourcing, integration, replicability, integrity, citation. And the role of how to organise research work: we should create frameworks that enable effective cooperation of many groups on common tasks, adopting the paradigm of cooperative collection of knowledge so successful in more mature disciplines, such as biology, astronomy or physics. The relevance of these issues must not be underestimated.
Technical and scientific issues are obviously important, but organisational, coordination, political issues play a major role. Technologies exist and develop fast, but at the same time the infrastructure that sustains them must be created. The challenges ahead depend on a coherent strategy involving not only the best methods and research but also policy dimensions. The concept behind the relevance of policy issues and best practices around LRs can be synthesised considering “data as public good”.
I think that a coherent LR ecosystem also requires an effort towards a culture of “service to the community”, where everyone has to contribute. Adopting policies that go in the direction of Open Science must become common practice. This “cultural change” is not a minor issue. It was in this spirit that I introduced at LREC initiatives such as the LRE Map and Share your LRs as steps towards shaping an open scientific information space.
Recently I started to advocate the need for reproducibility and replicability of research results – at the basis of scientific practice – in our field. We discussed this issue at an ELRA workshop, where I pushed Antonio Branco to organise a workshop on these topics at LREC2016. The importance of the topic led me to think that we had to give a sign of its importance also in the LRE Journal: Nancy Ide agreed, and we recently decided to have in the journal a special type of papers devoted to these aspects.
I am proud to have the possibility – through ELRA, LREC and LREJ – to contribute to shaping an open scientific information space for the future of our field. I have always felt it is our duty to use the means that we have in our hands to try to shape the future. In this case to play a role in how to change scientific practice and have an impact on our overall scientific enterprise.
The importance of the people around you: few anecdotes
In my long path through LRs, I became friend with so many colleagues all over the world (almost all the leading figures of a generation) and felt their closeness in many occasions. Over the years I realised how this was influential to me: they somehow shaped me and sometimes it is difficult to disentangle the professional and personal life.
Just few sparse memories:
After my presentation at COLING 1982 in Prague, Don Walker invited me at a small workshop in Stanford. I was young and was sitting together with the most important people in the field, from Martin Kay to Sture Allen. Back in Pisa I thought I would never have again such a wonderful year! I was wrong. Since then I had so many wonderful opportunities, recognitions, much more than I deserved. Lesson: so many unexpected things may happen in life.
From Zampolli I learned many things. I mention a simple one: you must both look at the details and be able to see the whole picture, projecting it into the future. I like both: precision and creativity. He had many visions for the future of the field, I hope I had some good ones too.
Ralph Grisham once saying at a workshop in Pisa: “You go to dinner with Nicoletta and standards come up”.
I like Facebook also because through it I exchanged memories with Chuck Fillmore in his last years, when he wanted to remember the past with his friends.
I was not a feminist when it was trendy. I did not react when an old important Italian university professor told me, very young, after a talk, “you are of a virile conciseness” thinking it was a great compliment. But after so many meetings with so many more men than women, I am more feminist now than when I was younger. I remember a meeting in Rome with the President of CNR, 36 people around a table, and me the only woman. I do not know why but I felt ashamed for them.
I was for a long time among the youngest in so many meetings, and then, all of a sudden, it changed. I realised it when Adam Kilgariff said: “Let’s listen to what Nicoletta thinks, she is always wise”. I saw it, wise and age: I was on the other side, among those with experience.
Recently a Japanese colleague told me: “You are really tough in negotiations”, but he said this with a smile so I hope it was a sort of compliment.
John Sinclair, many years ago: “You are very determined and really good in making many people work”. My parents always told me: if you want something you are so determined that you usually get it.
And I must mention my friendship with Nancy Ide, started when we were very young and consolidated over the years. We had many projects and have been to many places together, and now we exchange mails almost every day because of the LRE journal we are co-editors of.
Once at a meeting at the European Commission, one of the EC officers introduced myself to the others as Mrs. Language Resources. Not bad. This explains the title I have given to these notes.
The motivation for being in the founding group of ACL Fellows says: “for significant contributions to computational lexicography, and for the creation and dissemination of language resources”. I took it also as a sign that LRs were recognised in the CL community. Something not given for granted few years before. And a sign that what we did had an impact outside the LR community.
When I received a mail from Bente Maegaard saying that I was proposed for an Honorary Doctorate in Copenhagen I was so astonished that I asked Sara if she thought it was a joke. It was not, and I was very proud to receive the Honorary PhD directly from the Queen of Denmark.
I was moved recently when the ELRA Board decided to make me Honorary President of ELRA. I was there when it started in 1995 and I served it for so many years in so many roles that I feel it is part of my life. The same I obviously feel with LREC.
Conclusion … with enthusiasm
I conclude with the final words I wrote for my invited talk at the 1st LREC in Granada in 1998: “At the end everything is tied together, which makes our overall task so interesting – and difficult. What we must have is the ability to combine the overall view with its decomposition into manageable pieces. No one perspective – the global and the sectorial – is really fruitful if taken in isolation. A strategic and visionary policy has to be debated, designed and adopted for the next few years, if we hope to be successful. To this end, the contribution of the main actors in the field is of extreme importance. Some of the events in this conference are hopefully moving in this direction.”
Despite my age, I still have the enthusiasm I had when I started, even more when I see that I am able to influence new strategic directions of research. I hope I was able to pass my enthusiasm to younger colleagues.