Projects hosted by The LINGUIST List

Huge steps have been taken in LINGUIST List projects – Thank you, 2016 Summer Interns!

The Fall breeze brought the beginning of a new semester along with it, and a new season for our team of highly motivated Summer Interns at LINGUIST List, who (for the most part) just left us for the continuation of their linguistics endeavors. We are very grateful for their hard work and the priceless contribution they brought to multiple LINGUIST List projects, including GORILLA, MultiTree, LL-Map and GeoLing! These projects have all been started some time ago, and they were brought much closer to completion this summer. We are now very excited to let them tell you what they did over the last few months.



GORILLA is an exciting project currently being built. The goal of this project is to create a unified source of annotated corpora for languages around the world, with an emphasis on endangered and under-resourced languages. So Eun, Julian, Simon-Pierre, Clare and Will hugely contributed to this project by working on some novel speech corpora for Korean, German, and Kinyarwanda, and by revamping and annotating the AHEYM speech corpus for Yiddish.


“This summer, I helped to develop the Yiddish Speech Corpus: I transcribed, transliterated, and annotated Yiddish speech and developed corpus metadata. I clarecoordinated with Will and So Eun, and together we annotated over 5 hours of media for the corpus, including interviews, poetry and audio books.”

So Eun

“Over the course of the Linguist List internship, I have worked on collecting and producing speech corpora on the Yiddish and Korean SoEunlanguages. For the Korean corpus, I gathered texts in Korean from non-copy right restricted online sources, made recordings of said texts, and annotated each recording using ELAN. As to the Yiddish corpus, I helped with annotating the Yiddish recordings available at Indiana University’s Archives of Historical and Ethnographic Yiddish Memories (AHEYM) by segmenting audio files as well as converting and copying Yiddish (orthographic and YIVO/romanized) transcriptions onto the ELAN annotations.”


“While interning at LINGUIST List this summer, I was involved in one main project, and several smaller ones as well. I was told about the speech corpus I would be working on, and shown how to use the program necessary for it. I started off making audio recordings, and then transcribing them to text using ELAN. This took up the majority of my time interning here, but was very useful. After I had completed the transcriptions, I was given some smaller tasks, such as improving LINGUIST List’s website by cleaning up old links. I feel that my time interning here was useful and well spent, and has helped expand my skill set”


2) GeoLing, LL-Map and MultiTree

These three projects are some valuable tools that have been in the makings for quite some time, here at LINGUIST List. Thanks to some of our 2016 interns, these tools are now improved!

MultiTree is a digital library of scholarly hypotheses about language relationships and subgroupings, organized in a searchable database with a fancy web interface. Noah, Chloe and Arjuna spent the summer working on the structure of this useful webinterface, providing you with the new and improved MultiTree!

MultiTree interact with the LL-MAP Project, a geolinguistic database which provides users with a fully functional Geographical Information System (GIS) through which linguistic data – including subgrouping information – can be viewed in its geographical context. Jacob lead this project, assisted by Chloe.

Geoling is also an interactive map service, but with a different goal. It displays linguistics information around the world on a map: jobs, conferences, internships, and for the first time on LINGUIST List: local events. Lewis spent much time and effort reorganizing the data for this project, and with the help of Noah and Arjuna they were able to implement it to the website!


Jacob“I have spent the summer working on the LL-MAP project, which had been offline for several years. I began by identifying and correcting issues with the geometry and attribute data of the maps in our PostGIS database and KML files to allow them to display properly in viewers like QGIS, Google Earth, and OpenLayers. I also corrected the styles corresponding to the maps, according to recommendations by Jacob Henry, in order to show the colors, labels, and other visual aspects as they appear in the original source. Once the maps had been uploaded into Geoserver, I went through them to identify specific problems and fixed display issues with several dozen maps. Finally, I contributed along with several other interns to the new LL-MAP viewer. I would like to thank Lwin Moe and Damir Cavar for their help at every step of the process, and Damir and Malgosia Cavar for the opportunity to take part in this project.”


IMG_9534“As a summer intern at the Linguist List, I worked on improving the MultiTree and LL-MAP sites. Before I started, I had played around with the old and new MultiTree but didn’t know how the trees were generated. With some training in Django and D3 data visualizations, I was able to get behind the scenes of MultiTree and start exploring different tree views using the data from the Linguist List. Because of the variety of visualization options, I learned to put myself in the user’s shoes and to decide what features to prioritize in order for the site to be more helpful to the linguist community.

After MultiTree, I helped with the LL-MAP team on their project. Working on the new LL-MAP was a dynamic process because we constantly adjusted our tasks based on user feedback. The result that came out was an elegant viewer page that provides as much information as possible in a simple and organized way.

One thing I learned from my internship experience is the difference between a classroom assignment and a real project. For both MultiTree and LL-MAP, we had a lot of freedom deciding what to work on as a team as opposed to being assigned specific tasks, with the goal to make the site more informative and easier to use. I’m glad to have gained the experience of collaborating with teammates, and learning to solve issues creatively and efficiently.”


We sincerely enjoyed having these burgeoning linguists join our team, and we even have the pleasure of having Jacob and Clare stay on at LINGUIST List after the end of their internship! Thanks to the devoted work of the 2016 LINGUIST List summer interns, some novel and valuable language resources have now been created: their contribution goes beyond the limits of LINGUIST List, and is truly a contribution to the Linguistics community around the world. We now invite you all to enjoy these new tools that have been developed over the years by many different hands, and most recently by the LINGUIST List 2016 Interns crew!

A new year, a new LINGUIST List crew: introducing the 2016-2017 GAs!

Dear Readers,

With the waning of the hot season here in Indiana, and the wrapping up of some of the summer projects at LINGUIST List (you’ll get to read more exciting news about this soon!), and after having said good bye to our deeply missed predecessors, it is time to start a new semester with a new LINGUIST List crew!

You have already encountered most of us, and we’ve actually already been working here for some time, but here is the official introduction of us new GAs at LINGUIST List. Glad to meet you all!


Yue Chen

Yue is a new graduate assistant at the LINGUIST List. She comes from Chengdu, China. She is currently a second year M.A./Ph.D. student in Computational Linguistics here at Indiana University. Her academic interests are natural language processing, machine learning and recently, parsing. In daily life, she enjoys cooking, baking, hiking, crocheting and reading.


Kenneth Steimel

Kenneth Steimel is a student editor at LINGUIST List. St. Louis and Columbia Missouri were his home before moving to Bloomington. He works primarily with conferences and calls for papers at LINGUIST List. However, he also edits ask-a-linguist, summaries, FAQ, queries and discussions. His research, outside of LINGUIST List, is concerned with documenting African languages. He is specifically interested in developing computational tools and corpora for the languages he studies. In his free time, he also enjoys roasting coffee, geeking out over cars and backpacking.


Michael Czerniakowsky

Mike is a student editor here at LINGUIST List, where he works primarily on Books and Publications, while pursuing his MS in Computational Linguistics at Indiana University. In his free time he enjoys reading, crossword puzzles, and trivia nights.


Amanda Foster

Amanda started working at LINGUIST List in October 2015. She is now the Jobs and Supports Editor, as well as the editor for Journal related posts, Software announcements, and Programs and Institutions. She is originally from a small town in Northern France, but has also spent some time in Paris and in Ireland before coming to IU to pursue an MA in General Linguistics. She is passionate about the study and documentation of under-resourced and endangered languages. When she is not entertaining herself with language puzzles, she loves reading, hiking, and discovering the nature and culture around Bloomington!


Clare Harshey

Clare feels lucky to have been a summer intern for the LINGUIST List this year, and even luckier to be able to work here for the school year as well! This summer, she focused on the Yiddish Speech Corpus, part of the GORILLA project. She’s continuing work on the corpus this fall; she’s also in training as an editor for the Reviews, Books, Jobs and Support sections of the LINGUIST List. She is at IU to pursue her MS in Computational Linguistics, and is grateful for the opportunity to do work that builds on her education and her passion for this field. Outside of linguistics, she enjoys music, reading and exploring Bloomington with her dog.

We are excited to have a role to play in connecting the Linguistics community around the world. We’ll be in touch soon (and now, you can even associate a face to these editing emails you receive!)


The LINGUIST List Editors

LINGUIST List Internships 2016

Dear linguists, colleagues, students,

LINGUIST List will host another internship program during summer 2016. See for details the announcement on LINGUIST List.

Please keep in mind that the dates of the core internship program are flexible and can be adapted to suite the summer break period of different systems, countries, and continents. Please contact us to discuss particular arrangements that you might need.

We would be happy to assist you with applications for supplemental funding and stipends. Various countries and educational or research organizations offer support opportunities to students. Please consider contacting your advisor and local University administration about funding opportunities and let us know how we could help you with the application.


Your LINGUIST List Team


LINGUIST List at the LSA 2016

Please join us for the LINGUIST List office hours at the LSA 2016 Annual Meeting:

date: Friday, 8th of January 2016

time: 6-7 PM

location: George Washington University room

We will talk about the launch of:

  • GeoLing, a GIS-based linguistic information system (linguistic information on a global map).
  • GORILLA, a service that links language documentation, linguistic research with corpus linguistics, computational linguistics, and natural language processing, corpus development, speech and language technology engineering.

and many other LINGUIST List related things.

Please join us!

This year LINGUIST List will not host a booth in the exhibition area, but we are happy to meet with you during the LSA 2016 annual meeting. Please email us or call us to make meeting arrangements.

See you at the LSA 2016 annual meeting!

Your LINGUIST List Team


Linda Lanz Visits The LINGUIST List!

On Friday, December 4, the Indiana University Linguistics Club hosted Dr. Linda Lanz as part of their Colloquium Series. While Dr. Lanz was in Bloomington, she stopped by The LINGUIST List office to visit!


Dr. Lanz, IU Graduate Student Yue Chen, and LINGUIST Editor Ashley Parker

Dr. Lanz, IU Graduate Student Yue Chen, and LINGUIST Editor Ashley Parker

Dr. Lanz currently works as a computational linguist at Interactive Intelligence. While she was at IU, she gave a talk about the work she’s done with language documentation and revitalization in  Iñupiaq and Virginia Algonquin.

We always welcome visitors at The LINGUIST List office. If you are in Bloomington, please stop by!

Michael Abramov at LINGUIST List

This summer was a great one for collaborating with fellow scholars on our projects here at The LINGUIST List! Over the past few months, Michael Abramov accompanied Hilaria Cruz at our office to help with the transcription and time alignment for a Chatino corpus–an Otomanguean language found in the mountains of Oaxaca, Mexico. For the past 15 years, he has worked as a librarian at the Austin Public Library in Texas. Though not a trained linguist, on occasion Michael assists Hilaria on her research in Chatino. Michael has studied Romance languages in the past and can speak Spanish and a little Italian.

Michael at LINGUIST List

Thank you, Michael, for all of your help this summer!

Hilaria Cruz at LINGUIST List

We at the LINGUIST List are always happy to collaborate with fellow scholars on our projects. We were lucky to host Dr. Hilaria Cruz, a researcher and speaker of Chatino, for a week while she worked on creating a spoken corpus of the language for an ongoing project. If you’re interested in collaborating on spoken corpora with us, please contact us!

Dr. Cruz at LINGUIST List

Hilaria Cruz is a linguist and a native speaker of San Juan Quiahije (SJQ) Chatino, an endangered Zapotecan language, spoken in the mountains of Oaxaca, Mexico. She has been documenting and revitalizing the Chatino languages since 2003. Hilaria founded the Chatino Language Documentation Project (CLDP) together with her sister Emiliana Cruz (now an assistant professor at UMass Amherst), and their advisor Tony Woodbury of The University of Texas at Austin.

The CLDP aims to carry out linguistic documentation projects and research integrating the advancement of linguistic science with the wishes of the Chatino people to promote and honor their language. During the course of Hilaria’s fieldwork on Chatino, she has personally collected and archived more than one hundred hours of audio recordings of naturalistic speech in formal and informal settings.

Hilaria earned her Ph.D. in linguistics in 2014 at the University of Texas at Austin. The dissertation entitled “Linguistic Poetics and Rhetoric of Eastern Chatino of San Juan Quiahije,” analyzes the poetic patterns of SJQ discourse.

Hilaria is currently working on a project with LINGUIST List to create tools for speech recognition in SJQ Chatino. Beginning in the fall of 2015 Hilaria will be a Lyman T. Johnson Postdoctoral fellow at the University of Kentucky. There Hilaria will investigate, the Chatino concepts of death in four Eastern Chatino communities. They are Santa Maria Yolotepec (YOL), Santa Maria Amialtepec (AMIA) and San Juan Quiahije (SJQ) and San Marcos Zacatepec (ZAC).  Hilaria’s research interests include Chatino poetics and verbal art, language revitalization, and automatic speech recognition in Chatino.

The LINGUIST List Operation

Dear LINGUIST List supporters,

Many of you have heard that the LINGUIST List relocated from Eastern Michigan University in Ypsilanti to Indiana University in Bloomington in 2014. Please allow us to summarize what this relocation involved.

In spring 2014 we started cleaning out the former space of LINGUIST List and the Institute for Language Information and Technology (ILIT) at EMU and planning the relocation to Indiana University. Some team members decided to join us in the relocation and continue their work and lives at the new location. Unfortunately, not everybody could join us. Our editors Uliana and Danuta continue to support LINGUIST List remotely, but decided to stay in Michigan.

As you can imagine, the LINGUIST List operation involves a significant amount of technology and equipment. The servers that the LINGUIST List was using in Michigan supported among others the following systems:

It was clear that it would not be possible to relocate the hardware (7 servers of varying age and capacity) and the other equipment. One of the problems we were facing was that policies and restrictions at our new hosting institution would not allow us to operate the respective servers there. It also quickly became clear that the LINGUIST List would not have the funds to pay the expensive licenses for commercial software, e.g. LISTSERV, Adobe ColdFusion, or Oracle database server at the new location.

Many of our online linguistic tools (e.g. LEGO, GOLD, etc.) were developed long ago, with funds from research grants, using now outdated software, with systems running untouched for years on outdated infrastructure, written in programming languages that have been overhauled ever since and so on. As in any research environment with IT-systems and software, as soon as the software is ready and installed, the environment, programming language, and systems are outdated and need updates. For many systems, we were facing the situation that they could not be updated at all anymore, since they relied on components that were removed from modern Linux distributions years ago, because the programming languages and libraries they used were not even available anymore (in the required version).

All these issues together posed a serious problem. LINGUIST had no resources to fund new servers or the redevelopment and adaptation of the software and applications. No research funding agency could be approached in such a short time to help find a solution and preserve the data and applications. LINGUIST had no funds for a basic IT-infrastructure, or the mentioned commercial software licenses for the existing infrastructure and organization. On the other hand, the basics to run an operation like LINGUIST and all the projects and online applications were missing. The infrastructure demands are huge, e.g. a large digital storage space and quite high computational power to cope with the amount of data are needed, to serve millions of access requests every day, handle large amounts of data transfer, etc. On the other hand, the labor necessary to handle the setup, installation, administration, programming and data management was just overwhelming and immense. We had no funds to support any external IT-person to help us with the launch of the systems and services.

As you can imagine, in addition to these problems, there was no available solution to get help with these technical problems. There was not even time to ask for help, to start a new fund drive, or explain to willing helpers and volunteers what needs to be done, and how one could help us. As we were running out of funds, we were running out of time. We were already in over our heads.

Just before the move there were two significant steps that we took. We asked companies for help. We approached Google with an application to grant us free access to their applications and services as a Charitable Non-Profit organization. They approved us. Our problems with data storage, operational email and management tools were solved. We approached GitHub and Bitbucket to grant us free access to their services to manage our code-base for all the systems and software development projects that we had, and quite many we had… Since Bitbucket approved our application first, we decided to go with their service. We are grateful that Google and Bitbucket decided to support us and significantly reduce the workload that we had. Software development with the help of services like GitHub or Bitbucket is significantly easier and faster. We have a very good versioning system now, and collaboration between team members and external helpers is much, much better.

Since various policies at the new institution do not allow us to operate our own list- or email-server within the hosting institution’s intranet, we had to set up the necessary servers outside of the institution through commercial means. We also had to find fast and easy solutions for the LINGUIST List website and various other services to minimize the downtime during the move as much as possible. We have chosen to use Amazon EC2 and A2 Hosting virtual servers for that. These virtual server instances have significant advantages, but they also come with a price-tag. The price for the virtual servers is still lower than investing in new hardware, server hosting at any location, and hardware maintenance and administration costs. We estimate the LINGUIST List saves significantly on operational costs with the new infrastructure. In addition to that, the virtual server infrastructure opens up new flexible solutions. Any server instance can be backed up as an image, that we can download and even run in a virtualization software tool on our desktop machines. The new management tools for tablets for example offer an easy and neat administration interface. It has the touch of Star Trek to open up the tablet and add a new CPU or more memory to the servers, reboot the machine from a mobile phone, and so on.

The LINGUIST List team decided to stick with Linux as the operating system for all servers. We also decided to use only open source and free software for everything from now on. The database was replaced by PostgreSQL. The LISTSERV system was replaced by Mailman. Adobe ColdFusion was replaced with the open source and free Railo system. All operating systems were replaced by free and open Debian-based Linux systems. Even the desktop systems for the editors, developers and managers were replaced by Linux PCs. Our development environment is based on Vim, Eclipse, and other open and free tools. We have to confess, we make use of PyCharm (the free and community, or student and faculty edition, thanks to JetBRAINS for providing those free of charge).

The changes from a commercial database software to an open source one, or the switch from Adobe ColdFusion to Railo, do not just mean no licensing fees and therefore savings. They actually came with an incredible investment upfront. Most of the code, all SQL database commands and code sequences, the ColdFusion code – essentially everything had to be checked and rewritten. This could not be done in a month, two months or half a year. Given the aforementioned problems with hardware, outdated software, and other finance and time problems, this was just a very bad move. We cannot switch at the same time the running systems to free and open ones. Well, we can, and we did. Since we had to invest in updating the systems anyway, we thought that we can also rewrite and change everything and make the move to Open and Free. We have rewritten so much of the old vintage LINGUIST List website, it is an entirely new system in the back-end. We paid for the switch from commercial and expensive software to free and open source systems with our free time. We invested our weekends, nights, and holidays in the port and the relocation. More than once we had reached a point of total frustration, of physical and mental exhaustion, where no more coffee or sugar resources would help. Can you imagine? At the same time, we had to run the operations, continue editing, posting, talk to colleagues who want to make changes of postings, job ads, conference announcements, and also rent trucks for the relocation, commute back and forth for negotiations, checking out new housing and office spaces etc. May to August 2014 were the wildest months of our lives.

Many of you have experienced some glitches and broken or dysfunctional pages. We are sorry for that. Given the short time for relocation and the switch of the paradigms and systems, we were not able to test upfront before bringing up and making it live, but rather had to use user feedback to fix issues as they occurred. We transferred the lists to the new Mailman system. This has caused some of the deactivated accounts to be activated again. Colleagues and subscribers started getting mails and were quite surprised to receive the full LINGUIST list email collection every day; some were even angry with us. We are sorry for causing you this inconvenience, but there was no other way for us to transfer the list server mails, archives and subscriptions to the new system.

The team at LINGUIST List was massively reduced. Only Malgosia, Lwin, and Damir relocated from Michigan to Indiana, together with three GAs, Andrew, Sara, and Anna. The relocation meant not only a relocation of families, children, and households, it also meant the relocation of resources, the acquisition of equipment, the setting up of a new office space for the operation, and also the cleaning up the old one. The team did an incredible job. Within just 6 months all that was accomplished, and the operation of LINGUIST List was interrupted just for some hours and minutes. Many people did not realize that. Many in fact feared that this endeavor will fail, that it was basically impossible to achieve all this in such a short time.

We are lucky that IU provided us with a nice building to restart our operations. We were able to acquire a few PCs to start working again and we got some furniture from surplus to equip a meeting room and basic office space. We have a coffee machine again in the office, and things have calmed down somewhat. We sleep again, and life has some rhythm again. There is still a lot of work, a lot to do, and a lot we need to arrange and organize.

In the meantime we can report that:

  • The LINGUIST List website is up and running, faster and more stable than before, not only the newly written ‘vintage website’ with the new PostgreSQL database and Railo ColdFusion engine, but also the new website, which we could not continue developing since spring 2014 (based on Django and Python) because of the move.
  • EMELD is up and running, with some minor issues to fix from time to time. The code has been transfered from Adobe ColdFusion and Oracle to Railo and PostgreSQL.
  • The list server is back and all the archives and other functionalities are up, hosting not just LINGUIST and LINGLITE, but also many other lists that some of you might be subscribed to. We are now using the GNU open source server Mailman.
  • LEGO is up, with some issues that we still need to fix. This site was written in PHP and specific extensions and libraries. It uses in the backend the Apache Solr indexing engine running on a Tomcat server. This was a lot of work, to reinstall it and set it up. Some minor issues need to be fixed that have to do with the Solr communication in searches.
  • GOLD is up and running. It was also written in PHP using the Zend framework. We had to port old code to new server and software environments.
  • MultiTree is up, both the new and the old site. The old system still needs to be fixed, and the new one that was developed using Django and D3js needs some more development. The old system was written in one of the early Ruby on Rails versions. The port to the more recent Railo versions was quite complex.
  • OLAC is connected again, thanks to the help of many colleagues, e.g. Gary Simons, Steven Bird and others.
  • ODIN is up, and needs some minor corrections.
  • LL-Map is installed and needs to be activated again. Soon we should have the system and the connections up again, and all the polygons and maps available for browsing and search, linked to MultiTree, even LEGO and GOLD etc. There are new ways to contribute own maps and information now.
  • Etc.

There is still a lot to do. Most of the transfer has been accomplished. We did everything we could to preserve the data, port the applications, make the new site and operations more sustainable, cheaper, more open, and robust.

We are all set for a new start. After 25 years of the LINGUIST List, the technology and environment is again up to date, ready for the next 25 years.

Many of you know, the LINGUIST List has a very low operational budget. It did operate at its financial limits since spring 2013, without a fund drive in 2013, and a limited fund drive in 2014. LINGUIST started in the new location without any significant funds, just with the help and support of its hosting institution, the team, and some supporters.

The team and the operation now need your help. We depend on the Fund Drive 2015 to be able to continue with normal operations during the summer 2015, and during the next academic year. Graduate assistantships do not cover the summer. Although IU supports us with two fully covered GAs, and two partially covered ones (in addition to all the other support that we get from IU and the Department of Linguistics), we need to cover the summer months by paying editors. We also need more person-power to cover the next academic year.

Please consider helping LINGUIST List to continue its operations and donate during the 2015 Fund Drive.

The LINGUIST List Team

Making the Most of LINGUIST: Additional and Special Interest Resources

Some LINGUIST resources aren’t so easy to classify. In this last letter, we’ve grouped some of the lesser-known features that may be of interest to you.

LINGUIST has established a presence on a variety of social networking sites. Connect with us by clicking the links below:

Various linguistic resources can only be found on the World Wide Web. Luckily, LINGUIST has an area for that!

  • Web Resources/Software: This area of the LINGUIST List contains links to websites and software devoted to natural and constructed languages, to writing systems, and to language resources on the web (such as dictionaries).
  • FYI: As mentioned in our previous letter, the FYI area contains information that doesn’t neatly fit into any single LINGUIST posting topic, such as calls for book chapters, award recipient announcements, new journal editor announcements, scholarship announcements, etc.
  • Discussion: The Discussion area is one of LINGUIST’s best kept secrets (but we’d like it to be not-so-secret). Discussions posted on the LINGUIST site have spawned many publications, collaborations, and thought-provoking linguistic observations and ponderings. Join the discussion!
  • Mailing Lists: There are a number of mailing lists linked in here that are related to different facets of linguistics and language.

LINGUIST’s projects also cater to various linguistic interests.

  • Tutorials: These tutorials were designed by programmers to help train linguistics students for work at LINGUIST. They’re very helpful introductions (or, for some of you, refreshers) for the technical work linguists engage in.
  • Linguistic Blogs: Here you can see what linguists on the web have to say about language:
  • Learning Languages Other than English: These resources will help you find language learning resources.
  • English Language Learning (EFL/ESL): LINGUIST also contains a variety of resources for learning English.

We hope you’ve enjoyed this Making the Most of LINGUIST letter series! As always, if you have any questions about the services LINGUIST offers its readers and subscribers, don’t hesitate to ask.