It has been a busy summer here at The LINGUIST List! Please take a moment to check out the projects that our 2015 summer interns and volunteers have been working on!
Edvard Bikbaev works on the GORILLA project at the LINGUIST List. To that end, he is creating and annotating the speech corpus for Russian, his native language. The speech corpus Edvard is involved with includes multiple annotated tiers and will be further used to train a forced aligner. In addition, Edvard translates contents of the GORILLA website, and updates MultiTree with linguistic publications in Russian. Edvard plans to apply for a PhD program in Computations Linguistics and use the Russian speech corpus he has created at Linguist List for his
Alec spends most of his time at the LINGUIST List creating the official LINGUIST List Google Chrome App, which will soon provide easy access to the upcoming GeoLing map and other LINGUIST List resources. He is also in the process of writing a script that automatically collects language data from Wiktionary and other open-source databases, and has so far used the program to extend the LINGUIST List’s Yiddish lexicon.
Clara García Gómez
Clara is mainly involved in the GORILLA Project creating a speech corpus for Castilian Spanish, of which she is a native speaker. She is creating materials necessary for automatic alignment and transcription. She also works on the translation of parts of the website into Spanish and in some editing tasks for LINGUIST List. She is interested in the study of undocumented languages so she is happy to participate in GORILLA and hope to contribute to this project further after creating the corpus for Castilian Spanish.
Jacob has spent most of his time working on the LL-MAP project, a large collection of maps containing linguistic and geographic information to be used by linguists, anthropologists, and other researchers.The LINGUIST List relocation Indiana University became an opportunity to relaunch and redesign the technologies. This has involved porting all of the data accumulated to new servers and testing various file formats to find the easiest to work with for our purposes. We’ve made some progress and ideally, we would be able to relaunch LL-MAP by the end of the summer.
Seyed started working on Baharlu dialect of south Azeri Turkic language. It is a language that is being spoken in west Iran with the neighboring area of Persian, Kurdish, and Lori languages. He studied different writing styles used to produce the most suitable transcriptions. Moreover, he needed to study the standards of romanization of Baharlu Turkic. He worked on sample recordings, creating transcription, romanization, and translation.
During this work he has also started preparing a Baharlu-English dictionary that including original word, romanization, English translation and will be completed with other elements such as lemma, PoS and pronunciation information.
For the last two weeks, Petar has been mainly working on the Automatic Speech Recognition Project. Currently, he is working on the Croatian speech corpus and ASR. The first part of the project consists of making recordings and transcribing them. Along with building the corpus, he has been going through the documentation about Chrome Apps, and from the beginning of this week, he will start working alongside Alec on the LINGUIST List Chrome app. At the end of his internship, he would like to have a working Croatian Speech Recognizer, and an application that will ease the use of various LINGUIST List features.
Zac has been working primarily on the front and back end of Geoling which can be found at geoling.linguistlist.org. Zac has additionally contributed to the Gorilla project (gorilla.linguistlist.org) including the development of resources to be provided by Gorilla.