We plan to create a corpus of spontaneous speech in Tsez, an endangered language of the Caucasus spoken by about 6,000 people, and three endangered Mayan languages. The project will involve collecting, transcribing and annotating the data in such a way that they could be used by other researchers. We will then compare these languages to spoken production from several heritage languages (Russian, Chinese, Avar, Spanish, and Mam) whose corpora will also be transcribed and annotated.
Recording of speakers; help with transcriptions; help with glossing and annotations; literature research (articles on corpus linguistics and language
Hours per Week: 10-15
Total Number of Weeks: 30
Contact: Maria Polinksy