Scientific publications
Родионова А.П., Крижановская Н.Б., Пеллинен Н.А.
Речевой корпус ВепКар как инструмент сохранения диалектной речи прибалтийско-финских народов Карелии
// Ежегодник финно-угорских исследований. Т.17, вып. 3. 2023. C. 343-351
A.P. Rodionova, N.B. Krizhanovskaya, N.A. Pellinen. VepKar speech corpus as a tool to preserve the dialect speech of the Baltic-Finnish people of Karelia // Yearbook of Finno-Ugric Studies. V.17, iss.3. 2023. Pp. 343-351
Keywords: Speech corpus, Vepsian language, Karelian language, corpus linguistics, audio samples, text markup
The article is devoted to the description of the Speech Corpus of the Baltic-Finnish Speech, created on the platform of the Open Corpus of the Vepsian and Karelian Languages (VepKar), its architecture and possibilities. The speech corpus was developed by the staff of the Institute for the Language, Literature and History and the Institute of Applied Mathematical Research. The corpus includes a collection of spoken texts in different dialects of the Karelian and Vepsian languages, provided with transcription, markup and translation into Russian. The corpus also contains search filters necessary for work (search by language/dialect, place and year of recording, informant and collector, source). The need to develop the VepKar corpus is very relevant, the corpus is in great demand both in scientific research and in the process of developing the literary forms of the Karelian and Vepsian languages. The use of modern technologies and methods, the field material accumulated over many decades and the latest data will make it possible to fill in a number of gaps that were previously identified by linguists in this system. Researchers use three main sources to fill the corpus with audio recordings of Karelian and Vepsian speech: audiocollections of the Phonogram Archive of the ILLH KRC RAS, audiorecordings of broadcasts in the Livvic dialect of the Karelian language, as well as field materials of the authors recorded during the expeditions. Scientific novelty is justified by the lack of speech corpora of the Baltic-Finnish languages. Digitization of archival and field audio samples of Karelian and Vepsian speech in the Speech Corpus format will further simplify the processing and storage of materials. It will also make it possible to introduce unique audio materials reflecting the state of the Karelian and Vepsian dialects since the middle of the last century into scientific circulation and make them available to the public.
Indexed at RSCI, RSCI (WS)
Last modified: January 18, 2024