Speech recognition and transcription tools have been widely available in Latvian for several years. Thanks to the staff of the Artificial Intelligence Laboratory of the Institute of Mathematics and Computer Science of the University of Latvia (AiLab), more and more digital solutions are now available for the Latgalian language.
AiLab has created not only the first artificial intelligence (AI) model that recognizes Latgalian speech, but also a freely available speech transcription tool in Latgalian. The model was trained with data from the Modern Latgalian Speech Corpus and the Bolsi Interpreter.
“The Latgalian language is an integral part of the Latvian language and cultural heritage, therefore it is important to ensure the full existence and development of the Latgalian language in the digital space. (…) Currently, the provision of speech technologies for the Latgalian language is approaching the level of the Latvian language,” says Normunds Grūzītis, Head of AiLab.
The transcription tool is not yet completely accurate, which is why everyone is given the opportunity to make corrections to the transcript. Also, to improve the tool's performance, especially for recognizing different dialects, more speech data is needed! You can contribute to the recognition of Latgalian speech in the digital environment by recording, listening to and correcting sentences on the Bolsu tolka website!
“Diverse speakers are very important to us. By diversity, I mean age, gender, which also includes speech variations. Some speak faster, some slower. Some may have an accent, and the peculiarities of dialects are also really important,” says Kristīne Pokratniece (research assistant at AiLab) in a conversation with LTV news on the air. You can watch the program here.
The trained AI model is available to anyone at https://ltg.late.ailab.lv/, there is no need to create a user account.
You can watch a demonstration of how the LATE digital tool for transcribing the Latgalian language works here. In turn, the text in this demonstration video is read by another AI model, created in collaboration with the Latvian Library for the Blind.
Researchers from the Rēzekne Academy of Technology, AiLab, the Institute of Literature, Folklore and Art of the University of Latvia are participating in the creation and development of the data sets (speech corpora) necessary for training the Latgalian and Latvian language artificial intelligence models, while cooperation with the Latvian Open Technology Association has played a significant role in the successful organization and implementation of “Balsu talka”.
The training of Latvian and Latgalian language speech recognition models and the development of the LATE tool were initiated within the National Research Program “Letonika” and are being actively continued within the Language Technology Initiative project.

The project “Language Technology Initiative” (No. 2.3.1.1.i.0/1/22/I/CFLA/002) is co-financed by the European Union Recovery and Resilience Mechanism Investment and the State Budget.