Speech to text
Supported formats
Korpusomat enables automatic transcription of audio and video recordings into text format. The system supports the following file formats:
.m4a – Apple audio format (MPEG-4 Audio)
.wav – Waveform Audio File Format
.mp3 – MPEG Audio Layer III format
YouTube URL – direct processing of videos from YouTube
Supported languages
The system automatically detects the recording language. Language detection is not correlated with the corpus language and works independently of its settings.
Full list of supported languages (100 languages):
English |
Chinese |
German |
Spanish |
Russian |
Korean |
French |
Japanese |
Portuguese |
Turkish |
Polish |
Catalan |
Dutch |
Arabic |
Swedish |
Italian |
Indonesian |
Hindi |
Finnish |
Vietnamese |
Hebrew |
Ukrainian |
Greek |
Malay |
Czech |
Romanian |
Danish |
Hungarian |
Tamil |
Norwegian |
Thai |
Urdu |
Croatian |
Bulgarian |
Lithuanian |
Latin |
Maori |
Malayalam |
Welsh |
Slovak |
Telugu |
Persian |
Latvian |
Bengali |
Serbian |
Azerbaijani |
Slovenian |
Kannada |
Estonian |
Macedonian |
Breton |
Basque |
Icelandic |
Armenian |
Nepali |
Mongolian |
Bosnian |
Kazakh |
Albanian |
Swahili |
Galician |
Marathi |
Punjabi |
Sinhala |
Khmer |
Shona |
Yoruba |
Somali |
Afrikaans |
Occitan |
Georgian |
Belarusian |
Tajik |
Sindhi |
Gujarati |
Amharic |
Yiddish |
Lao |
Uzbek |
Faroese |
Haitian creole |
Pashto |
Turkmen |
Nynorsk |
Maltese |
Sanskrit |
Luxembourgish |
Myanmar |
Tibetan |
Tagalog |
Malagasy |
Assamese |
Tatar |
Hawaiian |
Lingala |
Hausa |
Bashkir |
Javanese |
Sundanese |
Cantonese |
Tutorial
First, you need to click the Pobierz (Download) button.
Next, fill in the metadata fields. Some of them (Title, Author, Date) can be autofilled.
After the transcription is complete, it is possible to edit the text by clicking the Edytuj tekst (Edit text) button.