Speech to text

Supported formats

Korpusomat enables automatic transcription of audio and video recordings into text format. The system supports the following file formats:

  • .m4a – Apple audio format (MPEG-4 Audio)

  • .wav – Waveform Audio File Format

  • .mp3 – MPEG Audio Layer III format

  • YouTube URL – direct processing of videos from YouTube

Supported languages

The system automatically detects the recording language. Language detection is not correlated with the corpus language and works independently of its settings.

Full list of supported languages (100 languages):

English

Chinese

German

Spanish

Russian

Korean

French

Japanese

Portuguese

Turkish

Polish

Catalan

Dutch

Arabic

Swedish

Italian

Indonesian

Hindi

Finnish

Vietnamese

Hebrew

Ukrainian

Greek

Malay

Czech

Romanian

Danish

Hungarian

Tamil

Norwegian

Thai

Urdu

Croatian

Bulgarian

Lithuanian

Latin

Maori

Malayalam

Welsh

Slovak

Telugu

Persian

Latvian

Bengali

Serbian

Azerbaijani

Slovenian

Kannada

Estonian

Macedonian

Breton

Basque

Icelandic

Armenian

Nepali

Mongolian

Bosnian

Kazakh

Albanian

Swahili

Galician

Marathi

Punjabi

Sinhala

Khmer

Shona

Yoruba

Somali

Afrikaans

Occitan

Georgian

Belarusian

Tajik

Sindhi

Gujarati

Amharic

Yiddish

Lao

Uzbek

Faroese

Haitian creole

Pashto

Turkmen

Nynorsk

Maltese

Sanskrit

Luxembourgish

Myanmar

Tibetan

Tagalog

Malagasy

Assamese

Tatar

Hawaiian

Lingala

Hausa

Bashkir

Javanese

Sundanese

Cantonese

Tutorial

Krok 1 Krok 2

First, you need to click the Pobierz (Download) button.

Krok 3

Next, fill in the metadata fields. Some of them (Title, Author, Date) can be autofilled.

Krok 4 Krok 5 Krok 6 Krok 7

After the transcription is complete, it is possible to edit the text by clicking the Edytuj tekst (Edit text) button.

Krok 8