Predicting Pronunciation Types in the Sloleks Morphological Lexicon of Slovene

Wepresent an experiment dealing with the automatic prediction of pronunciation types for lemmas in the Sloleks Morphological Lexicon of Slovene. We perform a statistical analysis on a number of mostly 𝑛-gram-based features and use a set of statistically significant features to train and test several machine learning models to discriminate between lemmasfor whichaphonetic transcription can be generated automatically using Slovene grapheme-to-phoneme (G2P) conversion rules (e.g. Novak), and lemmas with pronunciation that follows other G2P rules (e.g. Shakespeare).