Predicting Pronunciation Types in the Sloleks Morphological Lexicon of Slovene
Predicting Pronunciation Types in the Sloleks Morphological Lexicon of Slovene
0.25
0.5
0.75
1.25
1.5
1.75
2
Wepresent an experiment dealing with the automatic prediction of pronunciation types for lemmas in the Sloleks Morphological Lexicon of Slovene. We perform a statistical analysis on a number of mostly 𝑛-gram-based features and use a set of statistically significant features to train and test several machine learning models to discriminate between lemmasfor whichaphonetic transcription can be generated automatically using Slovene grapheme-to-phoneme (G2P) conversion rules (e.g. Novak), and lemmas with pronunciation that follows other G2P rules (e.g. Shakespeare).