Unsupervised Training of an HMM-based Speech Recognizer for Topic Classification

We address the problem of performing topic classification of speech when no transcriptions from the speech corpus of interest are available. The approach we take is one of incremental learning about the speech corpus starting with adaptive segmentation of the speech, leading to the generation of discovered acoustic units and a segmental recognizer for these units, and finally to an initial tokenization of the speech for the training of a HMM speech recognizer. The recognizer trained is BBN's Byblos system. We discuss the performance of this system and also consider the case when a small amount of transcribed data is available.

RELATED CATEGORIES

UNSUPERVISED LEARNING

Unsupervised Training of an HMM-based Speech Recognizer for Topic Classification

Herb Gish

RELATED CATEGORIES

MORE VIDEOS FROM THE EVENT

MORE VIDEOS FROM THE SAME CATEGORIES