Abstract

We explore morphology-based and sub-word language modeling approaches proposed for morphologically rich languages, and evaluate and contrast them for Turkish broadcast news transcription task. In addition, as a morphology-based model, we improve our previously proposed morphology-integrated model for automatic speech recognition. This model is built by composing the finite-state transducer of the morphological parser with a language model over lexical morphemes. This approach provides a morphology-integrated search network with an unlimited vocabulary, generating only valid word forms while reducing the out-of-vocabulary rate and hence improving the word error rate. We also analyze the effect of morpho-tactics and morphological disambiguation on the speech recognition accuracy for the morphology-integrated model. The improved morphology-integrated model performs better than statistically derived sub-word models with added benefit of generating morpho-syntactic and semantic features.

Keywords

Computer scienceMorphemeNatural language processingWord error rateVocabularyArtificial intelligenceLanguage modelParsingTurkishWord (group theory)Agglutinative languageSpeech recognitionLinguistics

Affiliated Institutions

Related Publications

Publication Info

Year
2010
Type
article
Pages
5402-5405
Citations
40
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

40
OpenAlex

Cite This

Haşim Sak, Murat Saraçlar, Tunga Güngör (2010). Morphology-based and sub-word language modeling for Turkish speech recognition. , 5402-5405. https://doi.org/10.1109/icassp.2010.5494927

Identifiers

DOI
10.1109/icassp.2010.5494927