1-1hit |
Freddy PERRAUD Christian VIARD-GAUDIN Emmanuel MORIN Pierre-Michel LALLICAN
This paper incorporates statistical language models into an on-line handwriting recognition system for devices with limited memory and computational resources. The objective is to minimize the error recognition rate by taking into account the sentence context to disambiguate poorly written texts. Probabilistic word n-grams have been first investigated, then to fight the curse of dimensionality problem induced by such an approach and to decrease significantly the size of the language model an extension to class-based n-grams has been achieved. In the latter case, the classes result either from a syntactic criterion or a contextual criteria. Finally, a composite model is proposed; it combines both previous kinds of classes and exhibits superior performances compared with the word n-grams model. We report on many experiments involving different European languages (English, French, and Italian), they are related either to language model evaluation based on the classical perplexity measurement on test text corpora but also on the evolution of the word error rate on test handwritten databases. These experiments show that the proposed approach significantly improves on state-of-the-art n-gram models, and that its integration into an on-line handwriting recognition system demonstrates a substantial performance improvement.