On Disambiguation in Czech Corpora

Warning

This publication doesn't include Institute of Computer Science. It includes Faculty of Informatics. Official publication website can be found on muni.cz.
Authors

POPELÍNSKÝ Lubomír PAVELEK Tomáš PTÁČNÍK Tomáš

Year of publication 2000
MU Faculty or unit

Faculty of Informatics

Description Lemma disambiguation means finding the basic word form, typically nominative singular for nouns or infinitive for verbs. We developed a multistrategy method for lemma disambiguation of unannotated text. The method is based on a combination of inductive logic programming and instance-based learning. We present results of the most important subtasks of lemma disambiguation for Czech language. Although no expert knowledge on Czech grammar has been used the accuracy reaches 90% with a fraction of words remaining ambiguous. We also display first results of tag disambiguation.
Related projects:

You are running an old browser version. We recommend updating your browser to its latest version.

More info