A New Czech Pipeline in Sketch Engine

Warning

This publication doesn't include Institute of Computer Science. It includes Faculty of Informatics. Official publication website can be found on muni.cz.
Authors

OHLÍDALOVÁ Vlasta JAKUBÍČEK Miloš

Year of publication 2024
Type Article in Proceedings
Conference Recent Advances in Slavonic Natural Language Processing, RASLAN 2024
MU Faculty or unit

Faculty of Informatics

Citation
web https://nlp.fi.muni.cz/raslan/2024/paper15.pdf
Keywords Morphological analysis; corpora annotation
Attached files
Description This paper introduces a new Czech pipeline that is now available in Sketch Engine. It describes the tools used for this pipeline and for some of them, we add details of how they were altered in recent years. The most complex part discusses adjustment of the training data used for Czech language – the DESAM corpus – and its effect on accuracy of the POS tagging performed by RFTagger.
Related projects:

You are running an old browser version. We recommend updating your browser to its latest version.

More info