Manatee, Bonito and Word Sketches for Czech

Warning

This publication doesn't include Institute of Computer Science. It includes Faculty of Informatics. Official publication website can be found on muni.cz.
Authors

RYCHLÝ Pavel SMRŽ Pavel

Year of publication 2004
Type Article in Proceedings
Conference Proceedings of the Second International Conference on Corpus Linguisitcs
MU Faculty or unit

Faculty of Informatics

Citation
web http://nlp.fi.muni.cz/publications/corpora2004_pary_smrz/
Field Informatics
Keywords corpora; corpus management; statistics; word sketches
Description This paper deals with a newly designed and developed system Manatee that can be employed to manage corpora, especially extremely large ones with billions of words, and enables the efficient evaluation of complex queries and the computation of advanced statistics. The main functions of the tool are presented here, together with the introduction of its web-based graphical user interface, Bonito. The sophisticated statistical processing is demonstrated in an example of computing of Word Sketches. Special attention is paid to the definition of the word sketches for Czech and problems connected to its free word order
Related projects:

You are running an old browser version. We recommend updating your browser to its latest version.

More info