Manatee, Bonito and Word Sketches for Czech
Authors | |
---|---|
Year of publication | 2004 |
Type | Article in Proceedings |
Conference | Proceedings of the Second International Conference on Corpus Linguisitcs |
MU Faculty or unit | |
Citation | |
web | http://nlp.fi.muni.cz/publications/corpora2004_pary_smrz/ |
Field | Informatics |
Keywords | corpora; corpus management; statistics; word sketches |
Description | This paper deals with a newly designed and developed system Manatee that can be employed to manage corpora, especially extremely large ones with billions of words, and enables the efficient evaluation of complex queries and the computation of advanced statistics. The main functions of the tool are presented here, together with the introduction of its web-based graphical user interface, Bonito. The sophisticated statistical processing is demonstrated in an example of computing of Word Sketches. Special attention is paid to the definition of the word sketches for Czech and problems connected to its free word order |
Related projects: |