Text Corpus with Errors

Pala,  Karel; Rychlý,  Pavel; Smrž,  Pavel

Text Corpus with Errors

Warning

This publication doesn't include Institute of Computer Science. It includes Faculty of Informatics. Official publication website can be found on muni.cz.

Authors	PALA Karel RYCHLÝ Pavel SMRŽ Pavel
Year of publication	2003
Type	Article in Proceedings
Conference	Text, Speech and Dialogue: Sixth International Conference, TSD 2003
MU Faculty or unit	Faculty of Informatics
Citation
web	http://nlp.fi.muni.cz/publications/tsd2003_pala_smrz_pary/
Field	Informatics
Keywords	error detection
Description	This paper presents a description of a Czech text corpus (Chyby) containing various kinds of errors such as spelling, typographical, grammatical, style, lexical. We explain how Chyby has been built, how the errors in it have been discovered, marked and annotated. The classification of the errors is presented and the statistics concerning the types of errors is given. The tools for annotating the errors are also described. To the best of our knowledge, this is first text corpus of this sort prepared for Czech.
Related projects:	Human-computer interaction, dialog systems and assistive technologies