Fantastic Examples and Where to Find Them - Compiling Czech Dataset for Evaluating Dictionary Examples

Denisová,  Michaela; Rychlý,  Pavel

Fantastic Examples and Where to Find Them - Compiling Czech Dataset for Evaluating Dictionary Examples

Warning

This publication doesn't include Institute of Computer Science. It includes Faculty of Informatics. Official publication website can be found on muni.cz.

Authors	DENISOVÁ Michaela RYCHLÝ Pavel
Year of publication	2024
Type	Article in Proceedings
Conference	Proceedings of the Eighteenth Workshop on Recent Advances in Slavonic Natural Languages Processing
MU Faculty or unit	Faculty of Informatics
Citation
web	Plný text Domovská stránka workshopu
Keywords	Dictionary examples; GDEX; Evaluation
Description	Examples are an important part of a dictionary entry, helping users better understand the word and its usage in context. However, selecting good examples is a challenging and time-consuming task due to varying selection criteria and the vast amount of data to choose from. While different tools have been developed to address this, evaluation remains flawed and lacks standardisation. In this paper, we compile an evaluation dataset for the Czech language, using the GDEX tool and manual annotations to classify examples and explain the classification. Based on our findings, we propose general annotation guidelines to improve consistency. This dataset serves as a foundation for the unified evaluation of dictionary example scoring tools and opens discussion on how to annotate examples. Additionally, we make the dataset publicly available.
Related projects:	Using artificial intelligence techniques for data processing, complex analysis and visualization of large-scale data