Využití corpus driven metod při corpus based výzkumu
Title in English | The Corpus-driven and Corpus-based Approach in Practice |
---|---|
Authors | |
Year of publication | 2015 |
Type | Article in Proceedings |
Conference | Proměna jazyka a jeho výzkumu v době nových médií a technologií |
MU Faculty or unit | |
Citation | |
Web | http://www.phil.muni.cz/wucj/home/News/2015/sbornik-promena-jazyka-a-jeho-vyzkumu-v-dobe-novych-medii-a-technologii |
Field | Linguistics |
Keywords | corpus; corpus based; corpus driven; overgeneration; undegeneration; lemma; tag; word formation |
Description | Overgeneration is a property of formal rules which does not cover the exact language data it was designed for. It is equivalent to low precision and occurs when a formal rule (corpus query) is too widely defined. Undergeneration is equivalent to low recall and occurs when a formal rule (corpus query) is too narrowly specified. Both are caused by the ambiguity of natural language. In this article we shall demonstrate how to use corpus driven method in optimization of retrieval technique for corpus based analysis. On a specific example of retrieval of candidates for a word formation model (kutil) we shall show how to use observation of corpus data for progressive specification of corpus query. |
Related projects: |