Corpus Factory

Varování

Publikace nespadá pod Ústav výpočetní techniky, ale pod Fakultu informatiky. Oficiální stránka publikace je na webu muni.cz.
Autoři

KILGARRIFF Adam REDDY Siva POMIKÁLEK Jan

Rok publikování 2009
Druh Článek ve sborníku
Fakulta / Pracoviště MU

Fakulta informatiky

Citace
www http://www.kilgarriff.co.uk/Publications/2009-KilgReddyPomikalek-asialex-CorpFactory.doc
Popis State-of the art lexicography requires corpora, but for many languages there are no large, general-language corpora available. Until recently, all but the richest publishing houses could do little but shake their heads in dismay as corpus-building was long, slow and expensive. But with the advent of the Web it can be highly automated and thereby fast and inexpensive. We have developed a ‘corpus factory’ where we build lexicographic corpora. In this paper we describe the method we use, and how it has worked, and how various problems were solved, for five languages: Dutch, Hindi, Telugu, Thai and Vietnamese. The corpora we have developed are available for use in the Sketch Engine corpus query tool.
Související projekty:

Používáte starou verzi internetového prohlížeče. Doporučujeme aktualizovat Váš prohlížeč na nejnovější verzi.

Další info