Effective Corpus Virtualization
Autoři | |
---|---|
Rok publikování | 2014 |
Druh | Článek ve sborníku |
Konference | Challenges in the Management of Large Corpora (CMLC-2) |
Fakulta / Pracoviště MU | |
Citace | |
www | http://corpora.ids-mannheim.de/cmlc.html |
Obor | Informatika |
Klíčová slova | corpus; corpus linguistics; virtualization; indexing; database |
Přiložené soubory | |
Popis | In this paper we describe an implementation of corpus virtualization within the Manatee corpus management system. Under corpus virtualization we understand logical manipulation with corpora or their parts grouping them into new (virtual) corpora. We discuss the motivation for such a setup in detail and show space and time efficiency of this approach evaluated on a 11 billion word corpus of Spanish. |
Související projekty: |