An efficient algorithm for building a distributional thesaurus

Warning

This publication doesn't include Institute of Computer Science. It includes Faculty of Informatics. Official publication website can be found on muni.cz.
Authors

RYCHLÝ Pavel KILGARRIFF Adam

Year of publication 2007
Type Article in Proceedings
Conference Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions
MU Faculty or unit

Faculty of Informatics

Citation
Web http://www.aclweb.org/anthology/P/P07/P07-2011
Field Informatics
Keywords text corpus; distributional thesaurus
Description Gorman and Curran (2006) argue that thesaurus generation for billion+-word corpora is problematic as the full computation takes many days. We present an algorithm with which the computation takes under two hours. We have created, and made publicly available, thesauruses based on large corpora for (at time of writing) seven major world languages. The development is implemented in the Sketch Engine.
Related projects:

You are running an old browser version. We recommend updating your browser to its latest version.

More info