An efficient algorithm for building a distributional thesaurus

Rychlý,  Pavel; Kilgarriff, Adam

An efficient algorithm for building a distributional thesaurus

Warning

This publication doesn't include Institute of Computer Science. It includes Faculty of Informatics. Official publication website can be found on muni.cz.

Authors	RYCHLÝ Pavel KILGARRIFF Adam
Year of publication	2007
Type	Article in Proceedings
Conference	Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions
MU Faculty or unit	Faculty of Informatics
Citation
web	http://www.aclweb.org/anthology/P/P07/P07-2011
Field	Informatics
Keywords	text corpus; distributional thesaurus
Description	Gorman and Curran (2006) argue that thesaurus generation for billion+-word corpora is problematic as the full computation takes many days. We present an algorithm with which the computation takes under two hours. We have created, and made publicly available, thesauruses based on large corpora for (at time of writing) seven major world languages. The development is implemented in the Sketch Engine.
Related projects:	Intelligent Models, Algorithms, Methods and Tools for the Semantic Web (realization) Centrum komputační lingvistiky Prostředky tvorby komplexní báze znalostí pro komunikaci se sémantickým webem v přirozeném jazyce