Finding Semantically Related Words in Large Corpora

Smrž,  Pavel; Rychlý,  Pavel

Finding Semantically Related Words in Large Corpora

Warning

This publication doesn't include Institute of Computer Science. It includes Faculty of Informatics. Official publication website can be found on muni.cz.

Authors	SMRŽ Pavel RYCHLÝ Pavel
Year of publication	2001
Type	Article in Proceedings
Conference	Text, Speech and Dialogue, 4th International Conference, TSD 2001
MU Faculty or unit	Faculty of Informatics
Citation
web	http://nlp.fi.muni.cz/publications/tsd2001_smrz_pary/
Field	Computer hardware and software
Keywords	natural language processing; large corpus; semantically related words
Description	The paper deals with the linguistic problem of fully automatic grouping of semantically related words. We discuss the measures of semantic relatedness of basic word forms and describe the treatment of collocations. Next we present the procedure of hierarchical clustering of a very large number of semantically related words and give examples of the resulting partitioning of data in the form of dendrogram. Finally we show a form of the output presentation that facilitates the inspection of the resulting word clusters.
Related projects:	Human-computer interaction, dialog systems and assistive technologies