Searching for Significant Word Associations in Text Documents Using Genetic Algorithms

Žižka,  Jan; Šrédl,  Michal; Bourek,  Aleš

Searching for Significant Word Associations in Text Documents Using Genetic Algorithms

Warning

This publication doesn't include Institute of Computer Science. It includes Faculty of Informatics. Official publication website can be found on muni.cz.

Authors	ŽIŽKA Jan ŠRÉDL Michal BOUREK Aleš
Year of publication	2003
Type	Article in Proceedings
Conference	Computional Linguistics and Intelligent Text Processing
MU Faculty or unit	Faculty of Informatics
Citation
Field	Informatics
Keywords	machine learning; text document processing; genetic algorithms; naive Bayes method
Description	The paper describes experiments that used Genetic Algorithms for looking for important word assocoations (phrases) in unstructured text documents obtained from the Internet in the area of a specialized medicine branch. Genetic alforithms can evolve sets of word associations with assigned significance weights from the document categorization point of view (relevant and irrelevant documents). The categorization is similarly reliable like the naive Bayes classification based on individual words. In addition, genetic algorithms provided phrases consisting of one, two, and three words. The phrases were quite meaningful from the human point of view.
Related projects:	Human-computer interaction, dialog systems and assistive technologies