Distributed Aspects of the System for Discovering Similar Documents

Kasprzak,  Jan; Brandejs,  Michal; Brandejsová,  Jitka

Distributed Aspects of the System for Discovering Similar Documents

Warning

This publication doesn't include Institute of Computer Science. It includes Faculty of Informatics. Official publication website can be found on muni.cz.

Authors	KASPRZAK Jan BRANDEJS Michal BRANDEJSOVÁ Jitka
Year of publication	2009
Type	Article in Proceedings
Conference	Proceedings of the Third International Conference on Internet Technologies and Applications
MU Faculty or unit	Faculty of Informatics
Citation
web	http://www.ita09.org/
Field	Informatics
Keywords	Theses Archive Plagiarism Similar documents Distributed computing
Description	With wide deployment of e-learning methods such as computer-mediated communication between the students and teachers, including papers and essays submission and evaluation, it has become much easier for students to base those works on electronic resources, including the plagiarization of the work of other people. In this paper we will briefly present a system for discovering similarities in a large base of documents, which has been in production use inside the Czech National Archive of Graduate Theses since January 2008. We will then focus on the distributed aspects of such a system, especially on the task of creating and maintaining the index for discovering the similarities on a cluster of commodity computers.
Related projects:	Czech Republic membership in the European Research Consortium for Informatics and Mathematics