Diverse queries and feature type selection for plagiarism discovery: Notebook for PAN at CLEF 2013
Authors | |
---|---|
Year of publication | 2013 |
Type | Article in Proceedings |
Conference | 2013 Cross Language Evaluation Forum Conference, CLEF 2013, CEUR Workshop Proceedings Volume 1179 |
MU Faculty or unit | |
Citation | |
Web | http://ceur-ws.org/Vol-1179/ |
Field | Informatics |
Keywords | suspicious document; plagiarism detection; search engine; source retrieval; stop word; text alignment; contextual n gram; word n gram; representative sentence; overlapping detection; snippet similarity; global postprocessing |
Description | This paper describes approaches used for the Plagiarism Detection task in PAN 2013 international competition on uncovering plagiarism, authorship, and social software misuse. We present modified three-way search methodology for Source Retrieval subtask and analyse snippet similarity performance. The results show, that presented approach is adaptable in real-world plagiarism situations. For the Detailed Comparison task, we discuss feature type selection and global postprocessing. Resulting performance is significantly better with the described modifications, and further improvement is still possible. |
Related projects: |