Creating an Annotated Health Record Dataset in a Limited-Resource Environment.

Anetta,  Krištof

Creating an Annotated Health Record Dataset in a Limited-Resource Environment.

Warning

This publication doesn't include Institute of Computer Science. It includes Faculty of Informatics. Official publication website can be found on muni.cz.

Authors	ANETTA Krištof
Year of publication	2023
Type	Article in Proceedings
Conference	Proceedings of the Seventeenth Workshop on Recent Advances in Slavonic Natural Languages Processing, RASLAN 2023
MU Faculty or unit	Faculty of Informatics
Citation
web	https://nlp.fi.muni.cz/raslan/2023/paper11.pdf
Keywords	Electronic health records; EHR; annotation; named entity recognition; NER; medical concept mining
Description	This paper demonstrates a workflow for creating a dataset of annotated electronic health records in an environment that is limited in terms of both language resources and expert availability. From preannotation using rule-based methods to the redundancy of multiple annotators per document and the resulting degrees of confidence for each annotation, including the possible avenues of data augmentation in order to be able to train large language models, this paper discusses the practical considerations of how to make the best of the resource-strapped situation shared by so many researchers who analyze health records.
Related projects:	Using artificial intelligence techniques for data processing, complex analysis and visualization of large-scale data