Creating an Annotated Health Record Dataset in a Limited-Resource Environment.

Warning

This publication doesn't include Institute of Computer Science. It includes Faculty of Informatics. Official publication website can be found on muni.cz.
Authors

ANETTA Krištof

Year of publication 2023
Type Article in Proceedings
Conference Proceedings of the Seventeenth Workshop on Recent Advances in Slavonic Natural Languages Processing, RASLAN 2023
MU Faculty or unit

Faculty of Informatics

Citation
web https://nlp.fi.muni.cz/raslan/2023/paper11.pdf
Keywords Electronic health records; EHR; annotation; named entity recognition; NER; medical concept mining
Description This paper demonstrates a workflow for creating a dataset of annotated electronic health records in an environment that is limited in terms of both language resources and expert availability. From preannotation using rule-based methods to the redundancy of multiple annotators per document and the resulting degrees of confidence for each annotation, including the possible avenues of data augmentation in order to be able to train large language models, this paper discusses the practical considerations of how to make the best of the resource-strapped situation shared by so many researchers who analyze health records.
Related projects:

You are running an old browser version. We recommend updating your browser to its latest version.

More info