From Examples to Patterns: LLM-Generated Reg-ular Expressions for Entity Extraction in Czech Clinical Texts

Zelina,  Petr

From Examples to Patterns: LLM-Generated Reg-ular Expressions for Entity Extraction in Czech Clinical Texts

Warning

This publication doesn't include Institute of Computer Science. It includes Faculty of Informatics. Official publication website can be found on muni.cz.

Authors	ZELINA Petr
Year of publication	2024
Type	Article in Proceedings
Conference	Proceedings of the Eighteenth Workshop on Recent Advances in Slavonic Natural Languages Processing, RASLAN 2024
MU Faculty or unit	Faculty of Informatics
Citation
web	https://nlp.fi.muni.cz/raslan/2024/paper6.pdf
Keywords	NLP; LLM; regex; text mining; clinical notes
Description	Entity extraction in clinical texts is essential for converting unstructured data in clinical notes into structured formats, facilitating large-scale analysis and clinical decision support. Traditional methods often rely on handcrafted regular expressions (regexes), which, while effective, demand significant time and specialized knowledge to create -- resources that healthcare professionals may lack. We introduce a novel approach leveraging large language models (LLMs) to automate regex generation for clinical entity extraction. Our method involves prompting LLMs to generate regex patterns from examples, followed by iterative refinement using a feedback loop. Despite regex limitations, this approach is practical for extracting frequently patterned information common in clinical texts, such as dates, specific data about medical procedures or event detection. Our experiments on Czech clinical notes show this method outperforms current SOTA genetic-programming-based methods for generating regular expression patterns from examples, especially when there are few of them.
Related projects:	Using artificial intelligence techniques for data processing, complex analysis and visualization of large-scale data