Information Extraction from Business Documents

Geletka,  Martin; Bankovič,  Mikuláš; Meluš,  Dávid; Ščavnická,  Šárka; Štefánik,  Michal; Sojka,  Petr

Information Extraction from Business Documents

Warning

This publication doesn't include Institute of Computer Science. It includes Faculty of Informatics. Official publication website can be found on muni.cz.

Authors	GELETKA Martin BANKOVIČ Mikuláš MELUŠ Dávid ŠČAVNICKÁ Šárka ŠTEFÁNIK Michal SOJKA Petr
Year of publication	2022
Type	Article in Proceedings
Conference	Recent Advances in Slavonic Natural Language Processing (RASLAN 2022)
MU Faculty or unit	Faculty of Informatics
Citation
web	fulltext PDF
Keywords	OCR; Multi-modal learning; Information extraction; Transformers; Structured Documents
Description	Document AI is a relatively new research topic that refers to techniques for automatically reading, understanding, and analyzing business documents. Nowadays, many companies extract data from business documents through manual efforts that are time-consuming and expensive, requiring manual customization or configuration. This paper describes techniques to address these problems, apply them to real-world data, and implement them to an end-to-end solution for automatic information extraction from business documents.
Related projects:	Aplikovaný výzkum v oblastech vyhledávání, analýz a vizualizací rozsáhlých dat, zpracování přirozeného jazyka a aplikované umělé inteligence Inteligentní back office