Project information
Inteligentní software pro sémantické hledání dokumentů
(ISSHD)
- Project Identification
- TD03000295
- Project Period
- 1/2016 - 12/2017
- Investor / Pogramme / Project type
-
Technology Agency of the Czech Republic
- OMEGA - Programme of support of applied social science research and experimental development
- MU Faculty or unit
-
Faculty of Informatics
- doc. RNDr. Petr Sojka, Ph.D.
- RNDr. Martin Líška
- RNDr. Michal Růžička, Ph.D.
- RNDr. Vít Starý Novotný, Ph.D.
- James Edward Thomas, M.A.
- Project Website
- https://scaletext.com
- Keywords
- scalable semantic search systems; semantic search; document topic modeling; machine learning; search; deep learning
- Cooperating Organization
-
RaRe Technologies s.r.o.
- Responsible person RNDr. Radim Řehůřek, Ph.D.
- Responsible person RNDr. Radim Řehůřek, Ph.D.
- Responsible person RNDr. Jan Pomikálek, Ph.D.
- Responsible person RNDr. Jan Rygl
Our society, research and culture is defined by words, which in today's information society
constitute _documents_.
Project goal is to develop a database system (software),
which will allow searching based on related documents based on their _meaning_ (semantics).
System Scaletext consists from three parts:
- semantic analysis: arbitrary unstructured document in natural language (English, Czech) is analyzed
- indexing: document topics and structure are represented and stored internally using _semantic_
representation in such a way, that system is then capable of semantic similarity search given a document query.
- search: given input query document, system finds semanticaly closed documents, that are closest to [latent] meaning of the query, even though they do not share same keywords
Results
https://www.rvvi.cz/cep?s=jednoduche-vyhledavani&ss=detail&n=0&h=TD03000295
Publications
Total number of publications: 9
2018
-
Implementation Notes for the Soft Cosine Measure
Proceedings of the 27th ACM International Conference on Information and Knowledge Management (CIKM '18), year: 2018
-
Weighting of Passages in Question Answering
Proceedings of the Twelfth Workshop on Recent Advances in Slavonic Natural Language Processing, RASLAN 2018, year: 2018
2017
-
Flexible Similarity Search of Semantic Vectors Using Fulltext Search Engines
CEUR Workshop Proceedings, Vol. 1923, year: 2017
-
Math Information Retrieval for Digital Libraries
Year: 2017, type:
-
ScaleText
Year: 2017
-
Semantic Similarities between Locations based on Ontology
Proceedings of the Eleventh Workshop on Recent Advances in Slavonic Natural Language Processing, RASLAN 2017, year: 2017
-
Semantic Vector Encoding and Similarity Search Using Fulltext Search Engines
Proceedings of the 2nd Workshop on Representation Learning for NLP, RepL4NLP 2017 c/o ACL 2017, year: 2017
-
Vector Space Representations in Information Retrieval
Year: 2017, type:
2016
-
ScaleText: The Design of a Scalable, Adaptable and User-Friendly Document System for Similarity Searches : Digging for Nuggets of Wisdom in Text
Proceedings of the Tenth Workshop on Recent Advances in Slavonic Natural Language Processing, RASLAN 2016, year: 2016