Cthulhu Hails from Wales: N-gram Frequency Analysis of R'lyehian

Warning

This publication doesn't include Institute of Computer Science. It includes Faculty of Informatics. Official publication website can be found on muni.cz.
Authors

NOVOTNÝ Vít STARÁ Marie

Year of publication 2020
Type Article in Proceedings
Conference Proceedings of the Fourteenth Workshop on Recent Advances in Slavonic Natural Language Processing, RASLAN 2020
MU Faculty or unit

Faculty of Informatics

Citation
Web
Keywords H. P. Lovecraft; language identification; N-grams; R'lyehian
Description

R'lyehian is a unique fictional language penned by the prolific 20th century horror fiction author H. P. Lovecraft. Prior work in the area of the Lovecraftian mythos has not yet studied the similarities between R'lyehian and natural languages, which are crucial for determining its true origins.

We produced a comprehensive wordlist of R'lyehian and used open-source $N$-gram-based language identification tools to find the most similar natural languages to R'lyehian. From the comprehensive wordlist, we also constructed a frequency table of all unigraphs and digraphs in R'lyehian.

We show that R'lyehian is most similar to Celtic languages, which lays grounds for our hypothesis that R'lyeh, where Cthulhu lies dreaming, might be a place in Wales.

Our frequency tables will prove a useful resource for future work in the area of the Lovecraftian mythos.

Related projects:

You are running an old browser version. We recommend updating your browser to its latest version.

More info