А.Н. Кириллов, А.А. Крижановский.
Модель геометрической структуры синсета
// Труды КарНЦ РАН. No 8. Сер. Математическое моделирование и информационные технологии. 2016. C. 45-54
A.N. Kirillov, A.A. Krizhanovsky. Synset geometry structure model // Transactions of Karelian Research Centre of Russian Academy of Science. No 8. Mathematical Modeling and Information Technologies. 2016. Pp. 45-54
Key words: synonym; synset; neural network; corpus linguistics; word2vec; RusVectores; gensim; Russian Wiktionary
The goal of formalization, proposed in this paper, is to bring together, as near as possible, the theoretic linguistic problem of synonym conception and the computer linguistic methods based generally on empirical intuitive unjustified factors. Using the word vector representation we have proposed the geometric approach to mathematical modeling of synset. The word embedding is based on the neural networks (Skip-gram, CBOW), developed and realized as word2vec program by T. Mikolov. The standard cosine similarity is used as the distance between word-vectors. Several geometric characteristics of the synset words are introduced: the interior of synset, the synset word rank and centrality. These notions are intended to select the most significant synset words, i.e. the words which senses are the nearest to the sense of a synset. Some experiments with proposed notions, based on RusVectores resources, are represented.
Indexed at RISC