Volume 11 - Volume 11
Evaluating Word Embedding Models for Malayalam
Abstract
An evaluation of static word embedding models for Malayalam is conducted in this paper. In this work,
we have created a well-documented and pre-processed corpus for Malayalam. Word vectors were
created for this corpus using three different word embedding models and they were evaluated using
intrinsic evaluators. Quality of word representation is tested using word analogy, word similarity and
concept categorization. The testing is independent of the downstream language processing tasks.
Experimental results on Malayalam word representations of GloVe, FastText and Word2Vec are
reported in this work. It is shown that higher-dimensional word representation and larger window size
gave better results on intrinsic evaluators.
Paper Details
PaperID: 2406
Author's Name: Merin Cherian and Kannan Balakrishnan
Volume: Volume 11
Issues: Volume 11
Keywords: Malayalam, Word Embedding, Intrinsic Evaluation.
Year: 2021
Month: July
Pages: 3769-3783