Evaluating Word Embedding Models for Malayalam

Authors

  • Merin Cherian
  • Kannan Balakrishnan

DOI:

https://doi.org/10.47059/revistageintec.v11i4.2406

Abstract

An evaluation of static word embedding models for Malayalam is conducted in this paper. In this work, we have created a well-documented and pre-processed corpus for Malayalam. Word vectors were created for this corpus using three different word embedding models and they were evaluated using intrinsic evaluators. Quality of word representation is tested using word analogy, word similarity and concept categorization. The testing is independent of the downstream language processing tasks. Experimental results on Malayalam word representations of GloVe, FastText and Word2Vec are reported in this work. It is shown that higher-dimensional word representation and larger window size gave better results on intrinsic evaluators.

Downloads

Published

2021-07-29

Issue

Section

Articles