|
New Word Vector Representation for Semantic Clustering Une nouvelle représentation vectorielle pour la classi cation sémantiqueKeywords: clustering , semantic concepts , word vector representation Abstract: The idea we defend in this paper is the possibility to obtain signi cant semantic concepts using clustering methods. We start by de ning some semantic measures to quantify the semantic relations between words. Then, we use some clustering methods to build up concepts in an automatic way. We test two well known methods: the K-means algorithm and the Ko- honen maps. Then, we propose the use of a Bayesian network conceived for clustering and called AutoClass. To group the words of the vocabulary in various classes, we test three vector representations of words. The rst is a simple contextual representation. The second associates to each word a vector which represents its similarity with each word of the vocabulary. The third representation is a combination of the rst and the second one.
|