|
计算机应用研究 2012
Joint incremental topic modeling by fusing text and link
|
Abstract:
This paper proposed an incremental algorithm integrating both content and link for topic modeling based on link-PLSA.Firstly,it performed topic modeling on the initial dataset.And then presented a reasonable technique of updating parameter of model to effectively integrate the newly arriving documents and linked into the original model.Furthermore,it proposed an adaptive asymmetric learning approach to fuse the latent topics of both content and link modality.For each webpage,it fused the distribution over topics of each model by multiplying different weights,which determined by the entropy of the distribution of words.A better topic modeling could be achieved as the probabilistic structure associates content and link modalities properly.Empirical experiments on two data sets with different link structure show that the approach is time saving and indicate that the model leads to systematic improvements in the quality of classification.Besides,this paper presented some interesting visualizations generated by the model.