|
计算机应用研究 2013
XML document latent information extractionalgorithm based on D-S evidence theory
|
Abstract:
Traditional XML document retrieval methods are mainly based on keywords' match, which ignore keywords' semantics and latent information contained in information combination. This paper proposed an algorithm of XML document latent information extraction based on D-S evidence theory. Firstly it used ontology to define the relationships between semantic concepts and the combination mode, and next proposed a retrieval model based on D-S evidence theory. Then it presented the computation of evidence weight, and finally designed a dynamic threshold with plausible function. It solved the problems of uncertainty in semantic match and retrieve of latent information. Furthermore, it presented the algorithm's application in the detection of personal and enterprises' sensitive information in e-government domain. The experiment proves that the proposed algorithm has higher precision and recall.