OALib Journal期刊
ISSN: 2333-9721
费用：99美元

投递稿件

查看量	下载量

相关文章
更多...

模式识别与人工智能 2011

有向标记根树之间的语义编辑距离

, PP. 816-824

康琪,马军

Keywords: 树编辑距离,文档聚类,结构相似度,语义相似性

Full-Text Cite this paper Add to My Lib

Abstract:

有向标记根树之间的编辑距离(TED)被广泛应用在文档的结构化相似度计算上。文中提出有向标记根树之间的语义编辑距离(TSED)的概念，并给出计算公式。组合TED和TSED形成距离测度，并应用在XML文档的结构聚类上。实验表明该距离模型在结构化聚类的准确率和召回率上明显优于单纯利用TED算法的聚类结果。该算法在时间复杂性上也等同于利用动态规划计算TED的最好算法。

References

[1]	Fiesca S, Manco G, Masciari E, et al. Fast Detection of XML Structural Similarity. IEEE Trans on Knowledge and Data Engineering, 2005, 17(2): 160-175
[2]	Ma Jun, Yi Yingnan, Tian Tian, et al. Retrieving Digital Artifacts from Digital Libraries Semantically // Proc of the International Conference on Intelligent Computing. Hefei, China, 2005: 340-349
[3]	Ma Jun, Hemmje M. Knowledge Management Support for Cooperative Research // Proc of the 17th World Computer Congress. Montreal, Canada, 2002: 280-284
[4]	Ma Jun, Shao Lu.An Optimal Algorithm for Fuzzy Classification Problem. Journal of Software, 2001,12 (4): 578-581 (in Chinese)(马军,邵陆.模糊聚类计算的最佳算法.软件学报, 2001, 12(4): 578-581)
[5]	Lei Jingsheng, Ma Jun, Jin Ting. A Fuzzy Clustering Technology Based on Hierarchical Neural Networks for Web Document. Journal of Computer Research and Development, 2006, 43(10): 1695-1699 (in Chinese) (雷景生,马军,靳婷.基于分级神经网络的Web文档模糊聚类技术.计算机研究与发展, 2006, 43(10):1696-1699)
[6]	Ma Jun, Chen Zhumin, Zhao Yan, et al. Computation of Document Structural Similarity Based on Part-Whole Matching. Pattern Recognition and Artificial Intelligence, 2007, 20(5): 630-635 (in Chinese)(马军,陈竹敏,赵嫣,等.基于部分-整体匹配的文档结构相似度计算.模式识别与人工智能, 2007, 20(5): 630-635)
[7]	Bertinoa E, Guerrinib E, Mesiti M. A Matching Algorithm for Measuring the Structural Similarity between an XML Document and a DOM and Its Applications. Information System, 2004, 29(1): 23-46
[8]	Marian A. Detecting Changes in XML Documents // Proc of the 18th International Conference on Data Engineering. San Jose, USA, 2002: 137-146
[9]	Buttler B. A Short Survey of Document Structure Similarity Algorithms // Proc of the International Conference on Internet Computing. Las Vegas, USA, 2004: 3-9
[10]	Chen Weimin. New Algorithm for Ordered Tree-to-Tree Correction Problem. Journal of Algorithms, 2001, 40(2): 135-158
[11]	Shasha D, Zhang Kaizhong. Fast Algorithms for the Unit Cost Editing Distance between Trees. Journal of Algorithms, 1990, 11(4): 135-145
[12]	Zhang Kaizhong. Algorithms for the Constrained Editing Problem between Ordered Labeled Trees and Related Problems. Pattern Recognition, 1995, 28(3): 463-474
[13]	Zhang Kaizhong, Shasha D. Simple Fast Algorithms for the Editing Distance between Trees and Related Problems. SIAM Journal of Computing, 1989, 18(6): 1245-1262
[14]	Nieman A, Jagadish H V. Evaluating Structural Similarity in XML Documents // Proc of the 5th International Workshop on the Web and Databases. Madison, USA, 2002: 61-66
[15]	Nayak R. Investigating Semantic Measures in XML Clustering // Proc of the IEEE/WIC/ACM International Conference on Web Intelligence. Hong Kong, China, 2006: 1042-1045
[16]	Sager T, Bernstein A, Pinzger M, et al. Detecting Similar Java Classes Using Tree Algorithms // Proc of the International Workshop on Mining Software Repositories. Shanghai, China, 2006: 65-71
[17]	Tai K. The Tree-to-Tree Correction Problem. Journal of ACM, 1979, 26(3): 422-433
[18]	Klein P N. Computing the Edit-Distance between Unrooted Ordered Trees // Proc of the 6th Annual European Symposium on Algorithms. Venice, Italy, 1998: 91-102
[19]	Wang Lian, Cheung D, Mamoulis N, et al. An Efficient and Scalable Algorithm for Clustering XML Documents by Structure. IEEE Trans on Knowledge and Data Engineering, 2004, 16(1): 82-96

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133