|
计算机科学 2010
Novel Approach for Extracting XML Schema Definition Based on Content Model Graph
|
Abstract:
Although XML Schema can be used to perform validation,querying and transformation on XML documents,a lot of XML documents in real applications have no XML Schema defined. This paper presented an approach, XSDInfer, to extract XMLSchema Definition (XSD) from XML documents automatically. Firstly, schema information harvesled from XML parsing was merged into the Content Model Graphs by applying rules. Then the graphs were transformed to content model expressions to generate the XSD. XSDInfer can scale to very large and deep recursive XML documents. It supports the context sensitive content model, and the generated XSD is more human-readable. Experiments show that XSDInfer achieves better performance both in scalability and expressiveness in contrast to the previous techniqucs.