|
计算机科学技术学报 2000
Incremental mining of the schema of semistructured data
|
Abstract:
Semistructured data are specified in lack of any fixed and rigidschema, even though typically some implicit structure appears in the data. Thehuge amounts of on-line applications make it important and imperative to mine theschema of semistructured data, both for the users (e.g., to gather useful informationand facilitate querying) and for the systems (e.g., to optimize access). The criticalproblem is to discover the hidden structure in the semistructured data. Currentmethods in extracting Web data structure are either in a general way independentof application background, or bound in some concrete environment such as HTML,XML etc. But both face the burden of expensive cost and difficulty in keeping alongwith the frequent and complicated variances of Web data. In this paper) the problemof incremental mining of schema for semistructured data after the update of the rawdata is discussed. An algorithm for incrementally mining the schema of semistruc-tured data is provided, and some experimental results are also given, which show thatincremental mining for semistructured data is more efficient than non-incrementalmining.