|
A Data Quality Methodology for Heterogeneous DataKeywords: Data quality , Methodology , structured data , semistructured data , unstructured data Abstract: We present a Heterogenous Data Quality Methodology (HDQM) for Data Quality (DQ) assessment andimprovement that considers all types of data managed in an organization, namely structured datarepresented in databases, semistructured data usually represented in XML, and unstructured datarepresented in documents. We also define a meta-model in order to describe the relevant knowledgemanaged in the methodology. The different types of data are translated in a common conceptualrepresentation. We consider two dimensions widely analyzed in the specialist literature and used inpractice: Accuracy and Currency. The methodology provides stakeholders involved in DQ managementwith a complete set of phases for data quality assessment and improvement. A non trivial case study fromthe business domain is used to illustrate and validate the methodology.
|