|
系统工程理论与实践 2003
The Discretization of Continuous Attributes Using Genetic Algorithms
|
Abstract:
The discretization of continuous attributes is an important method for compressing data and simplifying analysis, which is of the focuses in the domains of pattern recognition, machine learning and rough sets. Some discretization algorithms have been used such as MD, discretization based on entropy but there exist disadvantages in them. For example, the choice of initial set of cut dots is hard to be determined. The optimal discretization has been proved to be NP\|hard. Heuristics used by most algorithms usually give local minima though results sometimes are satisfactory. Based on the rough set theory, the problems mentioned above are firstly discussed in this paper. Then we transform the discretization of continuous attributes into 0\|1\|integer programming, which can be solved successfully by existent software such as lindo. Furthermore, a genetic algorithm using decimal encoding is proposed to compute the optimal discretization.