%0 Journal Article %T 基于层次分裂算法的价格指数序列聚类<br>Cluster analysis of a price index series based on the hierarchical division algorithm %A 褚洪洋 %A 柴跃廷 %A 刘义 %J 清华大学学报(自然科学版) %D 2015 %R 10.16511/j.cnki.qhdxxb.2015.21.010 %X 目前,中国国家统计局发布的消费者价格指数不包含网购部分。随着电子商务的快速发展,网购价格指数的发布已经成为亟待解决的问题。互联网环境下,网购交易数据能够实时获取,因此网购价格指数应当更为准确可靠。然而,由于企业对商品分类标准不同,分类价格指数的计算需要首先解决基本价格指数的分类问题。该文提出一种基于层次分裂算法的价格指数序列聚类方法,选择基于相关系数的距离和Manhattan距离作为距离度量,分两步对价格指数序列进行聚类。算法通过设置不同的终止条件停止分裂,不需要事先设置簇数。引用实例对算法进行验证,有效划分了226组价格指数序列中的219组,取得了较好的聚类效果。<br>Abstract:At present, e-commerce trade is not included in the consumer price index published by the National Bureau of Statistics of China. With the rapid development of e-commerce, the development of an online consumer price index(CPI) has become an urgent problem. Online transaction data supports real-time access and corresponds to actual transactions. Therefore, an online CPI should be more real-time and more accurate than the traditional CPI. However, the calculation of a classification price index requires classification of elementary price indexes, because there are differences in the classification standards used by different enterprises. This paper describes a hierarchical division algorithm for cluster analyses of price index series, which uses a correlation coefficient based distance and the Manhattan distance to measure the distances between price index series and then divides the series by two steps. The method uses ending conditions to stop the divisions, so that the cluster count need not be preset. Finally, the method is applied to practical cases with 219 of 226 price index series effectively divided, which indicates a good clustering result. %K 价格指数序列 %K 层次分裂算法 %K 基于相关系数的距离 %K Manhattan距离 %K < %K br> %K price index series %K divisive hierarchical clustering method %K correlation coefficient based distance %K Manhattan distance %U http://jst.tsinghuajournals.com/CN/Y2015/V55/I11/1178