全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

基于熵值法改进的K最近邻算法
An Improved K-Nearest Neighbor Algorithm Based on Entropy Method

DOI: 10.12677/csa.2025.155145, PP. 735-740

Keywords: KNN算法,熵值法,动态加权,二分类问题
K-Nearest Neighbor Algorithm
, Entropy Method, Dynamic Weighting, Binary Classification Problem

Full-Text   Cite this paper   Add to My Lib

Abstract:

针对传统K最近邻(KNN)算法在处理多维度、多量纲数据时,常因特征分布不均衡及量纲差异导致分类性能下降的问题,本文提出了一种基于熵值赋权的改进KNN算法。该方法融合了熵值赋权与标准化欧氏距离的优点,通过引入信息熵来量化各特征的信息量,并依据其重要性构建自适应权重体系,从而在距离计算中对各特征进行差异化处理,减弱了传统距离度量对高权重特征的过度敏感性。实验环节选取UCI公共数据集中的多个数据集进行测试,结果表明改进算法在大部分时候准确率优于传统KNN算法,且在高维数据集上的提升尤为显著。通过参数寻优确定最佳K值后,该算法能够显著提升分类准确率,有效解决传统KNN算法在特征分布不均和量纲差异下的性能不足问题。
To address the issue of reduced classification performance in traditional K-Nearest Neighbors (KNN) when dealing with multidimensional and multi-scale data—often caused by imbalanced feature distributions and discrepancies in scales—this paper proposes an improved KNN algorithm based on entropy weighting. This method combines the advantages of entropy weighting and normalized Euclidean distance. By introducing information entropy to quantify the information content of each feature and constructing an adaptive weighting system based on their importance, the approach enables differentiated processing of each feature in the distance calculation, thereby reducing the over-sensitivity of traditional distance measures to high-weight features. Experiments conducted on several datasets from the UCI repository demonstrate that the improved algorithm generally achieves higher accuracy than the traditional KNN, with particularly significant improvements on high-dimensional datasets. After determining the optimal K value through parameter tuning, the algorithm substantially enhances classification accuracy, effectively overcoming the shortcomings of traditional KNN in scenarios with imbalanced feature distributions and scale differences.

References

[1]  杨易木. 基于KNN算法的电子档案信息文本自动分类方法[J]. 办公自动化, 2025, 30(5): 14-16.
[2]  刘福民, 凌思庆, 于音, 等. 基于KNN算法的数控机床加工过程异常检测方法研究[J]. 机床与液压, 2024, 52(21): 168-172.
[3]  梅俊, 陈建敏. 基于KNN算法在糖尿病预测中的应用[J]. 电脑与信息技术, 2024, 32(1): 7-9.
[4]  谢红, 赵洪野. 基于卡方距离度量的改进KNN算法[J]. 应用科技, 2015, 42(1): 10-14.
[5]  戚孝铭. 基于蜂群算法和改进KNN的文本分类研究[D]: [硕士学位论文]. 上海: 上海交通大学, 2013.
[6]  王佃来, 宿爱霞, 刘文萍. 基于BP改进的KNN算法在北京密云土地覆盖分类中的应用[J]. 科学技术与工程, 2020, 20(23): 9464-9471.
[7]  李婧. 一种改进的最近邻聚类算法[J]. 重庆工商大学学报(自然科学版), 2013, 30(10): 61-63.
[8]  徐彦刚. 数据挖掘算法研究综述[J]. 电脑知识与技术, 2024, 20(24): 64-66+69.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133