%0 Journal Article %T Classification for Imbalanced Microarray Data Based on Oversampling Technology and Random Forest
基于过采样技术和随机森林的不平衡微阵列数据分类方法研究 %A YU Hua-long %A GAO Shang %A ZHAO Jing %A QIN Bin %A
于化龙 %A 高尚 %A 赵靖 %A 秦斌 %J 计算机科学 %D 2012 %I %X In recent years, applying DNA microarray technology to diagnose for disease, especially for cancer, has been becoming one of hot topics in bioinformatics. In contrast with many other data carriers,microarray data generally holds some unique characteristics. A novel oversampling technology based on probability distribution was proposed to solve the problem brought by the characteristic of sample distribution imbalance of microarray data. 13y this technology, some reasonable pseudo samples would be created for the minority class to guarantee the balance between two classes. Then we used random forest to classify the samples belonging to different classes. Its effectiveness and feasibility were verified on two benchmark microarray datasets. Experimental results show that the proposed method can obtain better classification performance, compared with some traditional approaches. %K Microarray data %K Sample distribution imbalance %K Oversampling technology %K Probability distribution %K Random forest
微阵列数据 %K 样本分布不平衡 %K 过采样技术 %K 概率分布 %K 随机森林 %U http://www.alljournals.cn/get_abstract_url.aspx?pcid=5B3AB970F71A803DEACDC0559115BFCF0A068CD97DD29835&cid=8240383F08CE46C8B05036380D75B607&jid=64A12D73428C8B8DBFB978D04DFEB3C1&aid=4207430C62DE8EBE4FEBEBBA600B73D6&yid=99E9153A83D4CB11&vid=7C3A4C1EE6A45749&iid=94C357A881DFC066&sid=6235172E4DDBA109&eid=5D9D6A8FC2C66FD8&journal_id=1002-137X&journal_name=计算机科学&referenced_num=0&reference_num=0