全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

基于SVM集成学习分析TCR与肽的特异性结合
Analysis of Specific Binding between TCR and Peptides Based on SVM Ensemble Learning

DOI: 10.12677/aam.2024.138344, PP. 3618-3624

Keywords: 集成学习,特异性结合,SVM,自动编码器
Ensemble Learning
, Specific Binding, SVM, Autoencoder

Full-Text   Cite this paper   Add to My Lib

Abstract:

本文提出了一种两阶段的方法来预测肽与T细胞受体(TCR)的特异性结合,旨在通过逐步优化预测过程来提高准确性。在第一阶段,我们采用堆叠式自动编码器对肽和TCR序列进行数值嵌入,特别是关注TCR β链的CDR3区域,这是肽识别的关键决定因素。通过Atchley因子编码,我们将氨基酸的生化特性转换为数字矩阵,并利用无监督学习捕捉序列的关键特征。实验结果表明,自动编码器能够高度忠实地重建原始序列,验证了数值嵌入的有效性。在第二阶段,我们基于第一阶段生成的数值编码,构建了一个集成学习模型来预测肽与TCR的特异性结合。该模型结合了不同内核的支持向量机(SVM)作为基学习器,并通过堆叠法集成它们的预测结果,以提高模型的泛化能力和捕捉基学习器之间的互补性。实验结果显示,集成学习模型的性能显著优于单一的SVM模型,其ROC值的提升,表明集成学习在预测肽与TCR特异性结合方面具有更高的准确性。本文的创新点在于结合了自动编码器的数值嵌入技术和集成学习的预测模型,不仅提高了预测的准确性,还为生物信息学领域的序列分析提供了新的方法论。
This article proposes a two-stage approach to predict the specific binding of peptides to T cell receptors (TCRs), aiming to improve accuracy by gradually optimizing the prediction process. In the first stage, we use a stacked autoencoder to numerically embed peptides and TCR sequences, particularly focusing on the CDR3 region of the TCR β chain, which is a key determinant of peptide recognition. By encoding the Atchley factor, we transform the biochemical characteristics of amino acids into a numerical matrix and use unsupervised learning to capture key features of the sequence. The experimental results show that the autoencoder can highly faithfully reconstruct the original sequence, verifying the effectiveness of numerical embedding. In the second stage, we constructed an ensemble learning model based on the numerical encoding generated in the first stage to predict the specific binding of peptides to TCR. This model combines support vector machines (SVM) with different kernels as base learners and integrates their prediction results through stacking to improve the model’s generalization ability and capture the complementarity between base learners. The experimental results show that the performance of the ensemble learning model is significantly better than that of a single SVM model, and the improvement in its ROC value indicates that ensemble learning has higher accuracy in predicting peptide TCR specific binding. The innovation of this article lies in the combination of numerical embedding technology of autoencoders and prediction models of ensemble learning, which not only improves the accuracy of prediction, but also provides a new methodology for sequence analysis in the field of bioinformatics.

References

[1]  黄金海, 魏敏杰. 肿瘤抗原CTL表位修饰策略研究进展[J]. 现代肿瘤医学, 2016, 24(13): 2170-2174.
[2]  王耀伟, 刘原麟, 潘若禹, 陈宝文, 罗方欣. 基于支持向量机的物流传送带场景下RFID识读性能研究[J]. 现代电子技术, 2024, 47(9): 150-156.
[3]  谢松. 多摄像头非刚体目标检测与空间定位系统[D]: [硕士学位论文]. 成都: 电子科技大学, 2015.
[4]  李杰. 基于穿戴式传感器的人体动作识别算法研究[D]: [硕士学位论文]. 哈尔滨: 哈尔滨工业大学, 2017.
[5]  林榆坚, 梁宁波. 以未知对未知——智能安全自我进化[J]. 通信技术, 2018, 51(8): 1949-1955.
[6]  吴今培. 基于核函数的主成分分析及应用[J]. 系统工程, 2005, 23(2): 117-120.
[7]  徐谦. 支持服务生命周期战略管理的业务协同技术研究[D]: [硕士学位论文]. 成都: 西南交通大学, 2017.
[8]  宋欢. 基于警度组合预测SVM警信号识别模型及其算法研究[D]: [硕士学位论文]. 南昌: 江西师范大学, 2018.
[9]  胡启国, 汪文珺. 核最小均方算法的特征映射和参数选择[J]. 南方农机, 2017, 48(13): 136-137.
[10]  刘泽坤. 基于叠加自编码器的输气管道泄漏口径识别与定位研究[D]: [硕士学位论文]. 青岛: 山东科技大学, 2020.
[11]  Pasha, S.T., Sikder, S., Rahman, M.M., Islam, A., Alam, M.Z., Habib, M.T. and Amin, M.A. (2024) IDF23-0446 Using an Ensemble Machine Learning Model with Explainable AI (XAI) to Diagnose Gestational Diabetes Mellitus. Diabetes Research and Clinical Practice, 209, Article ID: 111504.
https://doi.org/10.1016/j.diabres.2024.111504
[12]  Chen, D.Y. and Zhong, S.P. (2012) A Universal Steganalysis Algorithm for JPEG Image Based on Selective SVMs Ensemble. Advanced Materials Research, 532-533, 1548-1552.
https://doi.org/10.4028/www.scientific.net/AMR.532-533.1548

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133