全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

嵌入结构信息的高频实时数据在线学习模型研究
Research on Online Learning Models for High-Frequency Real-Time Data Embedded with Structural Information

DOI: 10.12677/csa.2025.153056, PP. 39-53

Keywords: 高频交易,结构信息,在线学习,SOC模型,双层聚类
High-Frequency Trading
, Structural Information, Online Learning, SOC Model, Two-Layer Clustering

Full-Text   Cite this paper   Add to My Lib

Abstract:

高频交易(HFT)对市场价格波动的快速捕捉和高效套利能力受到现在金融市场的广泛关注。传统方法在处理高频数据时通常缺乏全面建模能力,因其数据复杂、噪声干扰以及趋势变化迅速等特性,对实时决策的精准和模型解释性提出了巨大挑战。针对上述问题,本文提出了一种基于结构信息嵌入与动量优化的在线学习模型(SOC, Structural Online Classification)。SOC模型通过多层次特征工程构建时间序列特征、局部极值特征和全局关系特征,以充分嵌入高频交易数据的结构信息;结合双层聚类方法(K-Means结合层次聚类)对高维特征进行降维与优化,显著增强分类器的透明性与可解释性。利用L2正则化与协方差正则化策略改良模型,结合Adam优化器实现高效的动量优化。本文在沪深300指数、UR股票等高频数据集上对SOC模型进行了性能验证。实验结果表明,SOC模型在分类准确性、均方误差和F1值等多个指标上均表现优异,其中沪深300指数的分类准确率达到98.73%,显著优于传统在线学习模型。通过对比传统神经网络模型与在线学习模型(SOC)在分类与回归任务中的表现,定量分析了在线学习模型的改进方向。实验结果表明,SOC模型在预测精度、泛化能力及内存效率(内存用量减少67.5%)等方面均显著优于传统模型,验证了在线学习机制在动态数据环境下的有效性。
High-frequency trading (HFT) has drawn extensive attention in the current financial market due to its rapid capture of market price fluctuations and efficient arbitrage capabilities. Traditional methods often lack comprehensive modeling capabilities when dealing with high-frequency data, as the data is complex, subject to noise interference, and characterized by rapid trend changes, posing significant challenges to the accuracy of real-time decision-making and model interpretability. To address these issues, this paper proposes a structural online classification model (SOC) based on structural information embedding and momentum optimization. The SOC model constructs time series features, local extremum features, and global relationship features through multi-level feature engineering to fully embed the structural information of high-frequency trading data. It combines a two-layer clustering method (K-Means combined with hierarchical clustering) to reduce the dimensionality and optimize high-dimensional features, significantly enhancing the transparency and interpretability of the classifier. The model is improved using L2 regularization and covariance regularization strategies, and the Adam optimizer is employed to achieve efficient momentum optimization. The performance of the SOC model was verified on high-frequency datasets such as the CSI 300 Index and UR stocks. Experimental results show that the SOC model performs exceptionally well in multiple metrics including classification accuracy, mean squared error, and F1 score. Specifically, the classification accuracy of the CSI 300 Index reached 98.73%, significantly outperforming traditional online learning models. By comparing the performance of traditional neural network models and the online learning model (SOC) in classification and regression tasks, the improvement directions of the online learning model were quantitatively analyzed. The

References

[1]  孙达昌, 毕秀春. 基于深度学习算法的高频交易策略及其盈利能力[J]. 中国科学技术大学学报, 2018, 48(11): 923-932.
[2]  Chen, J. (2024) High-Frequency Trading (HFT): What It Is, How It Works, and Example. Investopedia.
https://www.investopedia.com/terms/h/high-frequency-trading.asp

[3]  李志杰, 李元香, 王峰, 等. 面向大数据分析的在线学习算法综述[J]. 计算机研究与发展, 2015, 52(8): 1707-1721.
[4]  Li, Y. and Long, P.M. (2000) The Relaxed Online Maximum Margin Algorithm. NIPS Conference, Denver, 29 November-4 December 1999, 498-504.
[5]  Huang, G.-B., Liang, N.-Y., Rong, H.-J., Saratchandran, P. and Sundararajan, N. (2005) On-Line Sequential Extreme Learning Machine. IASTED International Conference on Computational Intelligence, Calgary, 4-6 July 2005, 232-237.
[6]  Zhao, P., Wang, J., Wu, P., Jin, R. and Hoi, S. C. (2013) Fast Bounded Online Gradient Descent Algorithms for Scalable Kernel-Based Online Learning. Pattern Recognition, 45, 495-499.
[7]  Crammer, K., Dekel, O., Keshet, J., Shalev-Shwartz, S. and Singer, Y. (2006) Online Passive-Aggressive Algorithms. Journal of Machine Learning Research, 7, 551-585.
[8]  Gentile, C. (2001) A New Approximate Maximal Margin Classification Algorithm. Journal of Machine Learning Research, 2, 213-242.
[9]  Dredze, M., Crammer, K. and Pereira, F. (2008) Confidence-Weighted Linear Classification. Proceedings of the 25th International Conference on Machine Learning, Helsinki, 5-9 July 2008, 264-271.
https://doi.org/10.1145/1390156.1390190

[10]  Crammer, K., Kulesza, A. and Dredze, M. (2009) Adaptive Regularization of Weight Vectors. Advances in Neural Information Processing Systems, Vancouver, 7-10 December 2009, 414-422.
[11]  Cortes, C. and Vapnik, V. (1995) Support-Vector Networks. Machine Learning, 20, 273-297.
https://doi.org/10.1007/bf00994018

[12]  Tushare团队. Tushare Pro: Python金融数据接口[EB/OL].
http://tushare.pro/
, 2024-12-26.
[13]  魏永合, 陈懿翀, 谷晓娇. 基于SincNet网络结合L2正则化的故障诊断[J]. 组合机床与自动化加工技术, 2024(8): 158-162.
[14]  徐龙飞, 郁进明. 基于ML loss的SVM分类算法[J]. 计算机应用研究, 2021, 38(2): 435-439.
[15]  Malik, A.S., Boyko, O., Aktar, N. and Young, W.F. (2001) A Comparative Study of MR Imaging Profile of Titanium Pedicle Screws. Acta Radiologica, 42, 291-293.
https://doi.org/10.1080/028418501127346846

[16]  安琪, 梁宇飞, 王耀强, 等. 基于K-Means聚类与PSO特征优选KNN的分级负荷识别方法[J]. 河北科技大学学报, 2022, 43(3): 249-258.
[17]  陈斌, 谢文波, 付勋, 等. 基于改进局部密度的可扩展层次聚类算法[J]. 南京大学学报(自然科学), 2024, 60(3): 370-382.
[18]  蔡启航, 徐彬, 董晓迪. 利用语义增强提示和结构信息的知识图谱补全模型[J/OL]. 计算机科学, 2025: 1-17.
http://kns.cnki.net/kcms/detail/50.1075.TP.20241028.1439.034.html
, 2025-02-10.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133