OALib Journal期刊
ISSN: 2333-9721
费用：99美元

投递稿件

查看量	下载量

相关文章
更多...

Computer Science and Application 2023

基于HBase的数据高效读技术研究
Research on Efficient Data Reading Technology Based on HBase

DOI: 10.12677/CSA.2023.133034, PP. 358-368

闵继勇, 史爱武, 武俊, 田贞才

Keywords: HBase，Redis，缓存，二级索引
HBase, Redis, Cache, Secondary Index

Full-Text Cite this paper Add to My Lib

Abstract:

在大数据时代，关系型数据库面临着海量数据存储的挑战。HBase是一种基于列存储的NoSQL数据库，广泛应用于大数据存储。HBase在数据的检索方面仍然存在着不足之处，本文对HBase的数据检索技术进行分析和研究，针对目前存在的问题提出了相应的改进和优化。针对HBase在查询数据时需要访问磁盘，查询速度慢的问题，本文提出使用Redis索引HBase的热点数据，并综合考虑数据的查询频率、更新频率和历史积热对缓存的影响，设计了一种基于数据查询频率和更新频率的热值缓存驱逐策略，提高了Redis的缓存命中率。针对HBase在检索非行键字段时需要全表扫描，检索效率低的问题，本文提出了为非行键字段建立二级索引的策略，设计了一种基于协处理器和Redis的二级索引方案。实验结果表明改进后的缓存驱逐策略的命中率高于LRU策略，在查询模块引入Redis缓存热点数据并且为非行键字段建立二级索引后，改进后的查询模块的数据检索性能提升显著，极大地提高了查询速度。
In the era of big data, relational databases face the challenge of massive data storage. HBase is a NoSQL database based on column storage, which is widely used in big data storage. HBase still has shortcomings in data retrieval. This paper analyzes and studies the data retrieval technology of HBase, and puts forward corresponding improvements and optimization for the existing problems. In view of the problem that HBase needs to access the disk when querying data and the query speed is slow, this paper proposes to use Redis to index the hot data of HBase, and comprehensively considers the impact of data query frequency, update frequency and historical heat accumulation on the cache, and designs a calorific cache eviction strategy based on data query frequency and update frequency, which improves Redis cache hit rate. Aiming at the problem that HBase needs full table scanning when retrieving non-row key fields, and the retrieval efficiency is low, this paper proposes a strategy of establishing secondary index for non-row key fields, and designs a secondary index scheme based on coprocessor and Redis. The experimental results show that the hit rate of the im-proved cache eviction strategy is higher than that of the LRU strategy. After the Redis cache hot da-ta is introduced into the query module and the secondary index is established for non-row key fields, the data retrieval performance of the improved query module is significantly improved, and the query speed is greatly improved.

References

[1]	Medel, V., Rana, O. and Ba？ares, J.á. (2016) Modelling Performance & Resource Management in Kubernetes. Pro-ceedings of the 9th International Conference on Utility and Cloud Computing, Shanghia, 6-9 December 2016, 257-262. https://doi.org/10.1145/2996890.3007869
[2]	Li, Z.H., Zhang, Y. and Liu, Y.H. (2017) Towards a Full-Stack Dev Ops Environment (Platform-as-a-Service) for Cloud-Hosted Applications. Tsinghua Science and Technology, 22, 1-9. https://doi.org/10.1109/TST.2017.7830891
[3]	Chi, Y.P., Yang, Y.T., Xu, P. and Yang, J.X. (2008) Design and Implementation of Monitoring Data Storage and Processing Scheme Based on Hadoop. Computer Applications and Software, 35, 58-63+157.
[4]	Xia, C.J. (2015) Research on HBase retrieval Speed Improvement based on Coprocessor mechanism. Master’s Thesis, Hunan University, Changsha.
[5]	Konstantinou, I., Tsoumakos, D. and Mytilinis, I. (2013) DBalancer: Distributed Load Balancing for NoSQL Data-Stores. Proceedings of the 2013 ACM SIGMOD Inter-national Conference on Management of Data, New York, 22-27 June 2013, 1037-1040. https://doi.org/10.1145/2463676.2465232
[6]	Chang, F., et al. (2008) Bigtable: A Distributed Storage System for Structured Data. ACM Transactions on Computer Systems, 26, 1-26. https://doi.org/10.1145/1365815.1365816
[7]	Xia, L., Chen, H. and Sun, H. (2014) An Optimized Load Balance Based on Data Popularity on HBASE. Proceedings of 2nd International Conference on Information Technology and Electronic Commerce, Dalian, 20-21 December 2014, 234-238. https://doi.org/10.1109/ICITEC.2014.7105609
[8]	Dang, P. (2019) Design and Implementation of HBase Hier-archical Auxiliary Index System. Master’s Thesis, Xidian University, Xi’an.
[9]	丁飞, 陈长松, 张涛, 等. 基于协处理器的HBase区域级第二索引研究与实现[J]. 计算机应用, 2014(Z1): 181-185.
[10]	Levandoski, J.J., Larson, P.？. and Stoica, R. (2013) Identifying Hot and Cold Data in Main-Memory Databases. 2013 IEEE 29th International Conference on Data Engineering (ICDE), Brisbane, 8-12 April 2013, 26-37. https://doi.org/10.1109/ICDE.2013.6544811
[11]	Zhang, C., Li, F. and Jestes, J. (2013) Efficient Parallel kNN Joins for Large Data in MapReduce. Proceedings of the 15th International Conference on Extending Database Technol-ogy, Berlin, 26-30 March 2012, 38-49. https://doi.org/10.1145/2247596.2247602
[12]	Wei, G., et al. (2016) HiBase: A Hierarchical Indexing Mechanism and System for Efficient Hbase Query. Chinese Journal of Computers, 39, 140-153 .
[13]	Qu, L. and Li, X. (2017) A HBase index buffer solution based on TwemProxy. Information Technology and Management, 10, 103-107+117.
[14]	Li, K., Guo, K. and Guo, H. (2019) Financial Big Data Hot and Cold Separation Scheme Based on Hbase and Redis. 2019 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom), Xiamen, 16-18 December 2019, 1612-1617. https://doi.org/10.1109/ISPA-BDCloud-SustainCom-SocialCom48970.2019.00237

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133

基于HBase的数据高效读技术研究Research on Efficient Data Reading Technology Based on HBase

基于HBase的数据高效读技术研究
Research on Efficient Data Reading Technology Based on HBase