全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

基于Shark/Spark的分布式空间数据分析框架

DOI: 10.3724/SP.J.1047.2015.00401, PP. 401-407

Keywords: Spark,Hadoop,空间数据库,Shark,空间查询

Full-Text   Cite this paper   Add to My Lib

Abstract:

随着空间数据的与日俱增,传统依托于单节点的空间数据管理方法,已难以满足海量数据高并发的需求。云计算的兴起带来机遇与挑战,分布式技术与数据库技术的优势互补,为云计算下高效的数据管理提供了可能。本文提出一种在分布式计算引擎(Shark/Spark)中集合之关键技术(包括空间数据映射、空间数据加载、数据备份及空间查询等),将空间数据库对空间数据的高效存储、索引及查询优势与分布式计算引擎对复杂计算的优势相结合,实现一种基于Shark/Spark的分布式空间数据分析框架。在具体实现中,通过空间自定义函数和空间函数下推2种方式实现空间查询,结果表明,影响返回结果数据量的空间查询更适合下推给空间数据库完成,而不影响返回结果数据量的空间查询,利用分布式计算引擎直接运算更有优势。同时,通过与现有的一种分布式GIS方案(ArcGISonHadoop)对比发现,空间数据库的空间索引可有效提高查询效率,空间数据管理也更加独立。

References

[1]  Goodchild M, Haining R, Wise S. Integrating GIS and spatial data analysis: Problems and possibilities[J]. International Journal of Geographical Information Systems, 1992, 6(5): 407-423.
[2]  Yang C, Goodchild M, Huang Q, et al. Spatial cloud computing: How can the geospatial sciences use and help shape cloud computing?[J]. International Journal of Digital Earth, 2011, 4(4): 305-329.
[3]  Zhong Y, Han J, Zhang T, et al. Towards parallel spatial query processing for big spatial data[C]. 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops& PhD Forum, 2012.
[4]  Aji A, Wang F. High performance spatial query processing for large scale scientific data[C]. Proceedings of the on SIGMOD/PODS 2012 PhD Symposium. New York: ACM Press, 2012.
[5]  Cary A, Yesha Y, Adjouadi M, et al. Leveraging cloud computing in geodatabase management[C]. 2010 IEEE International Conference on Granular Computing (GrC), 2010.
[6]  Abadi D J. Data management in the cloud: Limitations and opportunities[J]. IEEE Data Engineering Bulletin, 2009, 32(1): 3-12.
[7]  Su X, Swart G. Oracle in-database Hadoop: When MapReduce meets RDBMS[C]. Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, 2012.
[8]  Abouzeid A, Bajda-Pawlikowski K, Abadi D, et al. HadoopDB: An architectural hybrid of MapReduce and DBMS technologies for analytical workloads[J]. Proceedings of the VLDB Endowment, 2009, 2(1): 922-933.
[9]  王珊, 王会举, 覃雄派, 等. 架构大数据: 挑战、现状与展望[J]. 计算机学报, 2011(10): 1741-1752.
[10]  Aji A, Wang F, Vo H, et al. Hadoop-GIS: A high performance spatial data warehousing system over MapReduce[J]. Proceedings of the VLDB Endowment International Conference on Very Large Data Bases, 2013, 6(11): 1009-1020.
[11]  Witayangkurn A, Horanont T, Shibasaki R. Performance comparisons of spatial data processing techniques for a large scale mobile phone dataset[C]. Proceedings of the 3rd International Conference on Computing for Geospatial Research and Applications. New York: ACM Press, 2012.
[12]  程昌秀. 空间数据库管理系统概论[M]. 北京: 科学出版社, 2012.
[13]  Zaharia M, Chowdhury M, Franklin M J, et al. Spark: Cluster computing with working sets[C]. Proceedings of the 2Nd USENIX Conference on Hot Topics in Cloud Computing. Berkeley: USENIX Association, 2010: 10.
[14]  Tabaa Y, Medouri A, Tetouan M. Towards a next generation of scientific computing in the cloud[J]. International Journal of Computer Science, 2012, 9(6): 177-183.
[15]  Zaharia M, Chowdhury M, Das T, et al. Fast and interactive analytics over Hadoop data with Spark[C]. USENIX, 2012.
[16]  Xin R S, Rosen J, Zaharia M, et al. Shark: SQL and rich analytics at scale[C]. Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, 2013: 13-24.
[17]  Edward C, Dean W, Jason R. Programming Hive[M]. 北京: 人民邮电出版社, 2013
[18]  Stolze K. SQL/MM spatial-The standard to manage spatial data in a relational database system[C]. Leipzig: BTW, 2003.
[19]  高昂, 陈荣国, 赵彦庆, 等. 空间数据访问集成与分布式空间数据源对象查询[J]. 地球信息科学学报, 2010, 12(4): 532-540.
[20]  Engle C, Lupher A, Xin R, et al. Shark: Fast data analysisusing coarse-grained distributed memory[C]. Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, 2012.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133