Scientific computation and data intensive analyses are ever more frequent. On the one hand, the MapReduce programming model has gained a lot of attention for its applicability in large parallel data analyses and Big Data applications. On the other hand, Cloud computing seems to be increasingly attractive in solving these computing problems that demand a lot of resources. This paper explores the potential symbiosis between MapReduce and Cloud Computing, in order to create a robust and scalable environment to execute MapReduce workflows regardless of the underlaying infrastructure. The main goal of this work is to provide an easy-to-install interface, so as non-expert scientists can deploy a suitable testbed for their MapReduce experiments on local resources of their institution. Testing cases were performed in order to evaluate the required time for the whole executing process on a real cluster.
Cite this paper
Salgueiro, M. , González, P. , Pena, T. F. and Cabaleiro, J. C. (2014). Assessment, Design and Implementation of a Private Cloud for MapReduce Applications. Open Access Library Journal, 1, e526. doi: http://dx.doi.org/10.4236/oalib.1100526.
Dean, J. and Ghemawat, S. (2008) MapReduce: Simplified Data Processing on Large Clusters. Communications of the ACM, 51, 107-133. http://vgc.poly.edu/~juliana/courses/cs6093/Readings/dean-cacm2008.pdf
Ekanayake, J., Pallickara, S. and Fox, G. (2008) MapReduce for Data Intensive Scientific Analyses. IEEE Fourth International Conference on eScience, Indianapolis, 7-12 December 2008, 277-284.
Armbrust, M., Fox, A., Griffith, R., Joseph, A.D., Katz, R.H., Konwinski, A., Lee, G., Patterson, D.A., Rabkin, A., Stoica, I. and Zaharia, M. (2009) Above the Clouds: A Berkeley View of Cloud Computing. Technical Report UCB/EECS-2009-28, EECS Department, University of California, Berkeley.
Srirama, S.N., Jakovits, P. and Vainikko, E. (2012) Adapting Scientific Computing Problems to Clouds Using MapReduce. Future Generation Computer Systems, 28, 184-192. http://dx.doi.org/10.1016/j.future.2011.05.025
Loughran, S., Alcaraz Calero, J.M., Farrell, A., Kirschnick, J. and Guijarro, J. (2012) Dynamic Cloud Deployment of a MapReduce Architecture. IEEE Internet Computing, 16, 40-50. http://dx.doi.org/10.1109/MIC.2011.163
Liu, H. and Orban, D. (2011) Cloud MapReduce: A MapReduce Implementation on Top of a Cloud Operating system. 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, Washington DC, 23-26 May 2011, 464-474.
Nurmi, D., Wolski, R., Grzegorczyk, C., Obertelli, G., Youseff, L. and Zagorodnov, D. (2009) The Eucalyptus Open-Source Cloud-Computing System. 9th IEEE International Symposium on Cluster Computing and the Grid, Shanghai, 18-21 May 2009, 124-131.
Ekanayake, J., Li, H., Zhang, B., Gunarathne, T., Bae, S.-H., Qiu, J. and Fox, G. (2010) Twister: A Runtime for Iterative MapReduce. Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, Chicago, 21-25 June 2010, 810-818.