|
Introduction to Hadoop Distributed File SystemKeywords: Distributed File System DFS , Hadoop Distributed File System HDFS , Nutch Distributed File System NDFS High Performance Computing HPC Abstract: HDFS is a distributed file system designed to hold very large amounts of data (terabytes or even petabytes), and provide high-throughput access to this information. Files are stored in a redundant fashion across multiple machines to ensure their durability to failure and high availability to very parallel applications. This paper includes the step by step introduction to the file system to distributed file system and to the Hadoop Distributed File System. Section I introduces What is file System, Need of File System, Conventional File System, its advantages, Need of Distributed File System, What is Distributed File System and Benefits of Distributed File System. Also the analysis of large dataset and comparison of mapreducce with RDBMS, HPC and Grid Computing communities have been doing large-scale data processing for years. Sections II introduce the concept of Hadoop Distributed File System. Lastly section III contains Conclusion followed with the References.
|