Hadoop is Open source Apache Framework which is written in java programming language. Hadoop allow distributed processing of large datasets across many computers which using programming models. Environment provides distributed storage and computation across clusters of computers. It is designed to scale up from single server to thousands of machine using the environment that we discuss above.
In 2006, Cutting join Yahoo and took with him the Nutch project which is further divided and hadoop is one of its part which is a distributed and processing portion. In 2008, Yahoo released Hadoop as an open-source Project. Now, Apache Software Foundation (ASF) maintain and managed the framework and ecosystem of technologies of hadoop which is a global community of software developers.
These contain java libraries which provides filesystem and OS level abstraction and contain necessary java files and scripts required to start hadoop.Hadoop YARN:
Framework for job scheduling and cluster resources management.Hadoop Distributed File System(HDFS):
This is a file system that provides high throughput access to application data.Hadoop MapReduce:
MapReduce is YARN based system for parallel processing of large data sets.
Since 2012, the term “Hadoop” often refers not just to the base modules mention above but also to the collection of additional software packages that can be installed on the top of or alongside Hadoop, such as Apache Pig, Apache Hive, Apache HBase, Apache spark etc.
Both input and output are stored in the file system and take care of scheduling tasks, monitoring them and re-execute the failed tasks. This framework consist of a single master JobTracker and one slave TaskTracker it is a single point failure for the hadoop mapreduce means if it is goes down then the running all the tasks are halted. Master is responsible for scheduling the task for the tracker and the slave executes those task that are assigned by the master and provide task status periodically.
This is most common file system used by hadoop it is based on google file system(GFS). Uses Master slave architecture in which master consist of singleNameNode that manage the file system metadata and another one is slaveDataNodes that store the actual data. HDFS namespaces is split into several blocks and those blocks are stored in a set of DataNodes, it also take care of read and write operation in the file system. The NameNodes determine the mapping of blocks to the DataNodes,it also take care of block creation ,deletion and replication based on the instruction given by NameNode. HDFS provide shell like file system in which the list of commands interact with the file system.
<div style="clear:both"><a href="http://www.assignmenthelp.net/big-data-hadoop"> <img src="http://www.assignmenthelp.net/files/hadoop-infographics.png" title="Big Data Hadoop" alt="Big Data Hadoop" border="0" /></a></div> <div>Courtesy of: <a href="http://www.assignmenthelp.net/big-data-hadoop">Assignment Help</a></div>