Big data comprises of large volume of datasets which is very difficult to manage within the traditional computer. The big data includes huge volume, high velocity and extended variety of data.
Hadoop is an open source framework written in java which is used to manage the large volume of datasets by the clusters of computers using the mapreduce concept. Hadoop MapReduce is a software framework in which the map collects the large input of data and converts into the sets of data whereas the reduction of these datasets are performed after the map process.
The most common file system used in the hadoop is Hadoop Distributed File System(HDFS) and it follows the master/slave technique. HDFS is a fault-tolerant and performs the parallel processing. It is designed by using the low cost hardware. It stores the metadata and the application data separately. The meta data is stored on the dedicated server called Namenode which contains the file system. The application data is stored on the other server called the Datanodes which contains the actual data. All these servers are communicate with each other using the TCP based protocols.
Hbase is the column based distributed database management system in which the data is stored in the form of columns in the tables whereas the traditional RDBMS stores the data in the form of rows. It provides the random quick access to the huge volume of data compared to the HDFS and real time read/write access to the big data. It stores the result in the form of hash tables.
Tools and Technologies
Netbeans IDE 8.0.1
Hadoop Distributed File System
For Further Details : http://slogix.in/projects-in-big-data/index.html