Tag Archives: big data projects slogix

How to modify Hadoop Source Code using IntelliJ IDEA

The hadoop uses IntelliJ IDEA (intelligent Java IDE) tool for building  the hadoop packages.

Steps to modify and build hadoop source code in IntelliJ Idea

1) Downloading the IntelliJ IDEA tool from following link

2) Download the Hadoop along with source code

3) Import Hadoop project

(i) Start IntelliJ IDEA.

(ii) Click Import Project

(iii) Select the hadoop version folder and the click Next

(iv) Set the project name and project location on “Import Project”  wizard and then click Next.

(v) Select the Java SE Development Kit 7 (JDK) installation folder  on  project SDK. Click finish.

(vi) Now Hadoop is successfully imported in the IntelliJ IDEA

4) Configuring Module Dependencies and Libraries

(i) Select File->Project Structure.

(ii) Click on Modules under “Project Settings.”

(iii) Select the Dependencies tab then click on the + at the right of  the screen. Select JARS or directories.

5) Modify the existing module according to the requirement and  rebuild it

6) Integrate the modified module to existing hadoop.

7) Run the Hadoop application with modified hadoop source code

(i) start hadoop daemons

(ii) Run the sample program


For Further Details Visit





Tags: , ,


Hbase is an open source, column based distributed management system. It is a fault-tolerant and provides the quick recovery from the individual servers. It is built on the top of the hadoop / HDFS and the data stored on it is processed using the mapreduce capabilities.

Hbase consists of three components: HMaster, HRegionserver and HRegions. Hbase cluster consists of a master node called as HMaster,the multiple region server is called as HRegionserver. Each region server consists of the multiple regions is referred to as HRegions.

HMaster acts as a Master server. It is responsible for monitoring each region server across the cluster and acts as an interface in the case of any changes in all metadata.The master runs on the namenode in a distributed server. The cluster consists of the number of master but only one master is activate at a time. Once the active master loses it lease in zookeeper then any one of the server in the cluster acts as a master and take care of the regionservers.


HRegionserver plays a vital role in the regionserver implementation. Each regionserver is responsible for sharing and managing the regions i.e.,serving a set of regions. The HRegionserver runs on the datanode in a distributed cluster. One region can be served only by the regionserver.


Regions are the subset of the table’s data. It is the basic element based on the availibility and distribution of rows and columns in the table. Hence the multiple regions in the Hbase is called as HRegions.
Tools and Technologies

1. JDK 1.8.0
2. Netbeans IDE 8.0.1
3. Hadoop Distributed File System
4. Hbase-0.94.16
5. Mahout
6. Map reduce
7. Hadoop-1.2.1

For further details:



Email :


Tags: , , ,