RSS

Category Archives: Hadoop Projects

How to modify Hadoop Source Code using IntelliJ IDEA

The hadoop uses IntelliJ IDEA (intelligent Java IDE) tool for building  the hadoop packages.

Steps to modify and build hadoop source code in IntelliJ Idea

1) Downloading the IntelliJ IDEA tool from following link

https://www.jetbrains.com/idea/download/

2) Download the Hadoop along with source code

3) Import Hadoop project

(i) Start IntelliJ IDEA.

(ii) Click Import Project

(iii) Select the hadoop version folder and the click Next

(iv) Set the project name and project location on “Import Project”  wizard and then click Next.

(v) Select the Java SE Development Kit 7 (JDK) installation folder  on  project SDK. Click finish.

(vi) Now Hadoop is successfully imported in the IntelliJ IDEA

4) Configuring Module Dependencies and Libraries

(i) Select File->Project Structure.

(ii) Click on Modules under “Project Settings.”

(iii) Select the Dependencies tab then click on the + at the right of  the screen. Select JARS or directories.

5) Modify the existing module according to the requirement and  rebuild it

6) Integrate the modified module to existing hadoop.

7) Run the Hadoop application with modified hadoop source code

(i) start hadoop daemons

(ii) Run the sample program

 

For Further Details Visithttp://slogix.in/

 

 

 

Advertisements
 

Tags: , ,

Big Data

Big data comprises of large volume of datasets which is very difficult to manage within the traditional computer. The big data includes huge volume, high velocity and extended variety of data.

Hadoop is an open source framework written in java which is used to manage the large volume of datasets by the clusters of computers using the mapreduce concept. Hadoop MapReduce is a software framework in which the map collects the large input of data and converts into the sets of data whereas the reduction of these datasets are performed after the map process.

HDFS

The most common file system used in the hadoop is Hadoop Distributed File System(HDFS) and it follows the master/slave technique. HDFS is a fault-tolerant and performs the parallel processing. It is designed by using the low cost hardware. It stores the metadata and the application data separately. The meta data is stored on the dedicated server called Namenode which contains the file system. The application data is stored on the other server called the Datanodes which contains the actual data. All these servers are communicate with each other using the TCP based protocols.

Hbase

Hbase is the column based distributed database management system in which the data is stored in the form of columns in the tables whereas the traditional RDBMS stores the data in the form of rows. It provides the random quick access to the huge volume of data compared to the HDFS and real time read/write access to the big data. It stores the result in the form of hash tables.
  
Tools and Technologies

JDK 1.8.0
Netbeans IDE 8.0.1
Hadoop Distributed File System
Hbase-0.94.16
Mahout
Map reduce
Hadoop-1.2.1

 

For Further Details : http://slogix.in/projects-in-big-data/index.html

 

What is Cloud Sim?

     CloudSim is a Simulation Tool or framework for implementing the Cloud Computing Environment. The CloudSim toolkit enables the simulation and experimentation of Cloud Computing systems. CloudSim library written in java, contains classes for creating the components such as  Datacenters, Hosts, Virtual Machines, applications, users etc.

     These components are used to simulate the new strategies in Cloud Computing domain. These components can be used to implement the various Scheduling Algorithms, Allocation Policies and Load Balancing Techniques. With the simulation results we can evaluate the efficiency of the newly implemented policies or strategies in Cloud environment. The CloudSim basic classes can be extended and one can add new scenarios for utilization. CloudSim requires that one should write a Java program using its components to compose the desired scenario.

The basic components in Cloudsim which will create the Cloud computing environment are:

1. Datacenter :    Datacenter, first component should be created, with an VmAllocation policy. The Hosts, and VMs are created inside the Datacenter only. The resource provisioning is performed based on the allocation policies.

2. DatacenterBroker :    A broker, communicates between the user and the datacenter. The  VM and Cloudlet requests given by the user are submitted to broker. The Broker will send the requests to datacenter. And also collects the result from the datacenter and sends it to the user.

3. Host:    The Host class is used to simulate a physical machine. It manages the VMs allocated in it.

4. Vm:    The Vm class is used to simulate the Virtual Machine which runs inside the Host and executes the applications or tasks.

5. Cloudlet:    The applications or tasks to be executed in Vm are simulated using Cloudlet class. The Class contains the basic application characteristics and runs inside the Vm.

6. VmAllocationPolicySimple:    It is the policy defined for allocating the Host for each Vm in the datacenter.

7. VmScheduler and CloudletScheduler :   It is the scheduling policy that defines the scheduling order of Vms and Cloudlets respectively.

Tools and Technology

  • Cloudsim 3.0.3
  • Java
  • Netbeans or Eclipse

For Further Details: http://slogix.in/cloud-computing-source-code/index.html

For details contact

S-Logix (OPC) Private Limited

Registered Office:

#5, First Floor, 4th Street

Dr. Subbarayan Nagar, Kodambakkam

Chennai-600 024, India

Landmark : ( Samiyar Madam)

Research Projects :

Email – pro@slogix.in ,  Mobile : +91- 8124001111.

Ph.D Guidance & Consulting :

Email – phd@slogix.in , Mobile : +91- 9710999001.

Visit – http://www.slogix.in

 

Hbase

Hbase is an open source, column based distributed management system. It is a fault-tolerant and provides the quick recovery from the individual servers. It is built on the top of the hadoop / HDFS and the data stored on it is processed using the mapreduce capabilities.

Hbase consists of three components: HMaster, HRegionserver and HRegions. Hbase cluster consists of a master node called as HMaster,the multiple region server is called as HRegionserver. Each region server consists of the multiple regions is referred to as HRegions.
  
HMaster

HMaster acts as a Master server. It is responsible for monitoring each region server across the cluster and acts as an interface in the case of any changes in all metadata.The master runs on the namenode in a distributed server. The cluster consists of the number of master but only one master is activate at a time. Once the active master loses it lease in zookeeper then any one of the server in the cluster acts as a master and take care of the regionservers.

HRegionserver

HRegionserver plays a vital role in the regionserver implementation. Each regionserver is responsible for sharing and managing the regions i.e.,serving a set of regions. The HRegionserver runs on the datanode in a distributed cluster. One region can be served only by the regionserver.

HRegions

Regions are the subset of the table’s data. It is the basic element based on the availibility and distribution of rows and columns in the table. Hence the multiple regions in the Hbase is called as HRegions.
  
Tools and Technologies

1. JDK 1.8.0
2. Netbeans IDE 8.0.1
3. Hadoop Distributed File System
4. Hbase-0.94.16
5. Mahout
6. Map reduce
7. Hadoop-1.2.1

For further details:

S-Logix (OPC) Private Limited

    Registered Office:

    #5, First Floor, 4th Street

    Dr. Subbarayan Nagar, Kodambakkam

    Chennai-600 024, India

Research Projects :

Email – pro@slogix.in ,  Mobile : +91- 8124001111.

Ph.D Guidance & Consulting :

Email – phd@slogix.in , Mobile : +91- 9710999001.

 

Tags: , , ,