A simple way to install Hadoop and HBase

I was not able to install hadoop without spending at least two days and re-install of OS few times. It may not be the case always. I just wanted to run a few MR jobs and access HBase and HDFS. Instructions available online for hadoop hbase installation were mostly about installing hadoop on a production or clustered environment. I have never found a way to install all these three components for testing purpose. I was using Cloudera on ubuntu. Instructions may be useful for a production environment. But was never easy for me. So, I decided to download and install Hadoop and Hbase and configure them. Here is what I have done.


Note: This a old article, copied manually when moving to new hosting provider.

1. Decide where you want to keep your haddop installation.

I have user created by name hdtest. For testing purpose, I am keeping all the hadoop components in /home/hdtest/installs/. I also have to setup ssh. Connect to user hdtest (or the user you want to use for hadoop) and run the following commands.

$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

2. Download Hadoop and HBase.

To start with download all the required stuff. We will be downloading two zip files, one for hadoop common and another one for HBase.

Save all the files to /home/hdtest/installs/, or your preferred directory.

3. Extract and setup Hadoop

Extract Hadoop common files to the same folder. Run the following command

$tar xvfz hadoop-2.2.0.tar.gz  

The command will extract the content to a folder ‘hadoop-2.2.0’. To make it simple in future, lets rename the folder, to ‘hadoop’

$mv hadoop-2.2.0 hadoop

Set the following environment variable in your startup script. In my case, I havt to set it on /home/hdtest/.cshrc

setenv HADOOP_HOME /home/hdtest/hadoop
setenv JAVA_HOME /usr/local/jdk7/
setenv PATH $HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH

Enter the shell again to make sure all these variables are set.

Edit the configuration files for hadoop. As this is a basic setup we just need to edit the file core-site.xml found in $HADOOP_HOME/etc/hadoop. Add the following content between the configration tags.



hadoop.tmp.dir
/home/hdtest/hadoop/tmp
A base for other temporary directories.



fs.default.name
hdfs://localhost:54310
The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. 



4. Update /etc/hosts file

Sometimes hadoop does not work because if some unexpected entry in /etc/hosts file. Check if you have the following entry in /etc/hosts file

127.0.1.1 myhostname

If you see one such line, change the first part to 127.0.0.1 instead of 127.0.1.1.

We need to create a folder /home/hdtest/hadoop/tmp/

$mkdir /home/hdtest/hadoop/tmp/

Next step is to run the format command.

$hadoop namenode -format

5. Start the hadoop processes

Run the following command.

$start-yarn.sh

Once the command is run succesfully, let us verify if everything is working fine. You should run the following command see the result.

$jps

Expected output

9818 ResourceManager
10116 Jps
9933 NodeManager

The number on left side may be different. If you an error message saying jps is not found, it means, you don’t have PATH set to access your JDK properly. jps is an executable in JDK installation.

Next step is to run the command start-dfs.sh, which will start namenode, datanode and secondary namenode.

$start-dfs.sh

Once everything is run, check the status using jps command. The expected output is

9818 ResourceManager
10444 DataNode
10593 SecondaryNameNode
10316 NameNode
10705 Jps
9933 NodeManager

Hadoop will not be started automatically on reboot. So, you have run the command start-yarn.sh and start-dfs.sh, on every reboot.
We will create two folders in hdfs. They will be used in hbase configuration. To create folders on hdfs,hadoop provides commands that follows regular file system command.

$cd HADOOP_HOME/bin
$./hadoop fs -mkdir /sample
$./hadoop fs -mkdir /zookeeper

You can also verify the above settings by opening the hadoop URL http://localhost:50070/ on your browser. You can also see the files in hdfs file system. Click the Browse File System link to see the folders sample and zookeeper created.

6. Extract and setup HBase

Similar to what you have done for Hadoop, extract HBase and rename the folder. Run the following command from the location where you have saved hbase files

$tar xvfz hbase-0.96.0-hadoop2-bin.tar.gz
$mv hbase-0.96.0-hadoop2 hbase
  • Note: If you are using Hbase 0.96 version
    Hbase 0.96 version uses older version of hadoop libraries. They are not compatible with Hadoop-2.2, we have downloaded. It uses beta version of hadoop common jar files. If you continue using, you will get errors like,
    org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.RpcServerException): Unknown out of band call #xxxxxx
    in hbase log files. To fix the problem, we need to remove all beta jar files from hbase and use the correct set of files. To start with we remove all the files which are not compatible

    $cd /home/hdtest/hbase/lib
    $rm -rf hadoop*.jar
    

    Once the files are removed, copy the correct files from hadoop installation

    $cd /home/hdtest/hbase/lib
    $cp $HADOOP_HOME/share/hadoop/common/hadoop*.jar .
    $cp $HADOOP_HOME/share/hadoop/hdfs/hadoop*.jar .
    $cp $HADOOP_HOME/share/hadoop/mapreduce/hadoop*.jar .
    $cp $HADOOP_HOME/share/hadoop/tools/lib/hadoop*.jar .
    $cp $HADOOP_HOME/share/hadoop/yarn/hadoop*.jar .
    

Update the Hbase configuration file at hbase/conf/hbase-site.conf. Add the following content between configuration tag.

 
    hbase.rootdir
    hdfs://127.0.0.1:54310/sample
  
  
    hbase.zookeeper.property.dataDir
    hdfs://127.0.0.1:54310/zookeeper
  
  
    hbase.zookeeper.property.clientPort
    2181
    Property from ZooKeeper's config zoo.cfg.
                 The port at which the clients will connect.
  

If you want to use a local file system instead of HDFS, replace the URL with file:///your/preferred/path/. Now lets start the HBase instance bu running teh following command.

$cd /home/hdtest/hbase/bin
$./start-hbase.sh

Hbase will not be started automatically in this configuration. You have to run this command again on your next reboot. Once this command is executed and you are back in shell prompt, you can check the log files if something is wrong. You can find the log files under /home/hdtest/hbase/logs folder. If you don’t see any issue, lets try to use hbase. Open the hbase shell and run a simple list command.

$./hbase shell
HBase Shell; enter 'help' for list of supported commands.
Type "exit" to leave the HBase Shell
Version 0.96.0-hadoop2, r1531434, Fri Oct 11 15:28:08 PDT 2013

hbase(main):001:0> list
TABLE                                                                                                                                              
0 row(s) in 3.1580 seconds

=> []
hbase(main):002:0>

There are not tables on Hbase. That is why you see the outpit as []. Now lets create a table and run the list command.

hbase(main):003:0> create 'sample', 'r'
0 row(s) in 0.5550 seconds

=> Hbase::Table - sample
hbase(main):004:0> list
TABLE                                                                                                                                              
sample                                                                                                                                             
1 row(s) in 0.0600 seconds

=> ["sample"]
hbase(main):005:0>

Congrats..!!! you have your Hadoop and HBase running.