I was not able to install hadoop without spending at least two days and re-install of OS few times. It may not be the case always. I just wanted to run a few MR jobs and access HBase and HDFS. Instructions available online for hadoop hbase installation were mostly about installing hadoop on a production or clustered environment. I have never found a way to install all these three components for testing purpose. I was using Cloudera on ubuntu. Instructions may be useful for a production environment. But was never easy for me. So, I decided to download and install Hadoop and Hbase and configure them. Here is what I have done.
Note: This a old article, copied manually when moving to new hosting provider.
1. Decide where you want to keep your haddop installation.
I have user created by name hdtest. For testing purpose, I am keeping all the hadoop components in /home/hdtest/installs/. I also have to setup ssh. Connect to user hdtest (or the user you want to use for hadoop) and run the following commands.
$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa $ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
2. Download Hadoop and HBase.
To start with download all the required stuff. We will be downloading two zip files, one for hadoop common and another one for HBase.
- Hadoop common : Download hadoop common from
I have used hadoop-2.2.0 and the file name is hadoop-2.2.0.tar.gz.
- HBase : Download HBase from
http://www.apache.org/dyn/closer.cgi/hbase. I downloaded the version hbase-0.96.0, file hbase-0.96.0-hadoop2-bin.tar.gz.
Save all the files to /home/hdtest/installs/, or your preferred directory.
3. Extract and setup Hadoop
Extract Hadoop common files to the same folder. Run the following command
$tar xvfz hadoop-2.2.0.tar.gz
The command will extract the content to a folder ‘hadoop-2.2.0’. To make it simple in future, lets rename the folder, to ‘hadoop’
$mv hadoop-2.2.0 hadoop
Set the following environment variable in your startup script. In my case, I havt to set it on /home/hdtest/.cshrc
setenv HADOOP_HOME /home/hdtest/hadoop setenv JAVA_HOME /usr/local/jdk7/ setenv PATH $HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
Enter the shell again to make sure all these variables are set.
Edit the configuration files for hadoop. As this is a basic setup we just need to edit the file core-site.xml found in $HADOOP_HOME/etc/hadoop. Add the following content between the configration tags.
hadoop.tmp.dir /home/hdtest/hadoop/tmp A base for other temporary directories. fs.default.name hdfs://localhost:54310 The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation.
4. Update /etc/hosts file
Sometimes hadoop does not work because if some unexpected entry in /etc/hosts file. Check if you have the following entry in /etc/hosts file
If you see one such line, change the first part to 127.0.0.1 instead of 127.0.1.1.
We need to create a folder /home/hdtest/hadoop/tmp/
Next step is to run the format command.
$hadoop namenode -format
5. Start the hadoop processes
Run the following command.
Once the command is run succesfully, let us verify if everything is working fine. You should run the following command see the result.
9818 ResourceManager 10116 Jps 9933 NodeManager
The number on left side may be different. If you an error message saying jps is not found, it means, you don’t have PATH set to access your JDK properly. jps is an executable in JDK installation.
Next step is to run the command start-dfs.sh, which will start namenode, datanode and secondary namenode.
Once everything is run, check the status using jps command. The expected output is
9818 ResourceManager 10444 DataNode 10593 SecondaryNameNode 10316 NameNode 10705 Jps 9933 NodeManager
Hadoop will not be started automatically on reboot. So, you have run the command start-yarn.sh and start-dfs.sh, on every reboot.
We will create two folders in hdfs. They will be used in hbase configuration. To create folders on hdfs,hadoop provides commands that follows regular file system command.
$cd HADOOP_HOME/bin $./hadoop fs -mkdir /sample $./hadoop fs -mkdir /zookeeper
You can also verify the above settings by opening the hadoop URL http://localhost:50070/ on your browser. You can also see the files in hdfs file system. Click the Browse File System link to see the folders sample and zookeeper created.
6. Extract and setup HBase
Similar to what you have done for Hadoop, extract HBase and rename the folder. Run the following command from the location where you have saved hbase files
$tar xvfz hbase-0.96.0-hadoop2-bin.tar.gz $mv hbase-0.96.0-hadoop2 hbase
Note: If you are using Hbase 0.96 version
Hbase 0.96 version uses older version of hadoop libraries. They are not compatible with Hadoop-2.2, we have downloaded. It uses beta version of hadoop common jar files. If you continue using, you will get errors like,
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.RpcServerException): Unknown out of band call #xxxxxx
in hbase log files. To fix the problem, we need to remove all beta jar files from hbase and use the correct set of files. To start with we remove all the files which are not compatible
$cd /home/hdtest/hbase/lib $rm -rf hadoop*.jar
Once the files are removed, copy the correct files from hadoop installation
$cd /home/hdtest/hbase/lib $cp $HADOOP_HOME/share/hadoop/common/hadoop*.jar . $cp $HADOOP_HOME/share/hadoop/hdfs/hadoop*.jar . $cp $HADOOP_HOME/share/hadoop/mapreduce/hadoop*.jar . $cp $HADOOP_HOME/share/hadoop/tools/lib/hadoop*.jar . $cp $HADOOP_HOME/share/hadoop/yarn/hadoop*.jar .
Update the Hbase configuration file at hbase/conf/hbase-site.conf. Add the following content between configuration tag.
hbase.rootdir hdfs://127.0.0.1:54310/sample hbase.zookeeper.property.dataDir hdfs://127.0.0.1:54310/zookeeper hbase.zookeeper.property.clientPort 2181 Property from ZooKeeper's config zoo.cfg. The port at which the clients will connect.
If you want to use a local file system instead of HDFS, replace the URL with file:///your/preferred/path/. Now lets start the HBase instance bu running teh following command.
$cd /home/hdtest/hbase/bin $./start-hbase.sh
Hbase will not be started automatically in this configuration. You have to run this command again on your next reboot. Once this command is executed and you are back in shell prompt, you can check the log files if something is wrong. You can find the log files under /home/hdtest/hbase/logs folder. If you don’t see any issue, lets try to use hbase. Open the hbase shell and run a simple list command.
$./hbase shell HBase Shell; enter 'help
' for list of supported commands. Type "exit " to leave the HBase Shell Version 0.96.0-hadoop2, r1531434, Fri Oct 11 15:28:08 PDT 2013 hbase(main):001:0> list TABLE 0 row(s) in 3.1580 seconds =>  hbase(main):002:0>
There are not tables on Hbase. That is why you see the outpit as . Now lets create a table and run the list command.
hbase(main):003:0> create 'sample', 'r' 0 row(s) in 0.5550 seconds => Hbase::Table - sample hbase(main):004:0> list TABLE sample 1 row(s) in 0.0600 seconds => ["sample"] hbase(main):005:0>
Congrats..!!! you have your Hadoop and HBase running.