Install Hadoop 1.1.2 on Ubuntu 12.04

Here is the example of installation of Hadoop on a single machine, UBuntu box in this case. For a simple cluster installation, please reference to Running Hadoop on Clusters of Two Nodes using Ubuntu and CentOS.
1. install hadoop 1.1 download Hadoop-1.1.2 (STABLE VERSION OF THE HADOOP)
http://archive.apache.org/dist/hadoop/core/hadoop-1.1.2/hadoop-1.1.2.tar.gz
1.2 untar the package to /opt 1.3 find the JAVA_HOME and HADOOP_HOME directory and add them to .bashrc file
echo export JAVA_HOME = /usr/lib/jvm/java-6-openjdk >> ~/.bashrc
echo export HADOOP_PREFIX = /opt/Hadoop-1.1.2 >> ~/.bashrc
1.4 add $HADOOP_PREFIX/bin to your $PATH variable
export PATH=$PATH:$HADOOP_PREFIX/bin
export PATH=$PATH:$HADOOP_PREFIX/sbin
2. Decide the running mode of your Hadoop:
local mode
standard/local alone mode
Pseudo-Disributed mode
2.1 If you want to start Hadoop in Pseudo-Distributed mode, edit the following three configuration files at $HADOOP_PREFIX/conf, core-site.xml, hdfs-site.xml and mapred-site.xml
#core-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/username/projects/hdfs_test</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:54310</value>
</property>
</configuration>
#hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
#mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:54311</value>
</property>
<property>
<name>mapred.system.dir</name>
<value>/home/username/projects/mapred/system</value>
<final>true</final>
</property>
</configuration>
2.2 create file hadoop-env.sh in $HADOOP_PREFIX/etc/, and write the following to the file:
vim $HADOOP_PREFIX/conf/hadoop-env.sh
export JAVA_HOME = /usr/lib/jvm/java-6-openjdk
3. Format the HDFS system
hadoop namenode -format
4. Start Hadoop
start-all.sh
5. usefule comamnd to check the HDFS system
hadoop fs -ls /: list the root of HDFS
hadoop fs -ls: (without '/'):
# if you could not get anything other than the following error msg:
#"ls: Cannot access .: No such file or directory.".
#You need to do the following steps to make it work.
hadoop fs -mkdir /user
hadoop fs -mkdir /user/username
# Now you should be able to run hadoop fs -ls because by default hadoop is looking for
# "/user/username" structure within HDFS. The error msg is means there is no such structure in HDFS.
#To avoide the error msg we need to create the structure within HDFS for Hadoop.
6. copy files from local "linux" to HDFS I used two text files for testing: file1.txt and file2.txt, they are located at /home/username/projects/hadoop 6.1 copy three files from local to hadoop file system
hadoop dfs -copyFromLocal /home/username/projects/hadoop /user/username
# check copy result
hadoop dfs -ls /user/username/
6.2 download the word-count example from http://repo1.maven.org/maven2/org/apache/hadoop/hadoop-examples/1.0.3/hadoop-examples-1.0.3.jar and put it /home/username/projects/hadoop 6.3 run the mapreduce job
hadoop jar  hadoop-examples-1.0.3.jar wordcount /user/username /user/username-output
# notes for this command:
#(1) if  you see io exception, you might use full path for the jar package
#(2) path "/user/username-output" is the output path for mapreduce, it has not been there before running the job.
6.4 Retrieve the job result from HDFS
# 1. merge mapreduce outputs and copy to local path: /home/username/projects/hadoop/output
hadoop dfs -getmerge /user/username-output /home/username/projects/hadoop/output
7. Hadoop Web Interfaces
NameNode daemon: http://localhost:50070/
JobTracker daemon: http://localhost:50030/
TaskTracker daemon: http://localhost:50060/
7. Questions
Q. $HADOOP_HOME is depreciated always showing up when you run hadoop commands.
A. Replace HADOOP_HOME in your ~/.bashrc file with HADOOP_PREFIX. Check whether $HADOOP_HOME is defined in other places using echo command when you open new terminal.

4 comments:

  1. […] both computers, make sure they are intalled into the same locations (highly recommend) bu following http://b2ctran.wordpress.com/2013/08/26/install-hadoop-1-1-2-on-ubuntu-12-04/. 2. Edit /etc/hosts to add host names for each computer and make sure that you can ping each other […]

    ReplyDelete
  2. Hello everybody. This one of the "easiest to follow" tutorials i have found. Very neat and precise. I too have setup a multi-node hadoop cluster inside oracle solaris 11.1 using zones. You can have a look at http://hashprompt.blogspot.in/2014/05/multi-node-hadoop-cluster-on-oracle.html

    ReplyDelete

Datatable static image not found on the server

When you use ```datatables.min.css``` and ```datatables.min.js``` locally, instead of datatables CDN, you may have encountered that ```sort...