What I Learned in Programming: Install Hadoop 1.1.2 on Ubuntu 12.04

Here is the example of installation of Hadoop on a single machine, UBuntu box in this case. For a simple cluster installation, please reference to Running Hadoop on Clusters of Two Nodes using Ubuntu and CentOS.
1. install hadoop 1.1 download Hadoop-1.1.2 (STABLE VERSION OF THE HADOOP)

http://archive.apache.org/dist/hadoop/core/hadoop-1.1.2/hadoop-1.1.2.tar.gz

1.2 untar the package to /opt 1.3 find the JAVA_HOME and HADOOP_HOME directory and add them to .bashrc file

echo export JAVA_HOME = /usr/lib/jvm/java-6-openjdk >> ~/.bashrc
echo export HADOOP_PREFIX = /opt/Hadoop-1.1.2 >> ~/.bashrc

1.4 add $HADOOP_PREFIX/bin to your $PATH variable

export PATH=$PATH:$HADOOP_PREFIX/bin
export PATH=$PATH:$HADOOP_PREFIX/sbin

2. Decide the running mode of your Hadoop:

local mode
standard/local alone mode
Pseudo-Disributed mode

2.1 If you want to start Hadoop in Pseudo-Distributed mode, edit the following three configuration files at $HADOOP_PREFIX/conf, core-site.xml, hdfs-site.xml and mapred-site.xml

#core-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/username/projects/hdfs_test</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:54310</value>
</property>
</configuration>
#hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
#mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:54311</value>
</property>
<property>
<name>mapred.system.dir</name>
<value>/home/username/projects/mapred/system</value>
<final>true</final>
</property>
</configuration>

2.2 create file hadoop-env.sh in $HADOOP_PREFIX/etc/, and write the following to the file:

vim $HADOOP_PREFIX/conf/hadoop-env.sh
export JAVA_HOME = /usr/lib/jvm/java-6-openjdk

3. Format the HDFS system

hadoop namenode -format

4. Start Hadoop

start-all.sh

5. usefule comamnd to check the HDFS system

hadoop fs -ls /: list the root of HDFS
hadoop fs -ls: (without '/'):
# if you could not get anything other than the following error msg:
#"ls: Cannot access .: No such file or directory.".
#You need to do the following steps to make it work.
hadoop fs -mkdir /user
hadoop fs -mkdir /user/username
# Now you should be able to run hadoop fs -ls because by default hadoop is looking for
# "/user/username" structure within HDFS. The error msg is means there is no such structure in HDFS.
#To avoide the error msg we need to create the structure within HDFS for Hadoop.

6. copy files from local "linux" to HDFS I used two text files for testing: file1.txt and file2.txt, they are located at /home/username/projects/hadoop 6.1 copy three files from local to hadoop file system

hadoop dfs -copyFromLocal /home/username/projects/hadoop /user/username
# check copy result
hadoop dfs -ls /user/username/

6.2 download the word-count example from http://repo1.maven.org/maven2/org/apache/hadoop/hadoop-examples/1.0.3/hadoop-examples-1.0.3.jar and put it /home/username/projects/hadoop 6.3 run the mapreduce job

hadoop jar  hadoop-examples-1.0.3.jar wordcount /user/username /user/username-output
# notes for this command:
#(1) if  you see io exception, you might use full path for the jar package
#(2) path "/user/username-output" is the output path for mapreduce, it has not been there before running the job.

6.4 Retrieve the job result from HDFS

# 1. merge mapreduce outputs and copy to local path: /home/username/projects/hadoop/output
hadoop dfs -getmerge /user/username-output /home/username/projects/hadoop/output

7. Hadoop Web Interfaces

NameNode daemon: http://localhost:50070/
JobTracker daemon: http://localhost:50030/
TaskTracker daemon: http://localhost:50060/

7. Questions
Q. $HADOOP_HOME is depreciated always showing up when you run hadoop commands.
A. Replace HADOOP_HOME in your ~/.bashrc file with HADOOP_PREFIX. Check whether $HADOOP_HOME is defined in other places using echo command when you open new terminal.

4 comments:

Running Hadoop on Clusters of Two Nodes Ubuntu and CentOS | Here is What I Saw and What I LearnedSeptember 10, 2013 at 4:48 AM
[…] both computers, make sure they are intalled into the same locations (highly recommend) bu following http://b2ctran.wordpress.com/2013/08/26/install-hadoop-1-1-2-on-ubuntu-12-04/. 2. Edit /etc/hosts to add host names for each computer and make sure that you can ping each other […]
Install Hadoop 1.1.2 on Ubuntu 12.04 | Karun Chennuri's BlogDecember 5, 2013 at 7:05 PM
[…] Install Hadoop 1.1.2 on Ubuntu 12.04. […]
Karun ChennuriDecember 5, 2013 at 7:05 PM
Reblogged this on DailyRaaga .
plumerialbaMay 30, 2014 at 5:20 AM
Hello everybody. This one of the "easiest to follow" tutorials i have found. Very neat and precise. I too have setup a multi-node hadoop cluster inside oracle solaris 11.1 using zones. You can have a look at http://hashprompt.blogspot.in/2014/05/multi-node-hadoop-cluster-on-oracle.html

What I Learned in Programming

Install Hadoop 1.1.2 on Ubuntu 12.04

4 comments:

Datatable static image not found on the server

Followers

Report Abuse