1. install hadoop 1.1 download Hadoop-1.1.2 (STABLE VERSION OF THE HADOOP)
http://archive.apache.org/dist/hadoop/core/hadoop-1.1.2/hadoop-1.1.2.tar.gz1.2 untar the package to /opt 1.3 find the JAVA_HOME and HADOOP_HOME directory and add them to .bashrc file
echo export JAVA_HOME = /usr/lib/jvm/java-6-openjdk >> ~/.bashrc echo export HADOOP_PREFIX = /opt/Hadoop-1.1.2 >> ~/.bashrc1.4 add $HADOOP_PREFIX/bin to your $PATH variable
export PATH=$PATH:$HADOOP_PREFIX/bin export PATH=$PATH:$HADOOP_PREFIX/sbin2. Decide the running mode of your Hadoop:
local mode standard/local alone mode Pseudo-Disributed mode2.1 If you want to start Hadoop in Pseudo-Distributed mode, edit the following three configuration files at $HADOOP_PREFIX/conf, core-site.xml, hdfs-site.xml and mapred-site.xml
#core-site.xml <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>hadoop.tmp.dir</name> <value>/home/username/projects/hdfs_test</value> </property> <property> <name>fs.default.name</name> <value>hdfs://localhost:54310</value> </property> </configuration> #hdfs-site.xml <?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration> #mapred-site.xml <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>mapred.job.tracker</name> <value>localhost:54311</value> </property> <property> <name>mapred.system.dir</name> <value>/home/username/projects/mapred/system</value> <final>true</final> </property> </configuration>2.2 create file hadoop-env.sh in $HADOOP_PREFIX/etc/, and write the following to the file:
vim $HADOOP_PREFIX/conf/hadoop-env.sh export JAVA_HOME = /usr/lib/jvm/java-6-openjdk3. Format the HDFS system
hadoop namenode -format4. Start Hadoop
start-all.sh5. usefule comamnd to check the HDFS system
hadoop fs -ls /: list the root of HDFS hadoop fs -ls: (without '/'): # if you could not get anything other than the following error msg: #"ls: Cannot access .: No such file or directory.". #You need to do the following steps to make it work. hadoop fs -mkdir /user hadoop fs -mkdir /user/username # Now you should be able to run hadoop fs -ls because by default hadoop is looking for # "/user/username" structure within HDFS. The error msg is means there is no such structure in HDFS. #To avoide the error msg we need to create the structure within HDFS for Hadoop.6. copy files from local "linux" to HDFS I used two text files for testing: file1.txt and file2.txt, they are located at /home/username/projects/hadoop 6.1 copy three files from local to hadoop file system
hadoop dfs -copyFromLocal /home/username/projects/hadoop /user/username # check copy result hadoop dfs -ls /user/username/6.2 download the word-count example from http://repo1.maven.org/maven2/org/apache/hadoop/hadoop-examples/1.0.3/hadoop-examples-1.0.3.jar and put it /home/username/projects/hadoop 6.3 run the mapreduce job
hadoop jar hadoop-examples-1.0.3.jar wordcount /user/username /user/username-output # notes for this command: #(1) if you see io exception, you might use full path for the jar package #(2) path "/user/username-output" is the output path for mapreduce, it has not been there before running the job.6.4 Retrieve the job result from HDFS
# 1. merge mapreduce outputs and copy to local path: /home/username/projects/hadoop/output hadoop dfs -getmerge /user/username-output /home/username/projects/hadoop/output7. Hadoop Web Interfaces
NameNode daemon: http://localhost:50070/ JobTracker daemon: http://localhost:50030/ TaskTracker daemon: http://localhost:50060/7. Questions
Q. $HADOOP_HOME is depreciated always showing up when you run hadoop commands.
A. Replace HADOOP_HOME in your ~/.bashrc file with HADOOP_PREFIX. Check whether $HADOOP_HOME is defined in other places using echo command when you open new terminal.