Install Hadoop 1.1.2 on Ubuntu 12.04

Here is the example of installation of Hadoop on a single machine, UBuntu box in this case. For a simple cluster installation, please reference to Running Hadoop on Clusters of Two Nodes using Ubuntu and CentOS.
1. install hadoop 1.1 download Hadoop-1.1.2 (STABLE VERSION OF THE HADOOP)
http://archive.apache.org/dist/hadoop/core/hadoop-1.1.2/hadoop-1.1.2.tar.gz
1.2 untar the package to /opt 1.3 find the JAVA_HOME and HADOOP_HOME directory and add them to .bashrc file
echo export JAVA_HOME = /usr/lib/jvm/java-6-openjdk >> ~/.bashrc
echo export HADOOP_PREFIX = /opt/Hadoop-1.1.2 >> ~/.bashrc
1.4 add $HADOOP_PREFIX/bin to your $PATH variable
export PATH=$PATH:$HADOOP_PREFIX/bin
export PATH=$PATH:$HADOOP_PREFIX/sbin
2. Decide the running mode of your Hadoop:
local mode
standard/local alone mode
Pseudo-Disributed mode
2.1 If you want to start Hadoop in Pseudo-Distributed mode, edit the following three configuration files at $HADOOP_PREFIX/conf, core-site.xml, hdfs-site.xml and mapred-site.xml
#core-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/username/projects/hdfs_test</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:54310</value>
</property>
</configuration>
#hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
#mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:54311</value>
</property>
<property>
<name>mapred.system.dir</name>
<value>/home/username/projects/mapred/system</value>
<final>true</final>
</property>
</configuration>
2.2 create file hadoop-env.sh in $HADOOP_PREFIX/etc/, and write the following to the file:
vim $HADOOP_PREFIX/conf/hadoop-env.sh
export JAVA_HOME = /usr/lib/jvm/java-6-openjdk
3. Format the HDFS system
hadoop namenode -format
4. Start Hadoop
start-all.sh
5. usefule comamnd to check the HDFS system
hadoop fs -ls /: list the root of HDFS
hadoop fs -ls: (without '/'):
# if you could not get anything other than the following error msg:
#"ls: Cannot access .: No such file or directory.".
#You need to do the following steps to make it work.
hadoop fs -mkdir /user
hadoop fs -mkdir /user/username
# Now you should be able to run hadoop fs -ls because by default hadoop is looking for
# "/user/username" structure within HDFS. The error msg is means there is no such structure in HDFS.
#To avoide the error msg we need to create the structure within HDFS for Hadoop.
6. copy files from local "linux" to HDFS I used two text files for testing: file1.txt and file2.txt, they are located at /home/username/projects/hadoop 6.1 copy three files from local to hadoop file system
hadoop dfs -copyFromLocal /home/username/projects/hadoop /user/username
# check copy result
hadoop dfs -ls /user/username/
6.2 download the word-count example from http://repo1.maven.org/maven2/org/apache/hadoop/hadoop-examples/1.0.3/hadoop-examples-1.0.3.jar and put it /home/username/projects/hadoop 6.3 run the mapreduce job
hadoop jar  hadoop-examples-1.0.3.jar wordcount /user/username /user/username-output
# notes for this command:
#(1) if  you see io exception, you might use full path for the jar package
#(2) path "/user/username-output" is the output path for mapreduce, it has not been there before running the job.
6.4 Retrieve the job result from HDFS
# 1. merge mapreduce outputs and copy to local path: /home/username/projects/hadoop/output
hadoop dfs -getmerge /user/username-output /home/username/projects/hadoop/output
7. Hadoop Web Interfaces
NameNode daemon: http://localhost:50070/
JobTracker daemon: http://localhost:50030/
TaskTracker daemon: http://localhost:50060/
7. Questions
Q. $HADOOP_HOME is depreciated always showing up when you run hadoop commands.
A. Replace HADOOP_HOME in your ~/.bashrc file with HADOOP_PREFIX. Check whether $HADOOP_HOME is defined in other places using echo command when you open new terminal.

Use Apache Virtual Host/ Reverse Proxy on Ubuntu

1. install apache2 on Ubuntu
sudo apt-get install apache2
You may notice that http.conf file located at /etc/apach2/ is emplty, which is normal since we are using apache2.conf instead 2. Create symbolic links to enable the proxy modula in Apache2, then restart the server
sudo ln -s /etc/apache2/mods-available/proxy.load /etc/apache2/mods-enabled
sudo ln -s /etc/apache2/mods-available/proxy_http.load /etc/apache2/mods-enabled
sudo /etc/init.d/apache2 restart
3. Create a virtual host file:
sudo gedit /etc/apache2/sites-enabled/proxiedhosts
# and edit the file so that it resembles:
NameVirtualHost *:80  # do not forget this line of code.
<VirtualHost *:80>
ServerName example.com
ProxyRequests off
ProxyPass / http://localhost:8080/
ProxyPassReverse / http://localhost:8080/
</VirtualHost>
It is obviously we are forwarding any request on port 80 to port 8080 4.Activate the virtual host file by making a symbolic link to the Apache2 sites-enabled folder then restarting Apache2:
sudo ln -s /etc/apache2/sites-enabled/proxiedhosts /etc/apache2/sites-enabled
sudo /etc/init.d/apache2 restart

Slow Authentication when SSH Connecting To Your Ubuntu Box

I noticed a dramatic slower authentication process when doing ssh connection to my Ubuntu box than connecting to CentOS box.
time ssh 102.14.98.241
After logging in, type exit and it reveal the time consumed as the following:
real 0m30.397s
user 0m0.012s
sys  0m0.020s
After little research I found modify one line in /etc/ssh/sshd_config could fix the problem.
sudo vim  /etc/ssh/sshd_config
# add the following line
UseDNS no
#save file and restart sshd server
sudo /etc/init.d/ssh restart
Test the new time consuming, use time ssh 102.14.98.241 shows
real 0m5.309s
user 0m0.013s
sys  0m0.015s

Configure sqlldr for batch uploading data to Oracle

I was trying to batch load data to Oracle from client site of computer (Ubuntu 64bit) and found that sqlldr was not installed. Here is a simple instruction of installing/enabling sqlldr on Ubuntu 64Bit.
1. make sure that Oracle Client was installed. If not, following this link to install.
2. Download Oracle Database Express Edition (oracle-xe-11.2.0-1.0.x86_64.rpm.zip) from here:
3. extract rpm file from the downloaded file
unzip -d oracle-ex oracle-xe-11.2.0-1.0.x86_64.rpm.zip
cd oracle-ex/Disk1
# extract files from rpm package
rpm2cpio oracle-xe-11.2.0-1.0.x86_64.rpm | cpio -idmv
# copy file sqlldr and folder mesg to local $ORACLE_HOME
sudo cp u01/app/oracle/product/11.2.0/xe/bin/sqlldr /usr/lib/oracle/11.2/client64/bin
# create fold rdbms/mesg at $ORACLE_HOME
sudo mkdir -p /usr/lib/oracle/11.2/client64/rdbms/mesg
#then copy files from mesg folder
sudo cp u01/app/oracle/product/11.2.0/xe/rdbms/mesg/* /usr/lib/oracle/11.2/client64/rdbms/mesg/
4. now you could run sqlldr to batch load data
sqlldr user/pass@//oracle_server:1521/devl control="RXNCONSO.ctl";

Install cx_Oracle on Ubuntu from rpm package

Make sure that you have installed Oracle instant client. If not, please follow this post.
1. make sure that the path in your oracle.conf file contains libclntsh.so.11.1 file
cat /etc/ld.so.conf.d/oracle.conf
#it shows
/usr/lib/oracle/11.2/client64/lib
# verify that the path contains libclntsh.so.11.1
ll /usr/lib/oracle/11.2/client64/lib | grep libclntsh
lrwxrwxrwx 1 root root        17 Aug  5  2013 libclntsh.so -> libclntsh.so.11.1
-rw-r--r-- 1 root root  52761218 Sep 17  2011 libclntsh.so.11.1
2. update ld.so.cache
sudo ldconfig
3. download and install cx_Oracle library from rpm package
sudo alien -i cx_Oracle-5.1.2-11g-py27-1.x86_64.rpm
4. Configure cx_Oracle to work with Python2.7 within Ubuntu
cd /usr/lib/python2.7
sudo mv site-packages/cx_Oracle* dist-packages/
sudo rmdir site-packages/
sudo ln -s dist-packages site-packages

Install Oracle SQL Plus on Ubuntu

Recently, I switched my development environment from CentOS to Ubuntu. I have difficulties of running Oracle SQL Plus. The installation went fine by following my previous post. However, when I tried to run sqlplus,it complains that libsqlplus.so is missing "error while loading shared libraries: libsqlplus.so": Here is the solution I used to fix the problem: Create a file oracle.conf in /etc/ld.so.conf.d
# create file oracle.conf
sudo vim /etc/ld.so.conf.d/oracle.conf
# add the following line to the file
/usr/lib/oracle/11.2/client64/lib
# then run ldconfig to update the configuration
sudo ldconfig
But when you run sqlplus again you may find it shows another error: "missing libaio.so.1, to fix that you need install libaio1"
sudo apt-get install libaio1
Now you may be able to connect to Oracle:
sqlplus user/password@//oracle_server:port/SID

Install rpm Packages in Ubuntu

If rpm package is all you got for a software package, you could follow the following instructions to install it on Ubuntu:
# install alien and all the dependencies it needs
sudo apt-get install alien dpkg-dev debhelper build-essential
# Convert a package from rpm to debian format
sudo alien packagename.rpm
# Use dpkg to install the package
sudo dpkg -i packagename.deb

Datatable static image not found on the server

When you use ```datatables.min.css``` and ```datatables.min.js``` locally, instead of datatables CDN, you may have encountered that ```sort...