Posts

Hadoop Installation(Single-Node)-2/3

Hadoop 1.0.4 on Ubuntu Linux 12.04 ( Single Node )- Part -2/3 Download Hadoop Download a stable hadoop version from  http://hadoop.apache.org/ I downloaded the this version - hadoop 1.0.4 and this is the link to it stable hadoop download page . Download the file (hadoop-1.0.4.tar.gz) and copy it under the home directory of hadoop_usr ( ie. /home/hadoop_usr/) Steps to Install Login to terminal as hadoop_usr and extract the contents of the gz file tar -xvf hadoop-1.0.4.tar.gz We have to edit the following configuration file $HADOOP_HOME/conf/hadoop-env.sh $HADOOP_HOME/conf/hdfs-site.xml $HADOOP_HOME/conf/core-site.xml $HADOOP_HOME/conf/mapred-site.xml $HADOOP_HOME/conf/masters $HADOOP_HOME/conf/slaves $HADOOP_HOME/conf/hadoop-env.sh set the Java home # The java implementation to use.Required. # export JAVA_HOME=/usr/lib/j2sdk1.5-sun export JAVA_HOME=/usr/lib/jvm/java-7-oracle $HADOOP_HOME/conf/hdfs-site.xml before editing this file create the f...

Hadoop Installation(Single-Node)-1/3

Image
Hadoop 1.0.4 on Ubuntu Linux 12.04 ( Single Node )- Part -1/3 There are few ways to install Hadoop.  Cloudera Distibution Hadoop PPA Stable download from hadoop.org I prefer to use the download from hadoop.org. One of the main reason to select this is whenever there is a new version of hadoop, I need not wait for someone else to release their version including the newer version.  Should we not be installing hadoop on multi nodes? Yes. But my objective is to set-up a hadoop environment on my laptop, so that I can play around and also get a better understanding of Map-Reduce. For that single-node set-up is sufficient. Prerequisites Make sure you have already installed Java 1.6 on my machine. if not please follow this link -  Install Java 1.6 Create a separate user & usergroup for hadoop. I like to keep a dedicated user for hadoop. It is much easier when it comes to giving permissions and for various admin acitivies. It is ...

Hadoop Setup and Architecture

Hadoop Index

Hadoop Setup and Architecture Hadoop | Installation ( SIngle-Node ) - 1/3 Hadoop | Installation ( SIngle-Node ) - 2/3 Hadoop | Installation ( SIngle-Node ) - 3/3 Hadoop File System Commands Run a Hadoop Example - Word Count

Linux | Install Apache Maven 3

Image
Install Apache Maven 3  on Ubuntu 12.10 Follow the below steps to install Maven3.  Maven is a important tool which we might require to build some of the apache projects. So lets install it. Open you your terminal and type in the following commands. sudo apt-get install maven If you get an error message like  E: dpkg was interrupted, you must manually run 'sudo dpkg --configure -a' to correct the problem. then run the below command sudo dpkg --configure -a After running the above command, again run the first command sudo apt-get install maven once the installation is over, then check the version to make sure if the installation was successful. mvn -version

Linux | Install Sun Java

Image
Install Oracle(Sun) Java 1.7.x on Ubuntu 12.10 My windows Visa laptop crashed recently and I decided to get rid of windows Vista ( enough is enough) and install Linux on my laptop. So did a fresh install of Ubuntu 12.10 on my old laptop and had to install all required software on it. So I thought why not blog it. Also I'm planning to install Hadoop-Hbase-Mahout on this machine and the first step to hadoop and others is installing JAVA. So here are the steps to install JAVA on Ubuntu 12.10 Open you your terminal and type in the following commands. sudo add-apt-repository ppa:webupd8team/java sudo apt-get update sudo apt-get install oracle-java7-installer once the installation is over, then check the version of the JAVA installed to make sure if the installation was successful. java -version

Linux | File_Combiner4Hadoop

File_Combiner4Hadoop This shell script can be used to combine a set of small files into one or more big file. This script is very useful when working with hadoop( at least it did for me). With Hadoop the overhead involved in processing small files is very high. So it is better to combine all the small files and make one or more big files and use those big files for hadoop processing. At the same time if the "big/combined file" is more than the block size( by default it is 64MB) in hadoop, there is a possibility for the file to get split during the hadoop process( i.e one half of the file will be processed by one node and another half on another node). If you dont want the files to be split, then this is one of the easiest solution - combine the small files into one or more big files and make sure the big file's size does go above the hadoop block size ( in my case it is 64MB). This shell script has a parameter "-size" where you can specify the maximum all...