Running a Hadoop Example - WordCount
"Word Count"
As i said earlier hadoop installation should contain an example jar which has got "word count" as one of the example. Here is where you can find that example jar
If you would like to see the source, then you can either use a java-decompiler to reverse engineer the class file or hadoop also provides you with the source files, have a look into that
Now run the example
You can see the result using the hadoop web portal too. Use the "Browse the filesystem" link and move to /HDFS_data/test1 and have a look into part-r-0000. This is your output file.
I know this is one of the common example you will find when searching for hadoop examples. The code for this comes along with the hadoop installation.
This is a very very simple example which you can use to understand how the hadoop code works.
Steps:
- Make some sample files. I have made 2 files - you can download them from this link - test sample files
- Now load these files into Hadoop's HDFS
./hadoop-1.0.4/bin/hadoop dfs -copyFromLocal /home/venkat/Documents/*.txt /source_data/
- You can also see the uploaded files using the hadoop web portal.
As i said earlier hadoop installation should contain an example jar which has got "word count" as one of the example. Here is where you can find that example jar
Use this command to see the classes related to "word count"/hadoop-examples-1.0.4.jar
jar -tvf ./hadoop-1.0.4/hadoop-examples-1.0.4.jar | grep 'wordcount.class' -i
If you would like to see the source, then you can either use a java-decompiler to reverse engineer the class file or hadoop also provides you with the source files, have a look into that
< hadoop_home>/src/examples/org/apache/hadoop/examples/WordCount.java
Now run the example
./hadoop-1.0.4/bin/hadoop jar ./hadoop-1.0.4/hadoop-examples-1.0.4.jar wordcount /source_data/*.txt HDFS_data/test1you will see the below result, if successful.
You can see the result using the hadoop web portal too. Use the "Browse the filesystem" link and move to /HDFS_data/test1 and have a look into part-r-0000. This is your output file.
Comments
Hadoop administration Online Training
iosh safety course in chennai