How to run simple Hadoop programs

How to run simple Hadoop programs

Let us write simple hadoop program and try to run it in Hadoop

Copy the following code in your eclipse java project in some class file

Configure the eclipse path to remove any errors. If you need help in setting up eclipse for hadoop then please see other post.


package org.jagat.hdfs;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;

public class HDFSCopyAll {

    public static void main(String[] args) throws IOException {

        Configuration conf = new Configuration();

        FileSystem hdfs = FileSystem.get(conf);
        FileSystem local = FileSystem.getLocal(conf);

        FileStatus[] localinput = local.listStatus(new Path(
                "/home/hadoop/software/20/pig-0.10.0/docs/api"));
       
        for(int i=0;i<localinput.length;i++){
            System.out.println(    localinput[i].getLen());
       
        }

    }

}

It is just using Hadoop API to get the length of files present in api directory.

The intention of this post is not to teach anything about hadoop api or about mapreduce programs , but just how to run a hadoop program you write.



Change the path above (/home/hadoop/software/20/pig-0.10.0/docs/api) to some real path present in your computer.

Now its time to package this as a Jar

Go to File > Export

Eclipse will show us menu.

Choose Main class as the above class name HDFSCopyAll and create a jar in your computer

Now its time to run it.

Open terminal and go to place where you made the jar

and invoke the jar as follows

$hadoop jar learnHadoop.jar

This will run and show you the output.

If you are stuck somewhere just post message in comments below.

Thanks for reading

No comments:

Post a Comment

Please share your views and comments below.

Thank You.