Wednesday, February 17, 2010

Project Hadoop : Part 1

Since I am again fortunate enough to regulate the flow of blood in my project again so I will just be describing it , the way we, Nitin, Rahul, Shashank, Sohil, Sunny and me are doing it.

There is one thing about permissions that is necessary.

r stands for Read and has value 4

w stands for write and has value 2

x stands for executable and has value 1

take for instance this line from terminal

drwxr-xr-x 2 root root 4096 2009-12-07 18:42 bin

here d stands for directory and now rwx –first three alphabets or ,better, charcters show the permissions for the creator. How does one know the creator? Simple …see that after this rwx thing there is a digit written and then root which shows that the root is the creator and owner of the directory and the again root shows that this folder belongs to group root. And so first rwx shows root, the owner, has rwx(4+2+1=7) permission. That is he can read, write and execute the file(execute in linux means that he can click on this file to get results. And mind it that every thing in linux is nothing but files). Next r-x shows the permission of group members. Like in this..group members have permission to read and execute but not write. Next r-x shows the permission of others. They are eligible for reading and executing too.

So if you are root user and you present directory is /etc…something like this:

root@shreyansh:/home/hadoop#

you can change permission of file one existing in the home directory of Hadoop by the following command:

root@shreyansh:/home/hadoop#chown 777 one

(777 gives the rwx permission to all the users..incl and excl of groups)

This is how you change permissions. Why I needed this was because sometimes I tell you to edit the file and you are not able to edit the file because you don’t have the permissions so what you can do is first make the backup copy of the file and then change permissions of the original and make changes and then revert back the permissions to the original..(! do not forget to revert back…plz)

Installing java JDK:

1) Open terminal and give yourself the root privileges.

2) If you are a Ubuntu user then probably you won’t be able to become root user. In that case use ‘sudo’ command as follows:

sudo apt-get install

sun-java6-jdk

But as is with me, I don’t like sudo much…I like root so In order to give you the root privilege do as follows:

Open terminal and write

sudo passwd

[sudo]passwd for user: “ enter your password here”

Enter the new password:” enter the password you, want ,the root should have”

Reenter the password:” confirm the password”

For all the beginners:

Enter the corresponding fields in the places where I wrote in “ “

This gives you the root privileges.

Now problem I faced…my package was not getting downloaded when I wrote apt-get command..

In that case update your aptitude package by the command:

aptitude update

And now again try writing the apt-get command I wrote above

After this it asks for license confirmation and when done what you get on your machine is the perfect java installation. Use TAB key to select various options of ‘yes‘or ‘no’ as required.

Now the post installation part:


Set JAVA_HOME into environment variable
copy the following statement and append to /etc/profile and .bahsrc file , make system set JAVA_HOME into system environment variable.

export JAVA_HOME = "/usr/lib/jvm/java-6-sun-1.6.0.06"

SSH installation

This won't take much time or effort. Simply update your aptitude manager.

For newbies:

aptitude update

and now write the following command:

for user other than root

sudo apt-get install openssh-server

for root user:

apt-get install openssh-server

and now time to generate the RSA pair.

root@shreyansh:~$ su - hadoop

hadoop@shreyansh:~$ ssh-keygen -t rsa -P ""

Generating public/private rsa key pair.

Enter file in which to save the key (/home/hadoop/.ssh/id_rsa):

Created directory '/home/hadoop/.ssh'.

Your identification has been saved in /home/hadoop/.ssh/id_rsa.

Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.

The key fingerprint is:

9d:47:ab:d7:22:54:f0:f9:b9:3b:64:93:12:75:81:27 hadoop@ubuntu

hadoop@shreyansh:~$

The second line will create an RSA key pair with an empty password. Generally, using an empty password is not recommended, but in this case it is needed to unlock the key without your interaction (you don't want to enter the passphrase every time Hadoop interacts with its nodes).

Second, you have to enable SSH access to your local machine with this newly created key.

hadoop@shreyansh:~$ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

The final step is to test the SSH setup by connecting to your local machine with the hadoop user. The step is also needed to save your local machine's host key fingerprint to the hadoop user's known_hosts file.

hadoop@shreyansh:~$ ssh localhost

The authenticity of host 'localhost (127.0.0.1)' can't be established.

RSA key fingerprint is 76:d7:61:86:ea:86:8f:31:89:9f:68:b0:75:88:52:72.

Are you sure you want to continue connecting (yes/no)? yes

Warning: Permanently added 'localhost' (RSA) to the list of known hosts.

Ubuntu 8.04

...

hadoop@shreyansh:~$

Disabling IPv6

To disable IPv6 on Ubuntu Linux, open /etc/modprobe.d/blacklist in the editor of your choice and add the following lines to the end of the file:

# disable IPv6

blacklist ipv6

You have to reboot your machine in order to make the changes take effect.

Installing hadoop

You have to download Hadoop from the Apache Download Mirrors and extract the contents of the Hadoop package to a location of your choice. I picked /usr/local/hadoop. Make sure to change the owner of all the files to the hadoop user and group, for example:

$ cd /usr/local

$ sudo tar xzf hadoop-0.20.0.tar.gz

$ sudo mv hadoop-0.20.0 hadoop

$ sudo chown hadoop hadoop

$ chgrp hadoop hadoop