Installing Hadoop :(
Installing softwares on Linux might sometime be uncomfortable and might test one’s patience , but I unfortunately ran into a huge pile of errors in an unsuccessful attempt to native install and configure hadoop on ubuntu 9.04!
The simplest suggestion to save some (significant) hassle and time in installing Hadoop on Windows would be to
(1) Download and install a copy of Sun Virtual Box ( Dont try VMPlayer @least for the time being) – Be sure to download the right Virtual Box for the OS.
(2) Cloudera already has a Virtual Machine with Hadoop fully installed and configured along with Eclipse. (http://www.cloudera.com/hadoop-training-virtual-machine)Download the file (its approximately 1.18 GB),unzip it and load it in the Virtual Box ( Create a New VM and point to “cloudera-training-0.2-cl4.vmdk” as the primary master).
The VM boots up smoothly with a ”Cloudera” Wallpaper. Also, if you get an error stating ” This kernel requires the following features not present on the cpu pae” , Simply select the PAE/NX check Box in the settings of the VM.(Right Click on the settings of the VM ->settings -> General -> Advanced(2nd tab) –> check PAE/NX .
VirtualBox is compatible with most editions of Linux and a blog post on Cloudera’s website explains how. Cloudera’s site also recommends using VMplayer .I as well tried VMPlayer @first Strangely ,it continously triggered errors forcing me to try Virtual Box which was rather straightforward and smooth.
I however started with Michael.G.Noll ’s tutorial on installing Hadoop on Ubuntu Linux. I simply though Hadoop on Linux would be more stable than on the shaky , frequently crashing 64 Bit windows Vista. The tutorial had worked fine for a lot of people,but these are some variations I had with respect to my system.
(1) I installed Java 6 , and the jvm.cfg file in /etc/jvm was as follows according to the tutorial
# /etc/jvm # This file defines the default system JVM search order. #Each.JVM should list their JAVA_HOME compatible #directory in this file.The default system JVM is the first #one available from top to bottom. /usr/lib/jvm/java-6-sun /usr/lib/jvm/java-gcj /usr/lib/jvm/ia32-java-1.5.0-sun /usr/lib/jvm/java-1.5.0-sun /usr
But , I had a file that looked like this !
# @(#)jvm.cfg 1.9 05/11/17
#
# Copyright 2006 Sun Microsystems, Inc. All rights reserved.
# SUN PROPRIETARY/CONFIDENTIAL. Use is subject to license terms.
# List of JVMs that can be used as an option to java, javac, etc.
# Order is important -- first in this list is the default JVM.
# NOTE that this both this file and its format are UNSUPPORTED and
# WILL GO AWAY in a future release.
#
# You may also select a JVM in an arbitrary location with the
# "-XXaltjvm=<jvm_dir>" option, but that too is unsupported
# and may not be available in a future release.
#
/usr/lib/jvm/java-6-sun
-client IF_SERVER_CLASS -server
-server KNOWN
-hotspot ALIASED_TO -client
-classic WARN
-native ERROR
-green ERROR
Looking @ the example , I inserted the ” /usr/lib/jvm/java-6-sun” into the file. This looked correct,but seems not the right way to be doing it , I did shoot a mail to the hadoop core user groups , and still anticipating a reply. This file specifies the Virtual Machine that has to be used by Java in orderof preference. By Inserting the line,we instruct it to use java-6 as the default JVM.
(2) When I tried the following command, ubuntu spit on me complaining – “too many parameters”
hadoop@ubuntu:~$ ssh-keygen -t rsa -P ""
Alternatively, “ssh-keygen -t” seem to work,but I m still not sure !
(3) Confirm if ssh is installed before trying to connect to the localhost . I must confess I attempted this even without installing ssh – was actually confused with Secure Shell Daemon.
hadoop@ubuntu:~$ ssh localhost
(4) Downloading and Installing Hadoop should not be much of a problem. JAVA_ HOME variable must be set to the right java implementation – in case of Java 6.
# The java implementation to use. Required. export JAVA_HOME=/usr/lib/jvm/java-6-sun
(5) Although the tutorial explains with Hadoop 0.20.0 , the relase isn’t available anymore. The site present an updated version of Hadoop 0.20.1 which has a few variables in different files unlike 20.0.
(6) I could successfully format the node but faced an annoying number of warnings .The error actually propagated from(1) where I set the default JVM.
This continued to pop up some 20 times whenever I started or stopped nodes. I m not sure if its of any significance ! I jus tried not to show any warnings in linux . I tried “JPS” and netstat according to the tutorial, but ended getting different results!! I finally paused the attempt – unsure of proceeding further with the above errors !!
But I still Had heck a lot of problems in running Cloudera’s VM
!! Insisting on Ubuntu , I downloaded the 1.2 GB file and decompressed it. Ubuntu hung a couple of times , but then I realised I was running out of space
(1) I googled and found that Gparted is the most widely used software to manage partition. This however would not work
.I remember I had not created a separate partition for ubuntu but opted for “install inside Windows” . This option comes with ubuntu 9.04 unlike the previous editions .The internet was filled with solutions to used Gparted ,partition commander and other softwares.Sadly, none of them noticed that ubuntu 9.04 had the option to install inside windows
I couldnt find any solution to this problem.
(2) I rebooted to windows. Ubuntu creates a folder – “C:\ubuntu”. Navigating through that ,there is a file titled “.fuse_hidden0000000400000001″,which is nothing but the entire ubuntu image. It could be renamed “ubuntu_image.iso ” or something and could be used for installation.
(3) The folder also has “root.disk” and ”swap.disk”. There seem to be no way to read the .disk extension yet and I think that the previous editions of ubuntu didnt have it either !!
(4) I finally had the Cloudera VM in windows with Both Virtual Box and VMplayer installed. VMplayer seems to have this unresolved error that reads “One of the disks in this virtual machine is already in use by a virtual machine or by a snapshot” . All solutions in the net instruct to delete the .lck folder or the lock .But it continously kept sprouting. I tried a reboot,but that still didnt work ! There are some threads in the VMware community , but all of them abruptly end without a solution !
(5) The Sun VirtualBox booted with a small window size. Seems editing the xorg.conf file fixes this. I tired to do this .. but I almost ran out of patience @ this point of time
. I had banged on way more number of errors and bugs than I should have !! .. ( I also tried installing samba in between,but dumped it as it also pointed to the same error in jvm.cfg )
I am finally contented with Hadoop on Windows Vista , on Virtual Box with a small screen
Hopefully I get to run some programs rather than working on secondary and tertiary errors and soon native install hadoop on ubuntu !!!
– Installed Successfully ! Reboot –
Filed under: Uncategorized | Leave a Comment
Tags: Cloudera, hadoop virtualbox VMplayer, JVM, PAE, Virtualmachine
No Responses Yet to “Installing Hadoop :(”