The error is similar to what reported here:
http://www.mail-archive.com/common-user@hadoop.apache.org/msg03019.html
or here:
http://www.mail-archive.com/common-user@hadoop.apache.org/msg01935.html
How it was solved:
First, clean all the files generated by hadoop and stored in /tmp folders. These files are used to store pid of namenode, jobtracker, datanode, tasktracker... To be safe, backup them before deleting:
-bash-4.0$ mkdir HadoopBackup
-bash-4.0$ mv hadoop* HadoopBackup/
Then, check all the all the tasks running which are related to java.
-bash-4.0$ ps -ef|grep java
It turns out that there are lots of tasks related to hadoop. Given that all the versions of hadoop have been stopped, this means there are some tasks which were not correctly shut down by hadoop. When hadoop wants to start, these existing tasks somehow prevent corresponding new tasks from running. As a result, when hadoop is stopped, it cannot find corresponding service to stop.
Next, kill all the out-of-control tasks:
-bash-4.0$ ps -ef|grep java |awk '{print $2}'|xargs kill -9
And recheck:
-bash-4.0$ ps -ef|grep java
The problem should be solved so far.
Do some preparation for restarting if necessary:
-bash-4.0$ bin/hadoop namenode -format
-bash-4.0$ cd conf
-bash-4.0$ vim hadoop-env.sh
-bash-4.0$ vim hdfs-site.xml
-bash-4.0$ vim mapred-site.xml
-bash-4.0$ vim core-site.xml
-bash-4.0$ vim conf/masters
-bash-4.0$ vim conf/slaves
Restart!
-bash-4.0$ bin/start-all.sh
cd logs/
-bash-4.0$ tail -f *.log
-bash-4.0$ cd ..
-bash-4.0$ bin/hadoop dfsadmin -report
Everything is fine up to this point.
The reason that may cause this problem in my hadoop is that there are two (actually three when I was trying to debug) versions of hadoop installed on the server. These versions takes different server machines as namenode and jobtracker. Sothere might be some confusion when doing wrong operation like launching one version first, but stop with another versoin.
Subscribe to:
Post Comments (Atom)
This comment has been removed by the author.
ReplyDeleteThank you !
ReplyDeleteHello Felix,
DeleteI too had the same problem.My namenode was not starting up. I followed your post and finally it started popping up in the jps results.
I am running in a pseudo distributed mode with only a single version of hadoop s source code. I do not think the issue is due to multiple versions of hadoop.
I also see that many of them have the same issue.Wonder why ? :)