This  video speaks about Programming with Hadoop, I guess I couldnt follow a good 20 mins in the video.

I picked up the following points from  the video, however there are a few more points to understand

(1)Some Hadoop Terminology
Job : A  fullprogram that includes both  mapper and reducer togather across a data set.
Task: A  job is broken down to Tasks.The Mapper or the reducer individually are referred to as the Task.
Task Attempt : If a mapper crashes in one system,it might restart again – it is the same task,but a different task attempt though.
(2)The system attempts to execute a task over and over but it eventually stops – if a particular input slice is triggering the failure. @ this point ,
(a) either the entire job can fail
(b) or a quality factor could be set  which specifies the number of map/reduce  tasks that would suffice for the overall goal of the job.
(3)The same task could be parallely attempted by  different  mapper,one of them completes the task faster and the others are killed.
(4)There is a job tracker that runs on the MasterNode, which tells  the slave nodes which particular task units it must run.The tast tracker  runs on the slaves which is  responsible for managing all  tasks in the node. The tasks run on a different JVM from that of the task  tracker.So even if a particular task  crashes,the task tracker is isolated and still continues to run.
(5)There is exactly   one tasktracker per node and all of the tasks report to it.
(6)The jobtracker is decided well in advance and its IP is published in the configuration file sent to the slaves.
Cannot run a job from a set of .class files,must assemble all of them  into a .jar file. It then uploads the JAR into hdfs , and writes a configuration specs for the job which is typically an XML file. using RPC , client sends the pointer to the location of jar in hdfs and the XML config file  to the job tracker. The jobtracker then notifies all the task trackers to download the jar from the shared hdfs.
(7)Sometimes, the records may bounce over to the next block – but hadoop takes care of this by reading past the end of the block.
(8)JobConf object describes the specs of a job which gets serialized and sent across the network  and again gets deserialized at the client machine into the job conf object.
(9)FileInputFormat.addInputPath(conf)  -used to specify the input to the mapper. Could simply be a file or a directory (uses all the files in the directory)
FileOutputFormat.setOutputPath(conf) – the reducers need to write back the solutions whihc is  specified using this.
runJob() – will block / the program will wait till the MapReducer  finishes it job and then go on, submitJob() – send the job to the jobtrackers Q and returns a handle to the job.
(10)The client sends a message to the Master every 10 seconds, based on which the client may recieve new data to map/reduce.
The JAR is cached, doesnt download it again.
(11) Once a task is completed , the node closes down the JVM and spawns a new one for the next task,which is wasteful. JVM  reuse solves this and is being thought about.