Archive Page 2
HackedExistence Hadoop Talk
I found this talk by a hacking group called “HackedExistence”,which they presented in DefCon 17. The first two parts of the videos talk in general about Cloud computing, Hadoop ,Map Reduce , HDFS and Streaming.At this point most of these concepts were familiar, but their explanation to the nexflix problem was interesting.
I did pick up a few points from these videos as well,
(1) HDFS breaks the files in 64 Meg Chunks, so if a file is 65 Megs in size, it might be divided into two files – one 64 Megs in size and the other 1 Meg .So , one mapper may run in full capacity while the other doesnt.
(2) refreshing on Hbase , its a non-relational database ,which is not built for real time querying.
Following a 20 minute refresher of the general concepts, the hackers explain about how they solved the netflix prize using Hadoop.They say they had to run some 17 K odd Mappers to solve this. The whole code of their attempt on improving the recommendation engine is here. I however couldnt understand much of the solution.They seems to be constrained by the time limit on stage, so it was pretty quick. May be trying to understand the code would be a good exercise
This is part 3:shows them explaining the code!
Filed under: Uncategorized | Leave a Comment