Summary slides after each model Hadoop, NoSQL and MPP. 3 bullets on actual implementation to tie back. To later section.
Sean to 27
Hadoop optimized for large streaming reads, not for low latency or fast writesHDFS optimized for fewer, larger files (> 100 MB), 128MB block size or higherFiles are write-once currently (append support available in 0.21, but mainly for HBase; otherwise not recommended)Blocks are replicated 3x by default, on three different data nodesNameNode stores file metadata in fsimage.txt:/usr/sean/foo.txt:blk_1,blk_2,blk_3 – but but it doesn't know which data nodes own those blocks until they report inBlocks are just files on the underlying filesystem (ext3, etc.) - blk_1234No metadata on the slave node that describes the data contained on that slave (or any other)When NameNode starts up, it starts in safe mode, and won't leave safe mode until it knows where at least one copy 99.999% of blocks are (configurable) based on block reports, then waits 30 seconds and exits safe modeNameNode block map is solely based on slave block reports, always cached in memory, nothing persistentAll data nodes heartbeat into NameNode every 3 seconds; NameNode will evict if no heartbeat after 5 minutes, and re-replicate “lost” blocks if no heartbeat after 10 minutesAs blocks are written, checksums are calculated and stored with the block (blk_1234.meta). Upon read it compares the calculated checksum with stored checksumTo avoid bit rot, a daemon runs to check the checksum every 3 weeks after a given block was written
Hadoop optimized for large streaming reads, not for low latency or fast writesHDFS optimized for fewer, larger files (> 100 MB), 128MB block size or higherFiles are write-once currently (append support available in 0.21, but mainly for HBase; otherwise not recommended)Blocks are replicated 3x by default, on three different data nodesNameNode stores file metadata in fsimage.txt:/usr/sean/foo.txt:blk_1,blk_2,blk_3 – but but it doesn't know which data nodes own those blocks until they report inBlocks are just files on the underlying filesystem (ext3, etc.) - blk_1234No metadata on the slave node that describes the data contained on that slave (or any other)When NameNode starts up, it starts in safe mode, and won't leave safe mode until it knows where at least one copy 99.999% of blocks are (configurable) based on block reports, then waits 30 seconds and exits safe modeNameNode block map is solely based on slave block reports, always cached in memory, nothing persistentAll data nodes heartbeat into NameNode every 3 seconds; NameNode will evict if no heartbeat after 5 minutes, and re-replicate “lost” blocks if no heartbeat after 10 minutesAs blocks are written, checksums are calculated and stored with the block (blk_1234.meta). Upon read it compares the calculated checksum with stored checksumTo avoid bit rot, a daemon runs to check the checksum every 3 weeks after a given block was written
JobTracker assigns map or reduce tasks to TaskTracker slaves (data nodes) with available “slots”. For map tasks, JobTracker attempts to assign work on local blocks to avoid expensive shipping of blocks across the networkEach task (mapper or reducer) runs in its own child JVM on the slave node. TaskTracker process kicks off its child tasks based on preconfigured number of task slotsEach child task JVM eats up a chunk of RAM, placing a limit on total # of slotsRule of thumb: 25-30% of space set aside for temp storage, outside of HDFS, to hold intermediate map output data before sending to reducersIf a child JVM dies, TaskTracker will remove it and report to the JobTracker that it died; JobTracker will attempt to reassign that task to a different TaskTrackerIf any specific task fails 4 times, the whole job failsIf a TaskTracker reports a high # of failed tasks, it'll get blacklisted for that jobIf a TaskTracker gets blacklisted for multiple jobs, it gets put on a global blacklist for 24 hours
As of Feb 2013
As of Feb 2013
CEP: Complex Event Processing
Big data projects often start out co-mingled within existing general purpose data center infrastructure, but eventually outgrow it and need to move to a dedicated “pod”. This is usually where we come in.