Parallel processing in big data

big data analysis with signal processing on graphs and big data for natural language processing a streaming approach
Dr.GordenMorse Profile Pic
Dr.GordenMorse,France,Professional
Published Date:22-07-2017
Your Website URL(Optional)
Comment
Ghislain Fourny Big Data 7. Massive Parallel Processing (Spark) 1kirtchanut / 123RF Stock Photo YARN 2Last week: MapReduce Input data Map Map Map Map Map Map Map Map Intermediate data (shuffled) Reduce Reduce Reduce Reduce Reduce Reduce Reduce Reduce Output data 3Hadoop infrastructure (version 1) Namenode + JobTracker /dir/file Datanode Datanode Datanode Datanode Datanode Datanode + + + + + + 4 TaskTracker TaskTracker TaskTracker TaskTracker TaskTracker TaskTrackerIssue 1: scalability M M M M M M M M M M M M 4,000 nodes 40,000 tasks 5Issue 2: bottleneck JobTracker Bottleneck TaskTracker TaskTracker TaskTracker TaskTracker TaskTracker TaskTracker 6 6Issue 3: Jack of all trades Scheduling Monitoring 7 7Issue 4: Utilization (task slots) Static Fixed-size (Decide on M/R at configuration time) 8 8YARN Yet Another Resource Negotiator 9YARN Scheduling Monitoring Application management Application Master Application Master Application Master Application Master Resource Manager Application Master 10Scales more M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M 10,000 nodes 100,000 tasks 11YARN architecture ResourceManager Container Container Container NodeManager NodeManager NodeManager NodeManager NodeManager NodeManager 12Remember... It does ring a bell, doesn't it? 13Master-slave architecture Master Slave Slave Slave Slave Slave Slave 14HDFS server architecture Namenode /dir/file1 /dir/file2 /file3 Datanode Datanode Datanode Datanode Datanode Datanode 15YARN ResourceManager Container Container Container NodeManager NodeManager NodeManager NodeManager NodeManager 16ResourceManager ResourceManager Scheduler + Applications Manager 17YARN: Client posts a job Client ResourceManager ApplicationClient Protocol Job Container Container Container NodeManager NodeManager NodeManager NodeManager NodeManager 18YARN: RM allocates an Application Master Client ResourceManager (Scheduler) ApplicationMaster Protocol Job Schedules Application Master Container Container NodeManager NodeManager NodeManager NodeManager NodeManager 19Scheduling strategies FIFO scheduler Capacity scheduler Fair scheduler 20