Question? Leave a message!




Big Data Distributed file systems

Big Data Distributed file systems
Ghislain Fourny Big Data 4. Distributed file systems Kheng Ho Toh / 123RF Stock PhotoSo far... We've rehearsed relational databases 2So far... We've looked into scaling out 3So far... We've seen a simple model for object storage 4Why? There is Big Data Anna Liebiedieva / 123RF Stock Photo and Big Data Vadym Kurgak / 123RF Stock Photo 5Use cases A huge amount of large files? vs. A large amount of huge files? 6Use cases Billions of TB files Object Storage vs. File Storage Millions of PB files 7Where does the data come from? Sensors Aggregated data Measurements Intermediate data Events Logs Anton Starikov / 123RF Stock Photo Oleg Dudko / 123RF Stock Photo Derived Data Raw Data 8Technologies and models Key-Value Store File System Object Storage Block Storage Billions of Millions of vs. TB files PB files 9Distributed file systems: inception FS 10GFS genesis Characteristics Requirements File System Design 11Fault tolerance and robustness Local disk It might fail Vitaly Korovin / 123RF Stock Photo Cluster with 100s to10,000s of machines nodes will fail Kheng Ho Toh / 123RF Stock Photo 12Fault tolerance and robustness Fault tolerance Automatic Recovery Error detection Monitoring 13 Kheng Ho Toh / 123RF Stock PhotoFile update model vs. Upsert/append only Random access 14File update model suitable for immutable Sensors _____ _____ Logs _____ Intermediate data Append 15Appends 100s of clients in parallel atomic Append only GFS only 16Performance requirements Top priority: Throughput Secondary: ? Latency 17The progress made (1956-2010): Logarithmic 622,100,131x 11,719x 8x Throughput Capacity Latency 18 Source: Michael E. Friske, Claus Mikkelsen, The History of Storage, SHARE 2014 Picture: Ash Waechter/123RFThe progress made (1956-2010): Logarithmic 622,100,131x 11,719x Parallelize 8x Throughput Capacity Latency 19 Source: Michael E. Friske, Claus Mikkelsen, The History of Storage, SHARE 2014 Picture: Ash Waechter/123RFThe progress made (1956-2010): Logarithmic 622,100,131x 11,719x 8x Batch processing Throughput Capacity Latency 20 Source: Michael E. Friske, Claus Mikkelsen, The History of Storage, SHARE 2014 Picture: Ash Waechter/123RFHadoop 21Hadoop Initiated in 2006 22Hadoop Primarily: • Distributed File System (HDFS) • MapReduce • Wide column store (HBase) 23 Covered in this lectureHadoop Inspired by Google's • GFS (2003) • MapReduce (2004) • BigTable (2006) 24 Covered in this lectureSize timeline Date Size reported by Yahoo April 2006 188 May 2006 300 October 2006 600 April 2007 1,000 February 2008 10,000 (index generation) March 2009 24,000 (17 clusters) June 2011 42,000 (100+ PB) 25Distributed file systems: the model 26File Systems (Logical Model) Lorem Ipsum Lorem Ipsum Dolor sit amet Dolor sit amet Consectetur Consectetur Adipiscing Adipiscing vs. Elit. In Elit. In Imperdiet Ipsum ante Imperdiet Ipsum ante File Storage Key-Value Storage 27Block Storage (Physical Storage) 111010010110101… 1 2 3 4 5 6 7 8 Object Storage Block Storage 28Terminology HDFS: Block GFS: Chunk 29Files and blocks 2 3 1 Lorem Ipsum 4 Dolor sit amet 5 Consectetur Adipiscing 6 3 1 Elit. In 2 Imperdiet 7 1 Ipsum ante 8 2 3 4 30Why blocks? 1. Files bigger than a disk PBs 2. Simpler level of abstraction 31Single machine vs. distributed 32The right block size 4 kB 128 MB Distributed file system Simple file system 33The right block size 4 kB – 32 kB 128 MB Distributed file system Relational Database 34HDFS Architecture 35How do we connect the many machines? 36Master-slave architecture Master Slave Slave Slave Slave Slave Slave 37Peer-to-peer architecture 38HDFS server architecture Namenode Datanode Datanode Datanode Datanode Datanode Datanode 39From the file perspective master ... replicated for fault tolerance ...divided into 128MB chunks... File... 40Concurrently accessed Namenode Datanode Datanode Datanode Datanode Datanode Datanode 41Hadoop implementation (Packaged code) 42HDFS Architecture Namenode Datanode Datanode Datanode Datanode Datanode Datanode 43HDFS Architecture: NameNode Namenode Datanode Datanode Datanode Datanode Datanode Datanode 44NameNode: all system-wide activity 45NameNode: all system-wide activity Memory 1 File namespace (+Access Control) 46NameNode: all system-wide activity Memory 2 /dir/file1 /dir/file2 1 File namespace /file3 (+Access Control) File to block mapping 47NameNode: all system-wide activity Memory 2 /dir/file1 /dir/file2 1 File namespace /file3 (+Access Control) File to block mapping 3 Block locations 48HDFS Architecture Namenode Datanode Datanode Datanode Datanode Datanode Datanode 49HDFS Architecture: DataNode Namenode Datanode Datanode Datanode Datanode Datanode Datanode 50DataNode 51DataNode Blocks are stored on the local disk 52DataNode Proximity to hardware facilitates disk failure detection 53Block IDs 64 bits e.g., 7586700455251598184 54Subblock granularity: Byte Range 55Communication Control Namenode Control Control Client Control Data Datanode Datanode 56Client Protocol (RPC) Metadata operations Namenode Client 57Client Protocol (RPC) Metadata operations DataNode location Namenode Client 58Client Protocol (RPC) Metadata operations DataNode location Block IDs Namenode Client 59Client Protocol (RPC) Metadata operations DataNode location Block IDs Namenode Client Java API available 60DataNode Protocol (RPC) Namenode Datanode always Datanode initiates connection 61DataNode Protocol (RPC) Namenode Registration Datanode always Datanode initiates connection 62DataNode Protocol (RPC) Namenode Registration every 3s Heartbeat Datanode always Datanode initiates connection 63 customizableDataNode Protocol (RPC) Namenode Registration every 3s Heartbeat Block operations Datanode always Datanode initiates connection 64 customizableDataNode Protocol (RPC) Namenode Registration every 3s Heartbeat Block operations every 6h BlockReport Datanode always Datanode initiates connection 65 customizableDataNode Protocol (RPC) Namenode Registration every 3s Heartbeat Block operations every 6h BlockReport BlockReceived Datanode always Datanode initiates connection 66 customizableDataNode Protocol (RPC) Namenode Registration every 3s Heartbeat Block operations every 6h BlockReport BlockReceived Java API available Datanode 67 customizableDataTransfer Protocol (Streaming) Data blocks DataNode Client Replication pipelining (write only) DataNode DataNode 68Summary Client Protocol Namenode DataNode Client Protocol DataTransfer Datanode Protocol Datanode 69Summary metadata Namenode Client data Datanode Datanode 70Metadata functionality Create directory Delete directory Write file Read file Delete file 71Client reads a file 72Client reads a file Asks for file 1 73Client reads a file 2 Get block locations Multiple DataNodes for each block, sorted by distance 74Client reads a file Read Input 3 Stream 75Client writes a file 76Client writes a file Create 1 77Client writes a file 3 DataNodes for first block 78Client writes a file Organizes pipeline 4 79Client writes a file Sends data over 5 80Client writes a file Ack 6 81Client writes a file Block IDs received 8 82Client writes a file 3 DataNodes for second block 83Client writes a file Organizes pipeline 4 84Client writes a file Sends data over 5 85Client writes a file Ack 6 86Client writes a file Block IDs received 8 87Replicas Number of replicas specified per file default:3 88Replica placement: what to consider? Reliability Read/Write Bandwidth Block distribution 89Replica placement: Reminder on topology Cluster Rack Node 90Replica placement: Distance D(A,B)=1 A B 91Replica placement: Distance D(A,B)=2 A B 92Replica placement 93Replica placement Replica 1: same node as client (or random), rack A 94Replica placement Replica 1: same node as client (or random), rack A Replica 2: a node in a different rack B 95Replica placement Replica 1: same node as client (or random), rack A Replica 2: a node in a different rack B Replica 3: a node in same rack B 96Replica placement Replica 1: same node as client (or random), rack A Replica 2: a node in a different rack B Replica 3: a node in same rack B Replica 4 and beyond: random, but if possible: • at most one replica per node • at most two replicas per rack 97Replica placement Client 98Why replicas 2+3 on other rack? Client 99If replicas 1+2 were on same rack... Block concentration on same rack (2/3) 100Performance and availability 101The NameNode is a single point of failure Namenode /dir/file1 /dir/file2 /file3 Datanode Datanode Datanode Datanode Datanode Datanode 102The namenode is a single point of failure... Namenode /dir/file1 /dir/file2 What if it fails? /file3 Datanode Datanode Datanode Datanode Datanode Datanode 103NameNode: all system-wide activity Memory 2 /dir/file1 /dir/file2 1 File namespace /file3 (+Access Control) File to block mapping 3 Block locations 1041. You want to persist Memory 2 1 /dir/file1 not persisted 3 1051. You want to persist Memory 2 1 /dir/file1 not persisted 3 Persistent Storage Namespace file 1061. You want to persist Memory 2 1 /dir/file1 not persisted /dir/file2 3 Persistent Storage Namespace Edit log file 1071. You want to persist Memory 2 1 /dir/file1 not persisted /dir/file2 3 /file3 Persistent Storage Namespace Edit log file 1082. You want to backup Backup drives Shared drive Glacier Persistent Storage Namespace Edit log file 109The namenode is a single point of failure... Namenode /dir/file1 /dir/file2 What if it fails? /file3 Datanode Datanode Datanode Datanode Datanode Datanode 110The namenode is a single point of failure... Namenode /dir/file1 We need to start /dir/file2 /file3 it up again Datanode Datanode Datanode Datanode Datanode Datanode 111Namenodes: Startup Memory Persistent Storage Namespace Edit log file 112Namenodes: Startup Memory Filesystem Persistent Storage Namespace Edit log file 113Namenodes: Startup Memory Filesystem Persistent Storage Namespace Edit log file 114Namenodes: Startup Memory /dir/file1 /dir/file2 /file3 Filesystem Persistent Storage Namespace Edit log file 115Namenodes: Startup Memory /dir/file1 /dir/file2 /file3 Filesystem Block locations Persistent Storage Namespace Edit log file 116Namenodes: Startup Memory /dir/file1 /dir/file2 /file3 Filesystem Block locations Persistent Storage Namespace Edit log file 117Namenodes: Startup Memory /dir/file1 /dir/file2 /file3 Filesystem Block locations Persistent Storage Namespace Edit log file 118Starting a namenode... ... takes 30 minutes 119Starting a namenode... Can we do better? 1203. Checkpoints with secondary namenodes Old namespace Edit log file New namespace file 1214. High Availability (HA): Backup Namenodes Namenode Namenode /dir/file1 /dir/file1 /dir/file2 /dir/file2 /file3 /file3 Maintains mappings and locations in memory like the namenode. Ready to take over at all times Datanode Datanode Datanode Datanode Datanode Datanode 1225. Federated DFS Namenode /foo Namenode /bar /foo/file1 /bar/file1 /foo/file2 /bar/file2 Datanode Datanode Datanode Datanode Datanode Datanode 123Using HDFS 124HDFS Shell hadoop fs args 125HDFS Shell: POSIX-like hadoop fs –ls hadoop fs –cat /dir/file hadoop fs –rm /dir/file hadoop fs –mkdir /dir 126HDFS Shell: upload and download from HDFS hadoop fs –put localfile1 localfile2 /user/hadoop/hadoopdir hadoop fs –get /user/hadoop/file localfile 127HDFS Shell: Configuration core-site.xml properties property namefs.defaultFS/name valuehdfs://host:8020/value descriptionNameNode hostname/description /property /properties 128HDFS Shell: Configuration hdfs-site.xml properties property namedfs.replication/name value1/value descriptionReplication factor/description /property property namedfs.namenode.name.dir/name value/grid/hadoop/hdfs/nn/value descriptionNameNode directory/description /property property namedfs.datanode.name.dir/name value/grid/hadoop/hdfs/nn/value descriptionDataNode directory/description /property /properties 129Populating HDFS: Apache Flume _____ _____ __ _____ ___ _____ Collects, aggregates, moves log data (into HDFS) 130Populating HDFS: Apache Sqoop Imports from a relational database 131GFS 132GFS vs. HDFS: Terminology NameNode Master DataNode Chunkserver Block Chunk FS Image Checkpoint image Edit log Operation log HDFS GFS 133HDFS vs. GFS: Block size 64 MB 128 MB GFS/Apache HDFS Cloudera HDFS 134Pointers Official documentation http://hadoop.apache.org/docs/r2.7.3/ GFS Paper On course website Java API http://blog.woopi.org/wordpress/files/hadoop- 2.6.0-javadoc/ 135
Website URL
Comment