Question? Leave a message!

GFS, Mapreduce and Bigtable

GFS, Mapreduce and Bigtable
Dr.GordenMorse Profile Pic
Published Date:22-07-2017
Website URL
GFS, Mapreduce and Bigtable Seminar on big data management Lecturer: Jiaheng Lu Spring 2016 1.2.2016 1Google big data techniquesGoogle File System MapReduce model Bigtable data storage platform Google File System Google File System (GFS) A scalable distributed file system for large distributed data intensive applications Multiple GFS clusters are currently deployed. The largest ones (in 2003) have: 1000+ storage nodes 300+ TeraBytes of disk storage heavily accessed by hundreds of clients on distinct machines Shares many same goals as previous distributed file systems performance, scalability, reliability, etc GFS design has been driven by four key observation of Google application workloads and technological environment Observations 1 1. Component failures are the norm constant monitoring, error detection, fault tolerance and automatic recovery are integral to the system 2. Huge files (by traditional standards) Multi GB files are common I/O operations and blocks sizes must be revisited Observations 2 3. Most files are mutated by appending new data This is the focus of performance optimization and atomicity guarantees 4. Co-designing the applications and APIs benefits overall system by increasing flexibility Design Cluster consists of a single master and multiple chunkservers and is accessed by multiple clients Master Maintains all file system metadata. names space, access control info, file to chunk mappings, chunk (including replicas) location, etc. Periodically communicates with chunkservers in HeartBeat messages to give instructions and check state Master Helps make sophisticated chunk placement and replication decision, using global knowledge For reading and writing, client contacts Master to get chunk locations, then deals directly with chunkservers Master is not a bottleneck for reads/writes Files are broken into chunks. Each chunk has a immutable globally unique 64-bit chunk- handle. handle is assigned by the master at chunk creation Chunk size is 64 MB Each chunk is replicated on 3 (default) servers Linked to apps using the file system API. Communicates with master and chunkservers for reading and writing Master interactions only for metadata Chunkserver interactions for data Only caches metadata information Data is too large to cache. Locations Master does not keep a persistent record of locations of chunks and replicas. Polls chunkservers at startup, and when new chunkservers join/leave for this. Stays up to date by controlling placement of new chunks and through HeartBeat messages (when monitoring chunkservers) to MapReduceMapReduce: Insight  A Toy Problem:   We have 10 billion documents   Average documents size is 20KB   10 Billion docs == 200 TB   We want build a language model of the Web:  – Basically count how many times each word occurMapReduce: Insight  for each document d  for each word w in d word_countw++;  Approximately 1 month.  Assumptions:  1. All disk reads are sequential  2. Dictionary fits into the memoryMapReduce Programming Model  Inspired from map and reduce operations commonly used in functional programming languages like Lisp.  Users implement interface of two primary methods:  1. Map: (key1, val1) → (key2, val2)  2. Reduce: (key2, val2) → val3Map operation  Map, a pure function, written by the user, takes an input key/value pair and produces a set of intermediate key/value pairs.  e.g. (doc—id, doc-content)  Draw an analogy to SQL, map can be visualized as group-by clause of an aggregate query.Reduce operation  On completion of map phase, all the intermediate values for a given output key are combined together into a list and given to a reducer.  Can be visualized as aggregate function (e.g., average) that is computed over all the rows with the same group-by attribute.