GFS, Mapreduce and
Seminar on big data management
Lecturer: Jiaheng Lu
www.helsinki.fi 1.2.2016 1Google big data techniquesGoogle File System
Bigtable data storage platform
www.helsinki.fiThe Google File System
www.helsinki.fiThe Google File System (GFS)
A scalable distributed file system for large
distributed data intensive applications
Multiple GFS clusters are currently deployed.
The largest ones (in 2003) have:
1000+ storage nodes
300+ TeraBytes of disk storage
heavily accessed by hundreds of clients on distinct
Shares many same goals as previous
distributed file systems
performance, scalability, reliability, etc
GFS design has been driven by four key
observation of Google application workloads
and technological environment
www.helsinki.fiIntro: Observations 1
1. Component failures are the norm
constant monitoring, error detection, fault
tolerance and automatic recovery are integral to
2. Huge files (by traditional standards)
Multi GB files are common
I/O operations and blocks sizes must be
www.helsinki.fiIntro: Observations 2
3. Most files are mutated by appending new
This is the focus of performance optimization
and atomicity guarantees
4. Co-designing the applications and APIs
benefits overall system by increasing
Cluster consists of a single master and multiple
chunkservers and is accessed by multiple clients
Maintains all file system metadata.
names space, access control info, file to chunk
mappings, chunk (including replicas) location, etc.
Periodically communicates with chunkservers in
HeartBeat messages to give instructions and check
Helps make sophisticated chunk placement
and replication decision, using global
For reading and writing, client contacts Master
to get chunk locations, then deals directly with
Master is not a bottleneck for reads/writes
Files are broken into chunks. Each chunk has
a immutable globally unique 64-bit chunk-
handle is assigned by the master at chunk creation
Chunk size is 64 MB
Each chunk is replicated on 3 (default)
Linked to apps using the file system API.
Communicates with master and chunkservers
for reading and writing
Master interactions only for metadata
Chunkserver interactions for data
Only caches metadata information
Data is too large to cache.
Master does not keep a persistent record
of locations of chunks and replicas.
Polls chunkservers at startup, and when
new chunkservers join/leave for this.
Stays up to date by controlling placement of
new chunks and through HeartBeat
messages (when monitoring chunkservers)
A Toy Problem:
We have 10 billion documents
Average documents size is 20KB
10 Billion docs == 200 TB
We want build a language model of the Web:
– Basically count how many times each word occurMapReduce: Insight
for each document d
for each word w in d word_countw++;
Approximately 1 month.
1. All disk reads are sequential
2. Dictionary fits into the memoryMapReduce Programming Model
Inspired from map and reduce operations
commonly used in functional programming
languages like Lisp.
Users implement interface of two primary
1. Map: (key1, val1) → (key2, val2)
2. Reduce: (key2, val2) → val3Map operation
Map, a pure function, written by the user, takes
an input key/value pair and produces a set of
intermediate key/value pairs.
e.g. (doc—id, doc-content)
Draw an analogy to SQL, map can be visualized
as group-by clause of an aggregate query.Reduce operation
On completion of map phase, all the intermediate values for a
given output key are combined together into a list and given to
Can be visualized as aggregate function (e.g., average) that is
computed over all the rows with the same group-by attribute.