Question? Leave a message!




Big Data Column stores

Big Data Column stores
Ghislain Fourny Big Data 5. Column stores 1Introduction 2Relational model 3Relational model Schema 4Issues with relational databases (RDBMS) • Small scale • Single machine 5Can we fix a RDBMS Scale up (remember) 6Can we fix a RDBMS Scale out 7Can we fix a RDBMS • Cluster Scale out 8Can we fix a RDBMS • Cluster Scale out • Replicate 9Can we fix a RDBMS • Hard to set up Scale out • Very high maintenance costs 10HBase By design running on a scalable cluster of commodity hardware 11HBase By design running on a scalable cluster of commodity hardware HDFS 12Wide column stores: data model 13Founding paper 's BigTable 14The tabular model 15The tabular model: expensive joins 16Design paradigm of BigTable store together what is accessed together 17The tabular model: expensive joins 1 1 4 2 2 3 2 4 4 5 6 6 18The columnar model: denormalized 1 4 2 2 4 6 19Rows Row ID 000 002 0A1 1E0 22A 4A2 20Columns Row ID 000 002 0A1 1E0 22A 4A2 21Columns Column family Row ID 000 002 0A1 1E0 22A 4A2 22Column families must be known in advance... Row ID 23Column families must be known in advance... A B 1 2 I Row ID 000 002 0A1 1E0 22A 4A2 24... but columns can be added on the fly A B C 1 2 I II III IV Row ID 000 002 0A1 1E0 22A 4A2 25Primary queries Get Put Scan Delete 26Get A B C 1 2 I II III IV Row ID 000 002 0A1 1E0 22A 4A2 27Put A B C 1 2 I II III IV Row ID 000 002 0A1 1E0 204 22A 4A2 28Scan A B C 1 2 I II III IV Row ID 000 002 0A1 1E0 204 22A 4A2 29Delete A B C 1 2 I II III IV Row ID 000 002 0A1 1E0 204 22A 4A2 30Some terminology: Keyvalue model Key Value 31Some terminology: Columnoriented stores Column1 Column2 32Some terminology: Columnoriented keyvalue stores Also: wide column stores, column familyoriented A B C 1 2 I II III IV Row ID 33Examples of Columnoriented keyvalue stores 's BigTable 34Warning on terminology NoSQL is very recent 35Warning on terminology Keyvalue storage Relational table Words have a "life" Block NoSQL File Object storage 36HBase: physical level 37Physical layer: regions A B C 1 2 I II III IV Row ID 38Physical layer: regions A B C 1 2 I II III IV Row ID 39Physical layer: regions A B C 1 2 I II III IV Row ID Minincl. Maxexcl. 40Physical layer: column families A B C 1 2 I II III IV Row ID Minincl. Stored together Maxexcl. 41Architecture "The same procedure as every year, James." 42HDFS... Namenode /dir/file1 /dir/file2 /file3 Datanode Datanode Datanode Datanode Datanode Datanode 43HBase HMaster Regionserver Regionserver Regionserver Regionserver Regionserver Regionserver 44HBase HMaster Replicas Regionserver Regionserver Regionserver Regionserver Regionserver Regionserver 45HMaster HMaster Regionserver Regionserver Regionserver Regionserver Regionserver Regionserver 46HMaster DDL operations Create table Delete table 47HMaster assigns regions to RegionServers Row ID 48HMaster assigns regions to RegionServers Row ID 49HMaster assigns regions to RegionServers Row ID 50HMaster splits regions Row ID 51HMaster handles Regionserver failovers 52Architecture HMaster Regionserver Regionserver Regionserver Regionserver Regionserver Regionserver 53Regionserver HMaster Regionserver Regionserver Regionserver Regionserver Regionserver Regionserver 54Physical storage A B C 1 2 I II III IV Row ID Minincl. Stored together 55Physical storage 1 2 I II III IV Row ID A B C Store Store Store Store Store Store 56Store = column family 1 2 Row ID 57Store = column family 1 2 Row ID Cell 58Store = column family 1 2 Row ID HFile HFile HFile HFile (On HDFS) 59HFile HFile 60HFile That's actually an HFile SSTable (flat sorted list of keyvalue pairs) 61HFile That's actually an HFile SSTable (flat sorted list of keyvalue pairs) (Stores KeyValue a cell) 62Versioning Latest Different versions of same cell 63HFile: KeyValue key value 64HFile: KeyValue (prefix code) keylength valuelength key value 65HFile: Key column row row column column key family timestamp length (key) family qualifier type length 66HFile: Key column row row column column key family timestamp length (key) family qualifier type length This one is for the versioning 67Blocks HFile 68Blocks "Quantity" of KeyValues HFile that get read at a time 69Blocks Default HFile 64kb 70Blocks: long keys or values size(KeyValue) block size No split (longer block) 71Levels of physical storage Table 72Levels of physical storage Table Region 73Levels of physical storage Table Region Store 74Levels of physical storage Table Region Store StoreFile 75Levels of physical storage Table Region Store StoreFile Block 76Levels of physical storage Table Region Store StoreFile Block KeyValue 77HBase: Writing new cells 78On Disk Table Region Store StoreFile Block KeyValue 79Store StoreFile StoreFile Block Block Block Block 80Store MemStore radub85 / 123RF Stock Photo StoreFile StoreFile Block Block Block Block 81In Memory Table Region Store MemStore Cell 82Writing new cells MemStore StoreFile Block Block 83Writing new cells MemStore StoreFile Block Block 84Writing new cells MemStore StoreFile Block Block 85Writing new cells MemStore StoreFile Block Block 86Writing new cells MemStore StoreFile Block Block 87Flush MemStore StoreFile StoreFile Block Block Block Block Sort 88Flush When: • Reaching max Memstore size in a store • Reaching overall max Memstore size • Reaching full WriteAhead Log 89Reading from a Store MemStore StoreFile StoreFile Block Block Block Block 90Reading from a Store MemStore StoreFile StoreFile Block Block Block Block 91Compaction StoreFile StoreFile StoreFile Block Block Block Block Block Block 92Compaction StoreFile StoreFile StoreFile Block Block Block Block Block Block 93Compaction StoreFile (Sort again) Block Block Block Block Block Block 94The META table: a table like any other 95The META table: stores region locations info: info: info: regioninfo server serverstartcode table + region start key + www.example.com:0 20161011T10:15:00 region id + replica id 96RegionInfo RegionInfo Table name Start key Region ID Replica ID encodedName End key Split Offline 97Architecture HMaster Regionserver Regionserver Regionserver Regionserver Regionserver Regionserver 98Architecture HMaster Create/delete/update table Regionserver Regionserver Regionserver Regionserver Regionserver Regionserver 99Architecture HMaster Region Regionserver Regionserver Regionserver Regionserver Regionserver Regionserver (hosting meta) 100Architecture HMaster Regionserver Region location(s) Regionserver Regionserver Regionserver Regionserver Regionserver Regionserver 101Architecture HMaster Query Regionserver Regionserver Regionserver Regionserver Regionserver Regionserver 102grazvydas / 123RF Stock Photo HBase: Underlying APIs 103HBase implementation (Packaged code) 104HBase APIs REST 105HBase: caching 106HBase Caches: reading faster LRU block cache Level 1 107HBase Caches: reading faster LRU bucket block cache cache Level 1 Level 2 108HBase Caches: reading faster LRU bucket block HDFS cache cache Level 1 Level 2 109LRU Block Cache On the Least Recently Heap Used 110LRU Block Cache: levels of priority Single access priority Multi access priority Inmemory access priority 111When to NOT use the cache Batch processing 112When to NOT use the cache Random access 113Hash function 114 Source: Jorge Stolfi (Wikipedia)Bloom filter Very quickly whether an element belongs to a (potentially false positives) set 115Bloom filter 0 0 0 0 0 0 0 0 0 0 0 0 116Bloom filter John Smith hash function 1 hash function 2 hash function k 0 1 1 0 0 0 0 1 0 0 0 0 117Bloom filter Mary Smith hash function 1 hash function 2 hash function k 0 1 1 0 0 1 1 1 0 0 0 0 118Bloom filter: not in set 0 1 1 0 0 1 1 1 0 0 0 0 hash function 1 hash function 2 hash function k Albert Einstein 119Bloom filter: in set (and correct) 0 1 1 0 0 1 1 1 0 0 0 0 hash function 1 hash function 2 hash function k Mary Smith 120Bloom filter: in set (false positive) 0 1 1 0 0 1 1 1 0 0 0 0 hash function 1 hash function 2 hash function k Louis de Broglie 121Data Locality 122HBase vs. HDFS 123With HDFS load balancer... 124HFile compaction brings back locality 125Best practices 126Number of rows Millions RDBMS Billions HBase 127Number of nodes 5 12810 Design Principles of Big Data 1291. Learn from the past 1302. Keep the design simple 1313. Modularize the architecture 1324. Homogeneity in the large 1335. Heterogeneity in the small 1346. Separate metadata from data 1357. Abstract logical model from its physical implementation 1368. Shard the data 1379. Replicate the data 13810. Buy lots of cheap hardware 139
Website URL
Comment