Question? Leave a message!




Big Data Object storage

Big Data Object storage
Ghislain Fourny Big Data 3. Object storage satina / 123RF Stock PhotoWhere are we Last lecture: Reminder on relational databases 2Where are we Relational databases fit on a single machine 3Where are we Petabytes do not fit on a single machine 4The lecture journey Monolithic Modular Relational "Big Data" Database Technology Stack 5Not reinventing the wheel 99 of what we learned with 46 years of SQL and relational can be reused 6Important takeaway points Relational algebra: Selection Projection Grouping Sorting Joining 7Important takeaway points Language SQL Declarative languages Functional languages Optimizations Query plans 8Important takeaway points What a table is made of Table Columns Primary key Row 9Important takeaway points Denormalization 1NF vs. nesting 2NF/3NF vs. prejoin 10Important takeaway points Transactions Atomicity Consistency Isolation Durability NEW Atomic Consistency NEW Availability NEW Partition tolerance 11The stack User interfaces Querying Data stores Indexing Processing Validation Data models Syntax Encoding Storage 12The stack: Storage Local filesystem NFS GFS HDFS S3 Azure Blob Storage Storage 13The stack: Encoding ASCII ISO88591 UTF8 BSON Encoding 14The stack: Syntax Text CSV XML JSON RDF/XML Syntax Turtle XBRL 15The stack: Data models Tables: Relational model Trees: XML Infoset, XDM Graphs: RDF Cubes: OLAP Data models 16The stack: Validation XML Schema JSON Schema Relational schemas Validation XBRL taxonomies 17The stack: Processing Twophase processing: MapReduce Processing DAGdriven processing: Tez, Spark Elastic computing: EC2 18The stack: Indexing Keyvalue stores Indexing Hash indices BTrees Geographical indices Spatial indices 19The stack: Data stores RDBMS Data stores (Oracle/IBM/Microsoft) MongoDB CouchBase ElasticSearch Hive HBase MarkLogic Cassandra 20 ...The stack: Querying Querying SQL XQuery MDX SPARQL REST APIs 21The stack: User interfaces (UI) User interfaces Excel Access Tableau Qlikview BI tools 22boscorelli / 123RF Stock Photo Storage: from a single machine to a clusterStorage Database Data needs to be stored somewhere Storage 24Let's start from the 70s... Vitaly Korovin / 123RF Stock Photo 25Files File storage organized in a Lorem Ipsum Dolor sit amet hierarchy Consectetur Adipiscing Elit. In Imperdiet Ipsum ante 26What is a file made of Content + Metadata File 27File Metadata ls l total 48 drwxrxrx 5 gfourny staff 170 Jul 29 08:11 2009 drwxrxrx 16 gfourny staff 544 Aug 19 14:02 Exercises drwxrxrx 11 gfourny staff 374 Aug 19 14:02 Learning Objectives drwxrxrx 18 gfourny staff 612 Aug 19 14:52 Lectures rwrr 1 gfourny staff 1788 Aug 19 14:04 README.md Fixed "schema" 28Files File Content: Block storage content stored in 1 2 3 blocks 4 5 6 7 8 29Local storage Local Machine LAN (NAS) WAN LAN = localarea network NAS = networkattached storage 30 WAN = widearea networkScaling Issues 1,000 files 1,000,000 files 1,000,000,000 files Aleksandr Elesin / 123RF Stock Photo 31Better performance: Explicit Block Storage 1 2 3 4 5 6 Application (Control over locality of blocks) 7 8 32So how do we make this scale 1. We throw away the hierarchy 33So how do we make this scale 2. We make metadata flexible 34So how do we make this scale ID 3. We make the data model trivial 35So how do we make this scale 4. We use commodity hardware 36... and we get Object Storage "Blackbox" objects Flat and global keyvalue model Flexible metadata Commodity hardware 37boscorelli / 123RF Stock Photo ScaleOne machine's not good enough. How do we scale 39Approach 1: scaling up 40Approach 1: scaling up 41Approach 2: scaling out 42Approach 2: scaling out 43Approach 2: scaling out 44Approach 2: scaling out 45Hardware price comparison Scale up Scale out 46Approach 3: be smart 47 Viktorija Reuta / 123RF Stock PhotoApproach 3: be smart “You can have a second computer once you’ve shown you know how to use the first one.” Paul Barham 48 Viktorija Reuta / 123RF Stock PhotoIn this lecture Approach 2 Scale out 49boscorelli / 123RF Stock Photo Data centersNumbers computing 1,000100,000 machines in a data center 1100 cores per server 51Numbers storage 110 TB local storage per server 101000 GB of RAM per server 52Numbers network 1 GB/s network bandwith for a server 53Racks Height in "rack units" (e.g., 42 RU) 54Racks Modular: servers storage routers ... 55Rack servers 14 RU Lenovo ThinkServer RD430 Rack Server 56Amazon S3S3 Model 58S3 Model 59S3 Model Bucket ID 60S3 Model Bucket ID 61S3 Model Bucket ID + Object ID 62Scalability Max. 5 TB 63Scalability 100/account (more upon request) 64Durability 99.999999999 11 Loss of 1 in 10 objects 65Availability 99.99 Down 1h / year 66API REST 67REST API GET PUT DELETE POST HEAD OPTIONS TRACE CONNECT Method Resource (URI) + 68PUT (Idempotent) 69GET (Sideeffect free) 70DELETE 71POST Most generic: side effects 72S3 Resources: Buckets http://bucket.s3.amazonaws.com http://bucket.s3region.amazonaws.com 73S3 Resources: Objects http://bucket.s3.amazonaws.com/objectname http://bucket.s3region.amazonaws.com/objectname 74S3 REST API PUT Bucket PUT Object DELETE Bucket DELETE Object GET Bucket GET Object 75Example GET /myimage.jpg HTTP/1.1 Host: bucket.s3.amazonaws.com Date: Wed, 27 Sep 2016 09:30:00 GMT Authorization: authorization string 76Folders: is S3 a file system food fruit orange strawberry Logical vegetables tomato (Browsing) turnip lettuce /food/fruits/orange /food/fruits/strawberry Physical /food/vegetables/tomato (Object keys) /food/vegetables/turnip /food/vegetables/lettuce 77Static website hosting http://jsoniq.org.s3websiteuseast1.amazonaws.com/ 78More on StorageReplication Fault tolerance 80Faults Local (node failure) versus Regional (natural catastrophe) 81Regions Songkran Khamlue / 123RF Stock Photo 82Regions Songkran Khamlue / 123RF Stock Photo 83Storage Class Standard High availability Standard – Less availability Infrequent Access Cheaper storage Cost for retrieving Amazon Glacier Lowcost Hours to GET 84Azure Blob StorageOverall comparison Azure vs. S3 S3 Azure Object Bucket + Account + Partition + ID Object Object Object Blackbox Blocks or pages API Limit 5 TB 195 GB (blocks) 1 TB (pages) 86Azure Architecture: Storage Stamp Account name Virtual IP address FrontEnds Partition name Partition Layer Stream Layer Object name 87Azure Architecture: One storage stamp 1020 racks 18 storage nodes/rack (30PB) 88Azure Architecture: Keep some buffer kept below 70/80 storage capacity 89Storage Replication Intrastamp replication (synchronous) FrontEnds Partition Layer Stream Layer 90Storage Replication Interstamp replication (asynchronous) FrontEnds FrontEnds Partition Layer Partition Layer Stream Layer Stream Layer 91Location Services Account name mapped to one Virtual IP Location Services DNS (primary stamp) Virtual IP (primary) Virtual IP FrontEnds FrontEnds Partition Layer Partition Layer Stream Layer Stream Layer 92Location Services Location Services DNS North America Europe Asia 93Location Services DNS FrontEnds FrontEnds Partition Layer Partition Layer Stream Layer Stream Layer 94Sergey Nivens / 123RF Stock Photo Keyvalue storage (sneak peek)Can we consider object storage a database 96Issue: latency 19 ms 100300ms vs. Typical Database S3 97Keyvalue stores ID 1. Similar data model to object storage 98Keyvalue stores 400KB 5TB (DynamoDB) 2. Smaller objects 99Keyvalue stores 3. No Metadata 100Keyvalue stores: data model 101Keyvalue stores: data shape Key Value Lots of rows 102Keyvalue stores Which is the Key Value most efficient data structure for querying this 103Distributed Hash Tables: Chord hashed nbit ID (DynamoDB: 128 bits) 104Each Node picks a 128bit hash 105Nodes are organized in a ring 111... 000... n mod 2 106ID stored at next node "up" 111... 000... n mod 2 107Take away messages: how to scale out § Simplify the model § Buy cheap hardware § Remove schemas 108
Website URL
Comment