Ghislain Fourny Big Data 3. Object storage satina / 123RF Stock PhotoWhere are we? Last lecture: Reminder on relational databases 2Where are we? Relational databases fit on a single machine 3Where are we? Petabytes do not fit on a single machine 4The lecture journey Monolithic Modular Relational "Big Data" Database Technology Stack 5Not reinventing the wheel 99% of what we learned with 46 years of SQL and relational can be reused 6Important take-away points Relational algebra: Selection Projection Grouping Sorting Joining 7Important take-away points Language SQL Declarative languages Functional languages Optimizations Query plans 8Important take-away points What a table is made of Table Columns Primary key Row 9Important take-away points Denormalization 1NF vs. nesting 2NF/3NF vs. pre-join 10Important take-away points Transactions Atomicity Consistency Isolation Durability NEW Atomic Consistency NEW Availability NEW Partition tolerance 11The stack User interfaces Querying Data stores Indexing Processing Validation Data models Syntax Encoding Storage 12The stack: Storage Local filesystem NFS GFS HDFS S3 Azure Blob Storage Storage 13The stack: Encoding ASCII ISO-8859-1 UTF-8 BSON Encoding 14The stack: Syntax Text CSV XML JSON RDF/XML Syntax Turtle XBRL 15The stack: Data models Tables: Relational model Trees: XML Infoset, XDM Graphs: RDF Cubes: OLAP Data models 16The stack: Validation XML Schema JSON Schema Relational schemas Validation XBRL taxonomies 17The stack: Processing Two-phase processing: MapReduce Processing DAG-driven processing: Tez, Spark Elastic computing: EC2 18The stack: Indexing Key-value stores Indexing Hash indices B-Trees Geographical indices Spatial indices 19The stack: Data stores RDBMS Data stores (Oracle/IBM/Microsoft) MongoDB CouchBase ElasticSearch Hive HBase MarkLogic Cassandra 20 ...