Question? Leave a message!




Big Data Graph Databases

Big Data Graph Databases
Ghislain Fourny Big Data 12. Graph Databases 1 pinkyone / 123RF Stock Photo tovovan / 123RF Stock PhotoWhy graph databases 2The NoSQL paradigms Triple stores foo bar foobar Keyvalue stores Column stores Document stores 3Relational databases... 4Relational databases... Entity Entity Relationship 5Relational databases... have Entity Entity expensive Relationship joins 6Relational databases... ... are not that efficient at relationships 7We already know how to partly solve this though 3NF 0NF 8We already know how to partly solve this though 3NF ... but it has its limits, 0NF too 9Traversals... 10Traversals... ... translate into multiple joins 11Reverse traversals... 12Reverse traversals... ... need even more indices 13Traversals... what if links would be more ... translate into "direct" multiple joins 14Indexfree adjacency 15Graphs 16Graphs: ingredients Edges Nodes 17Graphs: nodes 18Graphs: edges 19Graphs: directed graph 20Graphs: undirected graph 21Graph representation: adjacency list A Node Edges A B A, C C A C B 22Graph representation: adjacency matrix A B C A A 0 1 1 B 0 0 0 C 0 1 0 C B 23Graph representation: incidence matrix Edges 1 2 3 A A 1 1 0 B 1 0 1 C 0 1 1 C B 24 NodesLabeled property graphs: ingredients Properties Labels Nodes Edges 25Property graph 26Properties Name: Einstein First name: Albert Profession: Physicist 27Labeled graph 28Labels on nodes 29Names on relationships B B A B A A A A A A 30Labeled property graph 31Node with properties and label Name: Einstein First name: Albert Profession: Physicist Person In Switzerland 32Graph database 33Graph databases: families Property Graph Triple stores (RDF) 34Graph databases: native or not Source Target Name Alice Bob knows Eve Bob eavesdrop Eve Alice eavesdrop Native Graph Database Graph stored as RDBMS, document store, ... 35RDF 36Triplebased graph 37RDF: one triple Is located in ETH Zürich Switzerland Subject Property Object 38IRI http://www.ethz.ch/school http://www.example.com/Switzerland 39Literal Foo 20121216 3.1415926535 includes XML Schema types 40Blank Node Is built on ground ETH Zürich Is subset of Switzerland 41What can appear where Subject Property Object IRI Literal Blank node 42Generalized Graphs Subject Property Object IRI Literal Blank node 43Syntax 44RDF Formats § RDF/XML § Turtle § JSONLD § RDFa § NTriples 45RDF/XML rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22rdfsyntaxns" xmlns:geo="http://www.example.com/geography" rdf:Description rdf:about="http://www.ethz.ch/self" geo:isLocatedIn rdf:resource="http://www.example.com/Switzerland"/ geo:population8000000/geo:population /rdf:Description /rdf:RDF 46RDF/XML: Subject rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22rdfsyntaxns" xmlns:geo="http://www.example.com/geography" rdf:Description rdf:about="http://www.ethz.ch/self" geo:isLocatedIn rdf:resource="http://www.example.com/Switzerland"/ geo:population8000000/geo:population /rdf:Description /rdf:RDF 47RDF/XML: Property rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22rdfsyntaxns" xmlns:geo="http://www.example.com/geography" rdf:Description rdf:about="http://www.ethz.ch/self" geo:isLocatedIn rdf:resource="http://www.example.com/Switzerland"/ geo:population8000000/geo:population /rdf:Description /rdf:RDF 48RDF/XML: Object rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22rdfsyntaxns" xmlns:geo="http://www.example.com/geography" rdf:Description rdf:about="http://www.ethz.ch/self" geo:isLocatedIn rdf:resource="http://www.example.com/Switzerland"/ geo:population8000000/geo:population /rdf:Description /rdf:RDF 49RDF/XML rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22rdfsyntaxns" xmlns:geo="http://www.example.com/geography" rdf:Description rdf:about="http://www.ethz.ch/self" geo:isLocatedIn rdf:resource="http://www.example.com/Switzerland"/ geo:population8000000/geo:population /rdf:Description /rdf:RDF ht tp://www.example.com/geographyisLocatedIn 50RDF/XML rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22rdfsyntaxns" xmlns:geo="http://www.example.com/geography" rdf:Description rdf:about="http://www.ethz.ch/self" geo:isLocatedIn rdf:resource="http://www.example.com/Switzerland"/ rdf:type rdf:resource="http://www.example.com/geographyschool"/ geo:population8000000/geo:population /rdf:Description /rdf:RDF 51RDF/XML rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22rdfsyntaxns" xmlns:geo="http://www.example.com/geography" rdf:Description rdf:about="http://www.ethz.ch/self" geo:isLocatedIn rdf:resource="http://www.example.com/Switzerland"/ rdf:type rdf:resource="http://www.example.com/geographyschool"/ geo:population8000000/geo:population /rdf:Description /rdf:RDF 52RDF/XML rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22rdfsyntaxns" xmlns:geo="http://www.example.com/geography" geo:school rdf:about="http://www.ethz.ch/self" geo:isLocatedIn rdf:resource="http://www.example.com/Switzerland"/ geo:population8000000/geo:population /geo:school /rdf:RDF 53JSONLD "context": "rdf": "http://www.w3.org/1999/02/22rdfsyntaxns", "geo": "http://www.example.com/geography" , "id" : "http://www.ethz.ch/self", "rdf:type": "geo:school", "geo:isLocatedIn": "http://www.example.com/Switzerland", "geo:population" : 8000000 54Turtle prefix geo: http://www.example.com/geography . prefix countries: http://www.example.com/ . prefix eth: http://www.ethz.ch/ . eth:self geo:isLocated countries:Switzerland . eth:self geo:population 8000000 . 55Turtle prefix geo: http://www.example.com/geography . prefix countries: http://www.example.com/ . prefix eth: http://www.ethz.ch/ . eth:self geo:isLocated countries:Switzerland ; eth:self geo:population 8000000 . 56Turtle prefix geo: http://www.example.com/geography . prefix countries: http://www.example.com/ . prefix eth: http://www.ethz.ch/ . eth:self geo:isLocated countries:Switzerland, eth:self geo:isLocated countries:Europe ; eth:self geo:population 8000000 . 57Querying 58Querying paradigms Classical Query declarative by querying example 59Two languages Is located in Switzerland ETH Zürich Is located in Switzerland ETH Zürich Is located in Switzerland ETH Zürich Is located in Switzerland ETH Zürich Is located in Switzerland ETH Zürich Is located in Switzerland ETH Zürich Is located in Switzerland ETH Zürich Is located in Switzerland ETH Zürich Is located in Switzerland ETH Zürich Cypher SPARQL 60Two languages Is located in Switzerland ETH Zürich Is located in Switzerland ETH Zürich Is located in Switzerland ETH Zürich Is located in Switzerland ETH Zürich Is located in Switzerland ETH Zürich Is located in Switzerland ETH Zürich Is located in Switzerland ETH Zürich Is located in Switzerland ETH Zürich Is located in Switzerland ETH Zürich Cypher SPARQL 61Querying labeled property graphs by example 62Querying labeled property graphs by example 63Querying labeled property graphs by example 64Querying labeled property graphs by example 65Querying labeled property graphs by example 66Querying labeled property graphs by example A B B A B B A A B B A B B B 67Cypher pattern A B B alpha B B A (alpha):A(beta):B(gamma) A B beta gamma A B B B 68Cypher pattern: anchoring a label A B B alpha (alpha) B :A(beta:yellow) B A :B(gamma) A B yellow beta gamma A B B B 69Cypher pattern: filtering a property name: Einstein A B B alpha B (alpha name: 'Einstein' ) B A :A(beta) A B :B(gamma) beta gamma A B B B 70Cypher pattern: anchoring and filtering A name: ETH B B alpha B B (alpha) A :A(beta) A B blue beta :B(gamma: blue name: 'ETH') gamma A B B B 71Cypher pattern: right to left delta A B B alpha B (alpha) B A :A(beta) A :B(gamma) B beta :B(delta) gamma A B B B 72Cypher pattern: variable repetition delta A B B alpha B (alpha) :A(beta) B A :B(gamma) A B :B(delta) beta :B(alpha) gamma A B B B 73Cypher pattern: variable length path A B B alpha B (alpha) 1..4(beta) B A A B A B B B beta 74Cypher pattern: MATCH clause MATCH (alpha name: 'Einstein' ):A(beta):B(gamma) 75Cypher pattern: MATCH clause MATCH (alpha name: 'Einstein' ):A(beta):B(gamma) RETURN gamma 76Cypher pattern: WHERE clause MATCH (alpha name: 'Einstein' ):A(beta):B(gamma) RETURN gamma MATCH (alpha):A(beta):B(gamma) WHERE alpha.name = 'Einstein' RETURN gamma 77Cypher pattern: CREATE clause CREATE (einstein:Scientist name: 'Einstein', first: 'Albert' ), (eth:University name: 'ETH Zurich' ), (einstein):VISITED(eth) 78Other clauses START MERGE WITH FOREACH DELETE MERGE SET UNION 79Two languages Is located in Switzerland ETH Zürich Is located in Switzerland ETH Zürich Is located in Switzerland ETH Zürich Is located in Switzerland ETH Zürich Is located in Switzerland ETH Zürich Is located in Switzerland ETH Zürich Is located in Switzerland ETH Zürich Is located in Switzerland ETH Zürich Is located in Switzerland ETH Zürich Cypher SPARQL 80Querying RDF: SPARQL PREFIX geo: http://www.example.com/geography PREFIX countries: http://www.example.com/ SELECT s WHERE s geo:isLocatedIn countries:Switzerland 81SPARQL PREFIX geo: http://www.example.com/geography PREFIX countries: http://www.example.com/ SELECT s WHERE s geo:isLocatedIn countries:Switzerland 82SPARQL PREFIX geo: http://www.example.com/geography PREFIX countries: http://www.example.com/ SELECT s WHERE s geo:isLocatedIn countries:Switzerland 83SPARQL PREFIX geo: http://www.example.com/geography PREFIX countries: http://www.example.com/ SELECT s WHERE s geo:isLocatedIn countries:Switzerland . s :deliversDiplom :bachelor . 84SPARQL PREFIX geo: http://www.example.com/geography PREFIX countries: http://www.example.com/ SELECT s WHERE s geo:isLocatedIn c . c geo:isInContinent geo:America . 85SPARQL PREFIX geo: http://www.example.com/geography PREFIX countries: http://www.example.com/ SELECT s WHERE s geo:isLocatedIn countries:Switzerland. s :deliversDiplom :bachelor LIMIT 10 86SPARQL PREFIX geo: http://www.example.com/geography PREFIX countries: http://www.example.com/ SELECT s name WHERE s geo:isLocatedIn countries:Switzerland . s :deliversDiplom :bachelor . s :hasName name . ORDER BY name LIMIT 10 87Architecture (Neo4j) 88No sharding 89Document stores don't like joins Graph databases don't like shards 90Why Fast traversal 91Masterslave architecture Master Slave Slave Slave Slave Slave Slave 92Data replication Master Slave Slave Slave 93Data replication Master Synchronization Slave Slave Slave 94Data replication (full) Master Synchronization Slave Slave Slave 95Read scaleup Slave 96Writes or Write to the master Write to a slave 97Caching and pages Indexfree adjacency Fixedsize records 98Label storage Person Jedi Geek Person Jedi Geek 99Properties storage name: Einstein firstname: Albert name Einstein firstname Albert 100Relationship storage A A 101Relationship storage A A 102Relationship storage A A 103Relationship storage A 104Relationship storage A 105Relationship storage A 106Relationship storage A 107Relationship storage A 108Relationship storage A 109Relationship storage A 110Relationship storage A 111Relationship storage A 112Relationship storage A A 113Relationship storage A A 114Relationship storage A A 115Relationship storage A B A Source sprevious snext tnext tprevious Target B 116Typical sizes Node: 9 bytes Relationship: 33 bytes Relationship name: 5 bytes Property: 33 bytes 117Semantics 118RDF has no semantics Chuchichäschtli Schwiiz Schoggi 119RDF Schema Class Property 120Classes rdfs:Resource rdfs:Class rdf:Property rdfs:Literal rdfs:DataType rdf:HTML rdf:XMLLiteral 121Properties rdf:type On any resources rdfs:label rdfs:comment rdfs:range On properties rdfs:domain rdfs:subPropertyOf rdfs:subClassOf On classes 122Selfawareness rdf:type rdfs:Resource rdfs:Resource 123Selfawareness rdf:subClassOf rdfs:Class rdfs:Resource 124Selfawareness rdf:range rdf:type rdfs:Class 125Selfawareness rdf:type rdf:subClassOf rdfs:Property 126Simple Entailment (RDF semantics) E I 127 I(E)=trueOWL 128OWL (In principle) standalone (Much) More powerful than RDF(S) 129OWL xml/ 130OWL and description logic / AI 131Entailment (and Syllogisms) Major All men are mortal. Minor Socrates is a man. Conclusion Therefore, Socrates is mortal. 132Trees... 133... and Graphs 134