Question? Leave a message!




Big Data Data Models

Big Data Data Models
Ghislain Fourny Big Data 9. Data Models 1 pinkyone / 123RF Stock PhotoSyntax vs. Data Models Physical view a Syntax d e="f"/ cThis is btext/b./c /a 2Syntax vs. Data Models a d c Logical view e = f Data Model This is b . text Physical view a Syntax d e="f"/ cThis is btext/b./c /a 3Edge vs. Node labeling foo foo foobar bar bar 4XML Data models Information Set (Infoset) http://www.w3.org/TR/xmlinfoset/ Post SchemaValidation Infoset (PSVI) http://www.w3.org/TR/xmlschema111/ XQuery and XPath Data Model (XDM) http://www.w3.org/TR/xpathdatamodel/ 5HTML/XML Data model Document Object Model (DOM) http://www.w3.org/TR/RECDOMLevel1/ 6grigorybruev / 123RF Stock Photo XML Information Set 7Information Set xml version="1.0" encoding="UTF8" dc:metadata xmlns:dc="http://www.systems.ethz.ch" title xml:lang="en" year="2008" Systems Group/title publisherETH Zurich/publisher /dc:metadata 8Information Set 9The 11 XML Information Items Document Namespace Element Unexpanded Entity Reference Attribute DTD Processing Instruction Unparsed Entity Character Notation Comment 10 10The 11 XML Information Items Document Namespace Element Unexpanded Entity Reference Attribute DTD Processing Instruction Unparsed Entity Character Notation Comment 11 11Document Information Items doc Document Information Item children Element Information Item metadata document element Element Information Item metadata notations empty unparsed entities empty base URI file:///Users/bigdata/Documents/info.xml character encoding scheme UTF8 standalone no value version 1.0 12Element Information Items Element Information Item metadata metadata namespace name http://www.systems.ethz.ch local name metadata prefix dc children Element Information Items title, publisher attributes empty namespace attributes Attribute Information Item xmlns:dc inscope namespaces Namespace Information Items dcsystems, xmlns base URI file:///Users/bigdata/Documents/info.xml parent Document Information Item 13Attribute Information Items xmlns:dc Attribute Information Item xmlns:dc namespace name http://www.w3.org/2000/xmlns local name dc prefix xmlns normalized value http://www.systems.ethz.ch specified true attribute type no value references unknown owner element Element Information Item metadata 14Namespace Information Items Namespace Information Item dcsystems dcsystems prefix dc namespace name http://www.systems.ethz.ch Namespace Information Item xmlns xmlns prefix xml namespace name http://www.w3.org/XML/1998/namespace 15XML Infoset the tree doc metadata xmlns xmlns:dc dcsystems xmlns publisher xmlns title dcsystems dcsystems lang ETH year Zurich Systems Group 16PostSchemaValidation Infoset Infoset + Types PostSchemaValidation Infoset (PSVI) 17Weerapat Kiatdumrong / 123RF Stock Photo XPath and XQuery Data Model 18XDM: Sequences of Items ( , , , , , ) 19XDM: Sequence of one item = ( ) 20XDM: Sequences are flat (( , ), )=( , , ) 21XDM: Items Atomic Node 22XDM: Seven Kinds of XML Nodes § Document node § Element node § Attribute node § Text node § Comment node § Processing instruction node § Namespace node 23XDM: Seven Kinds of XML Nodes Infoset XDM 24XDM vs. Infoset Infoset XDM xs:untyped 25XDM: New Items in 3.0 and 3.1 lorem ipsum dolor sit amet Functions Maps Arrays 26XDM and Querying while = let any return if for then + where else order by every exit with Expression 27Types (In general) 28Types (General) Atomic Types vs. Structured Types 29Atomic Types Strings Numbers Booleans Dates and Times Time Intervals Binaries Null 30Lexical Space vs. Value Space 1 01 1 24.30 24.30 3.1415 3.1415 5 15e+0 31 Lexical space Value spaceSubtypes Supertype's value space Subtype's value space 32Structured Types Data Structure Examples Associative Arrays JSON Object, (a.k.a. maps) Protobuf Message, Set of XML Attributes Ordered Lists JSON Array, XML Element, Protobuf repeated field 33Cardinality How Common Common many sign adjective One required Zero or more Zero or one optional One or more + 34wklzzz / 123RF Stock Photo Protocol Buffers 35Messages message Person required string lastname = 1; repeated string firstname = 2; optional Title title = 3; optional Person boss = 4; 36Scalar types double, float int32, int64 and variants bool string bytes 37Enums enum Title MR = 1; MS = 2; MRS = 3; 38In C++ person.boss().firstname() 39Burak Cakmak / 123RF Stock Photo Validation 40Validation: The Pipeline Well Document Validation Formedness 41On the oXygen Cheat Sheet Validity Well Formedness 42Validation vs. Annotation Validation Annotation 43Validation 44DTD Validation 45Document Type Declaration xmlversion="1.0"encoding="UTF98" DOCTYPEa a de="f"/ cThisisbtext/b./c /a 46Document Type Declaration xmlversion="1.0"encoding="UTF98" DOCTYPEa a de="f"/ cThisisbtext/b./c /a 47Document Type Declaration xmlversion="1.0"encoding="UTF98" DOCTYPEa a de="f"/ cThisisbtext/b./c /a 48Element Type Declaration xmlversion="1.0"encoding="UTF98" DOCTYPEa a de="f"/ cThisisbtext/b./c /a 49Element Type Declaration xmlversion="1.0"encoding="UTF98" DOCTYPEa ELEMENTaEMPTY a/ 50 Empty ContentElement Type Declaration xmlversion="1.0"encoding="UTF98" DOCTYPEa ELEMENTa(PCDATA) a Thisistext. /a 51 Simple ContentElement Type Declaration xmlversion="1.0"encoding="UTF98" DOCTYPEa ELEMENTa(foo,bar) ELEMENTfooEMPTY ELEMENTbarEMPTY a foo/ bar/ /a 52 Complex ContentElement Type Declaration xmlversion="1.0"encoding="UTF98" DOCTYPEa ELEMENTa(foo+,bar,foobar) ELEMENTfooEMPTY ELEMENTbarEMPTY ELEMENTfoobarEMPTY a foo/ foo/ foo/ foobar/ 53 Complex Content /a Element Type Declaration xmlversion="1.0"encoding="UTF98" DOCTYPEa ELEMENTa(bar(foofoobar)+) ELEMENTfooEMPTY ELEMENTfoobarEMPTY a foo/ foobar/ foo/ foobar/ 54 /a Complex Content Element Type Declaration xmlversion="1.0"encoding="UTF98" DOCTYPEa ELEMENTa(PCDATAfoo) ELEMENTfooEMPTY a foo/Lorumfoo/Ipsumfoo/ /a 55 Mixed ContentElement Type Declaration xmlversion="1.0"encoding="UTF98" DOCTYPEa ELEMENTaANY ELEMENTfooEMPTY ELEMENTbarEMPTY a foo/ Lorem bar/ Ipsum bar/ bar/ 56 /a Mixed Content AttributeList Declaration xmlversion="1.0"encoding="UTF98" DOCTYPEa ELEMENTaEMPTY ATTLISTafooCDATAREQUIRED afoo="Thisisaquot;valuequot;"/a 57AttributeList Declaration xmlversion="1.0"encoding="UTF98" DOCTYPEa ELEMENTaEMPTY ATTLISTafooCDATAIMPLIED afoo="Thisisaquot;valuequot;"/a 58AttributeList Declaration xmlversion="1.0"encoding="UTF98" DOCTYPEa ELEMENTaEMPTY ATTLISTafooCDATA"bar" afoo="Thisisaquot;valuequot;"/a 59AttributeList Declaration xmlversion="1.0"encoding="UTF98" DOCTYPEa ELEMENTaEMPTY ATTLISTafooCDATA"bar" a/a 60AttributeList Declaration xmlversion="1.0"encoding="UTF98" DOCTYPEa ELEMENTaEMPTY ATTLISTafooCDATAFIXED"bar" afoo="bar"/a 61AttributeList Declaration xmlversion="1.0"encoding="UTF98" DOCTYPEa ELEMENTaEMPTY ATTLISTafooNMTOKENREQUIRED afoo="123AToken456"/a 62AttributeList Declaration xmlversion="1.0"encoding="UTF98" DOCTYPEa ELEMENTaEMPTY ATTLISTafooNMTOKENSREQUIRED afoo="123AToken456"/a 63AttributeList Declaration xmlversion="1.0"encoding="UTF98" DOCTYPEroot ELEMENTroot(foo+,bar,barlist) ELEMENTfooEMPTY ELEMENTbarEMPTY ELEMENTbarlistEMPTY ATTLISTfoomyidIDREQUIRED ATTLISTbarrefIDREFREQUIRED ATTLISTbarlistrefIDREFSREQUIRED root foomyid="foobar"/ foomyid="foobar2"/ barref="foobar"/ barlistref="foobarfoobar2"/ 64 /root DTD Example: External Subset xmlversion="1.0"encoding="UTF98" DOCTYPErootSYSTEM"schema.dtd" afoo="bar"/ 65Warning: DTDs and Namespaces ELEMENT(eth((date,(president,(Rektor)( ATTLIST(eth(xmlns(CDATA(FIXED( "http://www.ethz.ch"( ((((((((((((((xmlns:xmldb(CDATA(FIXED( "http://www.dbis.ethz.ch"( ((((((((((((((date(CDATA(IMPLIED( ((((((((((((((xmldb:date(CDATA(IMPLIED( ELEMENT(date((PCDATA)( ELEMENT(president((PCDATA)( ATTLIST(president(number(CDATA(IMPLIED( ELEMENT(Rektor((PCDATA)( ( 66Notations and Unparsed Entities xml version="1.0" encoding="UTF8" DOCTYPE foo ENTITY presentation SYSTEM "/Users/bigdata/desktop/presentation.pptx" NDATA pptx NOTATION pptx PUBLIC "powerpoint" ELEMENT foo EMPTY ATTLIST foo foo ENTITY REQUIRED foo foo="presentation"/foo 67XML Schema 68Empty Schema xmlversion="1.0"encoding="UTF98" xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" /xs:schema 69Simple Scenario xmlversion="1.0"encoding="UTF98" xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xs:elementname="foo"type="xs:string"/ /xs:schema schema.xsd xmlversion="1.0"encoding="UTF98" foo xmlns:xsi="http://www.w3.org/2001/XMLSchema9instance" xsi:noNamespaceSchemaLocation="schema.xsd" Thisistext. /foo file.xml 70Simple Scenario xmlversion="1.0"encoding="UTF98" xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xs:elementname="foo"type="xs:integer"/ /xs:schema schema.xsd xmlversion="1.0"encoding="UTF98" foo xmlns:xsi="http://www.w3.org/2001/XMLSchema9instance" xsi:noNamespaceSchemaLocation="schema.xsd" 142857 /foo file.xml 71Simple Types: Builtin Strings string anyURI QName Numbers decimal integer float double long int short byte positiveInteger nonNegativeInteger... unsignedLong unsignedInt... Booleans boolean 72Simple Types: Builtin Dates and Times dateTime time date gYearMonth gMonthDay gYear gMonth gDay dateTimeStamp Time Intervals duration yearMonthDuration dayTimeDuration Binaries hexBinary base64Binary Null 73Dates 20141202 20141202T10:15:00Z 01:15:0008:00 74Userdefined types Restriction Union Not atomic List Not atomic 75Restriction xmlversion="1.0"encoding="UTF98" xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xs:simpleTypename="myFixedLengthString" xs:restrictionbase="xs:string" xs:lengthvalue="3"/ /xs:restriction /xs:simpleType xs:elementname="foo"type="myFixedLengthString"/ /xs:schema schema.xsd xmlversion="1.0"encoding="UTF98" foo xmlns:xsi="http://www.w3.org/2001/XMLSchema9instance" xsi:noNamespaceSchemaLocation="schema.xsd"ZRH/foo file.xml 76Restriction xs:simpleType,name="myFixedLengthString", ,,xs:restriction,base="xs:string", ,,,,xs:length,value="3"/, ,,/xs:restriction, /xs:simpleType, , fooZRH/foo, , , 77List xs:simpleType,name="myList", ,,xs:list,itemType="xs:string"/, /xs:simpleType, , foofoo,bar,foobar/foo, , 78Union xs:simpleType,name="myUnion", ,,xs:union,memberTypes="xs:integer,xs:boolean"/, /xs:simpleType, , footrue/foo, , , , 79Complex Types foo/ Empty footext/foo Simple Content foo a/ Complex Content b/ /foo foo Mixed Content Texta/Textb/ /foo 80Complex content xs:complexTypename="complexContent" xs:sequence xs:elementname="twotofour"type="xs:string"minOccurs="2"maxOccurs="4"/ xs:elementname="zeroorone"type="xs:boolean"minOccurs="0"maxOccurs="1"/ /xs:sequence /xs:complexType foo twotofourfoobar/twotofour twotofourfoobar/twotofour twotofourfoobar/twotofour zerooronetrue/zeroorone /foo 81Complex content xs:complexTypename="complexContent" xs:sequence xs:elementname="twotofour"type="xs:string"minOccurs="2"maxOccurs="4"/ xs:elementname="zeroorone"type="xs:boolean"minOccurs="0"maxOccurs="1"/ /xs:sequence /xs:complexType foo twotofourfoobar/twotofour twotofourfoobar/twotofour twotofourfoobar/twotofour zerooronetrue/zeroorone /foo 82Empty content xs:complexTypename="emptyType" xs:sequence/ /xs:complexType foo/ 83Simple content xs:complexTypename="dateCountry" xs:simpleContent xs:extensionbase="xs:date" xs:attributename="country"type="xs:string"/ /xs:extension /xs:simpleContent /xs:complexType foocountry="Switzerland"2014D12D02/foo 84Mixed content xs:complexTypename="mixedContent"mixed="true" xs:sequence xs:elementname="b"type="xs:string"minOccurs="0"maxOccurs="unbounded"/ /xs:sequence /xs:complexType fooSometextandsomebbold/btext./foo 85Simple type on attributes xs:complexTypename="withAttribute" xs:sequence/ xs:attributename="country" type="xs:string" default="Switzerland"/ /xs:complexType foocountry="Switzerland"/ 86Named Types xs:complexTypename="empty" xs:sequence/ /xs:complexType xs:elementname="c"type="empty" /xs:element 87Anonymous Types xs:elementname="c" xs:complexType xs:sequence/ /xs:complexType /xs:element 88No namespaces xmlversion="1.0"encoding="UTF98" xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xs:elementname="foo"type="xs:string"/ /xs:schema xmlversion="1.0"encoding="UTF98" foo xmlns:xsi="http://www.w3.org/2001/XMLSchema9instance" xsi:noNamespaceSchemaLocation="schema.xsd" Thisistext. /foo 89With namespaces xmlversion="1.0"encoding="UTF98" xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.example.com/bigdata" xmlns:big="http://www.example.com/bigdata" xs:elementname="foo"type="xs:string"/ /xs:schema xmlversion="1.0"encoding="UTF98" big:foo xmlns:xsi="http://www.w3.org/2001/XMLSchema9instance" xsi:schemaLocation="http://www.example.com/bigdataschema.xsd" xmlns:big="http://www.example.com/bigdata" Thisistext. /big:foo 90Keys xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xs:elementname="root" xs:complexType .. /xs:complexType xs:keyname="fooDid" What must be unique xs:selectorxpath="foo"/ xs:fieldxpath="id"/ /xs:key /xs:element What makes it unique /xs:schema xmlversion="1.0"encoding="UTFD8" root xmlns:xsi="http://www.w3.org/2001/XMLSchemaDinstance" xsi:noNamespaceSchemaLocation="schema.xsd" fooid="foo"/ fooid="bar"/ fooid="foobar"/ /root 91 Bonus material: The Schema of Schemas xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.w3.org/2001/XMLSchema" xs:elementname="schema"id="schema" xs:complexType xs:complexContent .. /xs:complexContent /xs:complexType /xs:element xs:elementname="element"type="xs:topLevelElement"id="element"/ xs:elementname="simpleType"type="xs:topLevelSimpleType"id="simpleType"/ xs:elementname="complexType"type="xs:topLevelComplexType"id="complexType"/ xs:complexTypename="element"abstract="true" xs:complexContent .. /xs:complexContent /xs:complexType /xs:schema 92
sharer
Presentations
Free
Document Information
Category:
Presentations
User Name:
Dr.GordenMorse
User Type:
Professional
Country:
France
Uploaded Date:
22-07-2017