Question? Leave a message!




Big Data Syntax

Big Data Syntax
Ghislain Fourny Big Data 8. SyntaxIntroduction 2The stack: Syntax Text CSV XML JSON RDF/XML Syntax Turtle XBRL 3Data Shapes Lorem ipsum dolor sit amet, consectetur adipiscing elit. Etiam vel erat nec dui aliquet vulputate sed quis nulla. Donec eget ultricies magna, eu dignissim elit. Nullam sed urna nec nisl rhoncus ullamcorper placerat et enim. Integer varius ornare libero quis consequat. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Aenean eu efficitur orci. Aenean ac posuere tellus. Ut id commodo turpis. Praesent nec libero metus. Praesent at turpis placerat, congue ipsum eget, scelerisque justo. Ut volutpat, massa ac lacinia cursus, nisl dui volutpat arcu, quis interdum sapien turpis in tellus. Suspendisse potenti. Vestibulum pharetra justo massa, ac venenatis mi condimentum nec. Proin viverra tortor non orci suscipit rutrum. Phasellus sit amet euismod diam. Nullam convallis nunc sit amet diam suscipit dapibus. Integer porta hendrerit nunc. Quisque pharetra congue porta. Suspendisse vestibulum sed mi in euismod. Etiam a purus suscipit, accumsan nibh vel, posuere ipsum. Nulla nec tempor nibh, id venenatis lectus. Duis lobortis id urna eget tincidunt. 4Trees... 5... and Graphs 62000s: The NoSQL Era Triple stores foo bar foobar Keyvalue stores Column stores Document stores 72000s: The NoSQL Era Triple stores foo bar foobar Keyvalue stores Column stores Document stores 8gajus / 123RF Stock Photo SemiStructured Documents 9SemiStructured Documents Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. Structured Unstructured 10 10SemiStructured Documents a Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna d c aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut e:f aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in This is b . voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui text officia deserunt mollit anim id est laborum. Structured Unstructured Semistructured 11 11Standards 12 12For whom 13 13Syntax 14Wellformedness 15 15Wellformedness One syntax = one language 16 16Wellformedness One syntax = one language D∈L 17 17Wellformedness One syntax = one language D is wellformed D∈L D is not wellformed 18 18XML xmlversion="1.0" countrycode="CH" nameSwitzerland/name population8014000/population currencycode="CHF"SwissFranc/currency cities cityZurich/city cityGeneva/city cityBernIItheFederalCityII/city /cities description Weproducebvery/bgoodchocolate. /description /country 19 19JSON "code": "CH", "name": "Switzerland", "population": 8014000, "currency": "name": "Swiss Franc", "code": "CHF" , "confederation": true, "president" : "Johann SchneiderAmman", "capital": null, "cities": "Zurich", "Geneva", "Bern" , "description": "We produce very good chocolate." 20 20HTML DOCTYPEhtml html head titleCountry/title /head body h1class="Title"Switzerland/h1 divPopulation:8014000br Currency:SwissFranc(CHF)/div h2Cities/h2 ul liZurich/li liGeneva/li liBernnbsp;SStheFederalCitySS/li /ul /body /html 21 21Robert Eastman / 123 Stock Photo XML 22XML: Element foomore XML/foo 23XML: Element foomore XML/foo bar/ = bar/bar 24XML: Element foomore XML/foo opening tag closing tag bar/ = bar/bar empty tag 25XML: Attribute a attr="value"/ 26 26XML: Text aThis is text/a 27 27XML: Comment This is a comment 28 28XML: Processing Instruction myapp do whatever xml version="1.0" Charles Goldfarb "In a perfect world, processing instructions would not be necessary. However, as you might have noticed, 29 the world is not perfect." 29(XML: Text declaration) xml version="1.0" encoding="UTF8" 30 30What Appears Where TopLevel Between Inside Opening Element Tags Element Tag Elements once Attributes Text Comments Processing Instructions 31 31XML: Wellformedness a foo="bar" foo="bar2"/ 32 32XML: Wellformedness a foo="bar" foo="bar2"/ a foo="bar" bar="foo"/ 33 33XML: Wellformedness ab/a/b 34 34XML: Wellformedness ab/a/b ab/b/a 35 35XML: Wellformedness a1 2/a 36 36XML: Wellformedness a1 2/a a1 lt; 2/a 37 37XML: Entity References xml version "1.0" document 2 lt; 3 /document 38XML: Entity References lt; 39XML: Entity References gt; 40XML: Entity References apos; ' 41XML: Entity References quot; " 42XML: Entity References amp; 43XML: Double escaping xml version "1.0" document 2 amp;lt; 3 /document 44XML: Character References (hex) xml version "1.0" document Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do x03C0; π eiusmod tempor incididunt ut labore et dolore magna aliqua. /document 45XML: Character References (dec) xml version "1.0" document Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do 960; π eiusmod tempor incididunt ut labore et dolore magna aliqua. /document 46XML: Wellformedness a attr="a "quote""/ 47 47XML: Wellformedness a attr="a "quote""/ a attr="a quot;quotequot;"/ 48 48XML: Wellformedness my comment 49 49XML: CDATA sections xml version "1.0" document Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do CDATA "' eiusmod tempor incididunt ut labore et dolore magna aliqua. /document 50XML: CDATA sections wellformedness xml version "1.0" document Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do CDATA "' eiusmod tempor incididunt ut labore et dolore magna aliqua. /document 51XML: Document Type xml version="1.0" DOCTYPE document document Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. /document 52 52XML: Document Type xml version "1.0" DOCTYPE document (internal subset) document Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. /document 53 53XML: Entity Declarations xml version "1.0" DOCTYPE document ENTITY myownentity "foobar" document Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do myownentity; foobar eiusmod tempor incididunt ut labore et dolore magna aliqua. /document 54 54XML Names 1234/ ab/ xml/ 55 55XML Names 1234/ ab/ xml/ foo1234/ bar/ 56 56Characters allowed ":" "" AZ "." "" Anywhere 09 az xB7 xC0xD6 x0300x036F xD8xF6 x203Fx2040 xF8x2FF x370x37D Not starting x37Fx1FFF x200Cx200D x2070x218F x2C00x2FEF x3001xD7FF xF900xFDCF xFDF0xFFFD x10000xEFFFF 57 57XML: Namespaces XML with Namespaces XML 58 58XML Names Namespace http://nosql.example.com + Local name entity Expanded name http://nosql.example.comentity 59 59XML Names Namespace http://nosql.example.com + Local name entity Expanded name http://nosql.example.comentity 60 60Life without QNames (Clark Notation) http://www.w3.org/1998/Math/MathMLmath http://www.w3.org/1998/Math/MathMLapply http://www.w3.org/1998/Math/MathMLeq/ http://www.w3.org/1998/Math/MathMLci x /http://www.w3.org/1998/Math/MathMLci http://www.w3.org/1998/Math/MathMLapply http://www.w3.org/1998/Math/MathMLroot/ http://www.w3.org/1998/Math/MathMLcn 2 /http://www.w3.org/1998/Math/MathMLcn /http://www.w3.org/1998/Math/MathMLapply /http://www.w3.org/1998/Math/MathMLapply /http://www.w3.org/1998/Math/MathMLmath 61 61 Life with Prefixes and QNames m:math xmlns:m="http://www.w3.org/1998/Math/MathML" m:apply m:eq/ m:ci x /m:ci m:apply m:root/ m:cn 2 /m:cn /m:apply /m:apply /m:math 62 62 Life with Prefixes and QNames m:math xmlns:m="http://www.w3.org/1998/Math/MathML" m:apply m:eq/ m:ci x /m:ci m:apply m:root/ The namespace is represented by a prefix. m:cn 2 /m:cn /m:apply /m:apply /m:math 63 63 Life with Prefixes and QNames m:math xmlns:m="http://www.w3.org/1998/Math/MathML" m:apply m:eq/ m:ci x /m:ci m:apply m:root/ Prefix m is bound to a namespace using an m:cn xmlns:m attribute. 2 /m:cn /m:apply /m:apply /m:math 64 64 Life with Prefixes and QNames m:math xmlns:m="http://www.w3.org/1998/Math/MathML" m:apply m:eq/ m:ci x /m:ci m:apply Prefix: m m:root/ QName Namespace: http://www.w3.org/1998/Math/MathML m:cn Local name: apply 2 /m:cn /m:apply /m:apply /m:math 65 65 Default Namespace math xmlns ="http://www.w3.org/1998/Math/MathML" apply eq/ ci x /ci apply root/ cn 2 /cn /apply /apply /math 66 66 Default Namespace math xmlns ="http://www.w3.org/1998/Math/MathML" apply eq/ ci x /ci apply root/ cn No Prefix: Default Namespace 2 /cn /apply /apply /math 67 67 Binding scopes m:math xmlns:m="http://www.w3.org/1998/Math/MathML" m:apply m:eq/ m:ci x /m:ci m:apply m:root/ m:cn 2 /m:cn /m:apply /m:apply /m:math Lifetime of the prefix binding 68 68 (Not) WellFormed XML xml version="1.0" encoding="utf16" movies movie id=”56225” titleLove Story/title title/title year1980/year director name='Coppola'/director comment text=”Five start” text=”Average”/ xmlIntroduce XML content/xml newcomment text="An important text"Oscar/newcomment comment lang=decopy; 1980 Warner Bros./comment Famous movie of the 80s /Movie /movies 69 69(Not) WellFormed XML xml version="1.0" encoding="utf16" movies movie id=”56225” titleLove Story/title title/title year1980/year director name='Coppola'/director comment text=”Five start” text=”Average”/ xmlIntroduce XML content/xml newcomment text="An important text"Oscar/newcomment comment lang=decopy; 1980 Warner Bros./comment Famous movie of the 80s /Movie /movies 70 70WellFormedness: How To Tell An editor (oXygen, ...) will tell you. 71 71Well Formed XML xml version="1.0" encoding="utf16" DOCTYPE movies ENTITY copy "169;" movies Movie id="56225" titleLove Story/title title/title year1980/year director name='Coppola'/director comment text="Five start"/ comment text="Average"/ newcomment text="An lt;importantgt; text"Oscar/newcomment comment lang="de"copy; 1980 Warner Bros./comment Famous movie of the 80s /Movie /movies 72 72Which QNames are in which Namespaces xml version="1.0" DOCTYPE eth eth xmlns="http://www.ethz.ch" xmlns:xmldb="http://www.dbis.ethz.ch" date="11.11.2006" xmldb:date="12.11.2006" date13.11.2006/date president number="1"Empty/president RektorName 2/Rektor /eth 73 73Which QNames are in which Namespaces xml version="1.0" DOCTYPE eth eth xmlns="http://www.ethz.ch" xmlns:xmldb="http://www.dbis.ethz.ch" date="11.11.2006" xmldb:date="12.11.2006" date13.11.2006/date president number="1"Empty/president RektorName 2/Rektor /eth 74 74XML: Not covered §Notations §Unparsed entities §Parameter entities 75 75JSON 76JSON: String "foo" "foo\nbar\u005f" 77JSON: Number 3.1415 1.2345E+5 78JSON: Boolean true false 79JSON: Null null 80JSON: Array 3.14159265368979, true, "This is a string", "foo" : false , null 81 JSON: Object foo: 3.14159265368979, bar: true, str: "This is a string", obj: "school" : "ETH", Q: null 82JSON: Wellformedness "foo" : "bar", "foo" : "bar2" 83 83JSON: Wellformedness "foo" : "bar", "foo" : "bar2" "foo" : "bar", "bar" : "foo" (SHOULD) 84 84JSON: Wellformedness 1 : "bar", 2 : "bar2" 85 85JSON: Wellformedness 1 : "bar", 2 : "bar2" "1" : "bar", "2" : "foo" 86 86JSON: Wellformedness foo: "bar", bar: "bar2" 87 87JSON: Wellformedness foo: "bar", bar: "bar2" "foo" : "bar", "bar" : "foo" 88 88HTML 89XHTML syntax HTML syntax 90 90HTML Syntax DOCTYPE html html head titleUntitled/title /head body Dear jane br pYou are invited at the weekly meeting/p pYours sincerely, br John/p /body /html 91 91HTML Elements: Void br 92 92HTML Elements: Raw text style body color: black; background: white; em fontstyle: normal; color: red; /style 93 93HTML Elements: Escapable raw text title This is a quot;titlequot; /title 94 94HTML Elements: Foreign math xmlns="http://www.w3.org/1998/Math/MathML" apply eq/ ci x /ci apply root/ cn 2 /cn /apply /apply /math 95 95 HTML Elements: Normal ol liett/li litvå/li litre/li /ol 96 96XHTML Syntax xml version "1.0" DOCTYPE html html xmlns="http://www.w3.org/1999/xhtml" head titleUntitled/title /head body Dear jane br/ pYou are invited at the weekly meeting/p pYours sincerely, br/ John/p /body /html . 97 97 .YAML 98YAML YAML1.2 Country: code:'CH' name:'Switzerland' population:8014000 currency: name:'SwissFranc' code:'CHF' confederation:true president:'DidierBurkhalter' capital:null cities: 'Zurich' 'Geneva' 'Bern' description:'Weproduceverygoodchocolate.' 99 99
Website URL
Comment