Question? Leave a message!




Seminar on big data management

Seminar on big data management
Seminar on big data management Lecturer: Jiaheng Lu Spring 2016 www.helsinki.fi 6.1.2016 1We are in the era of big data • Lots of data is being collected • Web data, ecommerce • Bank/Credit Card transactions • Social Network • Scientific data www.helsinki.fi 6.1.2016 2 Matemaattisluonnontieteellinen tiedekunta /How much data • Google processes 20 PB a day (2008) • Facebook has 2.5 PB of user data + 15 TB/day (4/2009) • eBay has 6.5 PB of user data + 50 TB/day (5/2009) • CERN atomic facility generates 40TB per second. • In 2009, total data is about 1ZB, in 2020, it is estimated to be 35ZB. www.helsinki.fiType of Data • Relational Data (Tables/Transaction/Legacy Data) • Text Data (Web) • Semistructured Data (XML) • Graph Data • Social Network, Semantic Web (RDF), … • Streaming Data • You can only scan the data once www.helsinki.fiFour V’s www.helsinki.fi 6.1.2016 5www.helsinki.fi 6.1.2016 6 Matemaattisluonnontieteellinen tiedekunta / Henkilön nimi / Esityksen nimi• Watch two videos about big data www.helsinki.fi 6.1.2016 7 Matemaattisluonnontieteellinen tiedekunta / Henkilön nimi / Esityksen nimiOutline • About the seminar • Practical information and requirement • Seminar topics • Our schedule www.helsinki.fi 6.1.2016 8 Matemaattisluonnontieteellinen tiedekunta / Henkilön nimi / Esityksen nimiThe seminar is about • Big data management • Data querying, exploration, sampling, sharing, cleansing, cloud data management, big data benchmark and applications. www.helsinki.fi 6.1.2016 9 Matemaattisluonnontieteellinen tiedekunta / Henkilön nimi / Esityksen nimiAt the end of the seminar • You should be able to tell what these terms stand for And more… Hadoop Mapreduce Cassadra Spark RDD www.helsinki.fi 6.1.2016 10 Matemaattisluonnontieteellinen tiedekunta / Henkilön nimi / Esityksen nimiAfter this seminar • Students are expected to • Have a decent understanding of big data challenge • Conduct research on one of topics related to big data management • Know how to read/write/review a technical paper • Know how to present a paper www.helsinki.fi 6.1.2016 11 Matemaattisluonnontieteellinen tiedekunta / Henkilön nimi / Esityksen nimiMore formally • Pick a topic from the offered topics • Read papers on that topic • Present the paper • Write a report on the topic • Review two other reported written by your classmates • Ask questions as an opponent for the presentation by your classmates • Attend the lectures (at least 80) www.helsinki.fi 6.1.2016 12 Matemaattisluonnontieteellinen tiedekunta / Henkilön nimi / Esityksen nimiDeadlines for each task Submit the Submit the Submit the first version peer review final report of the report comments 7 Mar 21 Mar 2 May Topic Selection Ask questions 29 Jan Present the as an paper opponent www.helsinki.fi 6.1.2016 13 Matemaattisluonnontieteellinen tiedekunta / Henkilön nimi / Esityksen nimiTopic assignment • Submit your list the preferred 3 topics • If you have something in mind which is not listed in, please send an email to the teacher • Unfortunately, due to multiple students wishing to take the same topic, you may not be able to get your first choice. • Same topics will be assigned to more than one person. www.helsinki.fi 6.1.2016 14 Matemaattisluonnontieteellinen tiedekunta / Henkilön nimi / Esityksen nimiStart researching your topics immediately after topic assignment www.helsinki.fi 6.1.2016 15 Matemaattisluonnontieteellinen tiedekunta / Henkilön nimi / Esityksen nimiTopics of this seminar • Big data survey • Hadoop and Spark platforms • Cloud data management • Graph data management • Data sampling • Data exploration • www.helsinki.fi 6.1.2016 16 Matemaattisluonnontieteellinen tiedekunta / Henkilön nimi / Esityksen nimiTopics of this seminar • Approximate data processing • Data cleansing • Knowledge base • Big data benchmark • Big data applications www.helsinki.fi 6.1.2016 17 Matemaattisluonnontieteellinen tiedekunta / Henkilön nimi / Esityksen nimiHadoop and Spark platforms • Two opensources platforms for big data processing www.helsinki.fi 6.1.2016 18 Matemaattisluonnontieteellinen tiedekunta / Henkilön nimi / Esityksen nimiCloud data management • Cloud data management is to deploy database systems in the cloud. • New challenges: • Data is stored at an untrusted host • Data is replicated across large geographic distances • Compute power is elastic Matemaattisluonnontieteellinen tiedekunta / Iso tiedonhallinta/ www.helsinki.fi 6.1.2016 19 Jiaheng LuData sampling • It is not always possible to store the big data in full • Many applications (telecoms, ISPs, search engines) can’t keep everything • It is inconvenient to work with data in full • It is faster to work with a compact summary • Better to explore data on a laptop than a cluster Matemaattisluonnontieteellinen tiedekunta / Iso tiedonhallinta/ www.helsinki.fi 6.1.2016 20 Jiaheng LuGraph data management • Graph data management has long been a topic of interest for database researchers. • New application domains for big data including social networks and the Web of data. Matemaattisluonnontieteellinen tiedekunta / Iso tiedonhallinta/ www.helsinki.fi 6.1.2016 21 Jiaheng LuData exploration • Data exploration is about efficiently extracting knowledge from big data even if we do not know exactly what we are looking for. • Topics: • Query Result Visualization • Query by example • Approximation query processing • Interactive interface Matemaattisluonnontieteellinen tiedekunta / Iso tiedonhallinta/ www.helsinki.fi 6.1.2016 22 Jiaheng LuApproximate string processing • String data is ubiquitous. Approximate string processing tolerates the error with string matching. Actual queries gathered by Google Matemaattisluonnontieteellinen tiedekunta / Iso tiedonhallinta/ www.helsinki.fi 6.1.2016 23 Jiaheng LuData cleansing • Data cleansing or data cleaning is the process of detecting and correcting (or removing) corrupt or inaccurate records from a record set, table, or database. • Example: Gender Frequency 21 F12 M13 X1 f2 Matemaattisluonnontieteellinen tiedekunta / Iso tiedonhallinta/ www.helsinki.fi 6.1.2016 24 Jiaheng LuKnowledge base • A knowledge base (KB) contains a set of concepts, instances, and relationships. • Applications: • query understanding • Deep Web search • Incontext advertisement • Event monitoring in social media • Product search, and social mining. Matemaattisluonnontieteellinen tiedekunta / Iso tiedonhallinta/ www.helsinki.fi 6.1.2016 25 Jiaheng LuBig data benchmark • Create a standard benchmark to assist in the evaluation of different big data systems. • Performance • Scaleup • Elastic speedup • Availability Matemaattisluonnontieteellinen tiedekunta / Iso tiedonhallinta/ www.helsinki.fi 6.1.2016 26 Jiaheng LuBig data applications • Big data will have many applications in different areas: • Science and research • Public health • Customer relation management • Machine and Device Performance • Security and Law Enforcement • Optimizing Cities and Countries Matemaattisluonnontieteellinen tiedekunta / Iso tiedonhallinta/ www.helsinki.fi 6.1.2016 27 Jiaheng LuTask for Next week • Perform 1st pass on the papers in the seed papers list – All papers available on the course homepage • Select interesting papers in references www.helsinki.fi 6.1.2016 28 Matemaattisluonnontieteellinen tiedekunta / Henkilön nimi / Esityksen nimiLogistics • ● Office hours: Monday 1517, Exactum A236 • ● Please send an email – To: Jiaheng.luhelsinki.fi • ● Course webpage: • http://www.cs.helsinki.fi/en/courses/58316103/ 2016/k/s/1 www.helsinki.fi 6.1.2016 29 Matemaattisluonnontieteellinen tiedekunta / Henkilön nimi / Esityksen nimiFinally, give Big Data a Warm Hug www.helsinki.fi
sharer
Presentations
Free
Document Information
Category:
Presentations
User Name:
Dr.GordenMorse
User Type:
Professional
Country:
France
Uploaded Date:
22-07-2017