Question? Leave a message!




Graph Data Management

Graph Data Management
Graph Data Management Analysis and Optimization of Graph Data Frameworks presented by Fynn LeitowOverview 1) Introduction a) Motivation b) Application for big data 2) Choice of algorithms 3) Choice of frameworks a) Framework implementation 4) Framework analysis a) Performance comparison b) Optimization techniques by Fynn Leitow 5) Conclusion CriticismMotivation ➢How to implement graph analysis algorithms on huge graphs ➢How do they perform ➢How about parallel computingApplication for big data ➢Social Networks Web of Data ➢extremely large dynamic ➢can’t be handled by legacy programsSocial Networks ➢vertices: people, pictures, videos ➢edges: relations among nodes (friendship, follower) ➢scalefree: follow powerlaw distribution (few vertices have high popularity)Web of Data ➢Largescale structured data by governments, researchers, companies ➢Publishing principles: ○ Unique ressource identifier ○ publication at this URI in RDF triples (ressourcedataframework, statements similar to entity relationship model but more general) ○ links to similar online ressources ➢RDF example: “The sky has the color blue” subject: predicate: object: “the sky” “has” “the color blue”Typical queries ➢ social networks: ○ punctual updates (of vertices, adding edges) ○ transitive closures (“other people you might know”, ...) ○ betweenness queries (“common friends”, shortest path, ...) ➢ Web of data: ○ bulk inserts ○ joins ○ logical inference (deductions) All men are Socrates Therefore, mortal is a man Socrates is mortal.Objective of the paper ➢ Analyse existing graph frameworks also for machine learning ➢ native, handoptimized implementation as reference point ➢ give suggestions to improve performance gapAlgorithms PageRank (statistics) Breadthfirst search (graph traversal) Triangle Counting (statistics) Collaborative filtering (machine learning)1.PageRank (site popularity) ➢how many links go to this this site ➢technically: probability for a random walk to end on this vertex t iteration r probability for random jump e set of directed edges degree(j) number of outgoing edges2. Breadthfirst search (BFS) Graph traversal ➢calculates “distance” (smallest number of edges) from start to any other vertex ➢Distance(start) initialized to zero, all others to infinity ➢iteratively computes for neighboring vertices:3. Triangle Counting graph statistics ➢Triangle := two vertices are both neighbors of a common third vertex ➢Algorithm: ○ each vertex shares his neighborhood list with its neighbors ○ do neighbors overlap with the neighborhood lists triangle k i jCollaborative filtering machine learning ➢Recommendation system: predicts user ratings based on an incomplete set of (user, item) ratings (a)Given a rating matrix R, find P and Q so that R = PQ is approximated best. (b) find the bipartite graph with edge weights R u,vOverview Algorithm Application Graph type Vertex property Message size (Bytes/edge) PageRank Graph statistics Directed, Double (pagerank) Constant (8) unweighted edges Breadth First Graph traversal Undirected, Int (distance) Constant (4) Search unweighted edges Collaborative Machine learning Bipartite graph; Array of Doubles Constant (8K) Filtering Undirected, (p or q ) u v weighted edges 6 Triangle Counting Graph statistics Directed, Long ( triangles) Variable (010 ) unweighted edges ➢We focus on PageRank and BFSFrameworks Graphlab CombinatorialBLAS SocialLite Galois GiraphExplanation: Framework Ninja gap ➢Framework: layered structure indicating what kind of programs can/should be built and how they should interrelate ➢can include programs, specify interfaces, offer programming tools,... ➢Ninja performance gap: “Performance gap between naively written C++/C code that is parallelism unaware (often serial) and bestoptimized code on modern multi/manycore processors” 1Choice of Frameworks: (Austin Texas University) (Stanford University) COMBINATORIALBLAS (UCSB)Overview Framework Programming Language Graph Communication model Partitioning layer GraphLab Vertex C++ 1D (vertexpart.) Sockets CombBLAS Sparse (adjacency) matrix C++ 2D (edgepart.) MPI SociaLite Datalog (declarative, Java 1D Sockets deductive database tables) Galois Taskbased C/C++ N/A N/A Giraph (Hadoop) Vertex Java 1D Netty ➢Galois: parallel computing framework for irregular data structuresPageRank vertex programming ➢Graphlab Giraph, similar for Galois ➢runs on a single vertex and communicates with adjacent vertices ➢Vertex program for one iteration:PageRank sparse matrix (CombBLAS) ➢single iteration of PageRank: A adjacency matrix p vector of PageRank values t p = p / vector of vertex outdegrees t tPageRank declarative (SociaLite) ➢Head: PageRank of vertex n for iteration t+1 is sum of two rules: ➢1st: constant term, 2nd: normalized values from iteration t ➢InEdgen(s): vertices in 1st, neighbors in 2nd column ➢second version optimized for distributed machinesBFS vertex programing (GraphLab, Giraph) ➢Initialize Distance to zero or infinity, respectively ➢Continue until there are no more updatesBreadth First Search sparse matrix (CombBLAS) ➢matrixvector multiplication in each iteration: T v = A s v nonzeros indicate start vertices for next iteration A adjacency matrix s starting verticesBreadth First Search SociaLite ➢1st rule handles source, 2nd recursively follows neighborsBreadth First Search Galois ➢Work lists are maintained parallel processed for each level by GaloisFramework Analysis Datasets Performance on single and multiple nodesExperimental Setup Dataset FB, Wiki, Netflix Twitter Yahoo Music Synthetic Graph500 Synthetic Collaborative Livejournal (64 nodes) Filtering (64 nodes) vertices 3 5 M 480k users 62 M 1.0 M users 537 M 63 M users 3 (k = 10 ) 18k movies 0.6 M items 1.3 M items edges 42 85 M 99 M ratings 1468 M 253 M ratings 8539 M 16743 M ratings 6 (M = 10 ) ➢Twitter Yahoo large enough for multiple (4, 16 for triangle) nodes ➢Intel Xeon E52697, 24 cores 2.7 Ghz, 64 GB DRAM / node ➢Real data distributed by power law synthetic as wellPerformance results (single node) ➢native fastest ➢Galois very fast (single node only) ➢Giraph slow ➢synthetic data in line with realworld resultsPerformance (single node) ➢CombBlas, Graphlab and SociaLite perform well on average 2 ➢CombBlas ran out of memory on Triangle Count (A calculations)Performance (multiple nodes, synthetic) ➢“weak scaling”: graph data per node constant, ➢horizontal line denotes perfect scalingmultiple node (realworld / combined)Observations ➢Again: Native best, Giraph worst, CombBLAS OOM. ➢Graphlab SociaLite perform well on counting because data structure is optimized for UNION operations ➢CombBLAS performs well for the other three algorithms ➢GraphLab drops off for PageRank BFS due to network bottleneckFurther Analysis: Resource monitoring ➢CPU utilization ( larger is better) ➢Peak network transfer rate ( ) ➢memory footprint ( ) ➢network data volume ( ) = find out why certain trends are observedResource monitoring ➢Giraph only 16 CPU due to memory requirements for each worker ➢Pagerank: limited by network traffic for all performance differenceOptimization ➢ Data Structure ○ 2x : bit vectors (exploit bitlevel parallelism in hardware) ○ compressed sparserow (CSR) format for adjacency list ➢ Data Compression ○ 2x : bit vectors + delta coding (store differences rather than actual values) ➢ Overlap of Computation and Communication ○ 1.22x : start computation before receiving whole message, chunk large messages ➢ Message passing mechanisms ○ 2.53x : use MPI (message passing interface) instead of TCP sockets ○ 2x : use multiple sockets between two nodes ➢ Partitioning schemes (1d, 2d, vertexcut,...)Optimization PageRank DFS ➢optimization performed for the native algorithm DFS: bitvectors for list of already visited verticesCriticism SummaryCriticism ➢Not including sparkbased GraphX (7x slower than GraphLab on PageRank) ➢For those frameworks without preimplemented code, your implementation might be too good/too bad and distort the results. ✓ Creates value for endusers and framework developers alike ✓ Methods and algorithms well explain without being too technicalSummary ➢vertexbased: every vertex is an instance, communicates noniterative ➢Galois: best single node results. Giraph: slow ➢Optimization: bit vectors, delta coding, CSR, Overlap, MPI ⇒ Reduce the performance gap through recommended changes and let the end user choose by preference.Sources 1 Can Traditional Programming Bridge the Ninja Performance Gap for Parallel Computing Applications (Satish et al. ; 2012) Images: Giraph, Graphlab, SociaLite, Galois Introductory Paper: Graph Data Management Systems for New Application Domains (CudréMauroux, Elnikety; 2011) Main Paper, other tables and figures: Navigating the maze of graph analytics frameworks using massive graph datasets (Satish et al.; SIGMOD 2014)