Question? Leave a message!




Graph Data Structures

Graph Data Structures
ECE 250 Algorithms and Data Structures Graph Data Douglas Wilhelm Harder, M.Math. LEL Department of Electrical and Computer Engineering Structures University of Waterloo Waterloo, Ontario, Canada ece.uwaterloo.ca dwharderalumni.uwaterloo.ca © 20062013 by Douglas Wilhelm Harder. Some rights reserved.Matrix Data Structures 2 Outline • In this topic, we will cover the representation of graphs on a computer • We will examine: – an adjacency matrix representation – smaller representations and pointer arithmetic – sparse matrices and linked listsMatrix Data Structures 3 Background • Project 5 requires you to store a graph with a given number of vertices numbered 0 through n – 1 • Initially, there are no edges between these n vertices • The insert command adds edges to the graph while the number vertices remains unchangedMatrix Data Structures 4 Background • In this laboratory, we will look at techniques for storing the edges of a graph • This laboratory will focus on weighted graphs, however, for unweighted graphs, one can easily use bool in place of doubleMatrix Data Structures 5 Background • To demonstrate these techniques, we will look at storing the edges of the following graph:Matrix Data Structures 6 Adjacency Matrix A graph of n vertices may have up to n  n(n1) 2 O(n )  2 2  edges The first straightforward implementation is an adjacency matrixMatrix Data Structures 7 Adjacency Matrix Define an n× n matrix A = (a ) and if the vertices v and v are ij i j connected with weight w, then set a = w and a = w ij ji That is, the matrix is symmetric, e.g., Matrix Data Structures 8 Adjacency Matrix An unweighted graph may be saved as an array of Boolean values – vertices v and v are connected then set i j a = a = true ij jiMatrix Data Structures 9 Adjacency Matrix If the graph was directed, then the matrix would not necessarily be symmetric Matrix Data Structures 10 Adjacency Matrix First we must allocate memory for a twodimensional array C++ does not have native support for anything more than one dimensional arrays, thus how do we store a twodimensional array – as an array of arraysMatrix Data Structures 11 Adjacency Matrix Suppose we require a 16 × 16 matrix of doubleprecision floating point numbers Each row of the matrix can be represented by an array The address of the first entry must be stored in a pointer to a double: double Matrix Data Structures 12 Adjacency Matrix However, because we must store 16 of these pointerstodoubles, it makes sense that we store these in an array What is the declaration of this array Well, we must store a pointer to a pointer to a double That is: double Matrix Data Structures 13 Adjacency Matrix Thus, the address of the first array must be declared to be: double matrix;Matrix Data Structures 14 Adjacency Matrix The next question is memory allocation First, we must allocate the memory for the array of pointers to doubles: matrix = new double 16;Matrix Data Structures 15 Adjacency Matrix Next, to each entry of this matrix, we must assign the memory allocated for an array of doubles for ( int i = 0; i 16; ++i ) matrixi = new double16; Matrix Data Structures 16 Adjacency Matrix Accessing a matrix is done through a double index, e.g., matrix34 You can interpret this as (matrix3)4Matrix Data Structures 17 Adjacency Matrix Recall that in matrix34, the variable matrix is a pointertoa pointertoadouble:Matrix Data Structures 18 Adjacency Matrix Therefore, matrix3 is a pointertoadouble:Matrix Data Structures 19 Adjacency Matrix And consequently, matrix34 is a double:Matrix Data Structures 20 C++ Notation Warning Do not use matrix3, 4 because: – in C++, the comma operator evaluates the operands in order from left toright – the value is the last one Therefore, matrix3, 4 is equivalent to calling matrix4 Try it: int i = (3, 4); cout i endl;Matrix Data Structures 21 C++ Notation Warning Many things will compile if you try to use this notation: matrix = new doubleN, N; will allocate an array of N doubles, just like: matrix = new doubleN; However, this is likely not to do what you really expect...Matrix Data Structures 22 Adjacency Matrix Now, once you’ve used the matrix, you must also delete it...Matrix Data Structures 23 Adjacency Matrix Recall that for each call to new,you must have a corresponding call to delete Therefore, we must use a forloop to delete the arrays – implementation up to youMatrix Data Structures 24 Default Values Question: what do we do about vertices which are not connected – the value 0 – a negative number, e.g., –1 – positive infinity: ∞ The last is the most logical, in that it makes sense that two vertices which are not connected have an infinite distance between themMatrix Data Structures 25 Default Values To use infinity, you may declare a constant static member variable INF: include limits class Weightedgraph private: static const double INF; // ... // ... ; const double Weightedgraph::INF = std::numericlimitsdouble::infinity();Matrix Data Structures 26 Default Values As defined in the IEEE 754 standard, the representation of the doubleprecision floatingpoint infinity eight bytes: 0x 7F F0 00 00 00 00 00 00 Incidentally, negative infinity is stored as: 0x FF F0 00 00 00 00 00 00Matrix Data Structures 27 Default Values In this case, you can initialize your array as follows: for ( int i = 0; i N; ++i ) for ( int j = 0; j N; ++j ) matrixij = INF; matrixii = 0; It makes intuitive sense that the distance from a node to itself is 0Matrix Data Structures 28 Default Values If we are representing an unweighted graph, use Boolean values: for ( int i = 0; i N; ++i ) for ( int j = 0; j N; ++j ) matrixij = false; matrixii = true; It makes intuitive sense that a vertex is connected to itselfMatrix Data Structures 29 Adjacency Matrix Let us look at the representation of our example graph Initially none of the edges are recorded:Matrix Data Structures 30 Adjacency Matrix To insert the edge between 0 and 1 with weight 0.83, we set matrix01 = matrix10 = 0.83;Matrix Data Structures 31 Adjacency Matrix The final result is shown as follows Note, however, that these six arrays could be anywhere in memory...Matrix Data Structures 32 Adjacency Matrix We have now looked at how we can store an adjacency graph in C++ Next, we will look at: – Two improvements for the arrayofarrays implementations, including: • allocating the memory for the matrix in a single contiguous block of code, and • a lowertriangular representation; and – A sparse linkedlist implementationMatrix Data Structures 33 Adjacency Matrix Improvement To begin, we will look at the first improvement: 2 – allocating all of the memory of the arrays in a single array with n entriesMatrix Data Structures 34 Adjacency Matrix Improvement For those of you who would like to reduce the number of calls to new, consider the following idea: – allocate an array of 16 pointers to doubles 2 – allocate an array of 16 = 256 doubles Then, assign to the 16 pointers in the first array the addresses of entries 0, 16, 32, 48, 64, ..., 240Matrix Data Structures 35 Adjacency Matrix Improvement First, we allocate memory: matrix = new double 16; double tmp = new double256;Matrix Data Structures 36 Adjacency Matrix Improvement Next, we allocate the addresses: matrix = new double 16; double tmp = new double256; for ( int i = 0; i 16; ++i ) matrixi = ( tmp16i ); This assigns: matrix 0 = ( tmp 0 ); matrix 1 = ( tmp 16 ); matrix 2 = ( tmp 32 ); . . . matrix15 = ( tmp240 );Matrix Data Structures 37 Adjacency Matrix Improvement Deleting this array is easier: delete matrix0; delete matrix;Matrix Data Structures 38 Adjacency Matrix Improvement Our sample graph would be represented as follows:Matrix Data Structures 39 Lowertriangular adjacency matrix Next we will look at another improvement which can be used for undirected graphs We will store only half of the entries – To do this, we must also learn about pointer arithmeticMatrix Data Structures 40 Lowertriangular adjacency matrix Note also that we are not storing a directed graph: therefore, we really need only store half of the matrix Thus, instead of 256 entries, we really only require 120 entriesMatrix Data Structures 41 Lowertriangular adjacency matrix The memory allocation for this would be straightforward, too: matrix = new double 16; matrix0 = 0; matrix1 = new double120; for( int i = 2; i 16; ++i ) matrixi = matrixi– 1 + i 1; Matrix Data Structures 42 Lowertriangular adjacency matrix • What we are using here is pointer arithmetic: – in C/C++, you can add values to a pointer – the question is, what does it mean to set: ptr = ptr + 1; or ptr = ptr + 2;Matrix Data Structures 43 Lowertriangular adjacency matrix • Suppose we have a pointertoadouble: double ptr = new double( 3.14 ); where: – the pointer has a value of 0x53A1D780, and – the representation of 3.14 is 0x40091Eb851EB851FMatrix Data Structures 44 Lowertriangular adjacency matrix • If we just added one to the address, then this would give us the value 0x53A1D781, but this contains no useful information...Matrix Data Structures 45 Lowertriangular adjacency matrix • The only logical interpretation of ptr + 1 is to go to the next location a different double could exist, i.e., 0x53A1D788Matrix Data Structures 46 Lowertriangular adjacency matrix Therefore, if we define: double array = new double4; then the following are all equivalent: array0 array array1 (array + 1) array2 (array + 2) array3 (array + 3)Matrix Data Structures 47 Lowertriangular adjacency matrix • Thus, the following code simply adds appropriate amounts to the pointer: matrix = new double N; matrix0 = nullptr; matrix1 = new doubleN(N – 1)/2; for( int i = 2; i N; ++i ) matrixi = matrixi– 1 + i 1; Matrix Data Structures 48 Lowertriangular adjacency matrix Visually, we have, for N = 16, the following: matrix0 = nullptr; matrix1 = ( tmp0 ); matrix2 = ( tmp1 ); matrix3 = ( tmp3 ); matrix4 = ( tmp6 ); matrix5 = ( tmp10 ); matrix6 = ( tmp15 ); matrix7 = ( tmp21 ); matrix7 = ( tmp28 ); . . . matrix15 = ( tmp105 );Matrix Data Structures 49 Lowertriangular adjacency matrix The only thing that we would have to do is ensure that we always put the larger number first: void insert( int i, int j, double w ) if ( j i ) matrixij = w; else matrixji = w; Matrix Data Structures 50 Lowertriangular adjacency matrix • A slightly less efficient way of writing this would be: void insert( int i, int j, double w ) matrixmax(i,j)min(i,j) = w; • The benefits (from the pointofview of clarity) are much more significant...Matrix Data Structures 51 Lowertriangular adjacency matrix • Our example graph is stored using this representation as shown here • Notice that we do not store any 0’s, nor do we store any duplicate entries • The second array has only 15 entries, versus 36Matrix Data Structures 52 Lowertriangular adjacency matrix • To determine the weight of the edge connecting vertices 1 and 4, we must look up the entry matrix41Matrix Data Structures 53 Beyond Array Bounds • Until now, some of you may have gone beyond array bounds accidentally • Recall that int array = new int10; allocates 40 bytes (4 bytes/int) and the entries are accessed with array0 through array9Matrix Data Structures 54 Beyond Array Bounds • If you try to access either array10 or array1, you are accessing memory which has not been allocated for this arrayMatrix Data Structures 55 Beyond Array Bounds • This memory may be used: – for different local variables, or – by some other process • In the first case, you will have a bug which is very difficult to track down – e.g., a variable will appear to change its value without an explicit assignment • In the second case, the OS will terminate your process (segmentation fault)Matrix Data Structures 56 Beyond Array Bounds • Now we have a very explicit example of what happens if you go outside your expected array bounds • Notice that the value stored at matrix41 is 0.46 • We can also access it using either: matrix34 matrix53Matrix Data Structures 57 Beyond Array Bounds • Thus, if you wanted to find the distance between vertices 3 and 4, if you access matrix43, you get is 0.24 • If, however, you access matrix34, you get 0.46Matrix Data Structures 58 Beyond Array Bounds • Similarly, if you wanted to find the distance between vertices 2 and 3, if you access matrix32, you get is 0.39 • If, however, you access matrix23, you get 0.72Matrix Data Structures 59 Sparse Matrices • Finally we will consider the problem with sparse matrices and we will look at one implementation using linked listsMatrix Data Structures 60 Sparse Matrices • The memory required for creating an n × n matrix using an arrayof arrays is: 2 2 4 bytes + 4n bytes + 8n bytes = Q(n ) bytes • This could potentially waste a significant amount of memory: – consider all intersections in Canada as vertices and streets as edges – how could we estimate the number of intersections in CanadaMatrix Data Structures 61 Sparse Matrices • The population of Canada is 33 million • Suppose we have one intersection per 10 houses and four occupants per house • Therefore, there are roughly 33 million / 10 / 4 ≈ 800 000 intersections in Canada which would require 4.66 TiB of memoryMatrix Data Structures 62 Sparse Matrices • Assume that each intersection connects, on average, four other intersections • Therefore, less than 0.0005 of the entries of the matrix are used to store connections – the rest are storing the value infinityMatrix Data Structures 63 Sparse Matrices • Matrices where less than 5 of the entries are not the default value (either infinity or 0, or perhaps some other default value) are said to be sparse • Matrices where most entries (25 or more) are not the default value are said to be dense • Clearly, these are not hard limitsMatrix Data Structures 64 Sparse Matrices • We will look at a very efficient sparsematrix implementation with the last topic • Here, we will consider a simpler implementation: – use an array of linked lists to store edges • Note, however, that each node in a linked list must store two items of information: – the connecting vertex and the weightMatrix Data Structures 65 Sparse Matrices • One possible solution: – modify the SingleNode data structure to store both an integer and a double: class SingleNode private: int adacentvertex; double edgeweight; SingleNode nextnode; public: SingleNode( int, double SingleNode = 0 ); double weight() const; int vertex() const; SingleNode next() const; ; – exceptionally stupid and inefficient Matrix Data Structures 66 Sparse Matrices A better solution is to create a new class which stores a vertexedge pair class Pair private: double edgeweight; int adacentvertex; public: Pair( int, double ); double weight() const; int vertex() const; ; Now create an array of linkedlists storing these pairs Matrix Data Structures 67 Sparse Matrices Thus, we define and create the array: SingleListPair array; array = new SingleListPair16;Matrix Data Structures 68 Sparse Matrices As before, to reduce redundancy, we would only insert the entry into the entry corresponding with the larger vertex void insert( int i, int j, double w ) if ( i j ) arrayj.pushfront( Pair(i, w) ); else arrayi.pushfront( Pair(j, w) ); Matrix Data Structures 69 Sparse Matrices For example, the graph shown below would be stored asMatrix Data Structures 70 Sparse Matrices Later, we will see an even more efficient implementation – The old an new Yale sparse matrix formatsMatrix Data Structures 71 Summary • In this laboratory, we have looked at a number of graph representations • C++ lacks a matrix data structure – must use array of arrays • The possible factors affecting your choice of data structure are: – weighted or unweighted graphs – directed or undirected graphs – dense or sparse graphsMatrix Data Structures 72 References Wikipedia, http://en.wikipedia.org/wiki/Adjacencymatrix http://en.wikipedia.org/wiki/Adjacencylist These slides are provided for the ECE 250 Algorithms and Data Structures course. The material in it reflects Douglas W. Harder’s best judgment in light of the information available to him at the time of preparation. Any reliance on these course slides by any party for any other purpose are the responsibility of such parties. Douglas W. Harder accepts no responsibility for damages, if any, suffered by any party as a result of decisions made or actions based on these course slides for any other purpose than that for which it was intended.