Distributed Database Design

Distributed Database Design
Dr.JakeFinlay Profile Pic
Dr.JakeFinlay,Germany,Teacher
Published Date:22-07-2017
Your Website URL(Optional)
Comment
Chapter 3: Distributed Database Design • Design problem • Design strategies(top-down, bottom-up) • Fragmentation • Allocation and replication of fragments, optimality, heuristics Acknowledgements: I am indebted to Arturas Mazeika for providing me his slides of this course. DDB 2008/09 J. Gamper Page 1Design Problem • Design problem of distributed systems: Making decisions about the placement of data and programs across the sites of a computer network as well as possibly designing the network itself. • In DDBMS, the distribution of applications involves – Distribution of the DDBMS software – Distribution of applications that run on the database • Distribution of applications will not be considered in the following; instead the distribution of data is studied. DDB 2008/09 J. Gamper Page 2Framework of Distribution • Dimension for the analysis of distributed systems – Level of sharing: no sharing, data sharing, data + program sharing – Behavior of access patterns: static, dynamic – Level of knowledge on access pattern behavior: no information, partial information, complete information • Distributed database design should be considered within this general framework. DDB 2008/09 J. Gamper Page 3Design Strategies • Top-down approach – Designing systems from scratch – Homogeneous systems • Bottom-up approach – The databases already exist at a number of sites – The databases should be connected to solve common tasks DDB 2008/09 J. Gamper Page 4Design Strategies . . . • Top-down design strategy DDB 2008/09 J. Gamper Page 5Design Strategies . . . • Distribution design is the central part of the design in DDBMSs (the other tasks are similar to traditional databases) – Objective: Design the LCSs by distributing the entities (relations) over the sites – Two main aspects have to be designed carefully ∗ Fragmentation · Relation may be divided into a number of sub-relations, which are distributed ∗ Allocation and replication · Each fragment is stored at site with ”optimal” distribution · Copy of fragment may be maintained at several sites • In this chapter we mainly concentrate on these two aspects • Distribution design issues – Why fragment at all? – How to fragment? – How much to fragment? – How to test correctness? – How to allocate? DDB 2008/09 J. Gamper Page 6Design Strategies . . . • Bottom-up design strategy DDB 2008/09 J. Gamper Page 7Fragmentation • What is a reasonable unit of distribution? Relation or fragment of relation? • Relations as unit of distribution: – If the relation is not replicated, we get a high volume of remote data accesses. – If the relation is replicated, we get unnecessary replications, which cause problems in executing updates and waste disk space – Might be an Ok solution, if queries need all the data in the relation and data stays at the only sites that uses the data • Fragments of relationas as unit of distribution: – Application views are usually subsets of relations – Thus, locality of accesses of applications is defined on subsets of relations – Permits a number of transactions to execute concurrently, since they will access different portions of a relation – Parallel execution of a single query (intra-query concurrency) – However, semantic data control (especially integrity enforcement) is more difficult ⇒ Fragments of relations are (usually) the appropriate unit of distribution. DDB 2008/09 J. Gamper Page 8Fragmentation . . . • Fragmentation aims to improve: – Reliability – Performance – Balanced storage capacity and costs – Communication costs – Security • The following information is used to decide fragmentation: – Quantitative information: frequency of queries, site, where query is run, selectivity of the queries, etc. – Qualitative information: types of access of data, read/write, etc. DDB 2008/09 J. Gamper Page 9Fragmentation . . . • Types of Fragmentation – Horizontal: partitions a relation along its tuples – Vertical: partitions a relation along its attributes – Mixed/hybrid: a combination of horizontal and vertical fragmentation (a) Horizontal Fragmentation (b) Vertical Fragmentation (c) Mixed Fragmentation DDB 2008/09 J. Gamper Page 10Fragmentation . . . • Exampe Data E-R Diagram DDB 2008/09 J. Gamper Page 11Fragmentation . . . • Example (contd.): Horizontal fragmentation of PROJ relation – PROJ1: projects with budgets less than200,000 – PROJ2: projects with budgets greater than or equal to200,000 DDB 2008/09 J. Gamper Page 12Fragmentation . . . • Example (contd.): Vertical fragmentation of PROJ relation – PROJ1: information about project budgets – PROJ2: information about project names and locations DDB 2008/09 J. Gamper Page 13Correctness Rules of Fragmentation • Completeness – Decomposition of relationR into fragmentsR ,R ,...,R is complete iff each 1 2 n data item inR can also be found in someR . i • Reconstruction – If relationR is decomposed into fragmentsR ,R ,...,R , then there should exist 1 2 n some relational operator∇ that reconstructsR from its fragments, i.e., R =R ∇...∇R 1 n ∗ Union to combine horizontal fragments ∗ Join to combine vertical fragments • Disjointness – If relationR is decomposed into fragmentsR ,R ,...,R and data itemd 1 2 n i appears in fragmentR , thend should not appear in any other fragmentR ,k =j j i k (exception: primary key attribute for vertical fragmentation) ∗ For horizontal fragmentation, data item is a tuple ∗ For vertical fragmentation, data item is an attribute DDB 2008/09 J. Gamper Page 14 6Horizontal Fragmentation • Intuition behind horizontal fragmentation – Every site should hold all information that is used to query at the site – The information at the site should be fragmented so the queries of the site run faster • Horizontal fragmentation is defined as selection operation,σ (R) p • Example: σ (PROJ) BUDGET200000 σ (PROJ) BUDGET≥200000 DDB 2008/09 J. Gamper Page 15Horizontal Fragmentation . . . • Computing horizontal fragmentation (idea) – Compute the frequency of the individual queries of the siteq ,...,q 1 Q – Rewrite the queries of the site in the conjunctive normal form (disjunction of conjunctions); the conjunctions are called minterms. – Compute the selectivity of the minterms – Find the minimal and complete set of minterms (predicates) ∗ The set of predicates is complete if and only if any two tuples in the same fragment are referenced with the same probability by any application ∗ The set of predicates is minimal if and only if there is at least one query that accesses the fragment – There is an algorithm how to find these fragments algorithmically (the algorithm CON MIN andPHORIZONTAL (pp 120-122) of the textbook of the course) DDB 2008/09 J. Gamper Page 16Horizontal Fragmentation . . . • Example: Fragmentation of thePROJ relation – Consider the following query: Find the name and budget of projects given their PNO. – The query is issued at all three sites – Fragmentation based on LOC, using the set of predicates/minterms ′ ′ ′ ′ ′ ′ LOC = Montreal,LOC = NewYork,LOC = Paris PROJ =σ ′ ′(PROJ) 2 LOC= NewYork ′ ′ PROJ =σ (PROJ) 1 LOC= Montreal PNO PNAME BUDGET LOC PNO PNAME BUDGET LOC P2 Database Develop. 135000 New York P1 Instrumentation 150000 Montreal P3 CAD/CAM 250000 New York ′ ′ PROJ =σ (PROJ) 3 LOC= Paris PNO PNAME BUDGET LOC P4 Maintenance 310000 Paris • If access is only according to the location, the above set of predicates is complete – i.e., each tuple of each fragmentPROJ has the same probability of being accessed i • If there is a second query/application to access only those project tuples where the budget is less than 200000, the set of predicates is not complete. – P2 inPROJ has higher probability to be accessed 2 DDB 2008/09 J. Gamper Page 17Horizontal Fragmentation . . . • Example (contd.): – AddBUDGET ≤ 200000 andBUDGET 200000 to the set of predicates to make it complete. ′ ′ ′ ′ ′ ′ ⇒LOC = Montreal,LOC = NewYork,LOC = Paris, BUDGET ≥ 200000,BUDGET 200000 is a complete set – Minterms to fragment the relation are given as follows: ′ ′ (LOC = Montreal )∧(BUDGET ≤ 200000) ′ ′ (LOC = Montreal )∧(BUDGET 200000) ′ ′ (LOC = NewYork )∧(BUDGET ≤ 200000) ′ ′ (LOC = NewYork )∧(BUDGET 200000) ′ ′ (LOC = Paris)∧(BUDGET ≤ 200000) ′ ′ (LOC = Paris)∧(BUDGET 200000) DDB 2008/09 J. Gamper Page 18Horizontal Fragmentation . . . • Example (contd.): Now,PROJ will be split in two fragments 2 ′ ′ ′ ′ PROJ =σ (PROJ) PROJ =σ (PROJ) 1 2 LOC= Montreal LOC= NY ∧BUDGET200000 PNO PNAME BUDGET LOC PNO PNAME BUDGET LOC P1 Instrumentation 150000 Montreal P2 Database Develop. 135000 New York ′ ′ ′ PROJ =σ (PROJ) PROJ =σ ′ ′ (PROJ) 3 LOC= Paris 2 LOC= NY ∧BUDGET≥200000 PNO PNAME BUDGET LOC PNO PNAME BUDGET LOC P4 Maintenance 310000 Paris P3 CAD/CAM 250000 New York – PROJ andPROJ would have been split in a similar way if tuples with budgets 1 2 smaller and greater than 200.000 would be stored DDB 2008/09 J. Gamper Page 19Horizontal Fragmentation . . . • In most cases intuition can be used to build horizontal partitions. Lett ,t ,t , 1 2 3 t ,t , andt ,t ,t ,t be query results. Then tuples would be fragmented in the 4 5 2 3 4 5 following way: t t t t t 1 2 3 4 5 DDB 2008/09 J. Gamper Page 20