Query Decomposition and Data Localization

Query Decomposition and Data Localization
Chapter 6: Query Decomposition and Data Localization • Query Decomposition • Data Localization Acknowledgements: I am indebted to Arturas Mazeika for providing me his slides of this course. DDB 2008/09 J. Gamper Page 1Query Decomposition • Query decomposition: Mapping of calcu lus query (SQL) to algebra operations (select, project, join, rename) • Both input and output queries refer to global re lations, without knowledge of the distribution of data. • The output query is semantically correct and good in the sense that redundant work is avoided. • Query decomposistion consists of 4 steps: 1. Normalization: Transform query to a normalized form 2. Analysis: Detect and reject ”incorrect” queries; possible only for a subset of relational calculus 3. Elimination of redundancy: Eliminate redundant predicates 4. Rewriting: Transform query to RA and optimize query DDB 2008/09 J. Gamper Page 2Query Decomposition – Normalization • Normalization: Transform the query to a normalized form to facilitate further processing. Consists mainly of two steps. 1. Lexical and syntactic analysis – Check validity (similar to compilers) – Check for attributes and relations – Type checking on the qualiﬁcation 2. Put into normal form – With SQL, the query qualiﬁcation (WHERE clause) is the most difﬁcult part as it might be an arbitrary complex predicate preceeded by quantiﬁers (∃,∀) – Conjunctive normal form (p ∨p ∨···∨p )∧···∧(p ∨p ∨···∨p ) 11 12 1n m1 m2 mn – Disjunctive normal form (p ∧p ∧···∧p )∨···∨(p ∧p ∧···∧p ) 11 12 1n m1 m2 mn – In the disjunctive normal form, the query can be processed as independent conjunctive subqueries linked by unions (corresponding to the disjunction) DDB 2008/09 J. Gamper Page 3Query Decomposition – Normalization . . . • Example: Consider the following query: Find the names of employees who have been working on project P1 for 12 or 24 months • The query in SQL: SELECT ENAME FROM EMP, ASG WHERE EMP.ENO = ASG.ENO AND ASG.PNO = ‘‘P1’’ AND DUR = 12 OR DUR = 24 • The qualiﬁcation in conjunctive normal form: EMP.ENO =ASG.ENO∧ASG.PNO = ”P1”∧(DUR = 12∨DUR = 24) • The qualiﬁcation in disjunctive normal form: (EMP.ENO =ASG.ENO∧ASG.PNO = ”P1”∧DUR = 12)∨ (EMP.ENO =ASG.ENO∧ASG.PNO = ”P1”∧DUR = 24) DDB 2008/09 J. Gamper Page 4Query Decomposition – Analysis • Analysis: Identify and reject type incorrect or semantically incorrect queries • Type incorrect – Checks whether the attributes and relation names of a query are deﬁned in the global schema – Checks whether the operations on attributes do not conﬂict with the types of the attributes, e.g., a comparison operation with an attribute of type string • Semantically incorrect – Checks whether the components contribute in any way to the generation of the result – Only a subset of relational calculus queries can be tested for correctness, i.e., those that do not contain disjunction and negation – Typical data structures used to detect the semantically incorrect queries are: ∗ Connection graph (query graph) ∗ Join graph DDB 2008/09 J. Gamper Page 5Query Decomposition – Analysis . . . • Example: Consider a query: SELECT ENAME,RESP FROM EMP, ASG, PROJ WHERE EMP.ENO = ASG.ENO AND ASG.PNO = PROJ.PNO AND PNAME = "CAD/CAM" AND DUR ≥ 36 AND TITLE = "Programmer" • Query/connection graph – Nodes represent operand or result relation – Edge represents a join if both connected nodes represent an operand relation, oth erwise it is a projection • Join graph – a subgraph of the query graph that consid ers only the joins • Since the query graph is connected, the query is semantically correct DDB 2008/09 J. Gamper Page 6Query Decomposition – Analysis . . . • Example: Consider the following query and its query graph: SELECT ENAME,RESP FROM EMP, ASG, PROJ WHERE EMP.ENO = ASG.ENO AND PNAME = "CAD/CAM" AND DUR ≥ 36 AND TITLE = "Programmer" • Since the graph is not connected, the query is semantically incorrect. • 3 possible solutions: – Reject the query – Assume an implicit Cartesian Product between ASG and PROJ – Infer from the schema the missing join predicate ASG.PNO = PROJ.PNO DDB 2008/09 J. Gamper Page 7Query Decomposition – Elimination of Redundancy • Elimination of redundancy: Simplify the query by eliminate redundancies, e.g., redundant predicates – Redundancies are often due to semantic integrity constraints expressed in the query language – e.g., queries on views are expanded into queries on relations that satiesfy certain integrity and security constraints • Transformation rules are used, e.g., – p∧p ⇐⇒ p – p∨p ⇐⇒ p – p∧true ⇐⇒ p – p∨false ⇐⇒ p – p∧false ⇐⇒ false – p∨true ⇐⇒ true – p∧¬p ⇐⇒ false – p∨¬p ⇐⇒ true – p ∧(p ∨p ) ⇐⇒ p 1 1 2 1 – p ∨(p ∧p ) ⇐⇒ p 1 1 2 1 DDB 2008/09 J. Gamper Page 8Query Decomposition – Elimination of Redundancy . . . • Example: Consider the following query: SELECT TITLE FROM EMP WHERE EMP.ENAME = "J. Doe" OR (NOT(EMP.TITLE = "Programmer") AND ( EMP.TITLE = "Elect. Eng." OR EMP.TITLE = "Programmer" ) AND NOT(EMP.TITLE = "Elect. Eng.")) • Letp be ENAME = ”J. Doe”,p be TITLE = ”Programmer” andp be TITLE = ”Elect. 1 2 3 Eng.” • Then the qualiﬁcation can be written asp ∨(¬p ∧(p ∨p )∧¬p ) 1 2 2 3 3 and then be transformed intop 1 • Simpliﬁed query: SELECT TITLE FROM EMP WHERE EMP.ENAME = "J. Doe" DDB 2008/09 J. Gamper Page 9Query Decomposition – Rewriting • Rewriting: Convert relational calculus query to relational algebra query and ﬁnd an efﬁcient expression. • Example: Find the names of employees other than J. Doe who worked on the CAD/CAM project for either 1 or 2 years. • SELECT ENAME FROM EMP, ASG, PROJ WHERE EMP.ENO = ASG.ENO AND ASG.PNO = PROJ.PNO AND ENAME = "J. Doe" AND PNAME = "CAD/CAM" AND (DUR = 12 OR DUR = 24) • A query tree represents the RAexpression – Relations are leaves (FROM clause) – Result attributes are root (SELECT clause) – Intermediate leaves should give a result from the leaves to the root DDB 2008/09 J. Gamper Page 10 6Query Decomposition – Rewriting . . . • By applying transformation rules, many different trees/expressions may be found that are equivalent to the original tree/expression, but might be more efﬁcient. • In the following we assume relationsR(A ,...,A ),S(B ,...,B ), andT which is 1 n 1 n unioncompatible toR. • Commutativity of binary operations – R×S =S×R – R⋊⋉S =S⋊⋉R – R∪S =S∪R • Associativity of binary operations – (R×S)×T =R×(S×T) – (R⋊⋉S)⋊⋉T =R⋊⋉ (S⋊⋉T) • Idempotence of unary operations – Π (Π (R)) = Π (R) A A A – σ (σ (R)) =σ (R) p1(A1) p2(A2) p1(A1)∧p2(A2) DDB 2008/09 J. Gamper Page 11Query Decomposition – Rewriting . . . • Commuting selection with binary operations – σ (R×S) ⇐⇒ σ (R)×S p(A) p(A) – σ (R⋊⋉ S) ⇐⇒ σ (R)⋊⋉ S p(A ) p(A ,B ) p(A ) p(A ,B ) 1 2 2 1 2 2 – σ (R∪T) ⇐⇒ σ (R)∪σ (T) p(A) p(A) p(A) ∗ (A belongs toR andT ) ′ ′ • Commuting projection with binary operations (assumeC =A ∪B , ′ ′ A ⊆A,B ⊆B) ′ ′ – Π (R×S) ⇐⇒ Π (R)×Π (S) C A B ′ ′ – Π (R⋊⋉ ′ ′ S) ⇐⇒ Π (R)⋊⋉ ′ ′ Π (S) C A B p(A,B ) p(A,B ) – Π (R∪S) ⇐⇒ Π (R)∪Π (S) C C C DDB 2008/09 J. Gamper Page 12Query Decomposition – Rewriting . . . • Example: Two equivalent query trees for the previous example – Recall the schemas: EMP(ENO, ENAME, TITLE) PROJ(PNO, PNAME, BUDGET) ASG(ENO, PNO, RESP, DUR) DDB 2008/09 J. Gamper Page 13Query Decomposition – Rewriting . . . • Example (contd.): Another equivalent query tree, which allows a more efﬁcient query evaluation, since the most selective operations are applied ﬁrst. DDB 2008/09 J. Gamper Page 14Data Localization • Data localization – Input: Algebraic query on global conceptual schema – Purpose: ∗ Apply data distribution information to the algebra operations and determine which fragments are involved ∗ Substitute global query with queries on fragments ∗ Optimize the global query DDB 2008/09 J. Gamper Page 15Data Localization . . . • Example: – Assume EMP is horizontally fragmented into EMP1, EMP2, EMP3 as follows: ∗ EMP1 =σ (EMP) ENO≤”E3” ∗ EMP2 =σ (EMP) ”E3”ENO≤”E6” ∗ EMP3 =σ (EMP) ENO”E6” – ASG fragmented into ASG1 and ASG2 as follows: ∗ ASG1 =σ (ASG) ENO≤”E3” ∗ ASG2 =σ (ASG) ENO”E3” • Simple approach: Replace in all queries – EMP by (EMP1∪EMP2∪ EMP3) – ASG by (ASG1∪ASG2) – Result is also called generic query • In general, the generic query is inefﬁcient since important restructurings and simpliﬁcations can be done. DDB 2008/09 J. Gamper Page 16Data Localization . . . • Example (contd.): Parallelsim in the evaluation is often possible – Depending on the horizontal fragmentation, the fragments can be joined in parallel followed by the union of the intermediate results. DDB 2008/09 J. Gamper Page 17Data Localization . . . • Example (contd.): Unnecessary work can be eliminated – e.g.,EMP ⋊⋉ASG gives an empty result 3 1 ∗ EMP3 =σ (EMP) ENO”E6” ∗ ASG1 =σ (ASG) ENO≤”E3” DDB 2008/09 J. Gamper Page 18Data Localizations Issues • Various more advanced reduction techniques are possible to generate simpler and optimized queries. • Reduction of horizontal fragmentation (HF) – Reduction with selection – Reduction with join • Reduction of vertical fragmentation (VF) – Find empty relations DDB 2008/09 J. Gamper Page 19Data Localizations Issues – Reduction of HF • Reduction with selection for HF – Consider relationR with horizontal fragmentationF =R ,R ,...,R , where 1 2 k R =σ (R) i p i – Rule1: Selections on fragments,σ (R ), that have a qualiﬁcation contradicting the p i j qualiﬁcation of the fragmentation generate empty relations, i.e., σ (R ) =∅ ⇐⇒ ∀x∈R(p (x)∧p (x) =false) p i i j j – Can be applied if fragmentation predicate is inconsistent with the query selection predicate. • Example: Consider the query: SELECT FROM EMP WHERE ENO=”E5” After commuting the selec tion with the union operation, it is easy to detect that the selection predicate contra dicts the predicates of EMP 1 and EMP . 3 DDB 2008/09 J. Gamper Page 20Data Localizations Issues – Reduction for HF . . . • Reduction with join for HF – Joins on horizontally fragmented relations can be simpliﬁed when the joined relations are fragmented according to the join attributes. – Distribute join over union (R ∪R )⋊⋉S ⇐⇒ (R ⋊⋉S)∪(R ⋊⋉S) 1 2 1 2 – Rule 2: Useless joins of fragments,R =σ (R) andR =σ (R), can be i p j p i j determined when the qualiﬁcations of the joined fragments are contradicting, i.e., R ⋊⋉R =∅ ⇐⇒ ∀x∈R ,∀y∈R (p (x)∧p (y) =false) i j i j i j ∪ ⋊⋉ ∪ R ⋊⋉ ⋊⋉ ⋊⋉ i p ,p p ,p p ,p i 1 i 2 i 3 R R R R R R R R R 1 2 3 i 1 i 2 i 3 DDB 2008/09 J. Gamper Page 21Data Localizations Issues – Reduction for HF . . . • Example: Consider the following query and fragmentation: – Query: SELECT FROM EMP, ASG WHERE EMP.ENO=ASG.ENO – Horizontal fragmentation: ∗ EMP1 =σ (EMP) ENO≤”E3” ∗ ASG1 =σ (ASG) ENO≤”E3” ∗ EMP2 =σ (EMP) ”E3”ENO≤”E6” ∗ ASG2 =σ (ASG) ENO”E3” ∗ EMP3 =σ (EMP) ENO”E6” – Generic query – The query reduced by distribut ing joins over unions and apply ing rule 2 can be implemented as a union of three partial joins that can be done in parallel. DDB 2008/09 J. Gamper Page 22Data Localizations Issues – Reduction for HF . . . • Reduction with join for derived HF – The horizontal fragmentation of one relation is derived from the horizontal fragmentation of another relation by using semijoins. • If the fragmentation is not on the same predicate as the join (as in the previous example), derived horizontal fragmentation can be applied in order to make efﬁcient join processing possible. • Example: Assume the following query and fragmentation of the EMP relation: – Query: SELECT FROM EMP, ASG WHERE EMP.ENO=ASG.ENO – Fragmentation (not on the join attribute): ∗ EMP1 =σ (EMP) TITLE=“Prgrammer” ∗ EMP2 =σ (EMP) TITLE=“Prgrammer” – To achieve efﬁcient joins ASG can be fragmented as follows: ∗ ASG1= ASG⊲ EMP1 ENO ∗ ASG2= ASG⊲ EMP2 ENO – The fragmentation of ASG is derived from the fragmentation of EMP – Queries on derived fragments can be reduced, e.g.,ASG ⋊⋉EMP =∅ 1 2 DDB 2008/09 J. Gamper Page 23 6Data Localizations Issues – Reduction for VF • Reduction for Vertical Fragmentation – Recall, VF distributes a relation based on projection, and the reconstruction operator is the join. – Similar to HF, it is possible to identify useless intermediate relations, i.e., fragments that do not contribute to the result. – Assume a relationR(A) withA =A ,...,A , which is vertically fragmented as 1 n ′ ′ R =π (R), whereA ⊆A. i A i i ′ – Rule 3:π (R ) is useless if the set of projection attributesD is not inA andK D,K i i is the key attribute. – Note that the result is not empty, but it is useless, as it contains only the key attribute. DDB 2008/09 J. Gamper Page 24Data Localizations Issues – Reduction for VF . . . • Example: Consider the following query and vertical fragmentation: – Query: SELECT ENAME FROM EMP – Fragmentation: ∗ EMP1 = Π (EMP) ENO,ENAME ∗ EMP2 = Π (EMP) ENO,TITLE • Generic query • Reduced query – By commuting the projection with the join (i.e., pro jecting on ENO, ENAME), we can see that the pro jection on EMP is useless because ENAME is not 2 in EMP . 2 DDB 2008/09 J. Gamper Page 25Conclusion • Query decomposition and data localization maps calculus query into algebra operations and applies data distribution information to the algebra operations. • Query decomposition consists of normalization, analysis, elimination of redundancy, and rewriting. • Data localization reduces horizontal fragmentation with join and selection, and vertical fragmentation with joins, and aims to ﬁnd empty relations. DDB 2008/09 J. Gamper Page 26
Website URL
Comment
Presentations
Free
Category:
Presentations
User Name:
Dr.JakeFinlay
User Type:
Teacher
Country:
Germany