Question? Leave a message!




Query Decomposition and Data Localization

Query Decomposition and Data Localization
Dr.JakeFinlay Profile Pic
Dr.JakeFinlay,Germany,Teacher
Published Date:22-07-2017
Website URL
Comment
Chapter 6: Query Decomposition and Data Localization • Query Decomposition • Data Localization Acknowledgements: I am indebted to Arturas Mazeika for providing me his slides of this course. DDB 2008/09 J. Gamper Page 1Query Decomposition • Query decomposition: Mapping of calcu- lus query (SQL) to algebra operations (select, project, join, rename) • Both input and output queries refer to global re- lations, without knowledge of the distribution of data. • The output query is semantically correct and good in the sense that redundant work is avoided. • Query decomposistion consists of 4 steps: 1. Normalization: Transform query to a normalized form 2. Analysis: Detect and reject ”incorrect” queries; possible only for a subset of relational calculus 3. Elimination of redundancy: Eliminate redundant predicates 4. Rewriting: Transform query to RA and optimize query DDB 2008/09 J. Gamper Page 2Query Decomposition – Normalization • Normalization: Transform the query to a normalized form to facilitate further processing. Consists mainly of two steps. 1. Lexical and syntactic analysis – Check validity (similar to compilers) – Check for attributes and relations – Type checking on the qualification 2. Put into normal form – With SQL, the query qualification (WHERE clause) is the most difficult part as it might be an arbitrary complex predicate preceeded by quantifiers (∃,∀) – Conjunctive normal form (p ∨p ∨···∨p )∧···∧(p ∨p ∨···∨p ) 11 12 1n m1 m2 mn – Disjunctive normal form (p ∧p ∧···∧p )∨···∨(p ∧p ∧···∧p ) 11 12 1n m1 m2 mn – In the disjunctive normal form, the query can be processed as independent conjunctive subqueries linked by unions (corresponding to the disjunction) DDB 2008/09 J. Gamper Page 3Query Decomposition – Normalization . . . • Example: Consider the following query: Find the names of employees who have been working on project P1 for 12 or 24 months? • The query in SQL: SELECT ENAME FROM EMP, ASG WHERE EMP.ENO = ASG.ENO AND ASG.PNO = ‘‘P1’’ AND DUR = 12 OR DUR = 24 • The qualification in conjunctive normal form: EMP.ENO =ASG.ENO∧ASG.PNO = ”P1”∧(DUR = 12∨DUR = 24) • The qualification in disjunctive normal form: (EMP.ENO =ASG.ENO∧ASG.PNO = ”P1”∧DUR = 12)∨ (EMP.ENO =ASG.ENO∧ASG.PNO = ”P1”∧DUR = 24) DDB 2008/09 J. Gamper Page 4Query Decomposition – Analysis • Analysis: Identify and reject type incorrect or semantically incorrect queries • Type incorrect – Checks whether the attributes and relation names of a query are defined in the global schema – Checks whether the operations on attributes do not conflict with the types of the attributes, e.g., a comparison operation with an attribute of type string • Semantically incorrect – Checks whether the components contribute in any way to the generation of the result – Only a subset of relational calculus queries can be tested for correctness, i.e., those that do not contain disjunction and negation – Typical data structures used to detect the semantically incorrect queries are: ∗ Connection graph (query graph) ∗ Join graph DDB 2008/09 J. Gamper Page 5Query Decomposition – Analysis . . . • Example: Consider a query: SELECT ENAME,RESP FROM EMP, ASG, PROJ WHERE EMP.ENO = ASG.ENO AND ASG.PNO = PROJ.PNO AND PNAME = "CAD/CAM" AND DUR ≥ 36 AND TITLE = "Programmer" • Query/connection graph – Nodes represent operand or result relation – Edge represents a join if both connected nodes represent an operand relation, oth- erwise it is a projection • Join graph – a subgraph of the query graph that consid- ers only the joins • Since the query graph is connected, the query is semantically correct DDB 2008/09 J. Gamper Page 6Query Decomposition – Analysis . . . • Example: Consider the following query and its query graph: SELECT ENAME,RESP FROM EMP, ASG, PROJ WHERE EMP.ENO = ASG.ENO AND PNAME = "CAD/CAM" AND DUR ≥ 36 AND TITLE = "Programmer" • Since the graph is not connected, the query is semantically incorrect. • 3 possible solutions: – Reject the query – Assume an implicit Cartesian Product between ASG and PROJ – Infer from the schema the missing join predicate ASG.PNO = PROJ.PNO DDB 2008/09 J. Gamper Page 7Query Decomposition – Elimination of Redundancy • Elimination of redundancy: Simplify the query by eliminate redundancies, e.g., redundant predicates – Redundancies are often due to semantic integrity constraints expressed in the query language – e.g., queries on views are expanded into queries on relations that satiesfy certain integrity and security constraints • Transformation rules are used, e.g., – p∧p ⇐⇒ p – p∨p ⇐⇒ p – p∧true ⇐⇒ p – p∨false ⇐⇒ p – p∧false ⇐⇒ false – p∨true ⇐⇒ true – p∧¬p ⇐⇒ false – p∨¬p ⇐⇒ true – p ∧(p ∨p ) ⇐⇒ p 1 1 2 1 – p ∨(p ∧p ) ⇐⇒ p 1 1 2 1 DDB 2008/09 J. Gamper Page 8Query Decomposition – Elimination of Redundancy . . . • Example: Consider the following query: SELECT TITLE FROM EMP WHERE EMP.ENAME = "J. Doe" OR (NOT(EMP.TITLE = "Programmer") AND ( EMP.TITLE = "Elect. Eng." OR EMP.TITLE = "Programmer" ) AND NOT(EMP.TITLE = "Elect. Eng.")) • Letp be ENAME = ”J. Doe”,p be TITLE = ”Programmer” andp be TITLE = ”Elect. 1 2 3 Eng.” • Then the qualification can be written asp ∨(¬p ∧(p ∨p )∧¬p ) 1 2 2 3 3 and then be transformed intop 1 • Simplified query: SELECT TITLE FROM EMP WHERE EMP.ENAME = "J. Doe" DDB 2008/09 J. Gamper Page 9Query Decomposition – Rewriting • Rewriting: Convert relational calculus query to relational algebra query and find an efficient expression. • Example: Find the names of employees other than J. Doe who worked on the CAD/CAM project for either 1 or 2 years. • SELECT ENAME FROM EMP, ASG, PROJ WHERE EMP.ENO = ASG.ENO AND ASG.PNO = PROJ.PNO AND ENAME = "J. Doe" AND PNAME = "CAD/CAM" AND (DUR = 12 OR DUR = 24) • A query tree represents the RA-expression – Relations are leaves (FROM clause) – Result attributes are root (SELECT clause) – Intermediate leaves should give a result from the leaves to the root DDB 2008/09 J. Gamper Page 10 6Query Decomposition – Rewriting . . . • By applying transformation rules, many different trees/expressions may be found that are equivalent to the original tree/expression, but might be more efficient. • In the following we assume relationsR(A ,...,A ),S(B ,...,B ), andT which is 1 n 1 n union-compatible toR. • Commutativity of binary operations – R×S =S×R – R⋊⋉S =S⋊⋉R – R∪S =S∪R • Associativity of binary operations – (R×S)×T =R×(S×T) – (R⋊⋉S)⋊⋉T =R⋊⋉ (S⋊⋉T) • Idempotence of unary operations – Π (Π (R)) = Π (R) A A A – σ (σ (R)) =σ (R) p1(A1) p2(A2) p1(A1)∧p2(A2) DDB 2008/09 J. Gamper Page 11Query Decomposition – Rewriting . . . • Commuting selection with binary operations – σ (R×S) ⇐⇒ σ (R)×S p(A) p(A) – σ (R⋊⋉ S) ⇐⇒ σ (R)⋊⋉ S p(A ) p(A ,B ) p(A ) p(A ,B ) 1 2 2 1 2 2 – σ (R∪T) ⇐⇒ σ (R)∪σ (T) p(A) p(A) p(A) ∗ (A belongs toR andT ) ′ ′ • Commuting projection with binary operations (assumeC =A ∪B , ′ ′ A ⊆A,B ⊆B) ′ ′ – Π (R×S) ⇐⇒ Π (R)×Π (S) C A B ′ ′ – Π (R⋊⋉ ′ ′ S) ⇐⇒ Π (R)⋊⋉ ′ ′ Π (S) C A B p(A,B ) p(A,B ) – Π (R∪S) ⇐⇒ Π (R)∪Π (S) C C C DDB 2008/09 J. Gamper Page 12Query Decomposition – Rewriting . . . • Example: Two equivalent query trees for the previous example – Recall the schemas: EMP(ENO, ENAME, TITLE) PROJ(PNO, PNAME, BUDGET) ASG(ENO, PNO, RESP, DUR) DDB 2008/09 J. Gamper Page 13Query Decomposition – Rewriting . . . • Example (contd.): Another equivalent query tree, which allows a more efficient query evaluation, since the most selective operations are applied first. DDB 2008/09 J. Gamper Page 14Data Localization • Data localization – Input: Algebraic query on global conceptual schema – Purpose: ∗ Apply data distribution information to the algebra operations and determine which fragments are involved ∗ Substitute global query with queries on fragments ∗ Optimize the global query DDB 2008/09 J. Gamper Page 15Data Localization . . . • Example: – Assume EMP is horizontally fragmented into EMP1, EMP2, EMP3 as follows: ∗ EMP1 =σ (EMP) ENO≤”E3” ∗ EMP2 =σ (EMP) ”E3”ENO≤”E6” ∗ EMP3 =σ (EMP) ENO”E6” – ASG fragmented into ASG1 and ASG2 as follows: ∗ ASG1 =σ (ASG) ENO≤”E3” ∗ ASG2 =σ (ASG) ENO”E3” • Simple approach: Replace in all queries – EMP by (EMP1∪EMP2∪ EMP3) – ASG by (ASG1∪ASG2) – Result is also called generic query • In general, the generic query is inefficient since important restructurings and simplifications can be done. DDB 2008/09 J. Gamper Page 16Data Localization . . . • Example (contd.): Parallelsim in the evaluation is often possible – Depending on the horizontal fragmentation, the fragments can be joined in parallel followed by the union of the intermediate results. DDB 2008/09 J. Gamper Page 17Data Localization . . . • Example (contd.): Unnecessary work can be eliminated – e.g.,EMP ⋊⋉ASG gives an empty result 3 1 ∗ EMP3 =σ (EMP) ENO”E6” ∗ ASG1 =σ (ASG) ENO≤”E3” DDB 2008/09 J. Gamper Page 18Data Localizations Issues • Various more advanced reduction techniques are possible to generate simpler and optimized queries. • Reduction of horizontal fragmentation (HF) – Reduction with selection – Reduction with join • Reduction of vertical fragmentation (VF) – Find empty relations DDB 2008/09 J. Gamper Page 19Data Localizations Issues – Reduction of HF • Reduction with selection for HF – Consider relationR with horizontal fragmentationF =R ,R ,...,R , where 1 2 k R =σ (R) i p i – Rule1: Selections on fragments,σ (R ), that have a qualification contradicting the p i j qualification of the fragmentation generate empty relations, i.e., σ (R ) =∅ ⇐⇒ ∀x∈R(p (x)∧p (x) =false) p i i j j – Can be applied if fragmentation predicate is inconsistent with the query selection predicate. • Example: Consider the query: SELECT FROM EMP WHERE ENO=”E5” After commuting the selec- tion with the union operation, it is easy to detect that the selection predicate contra- dicts the predicates of EMP 1 and EMP . 3 DDB 2008/09 J. Gamper Page 20