Concurrency Control

Concurrency Control
Dr.JakeFinlay Profile Pic
Dr.JakeFinlay,Germany,Teacher
Published Date:22-07-2017
Your Website URL(Optional)
Comment
Chapter 9: Concurrency Control • Concurrency, Conflicts, and Schedules • Locking Based Algorithms • Timestamp Ordering Algorithms • Deadlock Management Acknowledgements: I am indebted to Arturas Mazeika for providing me his slides of this course. DDB 2008/09 J. Gamper Page 1Concurrency • Concurrency control is the problem of synchronizing concurrent transactions (i.e., order the operations of concurrent transactions) such that the following two properties are achieved: – the consistency of the DB is maintained – the maximum degree of concurrency of operations is achieved • Obviously, the serial execution of a set of transaction achieves consistency, if each single transaction is consistent DDB 2008/09 J. Gamper Page 2Conflicts • Conflicting operations: Two operationsO (x) andO (x) of transactionsT andT ij kl i k are in conflict iff at least one of the operations is a write, i.e., – O =read(x) andO =write(x) ij kl – O =write(x) andO =read(x) ij kl – O =write(x) andO =write(x) ij kl • Intuitively, a conflict between two operations indicates that their order of execution is important. • Read operations do not conflict with each other, hence the ordering of read operations does not matter. • Example: Consider the following two transactions T : Read(x) T : Read(x) 1 2 x←x+1 x←x+1 Write(x) Write(x) Commit Commit – To preserve DB consistency, it is important that theread(x) of one transaction is not betweenread(x) andwrite(x) of the other transaction. DDB 2008/09 J. Gamper Page 3Schedules • A schedule (history) specifies a possibly interleaved order of execution of the operations O of a set of transactionsT =T ,T ,...,T , whereT is specified by a partial 1 2 n i order(Σ ,≺ ). A schedule can be specified as a partial order overO, where i i S n – Σ = Σ T i i=1 S n –≺ ⊇ ≺ T i i=1 – For any two conflicting operationsO ,O ∈Σ , eitherO ≺ O or ij kl T ij T kl O ≺ O kl T ij DDB 2008/09 J. Gamper Page 4Schedules . . . • Example: Consider the following two transactions T : Read(x) T : Read(x) 1 2 x←x+1 x←x+1 Write(x) Write(x) Commit Commit – A possible schedule overT =T ,T can be written as the partial order 1 2 S =Σ ,≺ , where T T Σ =R (x),W (x),C ,R (x),W (x),C T 1 1 1 2 2 2 ≺ =(R ,W ),(R ,C ),(W ,C ), T 1 1 1 1 1 1 (R ,W ),(R ,C ),(W ,C ), 2 2 2 2 2 2 (R ,W ),(W ,W ),... 2 1 1 2 DDB 2008/09 J. Gamper Page 5Schedules . . . • A schedule is serial if all transactions inT are executed serially. • Example: Consider the following two transactions T : Read(x) T : Read(x) 1 2 x←x+1 x←x+1 Write(x) Write(x) Commit Commit – The two serial schedules areS =Σ ,≺ andS =Σ ,≺, where 1 1 1 2 2 2 Σ =Σ =R (x),W (x),C ,R (x),W (x),C 1 2 1 1 1 2 2 2 ≺ =(R ,W ),(R ,C ),(W ,C ),(R ,W ),(R ,C ),(W ,C ), 1 1 1 1 1 1 1 2 2 2 2 2 2 (C ,R ),... 1 2 ≺ =(R ,W ),(R ,C ),(W ,C ),(R ,W ),(R ,C ),(W ,C ), 2 1 1 1 1 1 1 2 2 2 2 2 2 (C ,R ),... 2 1 • We will also use the following notation: –T ,T=R (x),W (x),C ,R (x),W (x),C 1 2 1 1 1 2 2 2 –T ,T=R (x),W (x),C ,R (x),W (x),C 2 1 2 2 2 1 1 1 DDB 2008/09 J. Gamper Page 6Serializability • Two schedules are said to be equivalent if they have the same effect on the DB. • Conflict equivalence: Two schedulesS andS defined over the same set of 1 2 transactionsT =T ,T ,...,T are said to be conflict equivalent if for each pair 1 2 n of conflicting operationsO andO , wheneverO O thenO O . ij ij 1 ij 2 kl kl kl – i.e., conflicting operations must be executed in the same order in both transactions. • A concurrent schedule is said to be (conflict-)serializable iff it is conflict equivalent to a serial schedule • A conflict-serializable schedule can be transformed into a serial schedule by swapping non-conflicting operations • Example: Consider the following two schedules T : Read(x) 1 T : Read(x) 2 x←x+1 x←x+1 Write(x) Write(x) Write(z) Commit Commit – The scheduleR (x),W (x),R (x),W (x),W (z),C ,C is 1 1 2 2 1 2 1 conflict-equivalent toT ,T but not toT ,T 1 2 2 1 DDB 2008/09 J. Gamper Page 7Serializability . . . • The primary function of a concurrency controller is to generate a serializable schedule for the execution of pending transactions. • In a DDBMS two schedules must be considered – Local schedule – Global schedule (i.e., the union of the local schedules) • Serializability in DDBMS – Extends in a straightforward manner to a DDBMS if data is not replicated – Requires more care if data is replicated: It is possible that the local schedules are serializable, but the mutual consistency of the DB is not guaranteed. ∗ Mutual consistency: All the values of all replicated data items are identical • Therefore, a serializable global schedule must meet the following conditions: – Local schedules are serializable – Two conflicting operations should be in the same relative order in all of the local schedules they appear ∗ Transaction needs to be run on each site with the replicated data item DDB 2008/09 J. Gamper Page 8Serializability . . . • Example: Consider two sites and a data itemx which is replicated at both sites. T : Read(x) T : Read(x) 1 2 x←x+5 x←x∗10 Write(x) Write(x) – Both transactions need to run on both sites – The following two schedules might have been produced at both sites (the order is implicitly given): ∗ Site1: S =R (x),W (x),R (x),W (x) 1 1 1 2 2 ∗ Site2: S =R (x),W (x),R (x),W (x) 2 2 2 1 1 – Both schedules are (trivially) serializable, thus are correct in the local context – But they produce different results, thus violate the mutual consistency DDB 2008/09 J. Gamper Page 9Concurrency Control Algorithms • Taxonomy of concurrency control algorithms – Pessimistic methods assume that many transactions will conflict, thus the concurrent execution of transactions is synchronized early in their execution life cycle ∗ Two-Phase Locking (2PL) · Centralized (primary site) 2PL · Primary copy 2PL · Distributed 2PL ∗ Timestamp Ordering (TO) · Basic TO · Multiversion TO · Conservative TO ∗ Hybrid algorithms – Optimistic methods assume that not too many transactions will conflict, thus delay the synchronization of transactions until their termination ∗ Locking-based ∗ Timestamp ordering-based DDB 2008/09 J. Gamper Page 10Locking Based Algorithms • Locking-based concurrency algorithms ensure that data items shared by conflicting operations are accessed in a mutually exclusive way. This is accomplished by associating a “lock” with each such data item. • Two types of locks (lock modes) – read lock (rl) – also called shared lock – write lock (wl) – also called exclusive lock • Compatibility matrix of locks rl (x) wl (x) i i rl (x) compatible not compatible j wl (x) not compatible not compatible j • General locking algorithm 1. Before using a data itemx, transaction requests lock forx from the lock manager 2. Ifx is already locked and the existing lock is incompatible with the requested lock, the transaction is delayed 3. Otherwise, the lock is granted DDB 2008/09 J. Gamper Page 11Locking Based Algorithms • Example: Consider the following two transactions T : Read(x) T : Read(x) 1 2 x←x+1 x←x∗2 Write(x) Write(x) Read(y) Read(y) y←y−1 y←y∗2 Write(y) Write(y) – The following schedule is a valid locking-based schedule (lr (x) indicates the i release of a lock onx): S =wl (x),R (x),W (x),lr (x) 1 1 1 1 wl (x),R (x),W (x),lr (x) 2 2 2 2 wl (y),R (y),W (y),lr (y) 2 2 2 2 wl (y),R (y),W (y),lr (y) 1 1 1 1 – However,S is not serializable ∗ S cannot be transformed into a serial schedule by using only non-conflicting swaps ∗ The result is different from the result of any serial execution DDB 2008/09 J. Gamper Page 12Two-Phase Locking (2PL) • Two-phase locking protocol – Each transaction is executed in two phases ∗ Growing phase: the transaction obtains locks ∗ Shrinking phase: the transaction releases locks – The lock point is the moment when transitioning from the growing phase to the shrinking phase DDB 2008/09 J. Gamper Page 13Two-Phase Locking (2PL) . . . • Properties of the 2PL protocol – Generates conflict-serializable schedules – But schedules may cause cascading aborts ∗ If a transaction aborts after it releases a lock, it may cause other transactions that have accessed the unlocked data item to abort as well • Strict 2PL locking protocol – Holds the locks till the end of the transaction – Cascading aborts are avoided DDB 2008/09 J. Gamper Page 14Two-Phase Locking (2PL) . . . • Example: The scheduleS of the previous example is not valid in the 2PL protocol: S =wl (x),R (x),W (x),lr (x) 1 1 1 1 wl (x),R (x),W (x),lr (x) 2 2 2 2 wl (y),R (y),W (y),lr (y) 2 2 2 2 wl (y),R (y),W (y),lr (y) 1 1 1 1 – e.g., afterlr (x) (in line 1) transactionT cannot request the lockwl (y) (in line 4). 1 1 1 – Valid schedule in the 2PL protocol S =wl (x),R (x),W (x), 1 1 1 wl (y),R (y),W (y),lr (x),lr (y) 1 1 1 1 1 wl (x),R (x),W (x), 2 2 2 wl (y),R (y),W (y),lr (x),lr (y) 2 2 2 2 2 DDB 2008/09 J. Gamper Page 152PL for DDBMS • Various extensions of the 2PL to DDBMS • Centralized 2PL – A single site is responsible for the lock management, i.e., one lock manager for the whole DDBMS – Lock requests are issued to the lock manager – Coordinating transaction manager (TM at site where the transaction is initiated) can make all locking requests on behalf of local transaction managers • Advantage: Easy to implement • Disadvantages: Bottlenecks and lower reliability • Replica control protocol is addi- tionally needed if data are repli- cated (see also primary copy 2PL) DDB 2008/09 J. Gamper Page 162PL for DDBMS . . . • Primary copy 2PL – Several lock managers are distributed to a number of sites – Each lock manager is responsible for managing the locks for a set of data items – For replicated data items, one copy is chosen as primary copy, others are slave copies – Only the primary copy of a data item that is updated needs to be write-locked – Once primary copy has been updated, the change is propagated to the slaves • Advantages – Lower communication costs and better performance than the centralized 2PL • Disadvantages – Deadlock handling is more complex DDB 2008/09 J. Gamper Page 172PL for DDBMS . . . • Distributed 2PL – Lock managers are distributed to all sites – Each lock manager responsible for locks for data at that site – If data is not replicated, it is equivalent to primary copy 2PL – If data is replicated, the Read-One-Write-All (ROWA) replica control protocol is implemented ∗ Read(x): Any copy of a replicated itemx can be read by obtaining a read lock on the copy ∗ Write(x): All copies ofx must be write-locked beforex can be updated • Disadvantages – Deadlock handling more complex – Communication costs higher than primary copy 2PL DDB 2008/09 J. Gamper Page 182PL for DDBMS . . . • Communication structure of the distributed 2PL – The coordinating TM sends the lock request to the lock managers of all participating sites – The LMs pass the operations to the data processors – The end of the operation is signaled to the coordinating TM DDB 2008/09 J. Gamper Page 19Timestamp Ordering • Timestamp-ordering based algorithms do not maintain serializability by mutual exclusion, but select (a priori) a serialization order and execute transactions accordingly. – TransactionT is assigned a globally unique timestampts(T ) i i – Conflicting operationsO andO are resolved by timestamp order, i.e.,O is ij ij kl executed beforeO iffts(T )ts(T ). i kl k • To allow for the scheduler to check whether operations arrive in correct order, each data item is assigned a write timestamp (wts) and a read timestamp (rts): – rts(x): largest timestamp of any read onx – wts(x): largest timestamp of any write onx • Then the scheduler has to perform the following checks: – Read operation,R (x): i ∗ Ifts(T )wts(x): T attempts to read overwritten data; abortT i i i ∗ Ifts(T )≥wts(x): the operation is allowed andrts(x) is updated i – Write operations,W (x): i ∗ Ifts(T )rts(x): x was needed before by other transaction; abortT i i ∗ Ifts(T )wts(x): T writes an obsolete value; abortT i i i ∗ Otherwise, executeW (x) i DDB 2008/09 J. Gamper Page 20