Distributed DBMS Reliability

reliability of distributed dbms and distributed dbms reliability ppt
Dr.JakeFinlay Profile Pic
Published Date:22-07-2017
Your Website URL(Optional)
Chapter 10: Distributed DBMS Reliability • Definitions and Basic Concepts • Local Recovery Management • In-place update, out-of-place update • Distributed Reliability Protocols • Two phase commit protocol • Three phase commit protocol Acknowledgements: I am indebted to Arturas Mazeika for providing me his slides of this course. DDB 2008/09 J. Gamper Page 1Reliability • A reliable DDBMS is one that can continue to process user requests even when the underlying system is unreliable, i.e., failures occur • Failures – Transaction failures – System (site) failures, e.g., system crash, power supply failure – Media failures, e.g., hard disk failures – Communication failures, e.g., lost/undeliverable messages • Reliability is closely related to the problem of how to maintain the atomicity and durability properties of transactions DDB 2008/09 J. Gamper Page 2Reliability . . . • Recovery system: Ensures atomicity and durability of transactions in the presence of failures (and concurrent transactions) • Recovery algorithms have two parts 1. Actions taken during normal transaction processing to ensure enough information exists to recover from failures 2. Actions taken after a failure to recover the DB contents to a state that ensures atomicity, consistency and durability DDB 2008/09 J. Gamper Page 3Local Recovery Management • The local recovery manager (LRM) maintains the atomicity and durability properties of local transactions at each site. • Architecture – Volatile storage: The main memory of the computer system (RAM) – Stable storage ∗ A storage that “never” looses its contents ∗ In reality this can only be approximated by a combination of hardware (non-volatile storage) and software (stable-write, stable-read, clean-up) components DDB 2008/09 J. Gamper Page 4Local Recovery Management . . . • Two ways for the LRM to deal with update/write operations – In-place update ∗ Physically changes the value of the data item in the stable database ∗ As a result, previous values are lost ∗ Mostly used in databases – Out-of-place update ∗ The new value(s) of updated data item(s) are stored separately from the old value(s) ∗ Periodically, the updated values have to be integrated into the stable DB DDB 2008/09 J. Gamper Page 5In-Place Update • Since in-place updates cause previous values of the affected data items to be lost, it is necessary to keep enough information about the DB updates in order to allow recovery in the case of failures • Thus, every action of a transaction must not only perform the action, but must also write a log record to an append-only log file DDB 2008/09 J. Gamper Page 6In-Place Update . . . • A log is the most popular structure for recording DB modifications on stable storage • Consists of a sequence of log records that record all the update activities in the DB • Each log record describes a significant event during transaction processing • Types of log records – T ,start: if transactionT has started i i – T ,X ,V ,V : beforeT executes awrite(X ), whereV is the old value i j 1 2 i j 1 before the write andV is the new value after the write 2 – T ,commit: ifT has committed i i – T ,abort: ifT has aborted i i – checkpoint • With the information in the log file the recovery manager can restore the consistency of the DB in case of a failure. DDB 2008/09 J. Gamper Page 7In-Place Update . . . • Assume the following situation when a system crash occurs • Upon recovery: – All effects of transactionT should be reflected in the database (⇒ REDO) 1 – None of the effects of transactionT should be reflected in the database (⇒ UNDO) 2 DDB 2008/09 J. Gamper Page 8In-Place Update . . . • REDO Protocol – REDO’ing an action means performing it again – The REDO operation uses the log information and performs the action that might have been done before, or not done due to failures – The REDO operation generates the new image DDB 2008/09 J. Gamper Page 9In-Place Update . . . • UNDO Protocol – UNDO’ing an action means to restore the object to its image before the transaction has started – The UNDO operation uses the log information and restores the old value of the object DDB 2008/09 J. Gamper Page 10In-Place Update . . . • Example: Consider the transactionsT andT (T executes beforeT ) and the 0 1 0 1 following initial values:A = 1000,B = 2000, andC = 700 T : read(A) 0 A =A−50 T : read(C) 1 write(A) C =C−100 read(B) write(C) B =B +50 write(B) – Possible order of actual outputs to the log file and the DB: Log DB T ,start 0 T ,A,1000,950 0 T ,B,2000,2050 0 T ,commit 0 A = 950 B = 2050 T ,start 1 T ,C,700,600 1 T ,commit 1 C = 600 DDB 2008/09 J. Gamper Page 11In-Place Update . . . • Example (contd.): Consider the log after some system crashes and the corresponding recovery actions (a)T ,start (b)T ,start (c)T ,start 0 0 0 T ,A,1000,950 T ,A,1000,950 T ,A,1000,950 0 0 0 T ,B,2000,2050 T ,B,2000,2050 T ,B,2000,2050 0 0 0 T ,commit T ,commit 0 0 T ,start T ,start 1 1 T ,C,700,600 T ,C,700,600 1 1 T ,commit 1 (a) undo(T0): B is restored to 2000 and A to 1000 (b) undo(T1) and redo(T0): C is restored to 700, and then A and B are set to 950 and 2050, respectively (c) redo(T0) and redo(T1): A and B are set to 950 and 2050, respectively; then C is set to 600 DDB 2008/09 J. Gamper Page 12In-Place Update . . . • Logging Interface • Log pages/buffers can be written to stable storage in two ways: – synchronously ∗ The addition of each log record requires that the log is written to stable storage ∗ When the log is written synchronoously, the executtion of the transaction is supended until the write is complete→ delay in response time – asynchronously ∗ Log is moved to stable storage either at periodic intervals or when the buffer fills up. DDB 2008/09 J. Gamper Page 13In-Place Update . . . • When to write log records into stable storage? • Assume a transactionT updates a pageP • Fortunate case – System writesP in stable database – System updates stable log for this update – SYSTEM FAILURE OCCURS... (beforeT commits) – We can recover (undo) by restoringP to its old state by using the log • Unfortunate case – System writesP in stable database – SYSTEM FAILURE OCCURS... (before stable log is updated) – We cannot recover from this failure because there is no log record to restore the old value • Solution: Write-Ahead Log (WAL) protocol DDB 2008/09 J. Gamper Page 14In-Place Update . . . • Notice: – If a system crashes before a transaction is committed, then all the operations must be undone. We need only the before images (undo portion of the log) – Once a transaction is committed, some of its actions might have to be redone. We need the after images (redo portion of the log) • Write-Ahead-Log (WAL) Protocol – Before a stable database is updated, the undo portion of the log should be written to the stable log – When a transaction commits, the redo portion of the log must be written to stable log prior to the updating of the stable database DDB 2008/09 J. Gamper Page 15Out-of-Place Update • Two out-of-place strategies are shadowing and differential files • Shadowing – When an update occurs, don’t change the old page, but create a shadow page with the new values and write it into the stable database – Update the access paths so that subsequent accesses are to the new shadow page – The old page is retained for recovery • Differential files – For each DB fileF maintain ∗ a read-only partFR + − ∗ a differential file consisting of insertions part (DF ) and deletions part (DF ) + − – Thus,F = (FR∪DF )−DF DDB 2008/09 J. Gamper Page 16Distributed Reliability Protocols • As with local reliability protocols, the distributed versions aim to maintain the atomicity and durability of distributed transactions • Most problematic issues in a distributed transaction are commit, termination, and recovery – Commit protocols ∗ How to execute a commit command for distributed transactions ∗ How to ensure atomicity (and durability)? – Termination protocols ∗ If a failure occurs at a site, how can the other operational sites deal with it ∗ Non-blocking: the occurrence of failures should not force the sites to wait until the failure is repaired to terminate the transaction – Recovery protocols ∗ When a failure occurs, how do the sites where the failure occurred deal with it ∗ Independent: a failed site can determine the outcome of a transaction without having to obtain remote information DDB 2008/09 J. Gamper Page 17Commit Protocols • Primary requirement of commit protocols is that they maintain the atomicity of distributed transactions (atomic commitment) – i.e., even though the exectution of the distributed transaction involves multiple sites, some of which might fail while executing, the effects of the transaction on the distributed DB is all-or-nothing. • In the following we distinguish two roles – Coordinator: The process at the site where the transaction originates and which controls the execution – Participant: The process at the other sites that participate in executing the transaction DDB 2008/09 J. Gamper Page 18Centralized Two Phase Commit Protocol (2PC) • Very simple protocol that ensures the atomic commitment of distributed transactions. • Phase 1: The coordinator gets the participants ready to write the results into the database • Phase 2: Everybody writes the results into the database • Global Commit Rule – The coordinator aborts a transaction if and only if at least one participant votes to abort it – The coordinator commits a transaction if and only if all of the participants vote to commit it • Centralized since communication is only between coordinator and the participants DDB 2008/09 J. Gamper Page 19Centralized Two Phase Commit Protocol (2PC) . . . DDB 2008/09 J. Gamper Page 20