Distributed Databases

Distributed Databases
Dr.JakeFinlay Profile Pic
Published Date:22-07-2017
Your Website URL(Optional)
Distributed Databases Chapter 1: Introduction Johann Gamper • Syllabus • Data Independence and Distributed Data Processing • Definition of Distributed databases • Promises of Distributed Databases • Technical Problems to be Studied • Conclusion Acknowledgements: I am indebted to Arturas Mazeika for providing me his slides of this course. DDB 2008/09 J. Gamper Page 1Syllabus • Introduction • Distributed DBMS Architecture • Distributed Database Design • Query Processing • Transaction Management • Distributed Concurrency Control • Distributed DBMS Reliability • Parallel Database Systems DDB 2008/09 J. Gamper Page 2Data Independence • In the old days, programs stored data in regular files • Each program has to maintain its own data – huge overhead – error-prone DDB 2008/09 J. Gamper Page 3Data Independence . . . • The development of DBMS helped to fully achieve data independence (transparency) • Provide centralized and controlled data maintenance and access • Application is immune to physical and logical file organization DDB 2008/09 J. Gamper Page 4Data Independence . . . • Distributed database system is the union of what appear to be two diametrically opposed approaches to data processing: database systems and computer network – Computer networks promote a mode of work that goes against centralization • Key issues to understand this combination – The most important objective of DB technology is integration not centralization – Integration is possible without centralization, i.e., integration of databases and networking does not mean centralization (in fact quite opposite) • Goal of distributed database systems: achieve data integration and data distribution transparency DDB 2008/09 J. Gamper Page 5Distributed Computing/Data Processing • A distributed computing system is a collection of autonomous processing elements that are interconnected by a computer network. The elements cooperate in order to perform the assigned task. • The term “distributed” is very broadly used. The exact meaning of the word depends on the context. • Synonymous terms: – distributed function – distributed data processing – multiprocessors/multicomputers – satellite processing – back-end processing – dedicated/special purpose computers – timeshared systems – functionally modular systems DDB 2008/09 J. Gamper Page 6Distributed Computing/Data Processing . . . • What can be distributed? – Processing logic – Functions – Data – Control • Classification of distributed systems with respect to various criteria – Degree of coupling, i.e., how closely the processing elements are connected ∗ e.g., measured as ratio of amount of data exchanged to amount of local processing ∗ weak coupling, strong coupling – Interconnection structure ∗ point-to-point connection between processing elements ∗ common interconnection channel – Synchronization ∗ synchronous ∗ asynchronous DDB 2008/09 J. Gamper Page 7Definition of DDB and DDBMS • A distributed database (DDB) is a collection of multiple, logically interrelated databases distributed over a computer network • A distributed database management system (DDBMS) is the software that manages the DDB and provides an access mechanism that makes this distribution transparent to the users • The terms DDBMS and DDBS are often used interchangeably • Implicit assumptions – Data stored at a number of sites each site logically consists of a single processor – Processors at different sites are interconnected by a computer network (we do not consider multiprocessors in DDBMS, cf. parallel systems) – DDBS is a database, not a collection of files (cf. relational data model). Placement and query of data is impacted by the access patterns of the user – DDBMS is a collections of DBMSs (not a remote file system) DDB 2008/09 J. Gamper Page 8Definition of DDB and DDBMS . . . DDB 2008/09 J. Gamper Page 9Definition of DDB and DDBMS . . . • Example: Database consists of 3 relationsemployees,projects, and assignment which are partitioned and stored at different sites (fragmentation). • What are the problems with queries, transactions, concurrency, and reliability? DDB 2008/09 J. Gamper Page 10What is not a DDBS? • The following systems are parallel database systems and are quite different from (though related to) distributed DB systems Shared Memory Shared Disk Shared Nothing Central Databases DDB 2008/09 J. Gamper Page 11Applications • Manufacturing, especially multi-plant manufacturing • Military command and control • Airlines • Hotel chains • Any organization which has a decentralized organization structure DDB 2008/09 J. Gamper Page 12Promises of DDBSs Distributed Database Systems deliver the following advantages: • Higher reliability • Improved performance • Easier system expansion • Transparency of distributed and replicated data DDB 2008/09 J. Gamper Page 13Promises of DDBSs . . . Higher reliability • Replication of components • No single points of failure • e.g., a broken communication link or processing element does not bring down the entire system • Distributed transaction processing guarantees the consistency of the database and concurrency DDB 2008/09 J. Gamper Page 14Promises of DDBSs . . . Improved performance • Proximity of data to its points of use – Reduces remote access delays – Requires some support for fragmentation and replication • Parallelism in execution – Inter-query parallelism – Intra-query parallelism • Update and read-only queries influence the design of DDBSs substantially – If mostly read-only access is required, as much as possible of the data should be replicated – Writing becomes more complicated with replicated data DDB 2008/09 J. Gamper Page 15Promises of DDBSs . . . Easier system expansion • Issue is database scaling • Emergence of microprocessor and workstation technologies – Network of workstations much cheaper than a single mainframe computer • Data communication cost versus telecommunication cost • Increasing database size DDB 2008/09 J. Gamper Page 16Promises of DDBSs . . . Transparency • Refers to the separation of the higher-level semantics of the system from the lower-level implementation issues • A transparent system “hides” the implementation details from the users. • A fully transparent DBMS provides high-level support for the development of complex applications. (a) User wants to see one database (b) Programmer sees many databases DDB 2008/09 J. Gamper Page 17Promises of DDBSs . . . Various forms of transparency can be distingushed for DDBMSs: • Network transparency (also called distribution transparency) – Location transparency – Naming transparency • Replication transparency • Fragmentation transparency • Transaction transparency – Concurrency transparency – Failure transparency • Performance transparency DDB 2008/09 J. Gamper Page 18Promises of DDBSs . . . • Network/Distribution transparency allows a user to perceive a DDBS as a single, logical entity • The user is protected from the operational details of the network (or even does not know about the existence of the network) • The user does not need to know the location of data items and a command used to perform a task is independent from the location of the data and the site the task is performed (location transparency) • A unique name is provided for each object in the database (naming transparency) – In absence of this, users are required to embed the location name as part of an identifier DDB 2008/09 J. Gamper Page 19Promises of DDBSs . . . Different ways to ensure naming transparency: • Solution 1: Create a central name server; however, this results in – loss of some local autonomy – central site may become a bottleneck – low availability (if the central site fails remaining sites cannot create new objects) • Solution 2: Prefix object with identifier of site that created it – e.g., branch created at site S1 might be named S1.BRANCH – Also need to identify each fragment and its copies – e.g., copy 2 of fragment 3 of Branch created at site S1 might be referred to as S1.BRANCH.F3.C2 • An approach that resolves these problems uses aliases for each database object – Thus, S1.BRANCH.F3.C2 might be known as local branch by user at site S1 – DDBMS has task of mapping an alias to appropriate database object DDB 2008/09 J. Gamper Page 20