Chapter 1: Introduction
• Data Independence and Distributed Data Processing
• Deﬁnition of Distributed databases
• Promises of Distributed Databases
• Technical Problems to be Studied
Acknowledgements: I am indebted to Arturas Mazeika for providing me his slides of this course.
DDB 2008/09 J. Gamper Page 1Syllabus
• Distributed DBMS Architecture
• Distributed Database Design
• Query Processing
• Transaction Management
• Distributed Concurrency Control
• Distributed DBMS Reliability
• Parallel Database Systems
DDB 2008/09 J. Gamper Page 2Data Independence
• In the old days, programs stored data in regular ﬁles
• Each program has to maintain its own data
– huge overhead
DDB 2008/09 J. Gamper Page 3Data Independence . . .
• The development of DBMS helped to fully achieve data independence (transparency)
• Provide centralized and controlled data maintenance and access
• Application is immune to physical and logical ﬁle organization
DDB 2008/09 J. Gamper Page 4Data Independence . . .
• Distributed database system is the union of what appear to be two diametrically opposed
approaches to data processing: database systems and computer network
– Computer networks promote a mode of work that goes against centralization
• Key issues to understand this combination
– The most important objective of DB technology is integration not centralization
– Integration is possible without centralization, i.e., integration of databases and
networking does not mean centralization (in fact quite opposite)
• Goal of distributed database systems: achieve data integration and data distribution
DDB 2008/09 J. Gamper Page 5Distributed Computing/Data Processing
• A distributed computing system is a collection of autonomous processing elements
that are interconnected by a computer network. The elements cooperate in order to
perform the assigned task.
• The term “distributed” is very broadly used. The exact meaning of the word depends on
• Synonymous terms:
– distributed function
– distributed data processing
– satellite processing
– back-end processing
– dedicated/special purpose computers
– timeshared systems
– functionally modular systems
DDB 2008/09 J. Gamper Page 6Distributed Computing/Data Processing . . .
• What can be distributed?
– Processing logic
• Classiﬁcation of distributed systems with respect to various criteria
– Degree of coupling, i.e., how closely the processing elements are connected
∗ e.g., measured as ratio of amount of data exchanged to amount of local processing
∗ weak coupling, strong coupling
– Interconnection structure
∗ point-to-point connection between processing elements
∗ common interconnection channel
DDB 2008/09 J. Gamper Page 7Deﬁnition of DDB and DDBMS
• A distributed database (DDB) is a collection of multiple, logically interrelated databases
distributed over a computer network
• A distributed database management system (DDBMS) is the software that manages
the DDB and provides an access mechanism that makes this distribution transparent to
• The terms DDBMS and DDBS are often used interchangeably
• Implicit assumptions
– Data stored at a number of sites each site logically consists of a single processor
– Processors at different sites are interconnected by a computer network (we do not
consider multiprocessors in DDBMS, cf. parallel systems)
– DDBS is a database, not a collection of ﬁles (cf. relational data model). Placement
and query of data is impacted by the access patterns of the user
– DDBMS is a collections of DBMSs (not a remote ﬁle system)
DDB 2008/09 J. Gamper Page 8Deﬁnition of DDB and DDBMS . . .
DDB 2008/09 J. Gamper Page 9Deﬁnition of DDB and DDBMS . . .
• Example: Database consists of 3 relationsemployees,projects, and
assignment which are partitioned and stored at different sites (fragmentation).
• What are the problems with queries, transactions, concurrency, and reliability?
DDB 2008/09 J. Gamper Page 10What is not a DDBS?
• The following systems are parallel database systems and are quite different from (though
related to) distributed DB systems
Shared Memory Shared Disk
Shared Nothing Central Databases
DDB 2008/09 J. Gamper Page 11Applications
• Manufacturing, especially multi-plant manufacturing
• Military command and control
• Hotel chains
• Any organization which has a decentralized organization structure
DDB 2008/09 J. Gamper Page 12Promises of DDBSs
Distributed Database Systems deliver the following advantages:
• Higher reliability
• Improved performance
• Easier system expansion
• Transparency of distributed and replicated data
DDB 2008/09 J. Gamper Page 13Promises of DDBSs . . .
• Replication of components
• No single points of failure
• e.g., a broken communication link or processing element does not bring down the entire
• Distributed transaction processing guarantees the consistency of the database and
DDB 2008/09 J. Gamper Page 14Promises of DDBSs . . .
• Proximity of data to its points of use
– Reduces remote access delays
– Requires some support for fragmentation and replication
• Parallelism in execution
– Inter-query parallelism
– Intra-query parallelism
• Update and read-only queries inﬂuence the design of DDBSs substantially
– If mostly read-only access is required, as much as possible of the data should be
– Writing becomes more complicated with replicated data
DDB 2008/09 J. Gamper Page 15Promises of DDBSs . . .
Easier system expansion
• Issue is database scaling
• Emergence of microprocessor and workstation technologies
– Network of workstations much cheaper than a single mainframe computer
• Data communication cost versus telecommunication cost
• Increasing database size
DDB 2008/09 J. Gamper Page 16Promises of DDBSs . . .
• Refers to the separation of the higher-level semantics of the system from the lower-level
• A transparent system “hides” the implementation details from the users.
• A fully transparent DBMS provides high-level support for the development of complex
(a) User wants to see one database (b) Programmer sees many databases
DDB 2008/09 J. Gamper Page 17Promises of DDBSs . . .
Various forms of transparency can be distingushed for DDBMSs:
• Network transparency (also called distribution transparency)
– Location transparency
– Naming transparency
• Replication transparency
• Fragmentation transparency
• Transaction transparency
– Concurrency transparency
– Failure transparency
• Performance transparency
DDB 2008/09 J. Gamper Page 18Promises of DDBSs . . .
• Network/Distribution transparency allows a user to perceive a DDBS as a single,
• The user is protected from the operational details of the network (or even does not know
about the existence of the network)
• The user does not need to know the location of data items and a command used to
perform a task is independent from the location of the data and the site the task is
performed (location transparency)
• A unique name is provided for each object in the database (naming transparency)
– In absence of this, users are required to embed the location name as part of an
DDB 2008/09 J. Gamper Page 19Promises of DDBSs . . .
Different ways to ensure naming transparency:
• Solution 1: Create a central name server; however, this results in
– loss of some local autonomy
– central site may become a bottleneck
– low availability (if the central site fails remaining sites cannot create new objects)
• Solution 2: Preﬁx object with identiﬁer of site that created it
– e.g., branch created at site S1 might be named S1.BRANCH
– Also need to identify each fragment and its copies
– e.g., copy 2 of fragment 3 of Branch created at site S1 might be referred to as
• An approach that resolves these problems uses aliases for each database object
– Thus, S1.BRANCH.F3.C2 might be known as local branch by user at site S1
– DDBMS has task of mapping an alias to appropriate database object
DDB 2008/09 J. Gamper Page 20