Parallel Distributed system Lecture Notes

Parallel and Distributed simulation systems and difference between parallel distributed systems parallel and distributed systems book pdf free download
Dr.JamesSmith Profile Pic
Dr.JamesSmith,France,Professional
Published Date:11-07-2017
Your Website URL(Optional)
Comment
MU Parallel Distributed Systems CBGS 2015-2016 Parallel & Distributed Systems B.E. COMPUTER ENGINEERING (CPC803) ANURADHA BHATIA M.E. Computer Engineering Parallel & Distributed Systems Table of Contents 1. Introduction .................................................................................................................................................... 1 2. Pipeline Processing ................................................................................................................................... 36 3. Synchronous Parallel Processing ......................................................................................................... 52 4. Introduction to Distributed Systems .................................................................................................. 65 5. Communication .......................................................................................................................................... 88 6. Resource and Process Management.................................................................................................. 124 7. Synchronization ....................................................................................................................................... 153 8. Consistency and Replication ................................................................................................................ 177 Anuradha Bhatia Parallel & Distributed Systems Disclaimer The content of the book is the copyright property of the author, to be used by the students for the reference for the subject “Parallel and Distributed Systems”, CPC803, Eighth Semester, for the Final Year Computer Engineering, Mumbai University. The complete set of the e-book is available on the author’s website www.anuradhabhatia.com, and students are allowed to download it free of charge from the same. The author does not gain any monetary benefit from the same, it is only developed and designed to help the teaching and the student feternity to enhance their knowledge with respect to the curriculum prescribed by Mumbai University. Anuradha Bhatia Parallel & Distributed Systems 1. Introduction CONTENTS 1.1 Parallel Computing. 1.2 Parallel Architecture. 1.3 Architectural Classification 1.4 Scheme, Performance of Parallel Computers 1.5 Performance Metrics for Processors 1.6 Parallel Programming Models, Parallel Algorithms. Anuradha Bhatia Parallel & Distributed Systems 1. Introduction 1.1 Basics of Parallel Distributed Systems and Parallel Computing 1. What is Parallel Computing? i. Traditionally, software has been written for serial computation:  To be run on a single computer having a single Central Processing Unit (CPU);  A problem is broken into a discrete series of instructions.  Instructions are executed one after another.  Only one instruction may execute at any moment in time. Figure 1.1: Parallel Computing ii. In the simplest sense, parallel computing is the simultaneous use of multiple compute resources to solve a computational problem:  To be run using multiple CPUs  A problem is broken into discrete parts that can be solved concurrently  Each part is further broken down to a series of instructions  Instructions from each part execute simultaneously on different CPUs Anuradha Bhatia 2 Parallel & Distributed Systems 1. Introduction Figure 1.2: Multiple Compute iii. The compute resources might be:  A single computer with multiple processors;  An arbitrary number of computers connected by a network;  A combination of both. iv. The computational problem should be able to:  Be broken apart into discrete pieces of work that can be solved simultaneously;  Execute multiple program instructions at any moment in time;  Be solved in less time with multiple compute resources than with a single compute resource. 1.2 The Universe is Parallel i. Parallel computing is an evolution of serial computing that attempts to emulate what has always been the state of affairs in the natural world: many complex, interrelated events happening at the same time, yet within a temporal sequence. For example: Anuradha Bhatia 3 Parallel & Distributed Systems 1. Introduction Figure 1.3: Universe of Parallel Computing 1.3 Uses for Parallel Computing i. Science and Engineering: Historically, parallel computing has been considered to be "the high end of computing", and has been used to model difficult problems in many areas of science and engineering: Anuradha Bhatia 4 Parallel & Distributed Systems 1. Introduction o Atmosphere, Earth, Environment o Geology, Seismology o Physics - applied, nuclear, particle, o Mechanical Engineering - from condensed matter, high pressure, fusion, prosthetics to spacecraft photonics o Electrical Engineering, Circuit o Bioscience, Biotechnology, Genetics Design, Microelectronics o Chemistry, Molecular Sciences o Computer Science, Mathematics Table 1.1 ii. Industrial and Commercial: Today, commercial applications provide an equal or greater driving force in the development of faster computers. These applications require the processing of large amounts of data in sophisticated ways. For example: o Databases, data mining o Financial and economic modelling o Oil exploration o Management of national and multi-national o Web search engines, web corporations based business services o Advanced graphics and virtual reality, o Medical imaging and particularly in the entertainment industry diagnosis o Networked video and multi-media o Pharmaceutical design technologies o Collaborative work environments Table 1.2 1.4 Why Use Parallel Computing? i. Save time and/or money: In theory, throwing more resources at a task will shorten it’s time to completion, with potential cost savings. Parallel computers can be built from cheap, commodity components. Anuradha Bhatia 5 Parallel & Distributed Systems 1. Introduction ii. Solve larger problems: Many problems are so large and/or complex that it is impractical or impossible to solve them on a single computer, especially given limited computer memory. iii. Provide concurrency: A single compute resource can only do one thing at a time. Multiple computing resources can be doing many things simultaneously. iv. Use of non-local resources: Using compute resources on a wide area network, or even the Internet when local compute resources are scarce. For example: v. Limits to serial computing: Both physical and practical reasons pose significant constraints to simply building ever faster serial computers:  Transmission speeds - the speed of a serial computer is directly dependent upon how fast data can move through hardware.  Absolute limits are the speed of light (30 cm/nanosecond) and the transmission limit of copper wire (9 cm/nanosecond).  Increasing speeds necessitate increasing proximity of processing elements.  Limits to miniaturization - processor technology is allowing an increasing number of transistors to be placed on a chip. However, even with molecular or atomic-level components, a limit will be reached on how small components can be.  Economic limitations - it is increasingly expensive to make a single processor faster. Using a larger number of moderately fast commodity processors to achieve the same (or better) performance is less expensive.  Current computer architectures are increasingly relying upon hardware level parallelism to improve performance:  Multiple execution units  Pipelined instructions  Multi-core Anuradha Bhatia 6 Parallel & Distributed Systems 1. Introduction Figure 1.4: Core Layout 1.5 Concepts and Terminology 1. von Neumann Architecture i. Named after the Hungarian mathematician John von Neumann who first authored the general requirements for an electronic computer in his 1945 papers. ii. Since then, virtually all computers have followed this basic design, differing from earlier computers which were programmed through "hard wiring". Figure 1.5: von Neumann Anuradha Bhatia 7 Parallel & Distributed Systems 1. Introduction iii. Comprised of four main components:  Memory  Control Unit  Arithmetic Logic Unit  Input/output iv. Read/write, random access memory is used to store both program instructions and data  Program instructions are coded data which tell the computer to do something  Data is simply information to be used by the program v. Control unit fetches instructions/data from memory, decodes the instructions and then sequentially coordinates operations to accomplish the programmed task. vi. Arithmetic Unit performs basic arithmetic operations vii. Input/output is the interface to the human operator. 2. Flynn's Classical Taxonomy i. One of the more widely used classifications, in use since 1966, is called Flynn's Taxonomy. ii. Flynn's taxonomy distinguishes multi-processor computer architectures according to how they can be classified along the two independent dimensions of Instruction and Data. Each of these dimensions can have only one of two possible states: Single or Multiple. iii. The matrix below defines the 4 possible classifications according to Flynn: Anuradha Bhatia 8 Parallel & Distributed Systems 1. Introduction S I S D S I M D Single Instruction, Single Data Single Instruction, Multiple Data M I S D M I M D Multiple Instruction, Single Data Multiple Instruction, Multiple Data Figure 1.6: Flynn’s Classical Taxonomy A. Single Instruction, Single Data (SISD)  A serial (non-parallel) computer  Single Instruction: Only one instruction stream is being acted on by the CPU during any one clock cycle  Single Data: Only one data stream is being used as input during any one clock cycle  Deterministic execution  This is the oldest and even today, the most common type of computer  Examples: older generation mainframes, minicomputers and workstations; most modern day PC’s. Figure 1.7: SISD B. Single Instruction, Multiple Data (SIMD)  Single Instruction: All processing units execute the same instruction at any given clock cycle Anuradha Bhatia 9 Parallel & Distributed Systems 1. Introduction  Multiple Data: Each processing unit can operate on a different data element  Best suited for specialized problems characterized by a high degree of regularity, such as graphics/image processing.  Synchronous (lockstep) and deterministic execution  Two varieties: Processor Arrays and Vector Pipelines  Examples:  Processor Arrays: Connection Machine CM-2, MasPar MP-1 & MP-2, ILLIAC IV  Vector Pipelines: IBM 9000, Cray X-MP, Y-MP & C90, Fujitsu VP, NEC SX-2, Hitachi S820, ETA10  Most modern computers, particularly those with graphics processor units (GPUs) employ SIMD instructions and execution units. Figure 1.8: SIMD C. Multiple Instruction, Single Data (MISD)  Multiple Instruction: Each processing unit operates on the data independently via separate instruction streams.  Single Data: A single data stream is fed into multiple processing units. Anuradha Bhatia 10 Parallel & Distributed Systems 1. Introduction  Few actual examples of this class of parallel computer have ever existed. One is the experimental Carnegie-Mellon C.mmp computer (1971).  Some conceivable uses might  multiple frequency filters operating on a single signal stream  Multiple cryptography algorithms attempting to crack a single coded message. Figure 1.9: MISD D. Multiple Instruction, Multiple Data (MIMD)  Multiple Instruction: Every processor may be executing a different instruction stream  Multiple Data: Every processor may be working with a different data stream  Execution can be synchronous or asynchronous, deterministic or non-deterministic  Currently, the most common type of parallel computer - most modern supercomputers fall into this category.  Examples: most current supercomputers, networked parallel computer clusters and "grids", multi-processor SMP computers, multi-core PCs. Anuradha Bhatia 11 Parallel & Distributed Systems 1. Introduction  Note: many MIMD architectures also include SIMD execution sub- components. Figure 1.10: MIMD 1.6 General Parallel Terminology 1. Supercomputing / High Performance Computing (HPC): Using the world's fastest and largest computers to solve large problems. 2. Node: A standalone "computer in a box". Usually comprised of multiple CPUs/processors/cores. Nodes are networked together to comprise a supercomputer. 3. CPU / Socket / Processor / Core: This varies, depending upon who you talk to. In the past, a CPU (Central Processing Unit) was a singular execution component for a computer. Then, multiple CPUs were incorporated into a node. Then, individual CPUs were subdivided into multiple "cores", each being a unique execution unit. CPUs with multiple cores are sometimes called "sockets" - vendor dependent. The result is a node with multiple CPUs, each containing multiple cores. The nomenclature is confused at times. Figure 1.11: CPU Anuradha Bhatia 12 Parallel & Distributed Systems 1. Introduction 4. Task: A logically discrete section of computational work. A task is typically a program or program-like set of instructions that is executed by a processor. A parallel program consists of multiple tasks running on multiple processors. 5. Pipelining: Breaking a task into steps performed by different processor units, with inputs streaming through, much like an assembly line; a type of parallel computing. 6. Shared Memory: From a strictly hardware point of view, describes a computer architecture where all processors have direct (usually bus based) access to common physical memory. In a programming sense, it describes a model where parallel tasks all have the same "picture" of memory and can directly address and access the same logical memory locations regardless of where the physical memory actually exists. 7. Symmetric Multi-Processor (SMP): Hardware architecture where multiple processors share a single address space and access to all resources; shared memory computing. 8. Distributed Memory: In hardware, refers to network based memory access for physical memory that is not common. As a programming model, tasks can only logically "see" local machine memory and must use communications to access memory on other machines where other tasks are executing. 9. Communications: Parallel tasks typically need to exchange data. There are several ways this can be accomplished, such as through a shared memory bus or over a network, however the actual event of data exchange is commonly referred to as communications regardless of the method employed. 10. Synchronization: The coordination of parallel tasks in real time, very often associated with communications. Often implemented by establishing a synchronization point within an application where a task may not proceed further until another task(s) reaches the same or logically equivalent point. Synchronization usually involves waiting by at least one task, and can therefore cause a parallel application's wall clock execution time to increase. 11. Granularity: In parallel computing, granularity is a qualitative measure of the ratio of computation to communication. Anuradha Bhatia 13 Parallel & Distributed Systems 1. Introduction 12. Coarse: Relatively large amounts of computational work are done between communication events 13. Fine: Relatively small amounts of computational work are done between communication events 14. Observed Speedup: Observed speedup of a code which has been parallelized, defined as: wall-clock time of serial execution - wall-clock time of parallel execution One of the simplest and most widely used indicators for a parallel program's performance. 15. Parallel Overhead: The amount of time required to coordinate parallel tasks, as opposed to doing useful work. Parallel overhead can include factors such as:  Task start-up time  Synchronizations  Data communications  Software overhead imposed by parallel compilers, libraries, tools, operating system, etc.  Task termination time 16. Massively Parallel: Refers to the hardware that comprises a given parallel system - having many processors. The meaning of "many" keeps increasing, but currently, the largest parallel computers can be comprised of processors numbering in the hundreds of thousands. 17. Embarrassingly Parallel: Solving many similar, but independent tasks simultaneously; little to no need for coordination between the tasks. 18. Scalability: Refers to a parallel system's (hardware and/or software) ability to demonstrate a proportionate increase in parallel speedup with the addition of more processors. Factors that contribute to scalability include: Anuradha Bhatia 14 Parallel & Distributed Systems 1. Introduction  Hardware - particularly memory-cpu bandwidths and network communications  Application algorithm  Parallel overhead related  Characteristics of your specific application and coding 1.7 Performance attributes 1. Performance of a system depends upon i. Hardware technology ii. Architectural features iii. Efficient resource management iv. Algorithm design v. Data structures vi. Language efficiency vii. Programmer skill viii. Compiler technology 2. Performance of computer system we would describe how quickly a given system can execute a program or programs. Thus we are interested in knowing the turnaround time. Turnaround time depends on: i. Disk and memory accesses ii. Input and output iii. Compilation time iv. Operating system overhead v. Cpu time 3. An ideal performance of a computer system means a perfect match between the machine capability and program behavior. 4. The machine capability can be improved by using better hardware technology and efficient resource management. Anuradha Bhatia 15 Parallel & Distributed Systems 1. Introduction 5. But as far as program behavior is concerned it depends on code used, compiler used and other run time conditions. Also a machine performance may vary from program to program. 6. Because there are too many programs and it is impractical to test a CPU's speed on all of them, benchmarks were developed. Computer architects have come up with a variety of metrics to describe the computer performance. i. Clock rate and CPI / IPC: Since I/O and system overhead frequently overlaps processing by other programs, it is fair to consider only the CPU time used by a program, and the user CPU time is the most important factor. CPU is driven by a clock with a constant cycle time (usually measured in nanoseconds, which controls the rate of internal operations in the CPU. The clock mostly has the constant cycle time (t in nanoseconds). The inverse of the cycle time is the clock rate (f = 1/τ, measured in megahertz). A shorter clock cycle time, or equivalently a larger number of cycles per second, implies more operations can be performed per unit time. The size of the program is determined by the instruction count (Ic). The size of a program is determined by its instruction count, Ic, the number of machine instructions to be executed by the program. Different machine instructions require different numbers of clock cycles to execute. CPI (cycles per instruction) is thus an important parameter. ii. MIPS: The millions of instructions per second, this is calculated by dividing the number of instructions executed in a running program by time required to run the program. The MIPS rate is directly proportional to the clock rate and inversely proportion to the CPI. All four systems attributes (instruction set, compiler, processor, and memory technologies) affect the MIPS rate, which varies also from program to program. MIPS does not proved to be effective as it does not account for the fact that different systems often require different number of instruction to implement the program. It does not inform about how many instructions are required to Anuradha Bhatia 16 Parallel & Distributed Systems 1. Introduction perform a given task. With the variation in instruction styles, internal organization, and number of processors per system it is almost meaningless for comparing two systems. iii. MFLOPS (pronounced ``megaflops'') stands for ``millions of floating point operations per second.'' This is often used as a ``bottom-line'' figure. If one know ahead of time how many operations a program needs to perform, one can divide the number of operations by the execution time to come up with a MFLOPS rating. For example, the standard algorithm for multiplying nn matrices requires 2n3 – n operations (n2 inner products, with n multiplications and n-1additions in each product). Suppose you compute the product of two 100 100 matrices in 0.35 seconds. Then the computer achieves (2(100)3 – 100)/0.35 = 5,714,000 ops/sec = 5.714 MFLOPS iv. Throughput rate: Another important factor on which system’s performance is measured is throughput of the system which is basically how many programs a system can execute per unit time Ws. In multiprogramming the system throughput is often lower than the CPU throughput Wp which is defined as Wp = f/(Ic CPI) Unit of Wp is programs/second. v. Speed or Throughput (W/Tn) - the execution rate on an n processor system, measured in FLOPs/unit-time or instructions/unit-time. vi. Speedup (Sn = T1/Tn) - how much faster in an actual machine, n processors compared to asymptotic speedup. vii. Efficiency (En = Sn/n) - fraction of the theoretical maximum speedup achieved by n processors. viii. Degree of Parallelism (DOP) - for a given piece of the workload, the number of processors that can be kept busy sharing that piece of Anuradha Bhatia 17

Advise: Why You Wasting Money in Costly SEO Tools, Use World's Best Free SEO Tool Ubersuggest.