Multiprocessor architecture ppt

multiprocessors and multicomputers ppt and multiprocessors in computer architecture ppt
Dr.AldenCutts Profile Pic
Dr.AldenCutts,United Kingdom,Teacher
Published Date:23-07-2017
Your Website URL(Optional)
Comment
Lecture 18: Introduction to Multiprocessors Prepared and presented by: Kurt Keutzer with thanks for materials from Kunle Olukotun, Stanford; David Patterson, UC Berkeley 1Why Multiprocessors? Needs  Relentless demand for higher performance » Servers » Networks  Commercial desire for product differentiation Opportunities  Silicon capability  Ubiquitous computers 2Exploiting (Program) Parallelism Process Thread Loop Instruction 1 10 100 1K 10K 100K 1M Grain Size (instructions) 3 Levels of ParallelismExploiting (Program) Parallelism -2 Process Thread Loop Instruction Bit 1 10 100 1K 10K 100K 1M Grain Size (instructions) 4 Levels of ParallelismNeed for Parallel Computing  Diminishing returns from ILP » Limited ILP in programs » ILP increasingly expensive to exploit  Peak performance increases linearly with more processors Die Area » Amhdahl’s law applies 2P+M 2P+2M  Adding processors is inexpensive » But most people add P+M memory also Die Area 5 Performance PerformanceWhat to do with a billion transistors ? 1 clk  Technology changes the cost and performance of computer 1998 elements in a non-uniform 2001 manner » logic and arithmetic is becoming plentiful and cheap 2004 » wires are becoming slow and scarce  This changes the tradeoffs 64 x the area between alternative 4x the speed slower wires architectures » superscalar doesn’t scale well 2007 – global control and data  So what will the architectures 3 (10, 16, 20?) clks of the future be? 6Elements of a multiprocessing system  General purpose/special purpose  Granularity - capability of a basic module  Topology - interconnection/communication geometry  Nature of coupling - loose to tight  Control-data mechanisms  Task allocation and routing methodology  Reconfigurable » Computation » Interconnect  Programmer’s model/Language support/ models of computation  Implementation - IC, Board, Multiboard, Networked  Performance measures and objectives After E. V. Krishnamurty - Chapter 5 7Use, Granularity General purpose  attempting to improve general purpose computation (e.g. Spec benchmarks) by means of multiprocessing Special purpose  attempting to improve a specific application or class of applications by means of multiprocessing Granularity - scope and capability of a processing element (PE)  Nand gate  ALU with registers  Execution unit with local memory  RISC R1000 processor 8Topology Topology - method of interconnection of processors  Bus  Full-crossbar switch  Mesh  N-cube  Torus  Perfect shuffle, m-shuffle  Cube-connected components  Fat-trees 9Coupling Relationship of communication among processors  Shared clock (Pipelined)  Shared registers (VLIW)  Shared memory (SMM)  Shared network 10Control/Data Way in which data and control are organized Control - how the instruction stream is managed (e.g. sequential instruction fetch) Data - how the data is accessed (e.g. numbered memory addresses)  Multithreaded control flow - explicit constructs: fork, join, wait, control program flow - central controller  Dataflow model - instructions execute as soon as operands are ready, program structures flow of data, decentralized control 11Task allocation and routing Way in which tasks are scheduled and managed Static - allocation of tasks onto processing elements pre- determined before runtime Dynamic - hardware/software support allocation of tasks to processors at runtime  12Reconfiguration Computational  restructuring of computational elements » reconfigurable - reconfiguration at compile time » dynamically reconfigurable- restructuring of computational elements at runtime Interconnection scheme  switching network - software controlled  reconfigurable fabric 13Programmer’s model How is parallelism expressed by the user? Expressive power  Process-level parallelism » Shared-memory » Message-passing  Operator-level parallelism  Bit-level parallelism Formal guarantees  Deadlock-free  Livelock free Support for other real-time notions  Exception handling 14Parallel Programming Models  Message Passing  Shared Memory (address space) » Fork thread » Fork thread – Typically one per node – Typically one per node » Explicit communication » Implicit communication – Send messages – Using shared address – send(tid, tag, message) space – receive(tid, tag, message) – Loads and stores » Synchronization » Synchronization – Block on messages – Atomic memory operators (implicit sync) – barriers – Barriers 15Message Passing Multicomputers  Computers (nodes) connected by a network » Fast network interface – Send, receive, barrier » Nodes not different than regular PC or workstation  Cluster conventional workstations or PCs with fast network » cluster computing » Berkley NOW » IBM SP2 Node P P P M M M Network 16Shared-Memory Multiprocessors P P P  Several processors share one address space » conceptually a shared Network memory » often implemented just like a M multicomputer – address space Conceptual Model distributed over private memories  Communication is implicit P P P » read and write accesses to shared memory locations M M M  Synchronization » via shared memory locations – spin waiting for non-zero Network » barriers Actual Implementation 17Cache Coherence - A Quick Overview P1 P2 PN  With caches, action is required to prevent access to stale data » Processor 1 may read old data from its cache instead of new data in memory or Network » Processor 3 may read old data from memory rather than new data in Processor M A:3 2’s cache  Solutions P1: Rd(A) Rd(A) » no caching of shared data – Cray T3D, T3E, IBM RP3, P2: Wr(A,5) BBN Butterfly P3: Rd(A) » cache coherence protocol – keep track of copies – notify (update or invalidate) on writes 18Implementation issues Underlying hardware implementation  Bit-slice  Board assembly  Integration in an integrated-circuit Exploitation of new technologies  DRAM integration on IC  Low-swing chip-level interconnect 19Performance objectives Objectives  Speed  Power  Cost  Ease of programming/time to market/ time to money  In-field flexibility Methods of measurement  Modeling  Emulation  Simulation » Transaction » Instruction-set » Hardware 20

Advise: Why You Wasting Money in Costly SEO Tools, Use World's Best Free SEO Tool Ubersuggest.