Parallel and distributed computing lecture notes

parallel and distributed computing projects, parallel and distributed computing numerical methods, parallel and distributed computing tutorial, what is difference between parallel and distributed computing pdf free download
Dr.JakeFinlay Profile Pic
Dr.JakeFinlay,Germany,Teacher
Published Date:22-07-2017
Your Website URL(Optional)
Comment
I Parallel and Distributed Computing Fault tolerance of programmable devices1 Fault tolerance of programmable devices 01 Fault tolerance of programmable devicesMinoru Watanabe MinoruWatanabe Shizuoka University Japan 1. Introduction Currently, we are frequently facing demands for automation of many systems. In particular, demands for cars and robots are increasing daily. For such applications, high-performance embedded systems are necessary to execute real-time operations. For example, image pro- cessing and image recognition are heavy operations that tax current microprocessor units. Parallel computation on high-capacity hardware is expected to be one means to alleviate the burdensimposedbysuchheavyoperations. Toimplementsuchlarge-scaleparallelcomputationontoaVLSIchip,thedemandforalarge- dieVLSIchipisincreasingdaily. However,consideringtheratioofnon-defectivechipsunder currentfabrications,diesizescannotbeincreased(1),(2). Ifalargesystemmustbeintegrated ontoalargedieVLSIchiporasanextremecase,awafer-sizeVLSI,theuseofaVLSIincluding defectivepartsmustbeaccomplished. Intheearliestuseoffieldprogrammablegatearrays(FPGAs)(3)–(5),FPGAswereanticipated asdefect-tolerantdevicesthataccommodateinclusionofdefectiveareasonthegatearraybe- causeoftheirprogrammablecapability. However,thathopewaspartlyshatteredbecausede- fectsofaserialconfigurationlinecausedsevereimpairmentsthatpreventedprogrammingof theentiregatearray. Ofcourse,asparerowmethodsuchasthatusedformemories(DRAMs) reducestheratioofdiscardedchips(6),(7),inwhichsparerowsofagatearrayareusedinstead ofdefectiverowsbyswappingthemwithalaserbeammachine. However,suchmethodsre- quire hardware redundancy. Moreover, they are not perfect. To use a gate array perfectly and not produce any discarded VLSI chips, a perfectly parallel programmable capability is necessary: onewhichusesnoserialtransfer. Currently, optically reconfigurable gate arrays (ORGAs) that support parallel programming capability and which never use any serial transfer have been developed (8)–(15). An ORGA comprises a holographic memory, a laser array, and a gate-array VLSI. Although the ORGA construction is slightly more complex than that of currently available FPGAs, the parallel programmable gate array VLSI supports perfect avoidance of its faulty areas; it instead uses the remaining area. Therefore, the architecture enables the use of a large-die VLSI chip and even entire wafers, including fault areas. As a result, the architecture can realize extremely high-gate-countVLSIsandcansupportlarge-scaleparallelcomputation. This chapter introduces an ORGA architecture as a high defect tolerance device, describes how to use an optically reconfigurable gate array including defective areas, and clarifies its highfaulttolerance. TheORGAarchitecturehassomeweakpointsinmakingalargeVLSI,as2Parallel and Distributed Computing Fig.1. OverviewofanORGA. (a) (b) do FPGAs. Therefore, this chapter also presents discussion of more reliable design methods toavoidweakpoints. 2. Optically Reconfigurable Gate Array (ORGA) TheORGAarchitecturehasthefollowingfeatures: numerousreconfigurationcontexts,rapid reconfiguration, and large die size VLSIs or wafer-scale VLSIs. A large die size VLSI can producelargephysicalgatesthatincreasetheperformanceoflargeparallelcomputation. Fur- thermore,numerousreconfigurationcontextsachievehugevirtualgateswithcontextsseveral times more numerous than those of the physical gates. For that reason, such huge virtual (c) (d) gates can be reconfigured dynamically on the physical gates so that huge operations can be integratedontoasingleORGA-VLSI.ThefollowingsectionsdescribetheORGAarchitecture, Fig. 2. Gate-array structure of a fabricated ORGA. Panels (a), (b), (c), and (d) respectively whichpresentssuchadvantages. depict block diagrams of a gate array, an optically reconfigurable logic block, an optically reconfigurableswitchingmatrix,andanopticallyreconfigurableI/Obit. 2.1 Overall construction An overview of an Optically Reconfigurable Gate Array (ORGA) is portrayed in Fig. 1. An ORGA comprises a gate-array VLSI (ORGA-VLSI), a holographic memory, and a laser diode 2.2.1 Prototype ORGA-VLSI chip array. The holographic memory stores reconfiguration contexts. A laser array is mounted on ThebasicfunctionalityofanORGA-VLSIisfundamentallyidenticaltothatofcurrentlyavail- the top of the holographic memory for use in addressing the reconfiguration contexts in the able field programmable gate arrays (FPGAs). Therefore, ORGA-VLSI takes an island-style holographic memory. One laser corresponds to a configuration context. Turning one laser gate array or a fine-grain gate array. Figure 2 depicts the gate array structure of a first pro- on, the laser beam propagates into a certain corresponding area on the holographic memory totype ORGA-VLSI chip. The ORGA-VLSI chip was fabricated using a 0.35 μm triple-metal at a certain angle so that the holographic memory generates a certain diffraction pattern. A CMOSprocess(8). ThephotographofaboardisportrayedinFig. 3. Table1presentsthespec- photodiode-array of a programmable gate array on an ORGA-VLSI can receive it as a recon- ifications. The ORGA-VLSI chip consists of 4 optically reconfigurable logic blocks (ORLB), 5 figuration context. Then, the ORGA-VLSI functions as the circuit of the configuration con- opticallyreconfigurableswitchingmatrices(ORSM),and12opticallyreconfigurableI/Obits text. ThereconfigurationtimeofsuchORGAarchitecturereachesnanosecond-order(14),(15). (ORIOB) portrayed in Fig. 2(a). Each optically reconfigurable logic block is surrounded by Therefore,very-high-speedcontextswitchingispossible. Sincethestoragecapacityofaholo- wiring channels. In this chip, one wiring channel has four connections. Switching matrices graphicmemoryisextremelyhigh,numerousconfigurationcontextscanbeusedwithaholo- are located on the corners of optically reconfigurable logic blocks. Each connection of the graphic memory. Therefore, the ORGA architecture can dynamically treat huge virtual gate switching matrices is connected to a wiring channel. The ORGA-VLSI has 340 photodiodes countsthatarelargerthanthephysicalgatecountonanORGA-VLSI. to program its gate array. The ORGA-VLSI can be reconfigured perfectly in parallel. In this fabrication, the distance between each photodiode was designed as 90 μm. The photodiode 2 2.2 Gate array structure sizewassetas25.5×25.5μm toeasetheopticalalignment. Thephotodiodewasconstructed This section introduces a design example of a fabricated ORGA-VLSI chip. Based on it, a betweentheN-welllayerandP-substrate. Thegatearray’sgatecountis68. Itwasconfirmed generalizedgatearraystructureofORGA-VLSIsisdiscussed. experimentallythattheORGA-VLSIitselfisreconfigurablewithinananosecond-orderperiodFault tolerance of programmable devices3 Fig.1. OverviewofanORGA. (a) (b) do FPGAs. Therefore, this chapter also presents discussion of more reliable design methods toavoidweakpoints. 2. Optically Reconfigurable Gate Array (ORGA) TheORGAarchitecturehasthefollowingfeatures: numerousreconfigurationcontexts,rapid reconfiguration, and large die size VLSIs or wafer-scale VLSIs. A large die size VLSI can producelargephysicalgatesthatincreasetheperformanceoflargeparallelcomputation. Fur- thermore,numerousreconfigurationcontextsachievehugevirtualgateswithcontextsseveral times more numerous than those of the physical gates. For that reason, such huge virtual (c) (d) gates can be reconfigured dynamically on the physical gates so that huge operations can be integratedontoasingleORGA-VLSI.ThefollowingsectionsdescribetheORGAarchitecture, Fig. 2. Gate-array structure of a fabricated ORGA. Panels (a), (b), (c), and (d) respectively whichpresentssuchadvantages. depict block diagrams of a gate array, an optically reconfigurable logic block, an optically reconfigurableswitchingmatrix,andanopticallyreconfigurableI/Obit. 2.1 Overall construction An overview of an Optically Reconfigurable Gate Array (ORGA) is portrayed in Fig. 1. An ORGA comprises a gate-array VLSI (ORGA-VLSI), a holographic memory, and a laser diode 2.2.1 Prototype ORGA-VLSI chip array. The holographic memory stores reconfiguration contexts. A laser array is mounted on ThebasicfunctionalityofanORGA-VLSIisfundamentallyidenticaltothatofcurrentlyavail- the top of the holographic memory for use in addressing the reconfiguration contexts in the able field programmable gate arrays (FPGAs). Therefore, ORGA-VLSI takes an island-style holographic memory. One laser corresponds to a configuration context. Turning one laser gate array or a fine-grain gate array. Figure 2 depicts the gate array structure of a first pro- on, the laser beam propagates into a certain corresponding area on the holographic memory totype ORGA-VLSI chip. The ORGA-VLSI chip was fabricated using a 0.35 μm triple-metal at a certain angle so that the holographic memory generates a certain diffraction pattern. A CMOSprocess(8). ThephotographofaboardisportrayedinFig. 3. Table1presentsthespec- photodiode-array of a programmable gate array on an ORGA-VLSI can receive it as a recon- ifications. The ORGA-VLSI chip consists of 4 optically reconfigurable logic blocks (ORLB), 5 figuration context. Then, the ORGA-VLSI functions as the circuit of the configuration con- opticallyreconfigurableswitchingmatrices(ORSM),and12opticallyreconfigurableI/Obits text. ThereconfigurationtimeofsuchORGAarchitecturereachesnanosecond-order(14),(15). (ORIOB) portrayed in Fig. 2(a). Each optically reconfigurable logic block is surrounded by Therefore,very-high-speedcontextswitchingispossible. Sincethestoragecapacityofaholo- wiring channels. In this chip, one wiring channel has four connections. Switching matrices graphicmemoryisextremelyhigh,numerousconfigurationcontextscanbeusedwithaholo- are located on the corners of optically reconfigurable logic blocks. Each connection of the graphic memory. Therefore, the ORGA architecture can dynamically treat huge virtual gate switching matrices is connected to a wiring channel. The ORGA-VLSI has 340 photodiodes countsthatarelargerthanthephysicalgatecountonanORGA-VLSI. to program its gate array. The ORGA-VLSI can be reconfigured perfectly in parallel. In this fabrication, the distance between each photodiode was designed as 90 μm. The photodiode 2 2.2 Gate array structure sizewassetas25.5×25.5μm toeasetheopticalalignment. Thephotodiodewasconstructed This section introduces a design example of a fabricated ORGA-VLSI chip. Based on it, a betweentheN-welllayerandP-substrate. Thegatearray’sgatecountis68. Itwasconfirmed generalizedgatearraystructureofORGA-VLSIsisdiscussed. experimentallythattheORGA-VLSIitselfisreconfigurablewithinananosecond-orderperiod4Parallel and Distributed Computing Technology 0.35μmdouble-poly triple-metalCMOSprocess Chipsize 4.9mm×4.9mm Photodiodesize 25.5μm×25.5μm Distancebetweenphotodiodes 90μm Numberofphotodiodes 340 Gatecount 68 Table1. ORGA-VLSISpecifications. 2.2.3 Opticallyreconfigurableswitchingmatrix Similarly, optically reconfigurable switching matrices are optically reconfigurable. The block diagram of the optically reconfigurable switching matrix is portrayed in Fig. 2(c). The basic constructionisthesameasthatusedbyXilinxInc. Onefour-directionalwith24transmission Fig.3. PhotographofanORGA-VLSIboardwithafabricatedORGA-VLSIchip. TheORGA- 2 gatesand4three-directionalswitchingmatriceswith12transmissiongateswereimplemented VLSIwasfabricatedusinga0.35μmthree-metal4.9×4.9mm CMOSprocesschip. Thegate in the gate array. Each transmission gate can be considered as a bi-directional switch. A countofagatearrayonthechipis68. Inall,340photodiodesareusedforopticalconfigura- photodiode is connected to each transmission gate; it controls whether the transmission gate tions. isclosedornot. Basedonthatcapability,four-directionandthree-directionswitchingmatrices canbeprogrammed,respectively,as24and12opticalconnections. (14),(15). Althoughthegatecountofthechipistoosmall,thegatecountoffutureORGAswas 2.2.4 OpticallyreconfigurableI/Oblock alreadyestimated(12). FutureORGAswillachievegatecountsofoveramillion,whichissim- Optically reconfigurable gate arrays are assumed to be reconfigured frequently. For that rea- ilartogatecountsofFPGAs. son, an optical reconfiguration capability must be implemented for optically reconfigurable logic blocks and optically reconfigurable switching matrices. However, the I/O block might 2.2.2 Optically reconfigurable logic block not always be reconfigured under such dynamic reconfiguration applications because such The block diagram of an optically reconfigurable logic block of the prototype ORGA-VLSI a dynamic reconfiguration arises inside the device and each mode of Input, Output, or In- chip is presented in Fig. 2(b). Each optically reconfigurable logic block consists of a four- put/Output,andeachpinlocationoftheI/Oblockmustalwaysbefixedduetolimitationsof input one-output look-up table (LUT), six multiplexers, four transmission gates, and a delay theexternalenvironment. However,theORGA-VLSIsupportsopticalreconfigurationforI/O type flip-flop with a reset function. The input signals from the wiring channel, which are blocksbecausereconfigurationinformationisprovidedopticallyfromaholographicmemory applied through some switching matrices and wiring channels from optically reconfigurable inORGA.Consequently,electricallyconfigurableI/OblocksareunsuitableforORGAs. Here, I/O blocks, are transferred to a look-up table through four multiplexers. The look-up table eachI/Oblockisalsocontrolledusingnineopticalconnections. Always,theopticallyrecon- is used for implementing Boolean functions. The outputs of the look-up table and of a delay figurableI/Oblockconfigurationisexecutedonlyinitially. typeflip-flopconnectedtothelook-uptableareconnectedtoamultiplexer. Acombinational circuitandsequentialcircuitcanbechosenbychangingthemultiplexer,asinFPGAs. Finally, 3. DefecttolerancedesignoftheORGAarchitecture an output of the multiplexer is connected to the wiring channel again through transmission gates. Thelastmultiplexercontrolstheresetfunctionofthedelay-typeflip-flop. Suchafour- 3.1 Holographicmemorypart input one-output look-up table, each multiplexer, and each transmission gate respectively Holographic memories are well known to have a high defect tolerance. Since each bit of a have 16 photodiodes, 2 photodiodes, and 1 photodiode. In all, 32 photodiodes are used for reconfigurationcontextcanbegeneratedfromtheentireholographicmemory,thedamageof programming an optically reconfigurable logic block. Therefore, the optically reconfigurable some fraction rarely affects its diffraction pattern or a reconfiguration context. Even though logicblockcanbereconfiguredperfectlyinparallel. Inthisprototypechip,sincethegatearray a holographic memory device includes small defect areas, holographic memories can cor- istoosmall, aCLKforeachflip-flopisprovidedthroughasingleCLKbuffertree. However, rectly record configuration contexts and can correctly generate configuration contexts. Such for a large gate array, CLKs of flip-flops are applied through multiple CLK buffer trees as mechanismscanbeconsideredasthoseforwhichmajorityvotingisexecutedfromaninfinite programmableCLKs,aswellasthatofFPGAs. numberofdiffractionbeamsforeachconfigurationbit. Forasemiconductormemory,single- bitinformationisstoredinasingle-bitmemorycircuit. Incontrast,inholographicmemory,a single bit of a reconfiguration context is stored in the entire holographic memory. Therefore,Fault tolerance of programmable devices5 Technology 0.35μmdouble-poly triple-metalCMOSprocess Chipsize 4.9mm×4.9mm Photodiodesize 25.5μm×25.5μm Distancebetweenphotodiodes 90μm Numberofphotodiodes 340 Gatecount 68 Table1. ORGA-VLSISpecifications. 2.2.3 Opticallyreconfigurableswitchingmatrix Similarly, optically reconfigurable switching matrices are optically reconfigurable. The block diagram of the optically reconfigurable switching matrix is portrayed in Fig. 2(c). The basic constructionisthesameasthatusedbyXilinxInc. Onefour-directionalwith24transmission Fig.3. PhotographofanORGA-VLSIboardwithafabricatedORGA-VLSIchip. TheORGA- 2 gatesand4three-directionalswitchingmatriceswith12transmissiongateswereimplemented VLSIwasfabricatedusinga0.35μmthree-metal4.9×4.9mm CMOSprocesschip. Thegate in the gate array. Each transmission gate can be considered as a bi-directional switch. A countofagatearrayonthechipis68. Inall,340photodiodesareusedforopticalconfigura- photodiode is connected to each transmission gate; it controls whether the transmission gate tions. isclosedornot. Basedonthatcapability,four-directionandthree-directionswitchingmatrices canbeprogrammed,respectively,as24and12opticalconnections. (14),(15). Althoughthegatecountofthechipistoosmall,thegatecountoffutureORGAswas 2.2.4 OpticallyreconfigurableI/Oblock alreadyestimated(12). FutureORGAswillachievegatecountsofoveramillion,whichissim- Optically reconfigurable gate arrays are assumed to be reconfigured frequently. For that rea- ilartogatecountsofFPGAs. son, an optical reconfiguration capability must be implemented for optically reconfigurable logic blocks and optically reconfigurable switching matrices. However, the I/O block might 2.2.2 Optically reconfigurable logic block not always be reconfigured under such dynamic reconfiguration applications because such The block diagram of an optically reconfigurable logic block of the prototype ORGA-VLSI a dynamic reconfiguration arises inside the device and each mode of Input, Output, or In- chip is presented in Fig. 2(b). Each optically reconfigurable logic block consists of a four- put/Output,andeachpinlocationoftheI/Oblockmustalwaysbefixedduetolimitationsof input one-output look-up table (LUT), six multiplexers, four transmission gates, and a delay theexternalenvironment. However,theORGA-VLSIsupportsopticalreconfigurationforI/O type flip-flop with a reset function. The input signals from the wiring channel, which are blocksbecausereconfigurationinformationisprovidedopticallyfromaholographicmemory applied through some switching matrices and wiring channels from optically reconfigurable inORGA.Consequently,electricallyconfigurableI/OblocksareunsuitableforORGAs. Here, I/O blocks, are transferred to a look-up table through four multiplexers. The look-up table eachI/Oblockisalsocontrolledusingnineopticalconnections. Always,theopticallyrecon- is used for implementing Boolean functions. The outputs of the look-up table and of a delay figurableI/Oblockconfigurationisexecutedonlyinitially. typeflip-flopconnectedtothelook-uptableareconnectedtoamultiplexer. Acombinational circuitandsequentialcircuitcanbechosenbychangingthemultiplexer,asinFPGAs. Finally, 3. DefecttolerancedesignoftheORGAarchitecture an output of the multiplexer is connected to the wiring channel again through transmission gates. Thelastmultiplexercontrolstheresetfunctionofthedelay-typeflip-flop. Suchafour- 3.1 Holographicmemorypart input one-output look-up table, each multiplexer, and each transmission gate respectively Holographic memories are well known to have a high defect tolerance. Since each bit of a have 16 photodiodes, 2 photodiodes, and 1 photodiode. In all, 32 photodiodes are used for reconfigurationcontextcanbegeneratedfromtheentireholographicmemory,thedamageof programming an optically reconfigurable logic block. Therefore, the optically reconfigurable some fraction rarely affects its diffraction pattern or a reconfiguration context. Even though logicblockcanbereconfiguredperfectlyinparallel. Inthisprototypechip,sincethegatearray a holographic memory device includes small defect areas, holographic memories can cor- istoosmall, aCLKforeachflip-flopisprovidedthroughasingleCLKbuffertree. However, rectly record configuration contexts and can correctly generate configuration contexts. Such for a large gate array, CLKs of flip-flops are applied through multiple CLK buffer trees as mechanismscanbeconsideredasthoseforwhichmajorityvotingisexecutedfromaninfinite programmableCLKs,aswellasthatofFPGAs. numberofdiffractionbeamsforeachconfigurationbit. Forasemiconductormemory,single- bitinformationisstoredinasingle-bitmemorycircuit. Incontrast,inholographicmemory,a single bit of a reconfiguration context is stored in the entire holographic memory. Therefore,6Parallel and Distributed Computing REFRESH theholographicmemory’sinformationisrobustwhile,inthesemiconductormemory,thede- VCC VCC VCC VCC fect of a transistor always erases information of a single bit or multiple bits. Earlier studies have shown experimentally that a holographic memory is robust (13). In the experiments, 1000 impulse noises and 10% Gaussian noise were applied to a holographic memory. Then T Q T Q T Q T Q the holographic memory was assembled to an ORGA architecture. All configuration experi- RST RST RST RST ments were successful. Therefore, defects of a holographic memory device on the ORGA are beyondconsideration. GND GND GND GND CLOCK RESET 3.2 Laser array part In an ORGA, a laser array is a basic component for addressing a configuration memory or CS1 CS2 CS3 CSn a holographic memory. Although configuration context information stored on a holographic Configuration signals for memory is robust, if the laser array becomes defective, then the execution of each config- Logic Blocks, Switching Matrix, and I/O Blocks uration becomes impossible. Therefore, the defect modes arising on a laser array must be analyzed. In an ORGA, many discrete semiconductor lasers are used for switching configu- Fig.4. Circuitdiagramofreconfigurationcircuit. ration contexts. Each laser corresponds to one holographic area including one configuration context. Onelaseraddressesoneconfigurationcontext. Thedefectmodesofacertainlaserare categorizable as a turn-ON defect mode and a full-time turn-ON defect mode or a turn-OFF defect mode. The turn-ON defect mode means that a certain laser cannot be turned on. The full-time turn-ON defect mode means the state in which a certain laser is constantly turned ONandcannotbeturnedOFF. 3.2.1 Turn-ON defect mode A laser might have a Turn-ON defect. However, laser source defects can be avoided easily by not using the defective lasers, and not using holographic memory areas corresponding to the lasers. An ORGA has numerous reconfiguration contexts. A slight reduction of reconfig- uration contexts is therefore negligible. Programmers need only to avoid the defective parts whenprogrammingreconfigurationcontextsforaholographicmemory. Therefore,theORGA architectureallowsTurn-ONdefectmodeforlasers. Fig. 5. Defective area avoidance method on a gate array. Here, it is assumed that a defective opticallyreconfigurablelogicblock(ORLB)exists,asportrayedintheupperareaofthefigure. 3.2.2 Turn-OFF defect mode Inthiscase,thedefectiveareaisavoidedperfectlyusingparallelprogrammingwiththeother Furthermore,alasermighthaveaTurn-OFFdefectmode. Thistroublelevelisslightlyhigher components,aspresentedinthelowerareaofthefigure. than that of the Turn-ON defect mode. The corresponding holographic memory information is constantly superimposed to the other configuration context under normal reconfiguration procedureifonelaserhasTurn-OFFdefectmodeandturnsonconstantly. Therefore,theTurn- 3.3 ORGA-VLSI part OFFdefectmodeoflaserspresentsthepossibilitythatallnormalconfigurationproceduresare In the ORGA-VLSIs, serial transfers were perfectly removed and optical reconfiguration cir- impossible. Therefore,ifsuchTurn-OFFdefectmodearisesonanORGA,aphysicalactionto cuits including static memory functions and photodiodes were placed near and directly con- cut the corresponding wires or driver units is required. The action is easy and can perfectly nectedtoprogrammingelementsofaprogrammablegatearrayVLSI.Figure4showsthatthe removethedefectmode. toggleflip-flopsareusedfortemporarilystoringonecontextandrealizingabit-by-bitconfig- uration. Using this architecture, the optical configuration procedure for a gate array can be 3.2.3 Defect mode for matrix addressing executedperfectlyinparallel. Thereby,theVLSIpartcanachieveaperfectlyparallelbit-by-bit Suchlaserarraysarealwaysarrangedintheformofatwo-dimensionalmatrixandaddressed configuration. asthematrix. Insuchmatriximplementation,thedefectofonedrivercausesalllasersonthe addressing line to be defective. To avoid simultaneous defects of many lasers, a spare row 3.3.1 Simple method to avoid defective areas method like that used for memories (DRAMs) is useful (6)(7). By introducing the spare row Using configuration, a damaged gate array can be restored as shown in Fig. 5. The structure method,thedefectmodecanberemovedperfectly. andfunctionofanopticallyreconfigurablelogicblockandopticallyreconfigurableswitching matricesonagatearrayaremutuallysimilar. Ifapartisdefectiveorfails, thesamefunction canbeimplementedontotheotherpart. Here,theupperpartofFig. 5showsthatitisassumedFault tolerance of programmable devices7 REFRESH theholographicmemory’sinformationisrobustwhile,inthesemiconductormemory,thede- VCC VCC VCC VCC fect of a transistor always erases information of a single bit or multiple bits. Earlier studies have shown experimentally that a holographic memory is robust (13). In the experiments, 1000 impulse noises and 10% Gaussian noise were applied to a holographic memory. Then T Q T Q T Q T Q the holographic memory was assembled to an ORGA architecture. All configuration experi- RST RST RST RST ments were successful. Therefore, defects of a holographic memory device on the ORGA are beyondconsideration. GND GND GND GND CLOCK RESET 3.2 Laser array part In an ORGA, a laser array is a basic component for addressing a configuration memory or CS1 CS2 CS3 CSn a holographic memory. Although configuration context information stored on a holographic Configuration signals for memory is robust, if the laser array becomes defective, then the execution of each config- Logic Blocks, Switching Matrix, and I/O Blocks uration becomes impossible. Therefore, the defect modes arising on a laser array must be analyzed. In an ORGA, many discrete semiconductor lasers are used for switching configu- Fig.4. Circuitdiagramofreconfigurationcircuit. ration contexts. Each laser corresponds to one holographic area including one configuration context. Onelaseraddressesoneconfigurationcontext. Thedefectmodesofacertainlaserare categorizable as a turn-ON defect mode and a full-time turn-ON defect mode or a turn-OFF defect mode. The turn-ON defect mode means that a certain laser cannot be turned on. The full-time turn-ON defect mode means the state in which a certain laser is constantly turned ONandcannotbeturnedOFF. 3.2.1 Turn-ON defect mode A laser might have a Turn-ON defect. However, laser source defects can be avoided easily by not using the defective lasers, and not using holographic memory areas corresponding to the lasers. An ORGA has numerous reconfiguration contexts. A slight reduction of reconfig- uration contexts is therefore negligible. Programmers need only to avoid the defective parts whenprogrammingreconfigurationcontextsforaholographicmemory. Therefore,theORGA architectureallowsTurn-ONdefectmodeforlasers. Fig. 5. Defective area avoidance method on a gate array. Here, it is assumed that a defective opticallyreconfigurablelogicblock(ORLB)exists,asportrayedintheupperareaofthefigure. 3.2.2 Turn-OFF defect mode Inthiscase,thedefectiveareaisavoidedperfectlyusingparallelprogrammingwiththeother Furthermore,alasermighthaveaTurn-OFFdefectmode. Thistroublelevelisslightlyhigher components,aspresentedinthelowerareaofthefigure. than that of the Turn-ON defect mode. The corresponding holographic memory information is constantly superimposed to the other configuration context under normal reconfiguration procedureifonelaserhasTurn-OFFdefectmodeandturnsonconstantly. Therefore,theTurn- 3.3 ORGA-VLSI part OFFdefectmodeoflaserspresentsthepossibilitythatallnormalconfigurationproceduresare In the ORGA-VLSIs, serial transfers were perfectly removed and optical reconfiguration cir- impossible. Therefore,ifsuchTurn-OFFdefectmodearisesonanORGA,aphysicalactionto cuits including static memory functions and photodiodes were placed near and directly con- cut the corresponding wires or driver units is required. The action is easy and can perfectly nectedtoprogrammingelementsofaprogrammablegatearrayVLSI.Figure4showsthatthe removethedefectmode. toggleflip-flopsareusedfortemporarilystoringonecontextandrealizingabit-by-bitconfig- uration. Using this architecture, the optical configuration procedure for a gate array can be 3.2.3 Defect mode for matrix addressing executedperfectlyinparallel. Thereby,theVLSIpartcanachieveaperfectlyparallelbit-by-bit Suchlaserarraysarealwaysarrangedintheformofatwo-dimensionalmatrixandaddressed configuration. asthematrix. Insuchmatriximplementation,thedefectofonedrivercausesalllasersonthe addressing line to be defective. To avoid simultaneous defects of many lasers, a spare row 3.3.1 Simple method to avoid defective areas method like that used for memories (DRAMs) is useful (6)(7). By introducing the spare row Using configuration, a damaged gate array can be restored as shown in Fig. 5. The structure method,thedefectmodecanberemovedperfectly. andfunctionofanopticallyreconfigurablelogicblockandopticallyreconfigurableswitching matricesonagatearrayaremutuallysimilar. Ifapartisdefectiveorfails, thesamefunction canbeimplementedontotheotherpart. Here,theupperpartofFig. 5showsthatitisassumed8Parallel and Distributed Computing thatadefectiveopticallyreconfigurablelogicblock(ORLB)existsinagatearray. Inthatcase, 4. Conclusion thelowerpartofFig. 5showsthatanotherimplementationisavailable. Byreconfiguringthe Optically reconfigurable gate arrays have perfectly parallel programmable capability. Even gate array VLSI, the defective area can be avoided perfectly and its functions can be realized if a gate array VLSI and a laser array include defective parts, their perfectly parallel pro- using other blocks. For this example, we assumed a defective area of only one optically re- grammablecapabilityenablesperfectavoidanceofdefectiveareas. Instead,itusestheremain- configurable logic block. For the other cells, for optically reconfigurable switching matrices, ingareaofagatearrayVLSI,remaininglaserresources,andremainingholographicmemory andforopticallyreconfigurableI/Oblocks,asimilaravoidancemethodcanbeadopted. Such resources. Therefore, the architecture enables fabrication of large-die VLSI chips and wafer- a replacement method can be adopted onto FPGAs; however, such a replacement method is scale integrations using the latest processes, even those chips with a high defect fraction. Fi- basedontheconditionthattheconfigurationispossible. RegardingFPGAs,thedefectorfail- nally, we conclude that the architecture has a high defect tolerance. In the future, optically ure probability of configuration circuits is very high because of the serial configuration. On reconfigurablegatearrayswillbeatypeofnext-generationthree-dimensional(3D)VLSIchip the other hand, the ORGA architecture configuration is very robust because of the parallel withanextremelyhighgatecountandwithahighmanufacturing-defecttolerance. configuration. Forthatreason,theORGAarchitecturehashighdefectandfaulttolerance. 5. References 3.3.2 Weak point However, a weak point exists on the ORGA-VLSI design. It is a common clock signal line. 1 C.Hess,L.H.Weiland,”Waferleveldefectdensitydistributionusingcheckerboardtest When using a single common clock signal line to distribute a clock for all delay-type flip- structures,” International Conference on Microelectronic Test Structures, pp. 101–106, flops, damage to one clock tree renders all delay-type flip-flops useless. Therefore, the clock 1998. line must be programmable with many buffer trees when a large gate count VLSI or a wafer 2 C. Hess, L. H. Weiland, ”Extraction of wafer-level defect density distributions to im- scale VLSI is made. In currently available FPGAs, each clock line of delay-type flip-flops prove yield prediction,” IEEE Transactions on Semiconductor Manufacturing, Vol. 12, has already been programmable with several clock trees. To reduce the probability of the Issue2,pp.175-183,1999. clockdeathtrouble,sufficientprogrammableclocktreesshouldbeprepared. Ifso,alongwith 3 AlteraCorporation,”AlteraDevices,”http://www.altera.com. FPGA,defectsforclocktreesinORGAarchitecturecanbebeyondconsideration. 4 XilinxInc.,”XilinxProductDataSheets,”http://www.xilinx.com. 3.3.3 Critical weak points 5 Lattice Semiconductor Corporation, ”LatticeECP and EC Family Data Sheet,” Figure4showsthatmorecriticalweakpointsintheORGA-VLSIsarearefreshsignal,areset http://www.latticesemi.co.jp/products,2005. signal, and a configuration CLK signal of configuration circuits to support optical configura- 6 A.J.Yu,G.G.Lemieux,”FPGADefectTolerance: ImpactofGranularity,”IEEEInterna- tionprocedures. ThesesignalsarecommonsignalsonVLSIchipandcannotbeprogrammable tionalConferenceonField-ProgrammableTechnology,pp.189–196,2005. since the signals are necessary for programming itself. Therefore, along with the laser array, a physical action or a spare method is required in addition to enforcing the wire and buffer 7 A. Doumar, H. Ito, ”Detecting, diagnosing, and tolerating faults in SRAM-based field treesfordefectssothatcriticalweakpointscanberemoved. programmable gate arrays: a survey,” IEEE Transactions on Very Large Scale Integra- tion(VLSI)Systems,Vol.11,Issue3,pp.386–405,2003. 3.4 Possibility of greater than tera-gate capacity 8 M.Watanabe,F.Kobayashi,”DynamicOpticallyReconfigurableGateArray,”Japanese In ORGA architecture, a holographic memory is a very robust device. For that reason, defect JournalofAppliedPhysics,Vol.45,No.4B,pp.3510-3515,2006. analysis is done only for an ORGA-VLSI and a laser array. In ORGA-VLSI part, even if de- fect parts are included on the ORGA-VLSI chip, almost all defect parts can be avoided using 9 N. Yamaguchi, M. Watanabe, ”Liquid crystal holographic configurations for ORGAs,” parallelprogrammingcapability. Theonlyremainingconcernisthecommonsignalsusedfor AppliedOptics,Vol.47,No.28,pp.4692-4700,2008. controlling configuration circuits. For those common signals, spare hardware or redundant 10 D.Seto, M.Watanabe, ”Adynamicopticallyreconfigurable gatearray-perfectemula- hardware must be used. On the other hand, in a laser array part, only a spare row method tion,”IEEEJournalofQuantumElectronics,Vol.44,Issue5,pp.493-500,2008. mustbeappliedtomatrixdrivercircuits. Theotherdefectsarenegligible. 11 M. Watanabe, M. Nakajima, S. Kato, ”An inversion/non-inversion dynamic optically Therefore,exploitingthedefecttoleranceandusingmethodsofORGAarchitecturedescribed reconfigurable gate array VLSI,” World Scientific and Engineering Academy and Soci- above,averylargediesizeVLSIispossible. Atthattime,accordingtoanearlierpaper(12),if etyTransactionsonCircuitsandSystems,Issue1,Vol.8,pp.11-20,2009. itisassumedthatanORGA-VLSIisbuiltona0.18μmprocess8inchwaferandthat1million configuration contexts are stored on a corresponding holographic memory, then greater than 12 M.Watanabe,T.Shiki,F.Kobayashi,”Scalingprospectofopticallydifferentialreconfig- 10-tera-gate VLSIs will be realized. Currently, although this remains only a distant objective, urablegatearrayVLSIs,”AnalogIntegratedCircuitsandSignalProcessing,Vol.60,pp. optoelectronicdevicesmightpresentanewVLSIparadigm. 137-143,2009. 13 M. Watanabe, F. Kobayashi, ”Manufacturing-defect tolerance analysis of optically re- configurable gate arrays,” World Scientific and Engineering Academy and Society TransactionsonSignalProcessing,Issue11,Vol.2,pp.1457-1464,2006.Fault tolerance of programmable devices9 thatadefectiveopticallyreconfigurablelogicblock(ORLB)existsinagatearray. Inthatcase, 4. Conclusion thelowerpartofFig. 5showsthatanotherimplementationisavailable. Byreconfiguringthe Optically reconfigurable gate arrays have perfectly parallel programmable capability. Even gate array VLSI, the defective area can be avoided perfectly and its functions can be realized if a gate array VLSI and a laser array include defective parts, their perfectly parallel pro- using other blocks. For this example, we assumed a defective area of only one optically re- grammablecapabilityenablesperfectavoidanceofdefectiveareas. Instead,itusestheremain- configurable logic block. For the other cells, for optically reconfigurable switching matrices, ingareaofagatearrayVLSI,remaininglaserresources,andremainingholographicmemory andforopticallyreconfigurableI/Oblocks,asimilaravoidancemethodcanbeadopted. Such resources. Therefore, the architecture enables fabrication of large-die VLSI chips and wafer- a replacement method can be adopted onto FPGAs; however, such a replacement method is scale integrations using the latest processes, even those chips with a high defect fraction. Fi- basedontheconditionthattheconfigurationispossible. RegardingFPGAs,thedefectorfail- nally, we conclude that the architecture has a high defect tolerance. In the future, optically ure probability of configuration circuits is very high because of the serial configuration. On reconfigurablegatearrayswillbeatypeofnext-generationthree-dimensional(3D)VLSIchip the other hand, the ORGA architecture configuration is very robust because of the parallel withanextremelyhighgatecountandwithahighmanufacturing-defecttolerance. configuration. Forthatreason,theORGAarchitecturehashighdefectandfaulttolerance. 5. References 3.3.2 Weak point However, a weak point exists on the ORGA-VLSI design. It is a common clock signal line. 1 C.Hess,L.H.Weiland,”Waferleveldefectdensitydistributionusingcheckerboardtest When using a single common clock signal line to distribute a clock for all delay-type flip- structures,” International Conference on Microelectronic Test Structures, pp. 101–106, flops, damage to one clock tree renders all delay-type flip-flops useless. Therefore, the clock 1998. line must be programmable with many buffer trees when a large gate count VLSI or a wafer 2 C. Hess, L. H. Weiland, ”Extraction of wafer-level defect density distributions to im- scale VLSI is made. In currently available FPGAs, each clock line of delay-type flip-flops prove yield prediction,” IEEE Transactions on Semiconductor Manufacturing, Vol. 12, has already been programmable with several clock trees. To reduce the probability of the Issue2,pp.175-183,1999. clockdeathtrouble,sufficientprogrammableclocktreesshouldbeprepared. Ifso,alongwith 3 AlteraCorporation,”AlteraDevices,”http://www.altera.com. FPGA,defectsforclocktreesinORGAarchitecturecanbebeyondconsideration. 4 XilinxInc.,”XilinxProductDataSheets,”http://www.xilinx.com. 3.3.3 Critical weak points 5 Lattice Semiconductor Corporation, ”LatticeECP and EC Family Data Sheet,” Figure4showsthatmorecriticalweakpointsintheORGA-VLSIsarearefreshsignal,areset http://www.latticesemi.co.jp/products,2005. signal, and a configuration CLK signal of configuration circuits to support optical configura- 6 A.J.Yu,G.G.Lemieux,”FPGADefectTolerance: ImpactofGranularity,”IEEEInterna- tionprocedures. ThesesignalsarecommonsignalsonVLSIchipandcannotbeprogrammable tionalConferenceonField-ProgrammableTechnology,pp.189–196,2005. since the signals are necessary for programming itself. Therefore, along with the laser array, a physical action or a spare method is required in addition to enforcing the wire and buffer 7 A. Doumar, H. Ito, ”Detecting, diagnosing, and tolerating faults in SRAM-based field treesfordefectssothatcriticalweakpointscanberemoved. programmable gate arrays: a survey,” IEEE Transactions on Very Large Scale Integra- tion(VLSI)Systems,Vol.11,Issue3,pp.386–405,2003. 3.4 Possibility of greater than tera-gate capacity 8 M.Watanabe,F.Kobayashi,”DynamicOpticallyReconfigurableGateArray,”Japanese In ORGA architecture, a holographic memory is a very robust device. For that reason, defect JournalofAppliedPhysics,Vol.45,No.4B,pp.3510-3515,2006. analysis is done only for an ORGA-VLSI and a laser array. In ORGA-VLSI part, even if de- fect parts are included on the ORGA-VLSI chip, almost all defect parts can be avoided using 9 N. Yamaguchi, M. Watanabe, ”Liquid crystal holographic configurations for ORGAs,” parallelprogrammingcapability. Theonlyremainingconcernisthecommonsignalsusedfor AppliedOptics,Vol.47,No.28,pp.4692-4700,2008. controlling configuration circuits. For those common signals, spare hardware or redundant 10 D.Seto, M.Watanabe, ”Adynamicopticallyreconfigurable gatearray-perfectemula- hardware must be used. On the other hand, in a laser array part, only a spare row method tion,”IEEEJournalofQuantumElectronics,Vol.44,Issue5,pp.493-500,2008. mustbeappliedtomatrixdrivercircuits. Theotherdefectsarenegligible. 11 M. Watanabe, M. Nakajima, S. Kato, ”An inversion/non-inversion dynamic optically Therefore,exploitingthedefecttoleranceandusingmethodsofORGAarchitecturedescribed reconfigurable gate array VLSI,” World Scientific and Engineering Academy and Soci- above,averylargediesizeVLSIispossible. Atthattime,accordingtoanearlierpaper(12),if etyTransactionsonCircuitsandSystems,Issue1,Vol.8,pp.11-20,2009. itisassumedthatanORGA-VLSIisbuiltona0.18μmprocess8inchwaferandthat1million configuration contexts are stored on a corresponding holographic memory, then greater than 12 M.Watanabe,T.Shiki,F.Kobayashi,”Scalingprospectofopticallydifferentialreconfig- 10-tera-gate VLSIs will be realized. Currently, although this remains only a distant objective, urablegatearrayVLSIs,”AnalogIntegratedCircuitsandSignalProcessing,Vol.60,pp. optoelectronicdevicesmightpresentanewVLSIparadigm. 137-143,2009. 13 M. Watanabe, F. Kobayashi, ”Manufacturing-defect tolerance analysis of optically re- configurable gate arrays,” World Scientific and Engineering Academy and Society TransactionsonSignalProcessing,Issue11,Vol.2,pp.1457-1464,2006.10Parallel and Distributed Computing 14 M. Miyano, M. Watanabe, F. Kobayashi, ”Optically Differential Reconfigurable Gate Array,”ElectronicsandComputersinJapan,PartII,Issue11,vol.90,pp.132-139,2007. 15 M. Nakajima, M. Watanabe, ”A four-context optically differential reconfigurable gate array,”IEEE/OSAJournalofLightwaveTechnology,Vol.27,No.24,2009.Fragmentation management for HW multitasking in 2D Reconfigurable Devices: Metrics and Defragmentation Heuristics11 Fragmentation management for HW multitasking in 2D Reconfigurable 2 Devices: Metrics and Defragmentation Heuristics x Julio Septién, Hortensia Mecha, Daniel Mozos and Jesus Tabero Fragmentation management for HW multitasking in 2D Reconfigurable Devices: Metrics and Defragmentation Heuristics Julio Septién, Hortensia Mecha, Daniel Mozos and Jesus Tabero University Complutense de Madrid Spain 1. Introduction Hardware multitasking has become a real possibility as a consequence of FPGA advances along the last decade, such as partial run-time reconfiguration capability and increased FPGA size. Partial reconfiguration times are small enough, and FPGA sizes large enough, to consider reconfigurable environments where a single FPGA managed by an extended operating system can store and run simultaneously several whole tasks, even belonging to different users. The problem of HW multitasking management involves decisions such as the structure used to keep track of the free FPGA resources, the allocation of FPGA resources for each incoming task, the scheduling of the task execution at a certain time instant, where its time constraints are satisfied, and others that have been studied in detail in (Wigley & Kearney, 2002a). The tasks enter and leave the FPGA dynamically, and thus FPGA reuse due to hardware multitasking leads to fragmentation. When a task finishes execution and has to leave the FPGA, it leaves a hole that has to be incorporated to the FPGA free area. It becomes unavoidable that such process, repeated once and again, generates an external fragmentation that can lead to difficult situations where new tasks are unable to find room in the FPGA though there are free resources enough. The FPGA free area has become fragmented and it can not be used to accommodate future incoming tasks due to the way the free resources are spread along the FPGA. For 1D-reconfiguration architectures such as that of commercial Xilinx Virtex or Virtex II (only column-programmable, though they consist of 2D block arrays), simple management techniques based, for example, on several fixed-sized partitions or even arbitrary-sized partitions, are used, and fragmentation can be easily detected and managed (Steiger et al., 2004) (Ahmadinia et al., 2003). It is a linear problem alike to that of memory fragmentation in SW multitasking environments. The main problem for such architectures is not the management of the fragmented free area, but how defragmentation is accomplished by performing task relocation (Brebner & Diessel, 2001). Some systems even propose a 2D management of the 1D-reconfigurable, Virtex-type, architecture (Hübner et al., 2006) (van der Veen et al., 2005). 12Parallel and Distributed Computing For 2D-reconfigurable architectures such as Virtex 4 (Xilinx, Inc “Virtex-4 Configuration have been used, but for 2D a nice amount of interesting research has been done, and in this Guide) and 5 (Xilinx, Inc “Virtex-5 Configuration User Guide), more sophisticated section we’ll focus on such work. techniques must be used to keep track of the available free area, in order to get an efficient FPGA resource management (Bazargan et al., 2000) (Walder et al., 2003) (Diessel et al., 2000) 2.1 Fragmentation estimation (Ahmadinia et al., 2004) (Handa & Vemuri, 2004a) (Tabero et al., 2004). For such Fragmentation has been considered in the existing literature as an aspect of the area architectures the estimation of the FPGA fragmentation status through an accurate metric is management problem in HW multitasking, and thus most fragmentation metrics have been an important issue, and some researchers have proposed estimation metrics as in (Handa & proposed as part of different management techniques, most of them rectangle-based. Vemuri, 2004b), (Ejnioui & DeMara, 2005) and (Septien et al., 2008). What the 2D metric Bazargan presented in (Bazargan et al., 2000) a free area management and task allocation must estimate is how idoneous is the geometry of the free FPGA area to accommodate a heuristic that is broadly referenced. Such heuristic is based on MERs, maximum empty new task. rectangles. Bazargan´s allocator keeps track, with a high complexity algorithm, of all the A reliable fragmentation metric can be used in different ways: first, as a cost function when MERs (which can overlap) available in the free FPGA area. Such approach is optimal, in the the allocation decisions are being taken (Tabero et al., 2004). The use of a fragmentation sense that if there is free room enough for an incoming task, it is contained in one of the metric as cost function would guarantee future FPGA status with lower fragmentation (for available MERs. To select one of the MERs, Bazargan uses several techniques: First-Fit, the same FPGA occupation level), that would give a better probability of finding a location Worst-Fit, Best-fit… Though Bazargan does not estimate fragmentation directly, the for the next task. availability of large MERs at a given time is an indirect measure of the fragmentation status It can be used, also, as an alarm in order to trigger defragmentation measures as preventive of a given FPGA situation. actions or in extreme situations, that lead to relocation of one o more of the currently The MER approach, though, is so expensive in terms of update and search time that running tasks (van der Veen et al., 2005), (Diessel et al., 2000), (Septien et al., 2006) and Bazargan finally opted for a non-optimal approach to area management, by dividing the (Fekete et al., 2008). free area into a set of non-overlapping rectangles. In this work, we are going to review the fragmentation metrics proposed in the literature to Wigley proposes in (Wigley & Kearney, 2002b) a metric that must keep track of all the estimate the fragmentation of the FPGA resources, and we’ll present two fragmentation available MERs. Thus what we have just stated about the MER approach applies also to this metrics of our own, one of them based on the number and shape of the FPGA free holes, and metric. It considers fragmentation then as the average size of the maximal squares fitting another based on the relative quadrature of the free area perimeter. Then we´ll show into the more relevant set of MERs. Moreover, this metric does not discriminate enough, examples of how these metrics behave in different situations, with one or several free holes giving the same values for very different fragmentation situations. and also with islands (isolated tasks). We’ll also show how they can be used as cost Walder makes in (Walder & Platzner, 2002) an estimation of the free area fragmentation, functions in a location selection heuristic, each time a task is loaded into the FPGA. using non-overlapping rectangles similar to those of Bazargan. It considers the number of Experimental results show that though they maintain a low complexity, these metrics, rectangles with a given size. It uses a normalized, device-independent formula, to compute specially the quadrature-based one, behave better than most of the previous ones, the free area. Its main problem comes from the complexity of the technique needed to keep discarding a lower amount of computing volume when the FPGA supports a heavy task track of such rectangles. load. Handa in (Handa & Vemuri, 2004b) computes fragmentation referred to the average task We will review also the different approaches to FPGA defragmentation considered in the size. Holes with a size two times such value or more are not considered for the metric. literature, and we’ll propose a set of FPGA defragmentation techniques. Two basic Fragmentation then has not an absolute value for a given FPGA situation, but depends on techniques will be presented: preventive and on-demand defragmentation. Preventive the incoming task. It gives in general very low fragmentation values, even for situations measures will try to anticipate to possible allocation problems due to fragmentation. These with very disperse tasks and holes not too large compared to the total free area. measures will be triggered by a high fragmentation metric value. When fired, the system Ejnoui in (Ejnioui & DeMara, 2005) proposes a fragmentation metric that depends only on performs an immediate global or partial defragmentation, or a delayed global one the free area and the number of holes, and not on the shape of the holes. It can be considered depending on the time constraints of the involved tasks. On-demand measures try an urgent then a measure of the FPGA occupation, more than of FPGA fragmentation. There is a move of a single candidate task, the one with the highest relative adjacency with the hole fragmentation value of 0 only for an empty chip. When the FPGA is heavily loaded the border. Such battery of defragmentation measures can help avoiding most problems metric approaches to 1 quickly, independently from the hole shape. produced by fragmentation in HW multitasking on 2D reconfigurable devices. Cui in (Cui et al., 2007) computes fragmentation for all the MERs of the free area. For each MER this fragmentation is based on the probable size of the arriving task, and involves computations for each basic cell inside the MER. Thus the technique presents a heavy 2. Previous work complexity order that, as for other MER-based techniques, makes it difficult to use in a real The problems of fragmentation estimation and defragmentation are very different when environment. FPGAs managed in one or two dimensions are considered. For 1D, a few simple solutions All that has been explained above allows to make some assertions. The main feature of a good fragmentation metric should be its ability to detect when the free FPGA area is more or Fragmentation management for HW multitasking in 2D Reconfigurable Devices: Metrics and Defragmentation Heuristics13 For 2D-reconfigurable architectures such as Virtex 4 (Xilinx, Inc “Virtex-4 Configuration have been used, but for 2D a nice amount of interesting research has been done, and in this Guide) and 5 (Xilinx, Inc “Virtex-5 Configuration User Guide), more sophisticated section we’ll focus on such work. techniques must be used to keep track of the available free area, in order to get an efficient FPGA resource management (Bazargan et al., 2000) (Walder et al., 2003) (Diessel et al., 2000) 2.1 Fragmentation estimation (Ahmadinia et al., 2004) (Handa & Vemuri, 2004a) (Tabero et al., 2004). For such Fragmentation has been considered in the existing literature as an aspect of the area architectures the estimation of the FPGA fragmentation status through an accurate metric is management problem in HW multitasking, and thus most fragmentation metrics have been an important issue, and some researchers have proposed estimation metrics as in (Handa & proposed as part of different management techniques, most of them rectangle-based. Vemuri, 2004b), (Ejnioui & DeMara, 2005) and (Septien et al., 2008). What the 2D metric Bazargan presented in (Bazargan et al., 2000) a free area management and task allocation must estimate is how idoneous is the geometry of the free FPGA area to accommodate a heuristic that is broadly referenced. Such heuristic is based on MERs, maximum empty new task. rectangles. Bazargan´s allocator keeps track, with a high complexity algorithm, of all the A reliable fragmentation metric can be used in different ways: first, as a cost function when MERs (which can overlap) available in the free FPGA area. Such approach is optimal, in the the allocation decisions are being taken (Tabero et al., 2004). The use of a fragmentation sense that if there is free room enough for an incoming task, it is contained in one of the metric as cost function would guarantee future FPGA status with lower fragmentation (for available MERs. To select one of the MERs, Bazargan uses several techniques: First-Fit, the same FPGA occupation level), that would give a better probability of finding a location Worst-Fit, Best-fit… Though Bazargan does not estimate fragmentation directly, the for the next task. availability of large MERs at a given time is an indirect measure of the fragmentation status It can be used, also, as an alarm in order to trigger defragmentation measures as preventive of a given FPGA situation. actions or in extreme situations, that lead to relocation of one o more of the currently The MER approach, though, is so expensive in terms of update and search time that running tasks (van der Veen et al., 2005), (Diessel et al., 2000), (Septien et al., 2006) and Bazargan finally opted for a non-optimal approach to area management, by dividing the (Fekete et al., 2008). free area into a set of non-overlapping rectangles. In this work, we are going to review the fragmentation metrics proposed in the literature to Wigley proposes in (Wigley & Kearney, 2002b) a metric that must keep track of all the estimate the fragmentation of the FPGA resources, and we’ll present two fragmentation available MERs. Thus what we have just stated about the MER approach applies also to this metrics of our own, one of them based on the number and shape of the FPGA free holes, and metric. It considers fragmentation then as the average size of the maximal squares fitting another based on the relative quadrature of the free area perimeter. Then we´ll show into the more relevant set of MERs. Moreover, this metric does not discriminate enough, examples of how these metrics behave in different situations, with one or several free holes giving the same values for very different fragmentation situations. and also with islands (isolated tasks). We’ll also show how they can be used as cost Walder makes in (Walder & Platzner, 2002) an estimation of the free area fragmentation, functions in a location selection heuristic, each time a task is loaded into the FPGA. using non-overlapping rectangles similar to those of Bazargan. It considers the number of Experimental results show that though they maintain a low complexity, these metrics, rectangles with a given size. It uses a normalized, device-independent formula, to compute specially the quadrature-based one, behave better than most of the previous ones, the free area. Its main problem comes from the complexity of the technique needed to keep discarding a lower amount of computing volume when the FPGA supports a heavy task track of such rectangles. load. Handa in (Handa & Vemuri, 2004b) computes fragmentation referred to the average task We will review also the different approaches to FPGA defragmentation considered in the size. Holes with a size two times such value or more are not considered for the metric. literature, and we’ll propose a set of FPGA defragmentation techniques. Two basic Fragmentation then has not an absolute value for a given FPGA situation, but depends on techniques will be presented: preventive and on-demand defragmentation. Preventive the incoming task. It gives in general very low fragmentation values, even for situations measures will try to anticipate to possible allocation problems due to fragmentation. These with very disperse tasks and holes not too large compared to the total free area. measures will be triggered by a high fragmentation metric value. When fired, the system Ejnoui in (Ejnioui & DeMara, 2005) proposes a fragmentation metric that depends only on performs an immediate global or partial defragmentation, or a delayed global one the free area and the number of holes, and not on the shape of the holes. It can be considered depending on the time constraints of the involved tasks. On-demand measures try an urgent then a measure of the FPGA occupation, more than of FPGA fragmentation. There is a move of a single candidate task, the one with the highest relative adjacency with the hole fragmentation value of 0 only for an empty chip. When the FPGA is heavily loaded the border. Such battery of defragmentation measures can help avoiding most problems metric approaches to 1 quickly, independently from the hole shape. produced by fragmentation in HW multitasking on 2D reconfigurable devices. Cui in (Cui et al., 2007) computes fragmentation for all the MERs of the free area. For each MER this fragmentation is based on the probable size of the arriving task, and involves computations for each basic cell inside the MER. Thus the technique presents a heavy 2. Previous work complexity order that, as for other MER-based techniques, makes it difficult to use in a real The problems of fragmentation estimation and defragmentation are very different when environment. FPGAs managed in one or two dimensions are considered. For 1D, a few simple solutions All that has been explained above allows to make some assertions. The main feature of a good fragmentation metric should be its ability to detect when the free FPGA area is more or 14Parallel and Distributed Computing less apt to accommodate future incoming taks, that is, it must detect if it is efficiently or isolated islands appearing inside the hole defined by the VL, every time a new event inefficiently organized, and give a value to such organization. It must separate the happens. fragmentation estimation from the occupation degree, or the amount of available free area. As Figure 1 shows, we suppose a 2D-managed FPGA, with rectangular relocatable tasks For example, an FPGA status with a high occupation but with all the free area concentred in made of a number of basic reconfigurable basic blocks, each block includes processing a single, almost-square, rectangle, cannot be considered as fragmented as some of the elements and is able to access to a global interconnection network through a standard metrics previously presented do. Also, the metric must be computationally simple, and that interface, not depicted in the figure. suggests the inconvenience of the MER-based approach of some of the metrics reviewed. 2.2 Defragmentation techniques WaitingTasks Queue Running Tasks List HW manager Qw Lr As it was previously stated, the problem of defragmentation is different for 1D or 2D t t t 1 2 3 FPGAs. For FPGAs allowing reconfiguration in a single dimension, Compton (Compton et al., 2002), Brebner (Brebner & Diessel, 2001) or Koch (Koch et al., 2004) have proposed architectural features to perform defragmentation through relocation of complete columns or rows. For 2D-reconfigurable FPGAs, though many researchers estimate fragmentation, and even Vertex Vertex List T Task N use metrics to help their allocation algorithms to choose locations for the arriving tasks, as Selector Updater Scheduler section 2.1 has shown, only a few perform explicit defragmentation processes. Gericota proposes in (Gericota et al., 2003) architectural changes to a classical 2D FPGA to Defragmentation manager permit task relocation by replication of CLBs, in order to solve fragmentation problems. But Task Loader/Extractor they do not solve the problems of how to choose a new location or how to decide when this Fragmentation relocation must be performed. Metric Ejnioui (Ejnioui & DeMara, 2005) has proposed a fragmentation metric adapted from the one shown in (Tabero et al., 2003). They propose to use this estimation to schedule a Vertex List Vertex List Analyzer defragmentation process if a given threshold is reached. They comment several possible ways of defining such threshold, though they do not seem to choose any of them. Though they suggest several methodologies, they do not give experimental results that validate their approach. VL Finally, Van der Veen in (van der Veen et al., 2005) and (Fekete et al., 2008) uses a branch- FPGA and bound approach with constraints, in order to accomplish a global defragmentation process that searches for an optimal module layout. It is aimed to 2D FPGAs, though column-reconfigurable as current Virtex FPGAs. This process seems to be quite time- Fig. 1. HW management environment. consuming, of an order of magnitude of seconds. The authors do not give any information about how to insert such defragmentation process in a HW management system. Each incoming task T is originally defined by the tuple of parameters: i T = w , h , t_ex , t_arr , t_max iiiiii 3. HW management environment Our approach to reconfigurable HW management is summarized in Figure 1. Our where w times h indicates the task size in terms of basic reconfigurable blocks, t_ex is the iii environment is an extension of the operating system that consists of several modules. The task execution time, t_arr the task arrival time and t_max the maximum time allowed for i i Task Scheduler controls the tasks currently running in the FPGA and accepts new incoming the task to finish execution. These parameters are characteristic for each incoming task. tasks. Tasks can arrive anytime and must be processed on-line. The Vertex-List Updater If a suitable location is found, task T is finally allocated and scheduled for execution at an i keeps track of the available FPGA free area with a Vertex-List (VL) structure that has been instant t_start . If not, the task goes to the queue Qw, and it is reconsidered again at each i described in detail in (Tabero et al., 2003), updating it whenever a new event happens. Such task-end event or after defragmentation. We call the current time t_curr. All the times but structure can be travelled with different heuristics ((Tabero et al., 2003), (Tabero et al., 2006), t_ex are absolute (referred to the same time origin). We estimate t_conf , the time needed to ii and (Walder & Platzner, 2002)) by the Vertex Selector in order to choose the vertex where load the configuration of the task, proportional to its size: t_conf = k w h . iii each arriving task will be placed. Finally, a permanent checking of the FPGA status is made by the Free Area Analyzer. Such module estimates the FPGA fragmentation and checks for Fragmentation management for HW multitasking in 2D Reconfigurable Devices: Metrics and Defragmentation Heuristics15 less apt to accommodate future incoming taks, that is, it must detect if it is efficiently or isolated islands appearing inside the hole defined by the VL, every time a new event inefficiently organized, and give a value to such organization. It must separate the happens. fragmentation estimation from the occupation degree, or the amount of available free area. As Figure 1 shows, we suppose a 2D-managed FPGA, with rectangular relocatable tasks For example, an FPGA status with a high occupation but with all the free area concentred in made of a number of basic reconfigurable basic blocks, each block includes processing a single, almost-square, rectangle, cannot be considered as fragmented as some of the elements and is able to access to a global interconnection network through a standard metrics previously presented do. Also, the metric must be computationally simple, and that interface, not depicted in the figure. suggests the inconvenience of the MER-based approach of some of the metrics reviewed. 2.2 Defragmentation techniques WaitingTasks Queue Running Tasks List HW manager Qw Lr As it was previously stated, the problem of defragmentation is different for 1D or 2D t t t 1 2 3 FPGAs. For FPGAs allowing reconfiguration in a single dimension, Compton (Compton et al., 2002), Brebner (Brebner & Diessel, 2001) or Koch (Koch et al., 2004) have proposed architectural features to perform defragmentation through relocation of complete columns or rows. For 2D-reconfigurable FPGAs, though many researchers estimate fragmentation, and even Vertex Vertex List T Task N use metrics to help their allocation algorithms to choose locations for the arriving tasks, as Selector Updater Scheduler section 2.1 has shown, only a few perform explicit defragmentation processes. Gericota proposes in (Gericota et al., 2003) architectural changes to a classical 2D FPGA to Defragmentation manager permit task relocation by replication of CLBs, in order to solve fragmentation problems. But Task Loader/Extractor they do not solve the problems of how to choose a new location or how to decide when this Fragmentation relocation must be performed. Metric Ejnioui (Ejnioui & DeMara, 2005) has proposed a fragmentation metric adapted from the one shown in (Tabero et al., 2003). They propose to use this estimation to schedule a Vertex List Vertex List Analyzer defragmentation process if a given threshold is reached. They comment several possible ways of defining such threshold, though they do not seem to choose any of them. Though they suggest several methodologies, they do not give experimental results that validate their approach. VL Finally, Van der Veen in (van der Veen et al., 2005) and (Fekete et al., 2008) uses a branch- FPGA and bound approach with constraints, in order to accomplish a global defragmentation process that searches for an optimal module layout. It is aimed to 2D FPGAs, though column-reconfigurable as current Virtex FPGAs. This process seems to be quite time- Fig. 1. HW management environment. consuming, of an order of magnitude of seconds. The authors do not give any information about how to insert such defragmentation process in a HW management system. Each incoming task T is originally defined by the tuple of parameters: i T = w , h , t_ex , t_arr , t_max iiiiii 3. HW management environment Our approach to reconfigurable HW management is summarized in Figure 1. Our where w times h indicates the task size in terms of basic reconfigurable blocks, t_ex is the iii environment is an extension of the operating system that consists of several modules. The task execution time, t_arr the task arrival time and t_max the maximum time allowed for i i Task Scheduler controls the tasks currently running in the FPGA and accepts new incoming the task to finish execution. These parameters are characteristic for each incoming task. tasks. Tasks can arrive anytime and must be processed on-line. The Vertex-List Updater If a suitable location is found, task T is finally allocated and scheduled for execution at an i keeps track of the available FPGA free area with a Vertex-List (VL) structure that has been instant t_start . If not, the task goes to the queue Qw, and it is reconsidered again at each i described in detail in (Tabero et al., 2003), updating it whenever a new event happens. Such task-end event or after defragmentation. We call the current time t_curr. All the times but structure can be travelled with different heuristics ((Tabero et al., 2003), (Tabero et al., 2006), t_ex are absolute (referred to the same time origin). We estimate t_conf , the time needed to ii and (Walder & Platzner, 2002)) by the Vertex Selector in order to choose the vertex where load the configuration of the task, proportional to its size: t_conf = k w h . iii each arriving task will be placed. Finally, a permanent checking of the FPGA status is made by the Free Area Analyzer. Such module estimates the FPGA fragmentation and checks for 16Parallel and Distributed Computing We also define t_marg , as the time margin each task is allowed to delay its completion, the during a defragmentation process. The HF estimation can be used to help in the vertex i time interval between the task scheduled finishing instant and its time-out (defined by selection process, as it is done in (Tabero et al., 2004), (Tabero et al., 2006) and (Tabero et al., t_max ). If the task has been scheduled at time t_start it must be computed as: 2008), or to check the FPGA status in order to fire a defragmentation process when needed i i (Septién et al. 2006). In the next sections we will focus in how we accomplish defragmentation. t_marg = t_max – (t_start + t_conf + t_ex ) (1) i i i i i But if the task has not been allocated yet, and is waiting at Qw, t_curr should be used instead of t_start . In this case, t_marg value decreases at each time cycle as t_curr advances. i i b) a) c) When t_marg reaches a value of 0 the task must be definitively rejected and deleted from i Qw. 4. Fragmentation analysis As explained in section 1, we will present two different techniques to estimate the FPGA fragmentation status: a hole-based metric and a quadrature-based one. HF = 0,6 HF = 0,6 HF = 0,6 4.1 Hole-based fragmentation metric e) d) f) The fragmentation status of the free FPGA area is directly related to the possibility of being able to find a suitable location for an arriving task. We have identified a fragmentation situation by the occurrence of several circumstances. First, proliferation of the number of independent free area holes, each one represented in our system by a different VL. And second, increasing complexity of the hole shape, that we relate with the number of vertices. A particular instance of a complex hole is created when it contains an occupied island inside, made of one of several tasks isolated from the rest. This ideas lead to the following metric HF, very similar to the one we presented in (Tabero HF = 0,89 HF = 0.99 HF = 0.67 et al., 2004): Fig. 2. Different FPGA situations and fragmentation values given by the HF metric. n HF = 1 -  (4/V ) (A /A ) (2) h H H F_FPGA Where the term between brackets represents a kind of “suitability” for a given hole H, with 4.2 Perimeter quadrature-based metric area A and V vertices: H H The HF metric presented in section 4.1 gives adequate fragmentation values for many n  (4/V ) represents the suitability of the shape of hole H to accommodate rectangular H situations, but does not handle well a few, particular ones. The main problem for such tasks. Notice that any hole with four vertices has the best suitability. For most of our vertex-based metric is that sometimes a hole with a complex boundary with many vertices experiments we employ n=1, but we can use higher or lower values if we want to can contain a significantly usable portion of free area. Also, the metric does not discriminate penalize more or less the occurrence of holes with complex shapes and thus difficult among holes with different shapes but the same number of vertices, as in Figures 2.a, 2.b to use. and 2.c. Moreover, as Figure 2.f shows the metric is not too sensible to islands. Finally,  (A /A ) represents the relative normalized hole area. A stands for the H F_FPGA F_FPGA another drawback is that the occurrence of several holes as in Figures 2.d and 2.e is severely whole free area in the FPGA. That is A = ∑ A . F_FPGA H penalized with very high (close to 1) fragmentation values. We will try to solve this problem with a new metric, derived form a different approach. This HF metric penalizes the proliferation of independent holes in the FPGA, as well as the occurrence of holes with complex shapes and small sizes. Figure 2 shows several A) Quadrature fragmentation metric basics fragmentation situations in an example FPGA of 20x20 basic blocks, and the fragmentation The new metric starts from a simple idea: we do consider the ideal free hole H as such one values estimated by the formula in (2). able to accommodate most of the incoming tasks with a variety of shapes and a total task A new estimation is done every time a new event occurs, that is, when a new task is placed area similar or smaller than the size of the hole H. The assumption we make is that such in the FPGA, when a finishing task leaves the FPGA, or when relocation decisions are taken ideal free hole should have a perfect square shape. Such hole would be able to accommodate Fragmentation management for HW multitasking in 2D Reconfigurable Devices: Metrics and Defragmentation Heuristics17 We also define t_marg , as the time margin each task is allowed to delay its completion, the during a defragmentation process. The HF estimation can be used to help in the vertex i time interval between the task scheduled finishing instant and its time-out (defined by selection process, as it is done in (Tabero et al., 2004), (Tabero et al., 2006) and (Tabero et al., t_max ). If the task has been scheduled at time t_start it must be computed as: 2008), or to check the FPGA status in order to fire a defragmentation process when needed i i (Septién et al. 2006). In the next sections we will focus in how we accomplish defragmentation. t_marg = t_max – (t_start + t_conf + t_ex ) (1) i i i i i But if the task has not been allocated yet, and is waiting at Qw, t_curr should be used instead of t_start . In this case, t_marg value decreases at each time cycle as t_curr advances. i i b) a) c) When t_marg reaches a value of 0 the task must be definitively rejected and deleted from i Qw. 4. Fragmentation analysis As explained in section 1, we will present two different techniques to estimate the FPGA fragmentation status: a hole-based metric and a quadrature-based one. HF = 0,6 HF = 0,6 HF = 0,6 4.1 Hole-based fragmentation metric e) d) f) The fragmentation status of the free FPGA area is directly related to the possibility of being able to find a suitable location for an arriving task. We have identified a fragmentation situation by the occurrence of several circumstances. First, proliferation of the number of independent free area holes, each one represented in our system by a different VL. And second, increasing complexity of the hole shape, that we relate with the number of vertices. A particular instance of a complex hole is created when it contains an occupied island inside, made of one of several tasks isolated from the rest. This ideas lead to the following metric HF, very similar to the one we presented in (Tabero HF = 0,89 HF = 0.99 HF = 0.67 et al., 2004): Fig. 2. Different FPGA situations and fragmentation values given by the HF metric. n HF = 1 -  (4/V ) (A /A ) (2) h H H F_FPGA Where the term between brackets represents a kind of “suitability” for a given hole H, with 4.2 Perimeter quadrature-based metric area A and V vertices: H H The HF metric presented in section 4.1 gives adequate fragmentation values for many n  (4/V ) represents the suitability of the shape of hole H to accommodate rectangular H situations, but does not handle well a few, particular ones. The main problem for such tasks. Notice that any hole with four vertices has the best suitability. For most of our vertex-based metric is that sometimes a hole with a complex boundary with many vertices experiments we employ n=1, but we can use higher or lower values if we want to can contain a significantly usable portion of free area. Also, the metric does not discriminate penalize more or less the occurrence of holes with complex shapes and thus difficult among holes with different shapes but the same number of vertices, as in Figures 2.a, 2.b to use. and 2.c. Moreover, as Figure 2.f shows the metric is not too sensible to islands. Finally,  (A /A ) represents the relative normalized hole area. A stands for the H F_FPGA F_FPGA another drawback is that the occurrence of several holes as in Figures 2.d and 2.e is severely whole free area in the FPGA. That is A = ∑ A . F_FPGA H penalized with very high (close to 1) fragmentation values. We will try to solve this problem with a new metric, derived form a different approach. This HF metric penalizes the proliferation of independent holes in the FPGA, as well as the occurrence of holes with complex shapes and small sizes. Figure 2 shows several A) Quadrature fragmentation metric basics fragmentation situations in an example FPGA of 20x20 basic blocks, and the fragmentation The new metric starts from a simple idea: we do consider the ideal free hole H as such one values estimated by the formula in (2). able to accommodate most of the incoming tasks with a variety of shapes and a total task A new estimation is done every time a new event occurs, that is, when a new task is placed area similar or smaller than the size of the hole H. The assumption we make is that such in the FPGA, when a finishing task leaves the FPGA, or when relocation decisions are taken ideal free hole should have a perfect square shape. Such hole would be able to accommodate 18Parallel and Distributed Computing B) QF metric for multiple holes most incoming tasks. One of the advantages of a square shape task would be that the longest interconnections inside the task would be shorter than for irregular shape tasks with The QF metric can be easily extended to a more complex free area made of several holes, by the same area, or even rectangular ones. considering the whole boundary between the free and the occupied area as a single For any hole H with an area A a perimeter P and a non-square shape, we define its perimeter. Then P and A values would be used computed as: H H relative quadrature Q as “how its shape is near from being a perfect square”. We estimate such magnitude dividing its actual area A by the area A of a perfect square with the same H Q and (6) PP i perimeter P . A that is computed as: H Q i 2 A = (P / 4) (3) Q H (7) AA i i where P / 4 would be the length of each one of the square sides. Then the relative H quadrature is: And the global fragmentation is computed as: Q = A /A (4) H Q and thus fragmentation is: 2 QF = 1 – A/(P / 4) (8) QF = 1 – Q (5) The global fragmentation value given by QF would be, then, a measure of how far from It can be seen that our quadrature-based metric QF will consider that fragmentation for a being an ideal single hole is the whole available free area delimited by P. given hole H is minimal (0) when it has a square shape. On the contrary, the longest Figure 4 shows several situations for the same 20x20 FPGA and five running tasks than perimeter gives a higher fragmentation value. In Figure 3 we can see a set of five running tasks in a 20x20 FPGA, placed at different Figure 3. Now the tasks are located at different positions, and the free area A is divided into locations. The free area is of 169 basic area units for all of them. But the perimeter P an thus two (Figures 4.a and 4.b) or even three (Figure 4.c) independent holes. The figure shows how our metric does not need to take into account the number of holes to estimate the the A and Q values are different for each one, as the figure shows. Thus the fragmentation Q quality of the different FPGA situations. QF differs, and is smaller for the FPGA situation with a free area shape more apt to accommodate future incoming tasks, supposedly Figure 3.f. It can be noticed, also, how the QF metric, in contrast with the HF metric, gives different fragmentation values for holes c) a) b) with the same number of vertices (10 in all the cases) but different shapes, as in Figures 3.a, 3.e or 3.f. b) c) a) P = 82 Q = 0,40 P = 78 Q = 0,44 P = 90 Q = 0,33 AQ = 420,25 QF = 0,6 A = 380,25 QF = 0,56 A = 506,25 QF = 0,67 A = 169 Q = 1 Q Q = 0.73 P = 52 QF = 0 T 5 l = P/4 = 13 P = 62 Q = 0,70 P = 70 Q = 0,55 P = 62 Q = 0,70 Fig. 4. QF metric values for different tasks locations and multiple holes A = 306,25 QF = 0,45 AQ = 240,25 QF = 0,30 Q AQ = 240,25 QF = 0,30 C) QF metric for islands d) e) f) A situation that our metric deals with automatically is the occurrence of islands. Islands are high fragmentation, undesirable situations that can happen as some tasks finish and leave the FPGA, while others remain. It is important that a fragmentation metric is able to deal with such situations. Our metric deals with it automatically, because in our representation of the free area perimeter (a vertex list), the island is connected to the rest of the perimeter with virtual edges, as depicted in Figure 5. These virtual edges are considered as part of the perimeter P = 68 Q = 0,58 P = 76 Q = 0,47 P = 56 Q = 0,86 A = 289 QF = 0,42 A = 361 QF = 0,53 A = 196 QF = 0,14 when P is computed. Thus, an island close to the perimeter will have short virtual edges and Q Q Q the P value will be lower than when the island is more distant. As an island, even a small one, can be quite annoying when it is located in the middle of a large hole, virtual edges can Fig. 3. QF metric values for different task locations and a single hole. Fragmentation management for HW multitasking in 2D Reconfigurable Devices: Metrics and Defragmentation Heuristics19 B) QF metric for multiple holes most incoming tasks. One of the advantages of a square shape task would be that the longest interconnections inside the task would be shorter than for irregular shape tasks with The QF metric can be easily extended to a more complex free area made of several holes, by the same area, or even rectangular ones. considering the whole boundary between the free and the occupied area as a single For any hole H with an area A a perimeter P and a non-square shape, we define its perimeter. Then P and A values would be used computed as: H H relative quadrature Q as “how its shape is near from being a perfect square”. We estimate such magnitude dividing its actual area A by the area A of a perfect square with the same H Q and (6) PP i perimeter P . A that is computed as: H Q i 2 A = (P / 4) (3) Q H (7) AA i i where P / 4 would be the length of each one of the square sides. Then the relative H quadrature is: And the global fragmentation is computed as: Q = A /A (4) H Q and thus fragmentation is: 2 QF = 1 – A/(P / 4) (8) QF = 1 – Q (5) The global fragmentation value given by QF would be, then, a measure of how far from It can be seen that our quadrature-based metric QF will consider that fragmentation for a being an ideal single hole is the whole available free area delimited by P. given hole H is minimal (0) when it has a square shape. On the contrary, the longest Figure 4 shows several situations for the same 20x20 FPGA and five running tasks than perimeter gives a higher fragmentation value. In Figure 3 we can see a set of five running tasks in a 20x20 FPGA, placed at different Figure 3. Now the tasks are located at different positions, and the free area A is divided into locations. The free area is of 169 basic area units for all of them. But the perimeter P an thus two (Figures 4.a and 4.b) or even three (Figure 4.c) independent holes. The figure shows how our metric does not need to take into account the number of holes to estimate the the A and Q values are different for each one, as the figure shows. Thus the fragmentation Q quality of the different FPGA situations. QF differs, and is smaller for the FPGA situation with a free area shape more apt to accommodate future incoming tasks, supposedly Figure 3.f. It can be noticed, also, how the QF metric, in contrast with the HF metric, gives different fragmentation values for holes c) a) b) with the same number of vertices (10 in all the cases) but different shapes, as in Figures 3.a, 3.e or 3.f. b) c) a) P = 82 Q = 0,40 P = 78 Q = 0,44 P = 90 Q = 0,33 AQ = 420,25 QF = 0,6 A = 380,25 QF = 0,56 A = 506,25 QF = 0,67 A = 169 Q = 1 Q Q = 0.73 P = 52 QF = 0 T 5 l = P/4 = 13 P = 62 Q = 0,70 P = 70 Q = 0,55 P = 62 Q = 0,70 Fig. 4. QF metric values for different tasks locations and multiple holes A = 306,25 QF = 0,45 AQ = 240,25 QF = 0,30 Q AQ = 240,25 QF = 0,30 C) QF metric for islands d) e) f) A situation that our metric deals with automatically is the occurrence of islands. Islands are high fragmentation, undesirable situations that can happen as some tasks finish and leave the FPGA, while others remain. It is important that a fragmentation metric is able to deal with such situations. Our metric deals with it automatically, because in our representation of the free area perimeter (a vertex list), the island is connected to the rest of the perimeter with virtual edges, as depicted in Figure 5. These virtual edges are considered as part of the perimeter P = 68 Q = 0,58 P = 76 Q = 0,47 P = 56 Q = 0,86 A = 289 QF = 0,42 A = 361 QF = 0,53 A = 196 QF = 0,14 when P is computed. Thus, an island close to the perimeter will have short virtual edges and Q Q Q the P value will be lower than when the island is more distant. As an island, even a small one, can be quite annoying when it is located in the middle of a large hole, virtual edges can Fig. 3. QF metric values for different task locations and a single hole.

Advise: Why You Wasting Money in Costly SEO Tools, Use World's Best Free SEO Tool Ubersuggest.