There are 128kB memory per workgroup processor split up into 64 banks of dword-wide RAMs. These 64 banks are further sub-divided into two sets of 32-banks each where 32 of the banks are affiliated with a pair of SIMD32’s, and the other 32 banks are affiliated with the other pair of SIMD32’s within the WGP. Each bank is a 512x32 two-port RAM (1R/1W per clock cycle). Dwords are placed in the banks serially, but all banks can execute a store or load simultaneously. One work-group can request up to 64kB memory
The high bandwidth of the LDS
memory is achieved not only through its proximity to the ALUs, but also through simultaneous access to its memory banks. Thus, it is possible to concurrently execute 32 write or read instructions, each nominally 32-bits; extended instructions, read2/write2, can be 64-bits each. If, however, more than one access attempt is made to the same bank at the same time, a bank conflict occurs. In this case, for indexed and atomic operations, hardware prevents the attempted concurrent accesses to the same bank by turning them into serial accesses. This decreases the effective bandwidth of the LDS. For increased throughput (optimal efficiency), therefore, it is important to avoid bank conflicts. A knowledge of request scheduling and address mapping is key to achieving this.
Dataflow in Memory Hierarchy
The figure below is a conceptual diagram of the dataflow within the memory structure.Data can be loaded into LDS either by transferring it from VGPRs to LDS using "DS" instructions, or by loading in from memory. When loading from memory, the data may be loaded into VGPRs first or for some types of loads it may be loaded directly into LDS from memory. To store data from LDS to global memory, data is read from LDS and placed into the workitem’s VGPRs, then written out to global memory. To make effective use of the LDS, a kernel must perform many operations on what is transferred between global memory and LDS. LDS atomics are performed in the LDS hardware. (Thus, although ALUs are not directly used for these operations, latency is incurred by the LDS executing this function.)
Our superior products
ABB -- AC 800M controller, Bailey, PM866 controller, IGCT silicon controlled 5SHY 3BHB01 3BHEO0 3HNA00 DSOC series
BENTLY --- 3500 system/proximitor, front and rear cards, sensors, power modules, probes, cables
Emerson -- modbus card, power panel, controller, power supply, base, power module, switch
EPRO --- Data acquisition module, probe, speed sensor, vibration sensor, shaft vibration transmitter, proximitor
FOXBORO - thermal resistance input/output module, power module, communication module, cable, controller, switch
GE --- module, air switch, I/O module, display, CPU module, power module, converter, CPU board, Ethernet module, integrated protection device, power module, gas turbine card
HIMA --- DI module, processor module, AI card, pulse encoder
Honeywell --- Secure digital output card, program module, analog input card, CPU module, FIM card
MOOG - servo valve, controller, module, power module
NI --- Information acquisition card, PXI module, card, chassis multi-channel control card
WESTINGHOUSE --- RTD thermal resistance input module, AI/AO/DI/DO module, power module, control module, base module
Woodward - Regulator, module, controller, governor
YOKOGAWA - Servo module, control cabinet node unit