FLAT instructions can use zero to four consecutive Dwords of data in VGPRs and/or memory. The DATA field determines which VGPR(s) supply source data (if any), and the VDST VGPRs hold return data (if any). No data-format conversion is done. “D16” instructions use only 16-bit of the VGPR instead of the full 32bits. “D16_HI” instructions read or write only the high 16-bits, while “D16” use the low 16-bits. Scratch & Global D16 load instructions with LDS=1 will write the entire 32-bits of LDS.FLAT_SCRATCH is a 64-bit, byte address. The shader composes the value by adding together two separate values: the base address, which can be passed in via an initialized SGPR, or perhaps through a constant buffer, and the per-wave allocation offset (also initialized in an SGPR).
Scratch Space (Private)
Scratch (thread-private memory) is an area of memory defined by the aperture registers. When an address falls in scratch space, additional address computation is automatically performed by the hardware. The kernel must provide additional information for this computation to occur in the form of the FLAT_SCRATCH register. The wavefront must supply the scratch size and offset (for space allocated to this wave) with every FLAT request. Prior to issuing any FLAT or Scratch instructions, the shader program must initialize the FLAT_SCRATCH register with the base address of scratch space allocated this wave.
Data Share Operations
Local data share (LDS) is a very low-latency, RAM scratchpad for temporary data with at least one order of magnitude higher effective bandwidth than direct, uncached global memory. It permits sharing of data between work-items in a work-group, as well as holding parameters for pixel shader parameter interpolation. Unlike read-only caches, the LDS permits high-speed write-to-read re-use of the memory space (gather/read/load and scatter/write/store operations).The figure below shows the conceptual framework of the LDS is integration into the memory of AMD GPUs using OpenCL.Physically located on-chip, directly adjacent to the ALUs, the LDS is approximately one order of magnitude faster than global memory (assuming no bank conflicts).
Our superior products
ABB -- AC 800M controller, Bailey, PM866 controller, IGCT silicon controlled 5SHY 3BHB01 3BHEO0 3HNA00 DSOC series
BENTLY --- 3500 system/proximitor, front and rear cards, sensors, power modules, probes, cables
Emerson -- modbus card, power panel, controller, power supply, base, power module, switch
EPRO --- Data acquisition module, probe, speed sensor, vibration sensor, shaft vibration transmitter, proximitor
FOXBORO - thermal resistance input/output module, power module, communication module, cable, controller, switch
GE --- module, air switch, I/O module, display, CPU module, power module, converter, CPU board, Ethernet module, integrated protection device, power module, gas turbine card
HIMA --- DI module, processor module, AI card, pulse encoder
Honeywell --- Secure digital output card, program module, analog input card, CPU module, FIM card
MOOG - servo valve, controller, module, power module
NI --- Information acquisition card, PXI module, card, chassis multi-channel control card
WESTINGHOUSE --- RTD thermal resistance input module, AI/AO/DI/DO module, power module, control module, base module
Woodward - Regulator, module, controller, governor
YOKOGAWA - Servo module, control cabinet node unit