IBM 990 IEEE Floating Point, Translation Lookaside Buffer, Instruction fetching and instruction decode, Instruction grouping, Extended Translation Facility

Chapter 2. System structure and design 45

Draft Document for Review April 7, 2004 6:15 pm 6947ch02.fm

IEEE Floating Point

The inclusion of the IEEE Standard for Binary Floating Point Arithmetic (IEEE 754-1985) in

S/390 was made to further enhance the value of this platform for this type of calculation. The

initial implementation had 121 floating-point instructions over prior S/390 CMOS models

(Hexadecimal Floating Point had 54 instructions). Later, with the introduction of the 64-bit

architecture, 12 additional instructions were added for IEEE Binary Floating Point Arithmetic

64-bit integer conversion.

The key point is that Java and C/C++ applications tend to use IEEE Binary Floating Point

operations more frequently than legacy applications. This means that the better the hardware

implementation of this set of instructions, the better the performance of e-business

applications will be.

On earlier systems, the emphasis has been on the traditional hexadecimal floating point

arithmetic. The z990 has a Binary Floating Point unit that matches the performance of the

traditional hexadecimal floating point unit by halving the number of cycles required earlier.

Translation Lookaside Buffer

The Translation Lookaside Buffer (TLB) in the Instruction and Data L1 caches now have a

secondary TLB to enhance performance. In addition, a translator unit is added to translate

misses in the secondary TLB.

Instruction fetching and instruction decode

The superscalar design of the z990 microprocessor allows for the decoding of up to two

instructions per cycle and the execution of three instructions per cycle. Execution takes place

in order, but storage accesses for instruction and operand fetching may occur out of

sequence.

Instruction fetch in non-z990 models tries to get as far ahead of instruction decode and

execution as possible because of the relatively large instruction buffers available. In the z990

microprocessor, smaller instruction buffers are used. The operation code is fetched from the

I-cache and put in instruction buffers that hold pre-fetched data awaiting decode.

The processor can decode one or two instruction per cycle. The result of the decoding

process is queued and subsequently used to form a group.

Instruction grouping

From the instruction queue, one simple branch instruction and up to two general instructions

can be issued every cycle. The instructions are taken from the instruction queue and grouped

together. The instructions are assembled according to instruction grouping rules. A complete

description of the rules is beyond the scope of this redbook.

It is the compiler’s responsibility to select instructions that best fit with the z990 superscalar

microprocessor and abide by the grouping rules to create code that best exploits the

superscalar implementation.

Extended Translation Facility

The Extended Translation Facility adds 10 instructions to the zSeries instruction set. They

enhance the performance for data conversion operations for data encoded in Unicode,

making applications enabled for Unicode and/or Globalization more efficient. These data

encoding formats are used in Web Services, Grid, and on demand environments where XML,