Chapter 2. System structure and design 45
Draft Document for Review April 7, 2004 6:15 pm 6947ch02.fm
IEEE Floating Point
The inclusion of the IEEE Standard for Binary Floating Point Arithmetic (IEEE 754-1985) in
S/390 was made to further enhance the value of this platform for this type of calculation. The
initial implementation had 121 floating-point instructions over prior S/390 CMOS models
(Hexadecimal Floating Point had 54 instructions). Later, with the introduction of the 64-bit
architecture, 12 additional instructions were added for IEEE Binary Floating Point Arithmetic
64-bit integer conversion.
The key point is that Java and C/C++ applications tend to use IEEE Binary Floating Point
operations more frequently than legacy applications. This means that the better the hardware
implementation of this set of instructions, the better the performance of e-business
applications will be.
On earlier systems, the emphasis has been on the traditional hexadecimal floating point
arithmetic. The z990 has a Binary Floating Point unit that matches the performance of the
traditional hexadecimal floating point unit by halving the number of cycles required earlier.
Translation Lookaside Buffer
The Translation Lookaside Buffer (TLB) in the Instruction and Data L1 caches now have a
secondary TLB to enhance performance. In addition, a translator unit is added to translate
misses in the secondary TLB.
Instruction fetching and instruction decode
The superscalar design of the z990 microprocessor allows for the decoding of up to two
instructions per cycle and the execution of three instructions per cycle. Execution takes place
in order, but storage accesses for instruction and operand fetching may occur out of
sequence.

Instruction fetching

Instruction fetch in non-z990 models tries to get as far ahead of instruction decode and
execution as possible because of the relatively large instruction buffers available. In the z990
microprocessor, smaller instruction buffers are used. The operation code is fetched from the
I-cache and put in instruction buffers that hold pre-fetched data awaiting decode.

Instruction decoding

The processor can decode one or two instruction per cycle. The result of the decoding
process is queued and subsequently used to form a group.
Instruction grouping
From the instruction queue, one simple branch instruction and up to two general instructions
can be issued every cycle. The instructions are taken from the instruction queue and grouped
together. The instructions are assembled according to instruction grouping rules. A complete
description of the rules is beyond the scope of this redbook.
It is the compiler’s responsibility to select instructions that best fit with the z990 superscalar
microprocessor and abide by the grouping rules to create code that best exploits the
superscalar implementation.
Extended Translation Facility
The Extended Translation Facility adds 10 instructions to the zSeries instruction set. They
enhance the performance for data conversion operations for data encoded in Unicode,
making applications enabled for Unicode and/or Globalization more efficient. These data
encoding formats are used in Web Services, Grid, and on demand environments where XML,