Chapter 2. System structure and design 41
Draft Document for Review April 7, 2004 6:15 pm 6947ch02.fm
Data transfer between the CEC memory and attached I/O devices or CPCs is done through
the Memory Bus Adapter. The physical path includes the Channel card (except for STI
connected CPCs), the Self-Timed Interconnect bus and possibly a STI extender card, the
Storage Control, and the Storage Data chips.
More detailed information about I/O connectivity and channel types can be found in “I/O
subsystem” on page 71.

Dual External Time Reference

The optional ETR connections, although not part of the book design, are found adjacent to
the books on the opposite side of the CEC board. The z990 servers implement an Enhanced
ETR Attachment Facility (EEAF) designed to provide a dual External Time Reference (ETR)
attachment facility. Two ETR cards are automatically shipped when Coupling Links are
ordered and provide a dual path interface to the IBM Sysplex Timers, which are used for
timing synchronization between systems in a Sysplex environment. This allows continued
operation even if a single ETR card fails. This redundant design also allows concurrent
maintenance.
2.2.3 Processor Unit design
Each PU is optimized to meet the demands of new e-business workloads, without
compromising the performance characteristics of traditional workloads. The PUs in the z990
have a superscalar design.

Superscalar processor

A scalar processor is a processor that is based on a single issue architecture, which means
that only a single instruction is executed at a time. A superscalar processor allows concurrent
execution of instruction by adding additional resources onto the microprocessor to achieve
more parallelism by creating multiple pipelines, each working on their own set of instructions.
A superscalar processor is based on a multi-issue architecture. In such a processor, where
multiple instructions can be executed at each cycle, a higher level of complexity is reached
because an operation in one pipeline may depend on data in another pipeline. A superscalar
design therefore demands careful consideration of which instruction sequences can
successfully operate in a multi-pipeline environment.
As an example, consider the following: if the branch prediction logic of the microprocessor
makes the wrong prediction, it might be necessary to remove all instructions in the parallel
pipelines also (refer to “Processor Branch History Table (BHT)” on page44 for more details).
There are challenges in creating an efficient superscalar processor. The superscalar design
of the z990 PU has made big strides in avoiding address generation interlock situations.
Instructions requiring to get information from memory locations may suffer multi cycle delays
to get the memory content. The superscalar design of the z990 PU tries to overcome these
delays by continuing to execute (single cycle) instructions that do not cause delays. The
technique used is called “out-of-order operand fetching”. This means that some instructions in
the instruction stream are already underway, while earlier instructions in the instruction
stream that cause delays due to storage references, take longer. Eventually the delayed
instructions catch up with the already fetched instructions and all are executed in the
designated order. The z990 PU gets much of its superscalar performance benefits from
avoiding address generation interlocks.
It is not only the processor that contributes to the capability of successful execution of
instructions in parallel. Given a superscalar design, compilers and interpreters must create
code that benefit optimally from the particular superscalar processor implementation. Work is