IBM 750GL, 750GX stfd, stfs, 6.4.5 Load/Store Unit Execution Timing, 6.4.6 Effect of Operand Placement on Performance

User’s Manual

IBM PowerPC 750GX and 750GL RISC Microprocessor

6.4.5 Load/Store Unit Execution Timing

The execution of most load-and-store instructions is pipelined. The LSU has two pipeline stages. The first is for effective address calculation and MMU translation, and the second is for accessing data in the cache. Load-and-store instructions have a 2-cycle latency and 1-cycle throughput. For instructions that store FPR values (Store Floating-Point Double [stfd], Store Floating-Point Single [stfs], and their variations), the data to be stored is prefetched from the source register during the first pipeline stage. In cases where this register is updated that same cycle, the instruction will stall to get the correct data, resulting in one additional cycle of latency.

If operands are misaligned, additional latency might be required either for an alignment exception to be taken or for additional bus accesses. Load instructions that miss in the cache block require subsequent cache accesses during the cache-line refill. Table 6-9 on page 244 gives load-and-store instruction execution laten- cies.

6.4.6 Effect of Operand Placement on Performance

The PowerPC virtual environment architecture (VEA) states that the placement (location and alignment) of operands in memory might affect the relative performance of memory accesses, and in some cases affect it significantly. The effects memory operand placement has on performance are shown in Table 6-2.

The best performance is guaranteed if memory operands are aligned on natural boundaries. For the best performance across the widest range of implementations, the programmer should assume the performance model described in Chapter 3, “Operand Conventions” in the PowerPC Microprocessor Family: The Programming Environments Manual.

The effect of misalignment on memory-access latency is the same for big and little-endian addressing modes except for multiple and string operations that cause an alignment exception in little-endian mode.

Table 6-2. Performance Effects of Memory Operand Placement (Page 1 of 2)

Operand				Boundary Crossing

Size	Byte Alignment	None		8 Byte	Cache Block	Protection
Size	Byte Alignment	None		8 Byte	Cache Block	Boundary
						Boundary


Integer

4 byte	4	Optimal1		—	—	—
4 byte	< 4	Optimal		Good2	Good	Good
	< 4	Optimal		Good2	Good	Good
2 byte	2	Optimal		—	—	—

	< 2	Optimal		Good	Good	Good
	< 2	Optimal		Good	Good	Good

1 byte	1	Optimal		—	—	—

Load Multiple Word (lmw),	4	Good		Good	Good	Good
Load Multiple Word (lmw),
Store Multiple Word (stmw)3
Store Multiple Word (stmw)3	< 4	Poor	4	Poor	Poor	Poor
	< 4	Poor		Poor	Poor	Poor

String3	—	Good		Good	Good	Good

Note:
1. Optimal means one EA calculation occurs.
2. Good means multiple EA calculations occur that might cause additional bus activities with multiple bus transfers.
3. Not supported in little-endian mode; causes an alignment exception.
4. Poor means that an alignment exception occurs.


gx_06.fm.(1.2)						Instruction Timing
March 27, 2006						Page 233 of 377