tion addresses differ, and if they do not, inserting up to 8K bytes of padding between the arrays. This rule will avoid thrashing in direct-mapped TBs and in some large direct-mapped data caches with total sizes of 32 pages (256 KB) or more.

Usually, this padding will mean zero extra bytes in the executable image, just a skip in virtual address space to the next-higher page boundary.

For large caches, the rule above should be applied to the I-stream, in addition to all the D-stream references. Some implementations will have combined I-stream/D-stream large caches.

Both of the rules above can be satisfied simultaneously, thus often eliminating thrashing in all anticipated direct-mapped cache/TB implementations.

A.3.4 Sequential Read/Write — Factor of 1

All other things being equal, sequences of consecutive reads or writes should use ascending (rather than descending) memory addresses. Where possible, the memory address for a block of 2**Kbytes should be on a 2**K boundary, since this minimizes the number of different cache blocks used and minimizes the number of partially written cache blocks.

To avoid overrunning memory bandwidth, sequences of more than eight quadword load or store instructions should be broken up with intervening instructions (if there is any useful work to be done).

For consecutive reads, implementors should give first priority to prefetching ascending cache blocks and second priority to absorbing up to eight consecutive quadword load instructions (aligned on a 64-byte boundary) without stalling.

For consecutive writes, implementors should give first priority to avoiding read overhead for fully written aligned cache blocks and second priority to absorbing up to eight consecutive quadword store instructions (aligned on a 64-byte boundary) without stalling.

A.3.5 Prefetching — Factor of 3

Prefetching can be directed toward a cache block (a cache line) in the primary cache.

Alpha hardware, beginning with the 21164 (EV5) and subsequent, supports cache block prefetching. Cache block prefetching is performed by the following load operations to the R31 or F31 register:

Table A–1: Cache Block Prefetching

Type

Instructions

Operation

 

 

 

 

 

 

 

Normal Prefetch

LDL R31, xxx (Rn)

If the load operation hits in the Dcache,

the

 

 

instruction

is

dismissed;

otherwise,

the

 

 

addressed

cache

block is

allocated into

the

 

 

Dcache.

 

 

 

 

 

 

 

 

 

 

 

A–8Alpha Architecture Handbook

Page 282
Image 282
Compaq ECQD2KCTE manual Sequential Read/Write Factor, Prefetching Factor