data:image/s3,"s3://crabby-images/93ae7/93ae7fc8c97c31350c5cc520f45d57e2557efd7b" alt=""
In a frequently executed loop, compilers could allocate the data items accessed from memory so that, on each loop iteration, all of the memory addresses accessed are either in exactly the same aligned
Example:
REAL*4 A(1000),B(1000) DO 60 i=1,1000
60 A( i ) = f(B( i ))
Figures A–3, A–4, and A–5 show bad, better, and best allocation in cache, respectively.
BAD allocation (A and B thrash in 8 KB direct-mapped cache):
Figure A–3: Bad Allocation in Cache
A
B
0 | 4K | 8K | 12K | 16K |
BETTER allocation (A and B offset by 64 mod 2 KB, so 16 elements of A and 16 of B can be in cache simultaneously):
Figure A–4: Better Allocation in Cache
A
0
B
4K | 8K+64 | 12K | 16K |
BEST allocation (A and B offset by 64 mod 2 KB, so 16 elements of A and 16 of B can be in cache simultaneously, and both arrays fit entirely in 8 KB or bigger cache):
Figure A–5: Best Allocation in Cache
A
0
B
8K | 12K | 16K |
In a frequently executed loop, compilers could allocate the data items accessed from memory so that, on each loop iteration, all of the memory addresses accessed are either in exactly the same 8 KB page, or differ in bits VA<17:13>. For loops that go through arrays in a common direction with a common stride, this requires allocating the arrays, checking that the
Software Considerations