Compaq ECQD2KCTE manual Figure A-3 Bad Allocation in Cache

Models: ECQD2KCTE

1 371
Download 371 pages 20.35 Kb
Page 281
Image 281

In a frequently executed loop, compilers could allocate the data items accessed from memory so that, on each loop iteration, all of the memory addresses accessed are either in exactly the same aligned 64-byte block or differ in bits VA<10:6>. For loops that go through arrays in a common direction with a common stride, this requires allocating the arrays, checking that the first-iteration addresses differ, and if not, inserting up to 64 bytes of padding between the arrays. This rule will avoid thrashing in small direct-mapped data caches with block sizes up to 64 bytes and total sizes of 2K bytes or more.

Example:

REAL*4 A(1000),B(1000) DO 60 i=1,1000

60 A( i ) = f(B( i ))

Figures A–3,A–4,and A–5show bad, better, and best allocation in cache, respectively.

BAD allocation (A and B thrash in 8 KB direct-mapped cache):

Figure A–3: Bad Allocation in Cache

A

B

0

4K

8K

12K

16K

BETTER allocation (A and B offset by 64 mod 2 KB, so 16 elements of A and 16 of B can be in cache simultaneously):

Figure A–4: Better Allocation in Cache

A

0

B

4K

8K+64

12K

16K

BEST allocation (A and B offset by 64 mod 2 KB, so 16 elements of A and 16 of B can be in cache simultaneously, and both arrays fit entirely in 8 KB or bigger cache):

Figure A–5: Best Allocation in Cache

A

0

B

4K-64

8K

12K

16K

In a frequently executed loop, compilers could allocate the data items accessed from memory so that, on each loop iteration, all of the memory addresses accessed are either in exactly the same 8 KB page, or differ in bits VA<17:13>. For loops that go through arrays in a common direction with a common stride, this requires allocating the arrays, checking that the first-itera-

Software Considerations A–7

Page 281
Image 281
Compaq ECQD2KCTE manual Figure A-3 Bad Allocation in Cache