Intel® IXP42X product line and IXC1100 control plane processors—Intel XScale® Processor
Intel® IXP42X Product Line of Network Processors and IXC1100 Control Plane Processor
DM September 2006
190 Order Number: 252480-006US
Note the order reversal of the prefetches in relationship to the usage. If there is a
cache conflict and data is evicted from the cache then only the data from the first
prefetch is lost.
3.10.4.4.9 Loop Interchange
As mentioned earlier, the sequence in which data is accessed affects cache thrashing.
Usually, it is best to access data in a contiguous spatially address range. However,
arrays of data may have been laid out such that indexed elements are not physically
next to each other. Consider the following C code which places array elements in row
major order.
In the above example, A[i][j] and A[i+1][j] are not sequentially next to each other.
This situation causes an increase in bus traffic when prefetching loop data. In some
cases where the loop mathematics are unaffected, the problem can be resolved by
induction variable interchange. The above examples becomes:
3.10.4.4.10 Loop Fusion
Loop fusion is a process of combining multiple loops, which reuse the same data, in to
one loop. The advantage of this is that the reused data is immediately accessible from
the data cache. Consider the following example:
The second loop reuses the data elements A[i] and c[i]. Fusing the loops together
produces:
for(j=0; j<NMAX; j++)
for(i=0; i<NMAX; i++)
{prefetch(A[i+1][j]);
sum += A[i][j];
}
for(i=0; i<NMAX; i++)
for(j=0; j<NMAX; j++)
{prefetch(A[i][j+1]);
sum += A[i][j];
}
for(i=0; i<NMAX; i++)
{prefetch(A[i+1], c[i+1], c[i+1]);
A[i] = b[i] + c[i];
}
for(i=0; i<NMAX; i++)
{prefetch(D[i+1], c[i+1], A[i+1]);
D[i] = A[i] + c[i];
}
for(i=0; i<NMAX; i++)
{prefetch(D[i+1], A[i+1], c[i+1], b[i+1]);
ai = b[i] + c[i];
A[i] = ai;
D[i] = ai + c[i];
}