25112 Rev. 3.06 September 2005

Software Optimization Guide for AMD64 Processors

Btr_ptr -= 32;

//The addresses Aptr0, Aptr1, Aptr2, and Aptr3 need to be

//incremented to the next block of 4 rows of A to be dotted

//upon B's column. 4 rows of A are 128 doubles in size, and

//in the n-loop above they were incremented by 32 already, so they

//must be incremented an additional 96 to point to the

//next 4 rows of A to be dotted.

Aptr0 += 96; Aptr1 += 96; Aptr2 += 96; Aptr3 += 96;

}

//Pointer to B-transpose is incremented by a row so as to point

//to the next row of B upon which matrix A needs to be multiplied. Btr_ptr += 32;

}

}

Chapter 9

Optimizing with SIMD Instructions

205

Page 221
Image 221
AMD 250 manual 205