25112 Rev. 3.06 September 2005

Software Optimization Guide for AMD64 Processors

Each of the products of the components of a and b (for example, a1 * b1) is composed of 64 bits—an upper 32 bits and a lower 32 bits. it is convenient to represent these individual products as d, e, f, and g, as follows:

a0 * b0 = d1:d0 = d1 * 232 + d0

a1 * b0 = e1:e0 = e1 * 232 + e0

a0 * b1 = f1:f0 = f1 * 232 + f0

a1 * b1 = g1:g0 = g1 * 232 + g0

Substitution yields the following equation:

c= (g1 * 232 + g0) * 264 + (e1 * 232 + e0 + f1 * 232 + f0) * 232 + (d1 * 232 + d0)

Simplifying yields this equation:

c = g1 * 296 + (e1 + f1 + g0) * 264 + (d1 + e0 + f0) * 232 + d0

it is convenient to represent the terms that are multiplied by each power of 2 as c3, c2, c1, and c0, as follows:

g1 = c3

e1 + f1 + g0 = c2

d1 + e0 + f0 = c1

d0 = c0

Substituting again yields:

c = c3 * 296 + c2 * 264 + c1 * 232 + c0

Chapter 3

General 64-Bit Optimizations

63

Page 79
Image 79
AMD 250 manual G1 = c3 E1 + f1 + g0 = c2 D1 + e0 + f0 = c1 D0 = c0