Software Optimization Guide for AMD64 Processors

25112 Rev. 3.06 September 2005

3.264-Bit Arithmetic and Large-Integer Multiplication

Optimization

Use 64-bit arithmetic for integer multiplication that produces 128-bit or larger products.

Background

Large-number multiplications (those involving 128-bit or larger products) are utilized in cryptography algorithms, which figure importantly in e-commerce applications and secure transactions on the Internet. Processors cannot perform large-number multiplication natively; they must break the operation into chunks that are permitted by their architecture (32-bit or 64-bit additions and multiplications).

Rationale

Using 64-bit rather than 32-bit integer operations dramatically reduces the number of additions and multiplications required to compute large products. For example, computing a 1024-bit product using 64-bit arithmetic requires fewer than one quarter the number of instructions required when using

32-bit operations:

Comparing...

32-bit arithmetic

64-bit arithmetic

 

 

 

Number of multiplications

256

64

 

 

 

Number of additions with carry

509

125

 

 

 

Number of additions

255

63

 

 

 

In addition, the processor performs 64-bit additions just as fast as it performs 32-bit additions, and the latency of 64-bit multiplications is only slightly higher than for 32-bit multiplications. (The processor is capable of performing a 64-bit addition each clock cycle and a 64-bit multiplication every other clock cycle.)

Example

Consider the multiplication of two unsigned 64-bit numbers a and b, represented in terms of 32-bit numbers a1:a0 and b1:b0.

a = a1 * 232 + a0 b = b1 * 232 + b0

The product of a and b, c, can be expressed in terms of products of the 32-bit components, as follows:

c= (a1 * b1) * 264 + (a1 * b0 + a0 * b1) * 232 + (a0 * b0)

62

General 64-Bit Optimizations

Chapter 3

Page 78
Image 78
AMD 250 manual Bit Arithmetic and Large-Integer Multiplication, Background