22007E/0 — November 1999

AMD Athlon™ Processor x86 Code Optimization

which might inhibit certain optimizations with some compilers—for example, aggressive inlining.

Dynamic Memory Allocation Consideration

Dynamic memory allocation (‘malloc’ in C language) should always return a pointer that is suitably aligned for the largest base type (quadword alignment). Where this aligned pointer cannot be guaranteed, use the technique shown in the following code to make the pointer quadword aligned, if needed. This code assumes the pointer can be cast to a long.

Example:

double* p; double* np;

p= (double *)malloc(sizeof(double)*number_of_doubles+7L); np = (double *)((((long)(p))+7L) & (–8L));

Then use ‘np’ instead of ‘p’ to access the data. ‘p’ is still needed in order to deallocate the storage.

Introduce Explicit Parallelism into Code

Where possible, long dependency chains should be broken into several independent dependency chains which can then be executed in parallel exploiting the pipeline execution units. This is especially important for floating-point code, whether it is mapped to x87 or 3DNow! instructions because of the longer latency of floating-point operations. Since most languages, including ANSI C, guarantee that floating-point expressions are not re-ordered, compilers can not usually perform such optimizations unless they offer a switch to allow ANSI non- compliant reordering of floating-point expressions according to algebraic rules.

Note that re-ordered code that is algebraically identical to the original code does not necessarily deliver ident ic al computational results due to the lack of associativity of floating p o i n t o p e ra t i o n s . Th e re a re we ll - k n ow n nu m e r i c a l considerations in applying these optimizations (consult a book on numerical analysis). In some cases, these optimizations may

Dynamic Memory Allocation Consideration

25

Page 41
Image 41
AMD x86 manual Dynamic Memory Allocation Consideration, Introduce Explicit Parallelism into Code