25112 Rev. 3.06 September 2005

Software Optimization Guide for AMD64 Processors

Chapter 5 Cache and Memory Optimizations

The optimizations in this chapter take advantage of the large L1 caches and high-bandwidth buses of the AMD Athlon™ 64 and AMD Opteron™ processors.

This chapter covers the following topics:

Topic

Page

 

 

Memory-Size Mismatches

92

 

 

Natural Alignment of Data Objects

95

 

 

Cache-Coherent Nonuniform Memory Access (ccNUMA)

96

 

 

Multiprocessor Considerations

99

 

 

Store-to-Load Forwarding Restrictions

100

 

 

Prefetch Instructions

104

 

 

Write-combining

113

 

 

L1 Data Cache Bank Conflicts

114

 

 

Placing Code and Data in the Same 64-Byte Cache Line

116

 

 

Sorting and Padding C and C++ Structures

117

 

 

Sorting Local Variables

119

 

 

Memory Copy

120

 

 

Stack Considerations

122

 

 

Chapter 5

Cache and Memory Optimizations

91

Page 107
Image 107
AMD 250 manual Cache and Memory Optimizations