Chapter Integer Optimizations 159

Software Optimization Guide for AMD64 Processors

25112 Rev. 3.06 September 2005

7.3	Inline Functions	149
7.4	Address-Generation Interlocks	151
7.5	MOVZX and MOVSX	153
7.6	Pointer Arithmetic in Loops	154
7.7	Pushing Memory Data Directly onto the Stack	157
Chapter 8	Integer Optimizations	159
8.1	Replacing Division with Multiplication	160
8.2	Alternative Code for Multiplying by a Constant	164
8.3	Repeated String Instructions	167
8.4	Using XOR to Clear Integer Registers	169
8.5	Efficient 64-Bit Integer Arithmetic in 32-Bit Mode	170
8.6	Efficient Implementation of Population-Count Function in 32-Bit Mode	179
8.7	Efficient Binary-to-ASCII Decimal Conversion	181
8.8	Derivation of Algorithm, Multiplier, and Shift Factor for Integer
	Division by Constants	186
8.9	Optimizing Integer Division	192
Chapter 9	Optimizing with SIMD Instructions	193
9.1	Ensure All Packed Floating-Point Data are Aligned	195

9.2Improving Scalar SSE and SSE2 Floating-Point Performance with MOVLPD and

MOVLPS When Loading Data from Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . .196

9.3 Use MOVLPx/MOVHPx Instructions for Unaligned Data Access . . . . . . . . . . . . .198 9.4 Use MOVAPD and MOVAPS Instead of MOVUPD and MOVUPS . . . . . . . . . . .199 9.5 Structuring Code with Prefetch Instructions to Hide Memory Latency . . . . . . . . . .200

9.6Avoid Moving Data Directly Between

General-Purpose and MMX™ Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .206

9.7Use MMX™ Instructions to Construct Fast Block-Copy

Routines in 32-Bit Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .207

9.8 Passing Data between MMX™ and 3DNow!™ Instructions . . . . . . . . . . . . . . . . . .208 9.9 Storing Floating-Point Data in MMX™ Registers . . . . . . . . . . . . . . . . . . . . . . . . . .209 9.10 EMMS and FEMMS Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .210

Contents

AMD 250 manual Chapter Integer Optimizations 159

Models: 250

Chapter 8

Integer Optimizations

159

Chapter 9

Optimizing with SIMD Instructions

193