25112 Rev. 3.06 September 2005

Software Optimization Guide for AMD64 Processors

Appendix E SSE and SSE2 Optimizations

This appendix describes specific optimizations that can be utilized to improve performance when using SSE and SSE2 instructions on AMD Athlon™ 64 and AMD Opteron™ processors.

Types of XMM-Register Data

The XMM registers (used by the SSE and SSE2 instructions) can hold the following three types of data:

Floating-point single-precision (FPS)

Floating-point double-precision (FPD)

Integer (INT)

Types of SSE and SSE2 Instructions

Most SSE and SSE2 instructions can be divided into five types according to the type of data they produce and therefore expect to consume:

Floating-point single-precision (FPS)

Floating-point double-precision (FPD)

Integer (INT)

Load (produces data of type FPS, FPD, or INT)

Store (can consume a register with data of any type) This appendix covers the following topics:

Topic

Page

 

 

Half-Register Operations

356

 

 

Zeroing Out an XMM Register

357

 

 

Reuse of Dead Registers

359

 

 

Moving Data Between XMM Registers and GPRs

360

 

 

Saving and Restoring Registers of Unknown Format

361

 

 

SSE and SSE2 Copy Loops

362

 

 

Data Conversion

364

 

 

Appendix E

SSE and SSE2 Optimizations

355

Page 371
Image 371
AMD 250 manual Types of XMM-Register Data, Types of SSE and SSE2 Instructions, 355