22007E/0 — November 1999

AMD Athlon™ Processor x86 Code Optimization

The following code fragment uses the 3DNow! PAVGUSB instruction to perform averaging between the source macroblock and destination macroblock:

Example 2 (Preferred):

MOV

EAX, DWORD

PTR Src_MB

MOV

EDI, DWORD

PTR Dst_MB

MOV

EDX, DWORD

PTR SrcStride

MOV

EBX, DWORD

PTR DstStride

MOV

ECX, 16

 

 

L1:

 

 

 

MOVQ

MM0, [EAX]

 

;MM0=QWORD1

MOVQ

MM1, [EAX+8]

;MM1=QWORD2

PAVGUSB

MM0, [EDI]

 

;(QWORD1 + QWORD3)/2 with

 

 

 

; adjustment

PAVGUSB

MM1, [EDI+8]

;(QWORD2 + QWORD4)/2 with

 

 

 

; adjustment

ADD

EAX, EDX

 

 

MOVQ

[EDI], MM0

 

 

MOVQ

[EDI+8], MM1

 

ADD

EDI, EBX

 

 

LOOP

L1

 

 

Stream of Packed Unsigned Bytes

The following code is an example of how to process a stream of packed unsigned bytes (like RGBA information) with faster 3DNow! instructions.

Example:

outside loop:

PXOR

MM0, MM0

 

 

 

inside loop:

 

 

 

 

MOVD

MM1, [VAR]

;

0 v[3],v[2],v[1],v[0]

PUNPCKLBW

MM1, MM0

;0,v[3],0,v[2]

0,v[1],0,v[0]

MOVQ

MM2, MM1

;0,v[3],0,v[2]

0,v[1],0,v[0]

PUNPCKLWD

MM1, MM0

;

0,0,0,v[1]

0,0,0,v[0]

PUNPCKHWD

MM2, MM0

;

0,0,0,v[3]

0,0,0,v[2]

PI2FD

MM1, MM1

;

float(v[1])

float(v[0])

PI2FD

MM2, MM2

;

float(v[3])

float(v[2])

Stream of Packed Unsigned Bytes

125

Page 141
Image 141
AMD x86 manual Stream of Packed Unsigned Bytes, Example 2 Preferred