Cycle Timings and Interlock Behavior

14.6Sum of Absolute Differences (SAD)

Table 14-7shows SAD instructions and gives their cycle timing behavior.

Table 14-7 Sum of absolute differences instruction timing behavior

Instructions

Cycles

Early Reg

Result latency

 

 

 

 

USAD8

1

<Rn>, <Rm>

2a

USADA8

1

<Rn>, <Rm>

2a

a. Result latency is one fewer if the destination is the accumulate for a subsequent USADA8.

14.6.1Example interlocks

Table 14-8shows interlock examples using USAD8 and USADA8 instructions.

 

Table 14-8 Example interlocks

 

 

Instruction sequence

Behavior

 

 

USAD8 R1,R2,R3

Takes three cycles because USAD8 has a Result Latency of two, and the ADD requires

ADD R5,R6,R1

the result of the USAD8 instruction.

 

 

USAD8 R1,R2,R3

Takes three cycles. The MOV instruction is scheduled during the Result Latency of

MOV R9,R9

the USAD8 instruction.

ADD R5,R6,R1

 

 

 

USAD8 R1,R2,R3

Takes two cycles. The Result Latency is one less because the result is used as the

USADA8 R1,R4,R5,R1

accumulate for a subsequent USADA8 instruction.

 

 

ARM DDI 0363E

Copyright © 2009 ARM Limited. All rights reserved.

14-11

ID013010

Non-Confidential, Unrestricted Access

 

Page 375
Image 375
ARM R4F, r1p3 manual Sum of Absolute Differences SAD, Instruction sequence Behavior, Result of the USAD8 instruction