Cycle Timings and Interlock Behavior

14.21 Floating-point single-precision data processing instructions

This section describes the cycle timing behavior for all single-precision VFP CDP instructions. This includes arithmetic instructions such as VMUL.F32, data and immediate moving instructions such as “VMOV.F32 <Sd>, #<imm>”, VABS.F32, VNEG.F32, and “VMOV <Sd>, <Sm>”, and comparison instructions and conversion instructions.

Table 14-26shows the floating-point single-precision data processing instructions cycle timing behavior.

Table 14-26 Floating-point single-precision data processing instructions cycle timing behavior

Example instruction

Cycles

Early Reg

Result latency

 

 

 

 

VMLA.F32 <Sd>, <Sn>, <Sm>a

1b

<Sn>, <Sm>

5c

VADD.F32 <Sd>, <Sn>, <Sm>d

1

<Sn>, <Sm>

2

VDIV.F32 <Sd>, <Sn>, <Sm>

2

<Sn>, <Sm>

16

 

 

 

 

VSQRT.F32 <Sd>, <Sm>

2

<Sm>

16

 

 

 

 

VMOV.F32 <Sd>, #<imm>

1

-

1

 

 

 

 

VMOV.F32 <Sd>, <Sm>e

1

-

1

VCMP.F32 <Sd>, <Sm>f

1

<Sd>, <Sm>

-

VCMPE.F32 <Sd>, #0.0f

1

<Sd>

-

VCVT.F32.U32 <Sd>, <Sm>g

1

<Sm>

2

VCVT.F32.U32 <Sd>, <Sd>, #<fbits>h

1

<Sd>

2

VCVTR.U32.F32 <Sd>, <Sm>i

1

<Sm>

2

VCVT.U32.F32 <Sd>, <Sd>, #<fbits>j

1

<Sd>

2

VCVT.F64.F32 <Dd>, <Sn>

3

<Sm>

5

a.Also VMLS.F32, VNMLS.F32, and VNMLA.F32.

b.VMLA.F32 completes out-of-order, and can take an extra cycle (two in total) if an add

instruction (VADD) or certain dual-issued instruction pairs are in the iss-stage when the instruction completes.

c.Except when the instruction dependent on the result <Sd> is another VMLA.F32 instruction, and the dependent operand is the accumulate operand, <Sd>. In this case, the result latency is reduced to 3 cycles.

d.Also VSUB.F32, VMUL.F32, and VNMUL.F32.

e.Also VABS.F32 and VNEG.F32.

f.Also VCMPE.F32.

g.Also VCVT.F32.S32.

h.Also VCVT.F32.U16, VCVT.F32.S32, and VCVT.F32.S16.

i.Also VCVT.U32.F32, VCVTR.S32.F32, and VCVT.S32.F32.

j.Also VCVT.U16.F32, VCVT.S32.F32, and VCVT.S16.F32.

ARM DDI 0363E

Copyright © 2009 ARM Limited. All rights reserved.

14-32

ID013010

Non-Confidential, Unrestricted Access

 

Page 396
Image 396
ARM R4F, r1p3 manual Floating-point single-precision data processing instructions