Load Multiples, where the PC is in the register list

Cycle Timings and Interlock Behavior

14.13.2 Load Multiples, where the PC is in the register list

The processor includes a 4-entry return stack that can predict procedure returns. Any LDM to the PC that does not restore the SPSR to the CPSR, is predicted as a procedure return.

In all cases the base register, <Rn>, is a Very Early Reg.

Table 14-18shows the cycle timing behavior of Load Multiples, where the PC is in the register list.

Table 14-18 Cycle timing behavior of Load Multiples, with PC in the register list (64-bit aligned)

Example instruction	Cycles	Memory	Result	Comments
Example instruction	Cycles	cycles	latency	Comments
		cycles	latency

LDMIA <Rn>,{...,pc}	ma	nb	2,…	Correct return stack prediction
LDMIA <Rn>,{...,pc}	ma + 8	nb	2,…	Incorrect return stack prediction
LDMIA <cond>	ma	nb	2,…	Correct condition prediction and correct
<Rn>,{...,pc}				return stack prediction

LDMIA <cond>	ma + 7	nb	2,…	Incorrect condition prediction
<Rn>,{...,pc}

LDMIA <cond>	ma + 8	nb	2,…	Correct condition prediction and incorrect
<Rn>,{...,pc}				return stack prediction

a.Where m is the number of cycles for this instruction if the PC were treated as a normal register.

b.Where n is the number of memory cycles for this instruction if the PC were treated as a normal register.

Note

The Cycle timing behavior that Table 14-18shows also covers PUSH and POP instructions that behave like store and load multiple instructions with base register writeback.

14.13.3 Example Interlocks

The following sequence that has an LDM instruction takes six cycles to execute, because R7 has a result latency of five cycles:

LDMIA R0, {R1-R7}

ADD R10, R10, R7

The following sequence that has an STM instruction takes five cycles to execute:

STMIA R0, {R1-R7}

ADD R7, R10, R11

The following sequence has a result latency hidden by issue cycles. It takes five cycles to execute.

LDMIA R0, {R1-R7}

ADD R10, R10, R3

The following sequence that has a POP instruction takes seven cycles to execute, because R9 has a result latency of six cycles:

POP {R1-R9}

ADD R10, R10, R9

The following sequence that has a PUSH instruction takes five cycles to execute:

ARM DDI 0363E	Copyright © 2009 ARM Limited. All rights reserved.	14-22
ID013010	Non-Confidential, Unrestricted Access

ARM r1p3, R4F manual Load Multiples, where the PC is in the register list, Return stack prediction

Models: R4F r1p3 R4

14.13.2 Load Multiples, where the PC is in the register list

Correct condition prediction and correct

return stack prediction

Correct condition prediction and incorrect

14.13.3 Example Interlocks

ARM DDI 0363E