Writing ARM and Thumb Assembly Language

 

 

 

Table 2-3 All instructions conditional

 

 

 

 

r0: a

r1: b

Instruction

Cycles (ARM7)

 

 

 

 

1

2

CMP r0, r1

1

 

 

 

 

1

2

SUBGT r0,r0,r1

1 (not executed)

 

 

 

 

1

1

SUBLT r1,r1,r0

1

 

 

 

 

1

1

BNE gcd

3

 

 

 

 

1

1

CMP r0,r1

1

 

 

 

 

1

1

SUBGT r0,r0,r1

1 (not executed)

 

 

 

 

1

1

SUBLT r1,r1,r0

1 (not executed)

 

 

 

 

1

1

BNE gcd

1 (not executed)

 

 

 

 

 

 

 

Total = 10

 

 

 

 

Converting to Thumb

Because B is the only Thumb instruction that can be executed conditionally, the gcd algorithm must be written with conditional branches in Thumb code.

Like the ARM conditional branch implementation, the Thumb code requires seven instructions. However, because Thumb instructions are only 16 bits long, the overall code size is 14 bytes, compared to 16 bytes for the smaller ARM implementation.

In addition, on a system using 16-bit memory the Thumb version runs faster than the second ARM implementation because only one memory access is required for each Thumb instruction, whereas each ARM instruction requires two fetches.

Branch prediction and caches

To optimize code for execution speed you need detailed knowledge of the instruction timings, branch prediction logic, and cache behavior of your target system. Refer to ARM Architecture Reference Manual and the technical reference manuals for individual processors for full information.

2-24

Copyright © 2000, 2001 ARM Limited. All rights reserved.

ARM DUI 0068B

Page 36
Image 36
ARM VERSION 1.2 manual Converting to Thumb, Branch prediction and caches