8Alpha Architecture Handbook | Compaq ECQD2KCTE manual

•Both conditional branches are forward branches, so they are properly predicted not to be taken (to match the common case of no contention for the lock).

•The OR writes its result to a second register; this allows the OR and the BLBS to be interchanged if that would give a faster instruction schedule.

•Other operate instructions (from the critical section) may be scheduled into the LDQ_L..STQ_C sequence, so long as they do not fault or trap and they give correct results if repeated; other memory or operate instructions may be scheduled between the STQ_C and BEQ.

•The memory barrier instructions are discussed in Section 5.5.4. It is correct to substitute WMB for the second MB only if:

–All data locations that are read or written in the critical section are accessed only after acquiring a software lock by using lock_variable (and before releasing the software lock).

–For each read u of shared data in the critical section, there is a write v such that:

1.v is BEFORE the WMB

2.v follows u in processor issue sequence (see Section 5.6.1.1)

3.v either depends on u (see Section 5.6.1.7) or overlaps u (see Section 5.6.1), or both.

–Both lock_variable and all the shared data are in memory-like regions (or lock_variable and all the shared data are in non-memory-like regions). If the lock_variable is in a non-memory-like region, the atomic lock protocol must use some implementation-specific hardware support.

Generally, the substitution of a WMB for the second MB increases performance.

•An ordinary STQ instruction is used to clear the lock_variable.

It would be a performance mistake to spin-wait by repeating the full LDQ_L..STQ_C sequence (to move the BLBS after the BEQ) because that sequence may repeatedly change the software lock_variable from "locked" to "locked," with each write causing extra access delays in all other caches that contain the lock_variable. In the extreme, spin-waits that contain writes may deadlock as follows:

If, when one processor spins with writes, another processor is modifying (not changing) the lock_variable, then the writes on the first processor may cause the STx_C of the modify on the second processor always to fail.

This deadlock situation is avoided by:

•Having only one processor execute a store (no STx_C), or

•Having no write in the spin loop, or

•Doing a write only if the shared variable actually changes state (1 → 1 does not change state).

Compaq ECQD2KCTE manual 8Alpha Architecture Handbook

Models: ECQD2KCTE

5–8Alpha Architecture Handbook