22007E/0 — November 1999

AMD Athlon™ Processor x86 Code Optimization

Efficient Implementation of Population Count Function

 

Population count is an operation that determines the number of

 

set bits in a bit string. For example, this can be used to

 

determine the cardinality of a set. The following example code

 

shows how to efficiently implement a population count

 

operation for 32-bit operands. The example is written for the

 

inline assembler of Microsoft Visual C.

 

Function popcount() implements a branchless computation of

 

the population count. It is based on a O(log(n)) algorithm that

 

successively groups the bits into groups of 2, 4, 8, 16, and 32,

 

while maintaining a count of the set bits in each group. The

 

algorithms consist of the following steps:

Step 1

Partition the integer into groups of two bits. Compute the

 

population count for each 2-bit group and store the result in the

 

2-bit group. This calls for the following transformation to be

 

performed for each 2-bit group:

 

00b -> 00b

 

01b -> 01b

 

10b -> 01b

 

11b -> 10b

 

If the original value of a 2-bit group is v, then the new value will

 

be v - (v >> 1). In order to handle all 2-bit groups simultaneously,

 

it is necessary to mask appropriately to prevent spilling from

 

one bit group to the next lower bit group. Thus:

 

w = v - ((v >> 1) & 0x55555555)

Step 2

Add the population count of adjacent 2-bit group and store the

 

sum to the 4-bit group resulting from merging these adjacent

 

2-bit groups. To do this simultaneously to all groups, mask out

 

the odd numbered groups, mask out the even numbered groups,

 

and then add the odd numbered groups to the even numbered

 

groups:

 

x = (w & 0x33333333) + ((w >> 2) & 0x33333333)

 

Each 4-bit field now has value 0000b, 0001b, 0010b, 0011b, or

 

0100b.

Efficient Implementation of Population Count Function

91

Page 107
Image 107
AMD x86 manual Efficient Implementation of Population Count Function, Step