Efficient Implementation of Population Count Function

22007E/0 — November 1999

AMD Athlon™ Processor x86 Code Optimization

Efficient Implementation of Population Count Function

	Population count is an operation that determines the number of
	set bits in a bit string. For example, this can be used to
	determine the cardinality of a set. The following example code
	shows how to efficiently implement a population count
	operation for 32-bit operands. The example is written for the
	inline assembler of Microsoft Visual C.
	Function popcount() implements a branchless computation of
	the population count. It is based on a O(log(n)) algorithm that
	successively groups the bits into groups of 2, 4, 8, 16, and 32,
	while maintaining a count of the set bits in each group. The
	algorithms consist of the following steps:
Step 1	Partition the integer into groups of two bits. Compute the
	population count for each 2-bit group and store the result in the
	2-bit group. This calls for the following transformation to be
	performed for each 2-bit group:
	00b -> 00b
	01b -> 01b
	10b -> 01b
	11b -> 10b
	If the original value of a 2-bit group is v, then the new value will
	be v - (v >> 1). In order to handle all 2-bit groups simultaneously,
	it is necessary to mask appropriately to prevent spilling from
	one bit group to the next lower bit group. Thus:
	w = v - ((v >> 1) & 0x55555555)
Step 2	Add the population count of adjacent 2-bit group and store the
	sum to the 4-bit group resulting from merging these adjacent
	2-bit groups. To do this simultaneously to all groups, mask out
	the odd numbered groups, mask out the even numbered groups,
	and then add the odd numbered groups to the even numbered
	groups:
	x = (w & 0x33333333) + ((w >> 2) & 0x33333333)
	Each 4-bit field now has value 0000b, 0001b, 0010b, 0011b, or
	0100b.

Efficient Implementation of Population Count Function

AMD x86 manual Efficient Implementation of Population Count Function, Step

Models: x86

Efficient Implementation of Population Count Function

Step 1