Chapter 4

Pipeline

The C67x DSP pipeline provides flexibility to simplify programming and improve performance. Two factors provide this flexibility:

-Control of the pipeline is simplified by eliminating pipeline interlocks.

-Increased pipelining eliminates traditional architectural bottlenecks in program fetch, data access, and multiply operations. This provides single- cycle throughput.

This chapter starts with a description of the pipeline flow. Highlights are:

-The pipeline can dispatch eight parallel instructions every cycle.

-Parallel instructions proceed simultaneously through each pipeline phase.

-Serial instructions proceed through the pipeline with a fixed relative phase difference between instructions.

-Load and store addresses appear on the CPU boundary during the same pipeline phase, eliminating read-after-write memory conflicts.

All instructions require the same number of pipeline phases for fetch and decode, but require a varying number of execute phases. This chapter contains a description of the number of execution phases for each type of instruction.

Finally, the chapter contains performance considerations for the pipeline. These considerations include the occurrence of fetch packets that contain multiple execute packets, execute packets that contain multicycle NOPs, and memory considerations for the pipeline. For more information about fully optimizing a program and taking full advantage of the pipeline, see the TMS320C6000 Programmer’s Guide (SPRU198).

Topic

 

Page

 

 

 

4.1

Pipeline Operation Overview

. . 4-2

4.2

Pipeline Execution of Instruction Types

. 4-12

4.3

Functional Unit Constraints

. 4-33

4.4

Performance Considerations

. 4-56

 

 

 

SPRU733

Pipeline

4-1

Page 333
Image 333
Texas Instruments TMS320C67X/C67X+ DSP manual Pipeline