Texas Instruments TMS320 DSP manual Runtime APIs, Strong Ordering of DMA Transfer Requests

Page 67

www.ti.com

Runtime APIs

For example, in the table above, the "process" operation is using two logical channels. On logical channel 0, it performs on average 5 data transfers and a maximum of 7 data transfers. The average number of bytes for each transfer is 768, and the maximum number of bytes is 1024.

6.9Runtime APIs

The IDMA2 interface is used to request and grant an algorithm some DMA resources, and also change these resources in real-time. We also need to define runtime APIs that are actually called from within the algorithm to configure the logical channel, start a data transfer and synchronize the data transfer(s).

The following APCY2 APIs are allowed to be called from within an algorithm that has implemented the IDMA2.

Configuration:

ACPY2_configure (), ACPY2_setSrcFrameIndex,ACPY2_setDstFrameIndex, ACPY2_setNumFrames

Synchronization: ACPY2_complete(), ACPY2_wait()

Scheduling: ACPY2_start (), ACPY2_startAligned()

It is important to notice that the algorithm'sclient is free to implement these APIs as appropriate, granted that they satisfy their semantics in the TMS320 DSP Algorithm Standard API Reference (SPRU360).

The IDMA3 interface which is required to be implemented by algorithms that use the C64x+ EDMA3 controller, can be optionally associated with a custom IDMA3 protocol. When a non-null protocol object is provided, the DMA resource manager uses IDMA3_Protocol functions to perform additional memory allocation for the logical DMA channel’s environment field or to call protocol-specific, handle initialization and de-initialization functions. This feature allows frameworks to support custom DMA service function libraries with custom initialization and finalization functions.

The ACPY3 library is an example of such a custom DMA library that is similar to the ACPY2 library in its role and definition. However, it provides a much lower level of abstraction compared to the ACPY2 interface; it is designed to target EDMA3.0/QDMA, while ACPY2 provides a generic DMA abstraction layer. Details of the ACPY3 library can be found in Using DMA with Framework Components for C64x+ (SPRAAG1). Use of the ACPY3 library is not mandatory when using the IDMA3 interfaces; algorithms are free to use their own DMA functions to program the physical DMA resources acquired through the IDMA3 protocol.

6.10 Strong Ordering of DMA Transfer Requests

An important enhancement that was introduced through the ACPY2 APIs over the deprecated ACPY APIs is the strict FIFO ordering property of DMA transfers submitted by an algorithm on a logical DMA channel. Often algorithms need to issue back-to-back DMA transfers from and into the same data region and they can take advantage of the FIFO property. For example, an algorithm can schedule a transfer to copy out the result stored in a buffer used by an in-place computation phase, and immediately schedule a transfer to bring in the next set of input data into the same buffer for the next round of processing. Without the strong ordering property, an ACPY2_wait() synchronization call would be needed prior to submitting the second transfer request. This additional synchronization is needed to prevent the incoming (next round's) input data from corrupting the current output that is potentially still being copied out. The strong ordering guarantee ensures that the second transfer will not start until after the first transfer finishes. This leads to two levels of optimizations.

The extra ACPY2_wait() call/synchronization overhead is eliminated, but even more importantly, the algorithm can now continue to perform other tasks (e.g., process some other buffer, etc.) until it absolutely needs to synchronize with the completion of the second transfer.

Another related ACPY2 enhancement is the introduction of the concept of a serializer (QueueID) property for logical channels. A common QueueID extends the strong FIFO ordering property to all transfers submitted on any of the logical channels assigned the same QueueId by the same algorithm. QueueIDs are assigned by the algorithm and published through its IDMA2 interface.

IDMA3 does not support the queue IDs defined in IDMA2. This means there is no requirement to enforce inter-channel FIFO ordering of submitted DMA transfers. When FIFO ordering is needed, you must use linked transfers.

SPRU352G –June 2005 –Revised February 2007

Use of the DMA Resource

67

Image 67
Contents Rules and Guidelines Users GuideSubmit Documentation Feedback Contents Use of the DMA Resource Urls List of Figures Intended Audience Read This FirstDocument Overview Guideline n Related DocumentationText Conventions Rule nOverview Rules for TMS320C5x Rules for TMS320C6x Scope of the StandardRequirements of the Standard Rules and GuidelinesIntentional Omissions Goals of the StandardFrameworks System ArchitectureCore Run-Time Support AlgorithmsGeneral Programming Guidelines Rule Use of C LanguageThreads and Reentrancy ThreadsReentrancy Preemptive vs. Non-Preemptive MultitaskingExample Data Memory Data MemoryScratch versus Persistent Memory SpacesScratch vs Persistent Memory Allocation Guideline Algorithm versus ApplicationROM-ability Program MemorySection Name Purpose Use of Peripherals Use of PeripheralsInterfaces and Modules Algorithms Packaging Algorithm Component ModelImplementation Fir.h Interfaces and ModulesExternal Identifiers Module Initialization and Finalization Naming ConventionsModule Instance Objects Run-Time Object Creation and Deletion Design-Time Object CreationExample Module Module ConfigurationMultiple Interface Support Description Required Interface InheritanceSummary ElementAlgorithms AlgorithmsObject Code PackagingDebug Verses Release Header FilesModuleversvendorvariant.1arch Data Memory Program Memory Interrupt Latency Execution Time Algorithm Performance CharacterizationExternal Heap MemorySize Static Local and Global Data Memory Stack MemoryData Bss Object files Size Operation Interrupt LatencyExecution Time Mips Is Not EnoughExecution Timeline for Two Periodic Tasks Execution Time Model198000 Process59000 198000 Submit Documentation Feedback DSP-Specific Guidelines Register Types CPU Register TypesData Models Use of Floating PointTMS320C6xxx Rules and Guidelines Endian Byte OrderingCSR Field Use Type Register ConventionsStatus Register Register Use TypeTMS320C54xx Rules and Guidelines Interrupt LatencyProgram Models TMS320C54xx Rules and Guidelines ST0 Field Name Use Type Status RegistersST1 Field Name Use Type Stack Architecture TMS320C55x Rules and GuidelinesPmst Field Name Use Type Example RelocatabilitySSP ST2 Field Name Use Type Status BitsST3 Field Name Use Type Homy General TMS320C24xx GuidelinesTMS320C28x Rules and Guidelines TMS320C28x Rules and GuidelinesXAR0 M0M1MAP Submitting DMA Transfer Requests Use of the DMA ResourceAlgorithm and Framework OverviewLogical Channel Requirements for the Use of the DMA ResourceData Transfer Synchronization Data Transfer PropertiesDMA Guideline Abstract InterfaceDMA Rule Data Transfers bytes Frequency Resource CharacterizationAverage Maximum Strong Ordering of DMA Transfer Requests Runtime APIsDevice Independent DMA Optimization Guideline Submitting DMA Transfer Requests13 C6xxx Specific DMA Rules and Guidelines Cache Coherency Issues for Algorithm Producers14 C55x Specific DMA Rules and Guidelines Supporting Packed/Burst Mode DMA TransfersNon-Preemptive System Minimizing Logical Channel Reconfiguration OverheadAddressing Automatic Endianism Conversion Issues Inter-Algorithm SynchronizationPreemptive System Algorithm B Algorithm a Inter-Algorithm Synchronization Appendix a General Rules DMA Rules Performance Characterization RulesGeneral Guidelines DMA Guidelines Submit Documentation Feedback Core Run-Time APIs DSP/BIOS Run-time Support LibraryTI C-Language Run-Time Support Library DSP/BIOS Run-time Support LibraryBooks BibliographySubmit Documentation Feedback Glossary of Terms GlossaryGlossary of Terms Glossary of Terms Important Notice