Texas Instruments TMS320 DSP manual Submitting DMA Transfer Requests

Page 68

www.ti.com

Submitting DMA Transfer Requests

6.11 Submitting DMA Transfer Requests

The specification of the ACPY2 interface strives to perform a delicate trade-off between allowing high performance and requiring error checking at run time. Optimized algorithms require high speed transfer mechanisms and invariably use aligned addresses and 32 or 16-bit element sizes as their fundamental type of data transfer. At the other end of the spectrum, are algorithms that need a DMA library to perform the transfer of the required number of bytes from any sources address to any destination address without being any more complicated than a simple memory copy (memcpy) function in the C standard library.

The ACPY2 interface provides algorithm developers two interface functions to submit DMA transfer requests: ACPY2_start() and ACPY2_startAligned(). The only operational difference between ACPY2_startAligned() and ACPY2_start() is the additional requirement by ACPY2_startAligned() for its source and destination addresses to be properly aligned with respect to the configured element size. When using 32-bit transfer mode, these addresses must be at least 32-bit aligned. For 16-bit transfers,

16-bit alignment is required. When called with properly aligned addresses, both functions implement an identical behavior. However, in architectures, such as C6000, which permit DMA transfers using 8-bit or 16-bit alignment of source or destination addresses irrespective of the actual transfer element size, the ACPY2_startAligned() function can be optimized to operate more efficiently. On the other hand, certain architectures, such as C55x, may impose device-dependent DMA rules that require stricture alignment of the source and destination addresses for all transfers and therefore may provide the same implementation for both APIs.

ACPY2_start() makes no assumptions on the alignment of the source and destination addresses. It accepts addresses at any alignment and when allowed by the architecture, adjusts the transfer parameters (including element size, number of elements, transfer type) to transparently perform the desired transfer using the given alignment. It is intended to simplify algorithm development in the initial states. ACPY2_start() thus strives to maintain simplicity while maintaining reasonable levels of performance. The ACPY2_startAligned() API, on the other hand, makes no run-time checks on the alignment and performs the transfer using the configured transfer settings of the channel. Passing source or destination addresses with incorrect alignment, with respect to the configured element size of the DMA handle, will result in unspecified behavior. In this respect, the sole aim of ACPY2_startAligned() is to guarantee performance by eliminating run-time checks by a pre-negotiated contract with the algorithm developer.

6.12 Device Independent DMA Optimization Guideline

In this section, we outline a general guideline applicable to all architectures that may result in significant performance optimizations. The basic premise is that configuring a logical channel is an expensive operation in terms of cycles, even when compared to the standard ACPY2 scheduling and synchronization APIs. Therein lies the motivation for the following new guideline:

DMA Guideline 2

All algorithms should minimize channel (re)configuration overhead by requesting a dedicated logical DMA channel for each distinct type of DMA transfer it issues, and avoid calling ACPY2_configure and use the new fast configuration APIs where possible.

DMA Guideline 2 is useful when different types of DMA transfers are needed in a critical loop of an algorithm. By defining different IDMA2 logical channels for each transfer type, ACPY2_configure() can be called on each channel at the beginning of the algorithm code. Then, transfer requests can be rapidly submitted on these preconfigured channels in the critical loop using the new ACPY2_start() or ACPY2_startAligned() function.

In the next two sections, we present additional DMA rules and guidelines specific to C5000 or C6000 architectures.

68

Use of the DMA Resource

SPRU352G –June 2005 –Revised February 2007

Image 68
Contents Users Guide Rules and GuidelinesSubmit Documentation Feedback Contents Use of the DMA Resource Urls List of Figures Document Overview Read This FirstIntended Audience Related Documentation Text ConventionsRule n Guideline nOverview Scope of the Standard Rules for TMS320C5x Rules for TMS320C6xRules and Guidelines Requirements of the StandardGoals of the Standard Intentional OmissionsSystem Architecture FrameworksAlgorithms Core Run-Time SupportGeneral Programming Guidelines Use of C Language Threads and ReentrancyThreads RulePreemptive vs. Non-Preemptive Multitasking ReentrancyExample Data Memory Data MemoryMemory Spaces Scratch versus PersistentScratch vs Persistent Memory Allocation Algorithm versus Application GuidelineSection Name Purpose Program MemoryROM-ability Use of Peripherals Use of PeripheralsAlgorithm Component Model Interfaces and Modules Algorithms PackagingInterfaces and Modules Implementation Fir.hExternal Identifiers Module Instance Objects Naming ConventionsModule Initialization and Finalization Design-Time Object Creation Run-Time Object Creation and DeletionModule Configuration Example ModuleMultiple Interface Support Interface Inheritance SummaryElement Description RequiredAlgorithms AlgorithmsPackaging Object CodeHeader Files Debug Verses ReleaseModuleversvendorvariant.1arch Algorithm Performance Characterization Data Memory Program Memory Interrupt Latency Execution TimeSize Heap MemoryExternal Stack Memory Static Local and Global Data MemoryData Bss Object files Size Interrupt Latency Execution TimeMips Is Not Enough OperationExecution Time Model Execution Timeline for Two Periodic Tasks59000 198000 Process198000 Submit Documentation Feedback DSP-Specific Guidelines CPU Register Types Register TypesUse of Floating Point TMS320C6xxx Rules and GuidelinesEndian Byte Ordering Data ModelsRegister Conventions Status RegisterRegister Use Type CSR Field Use TypeProgram Models Interrupt LatencyTMS320C54xx Rules and Guidelines TMS320C54xx Rules and Guidelines ST1 Field Name Use Type Status RegistersST0 Field Name Use Type Pmst Field Name Use Type TMS320C55x Rules and GuidelinesStack Architecture Relocatability ExampleSSP ST3 Field Name Use Type Status BitsST2 Field Name Use Type Homy TMS320C24xx Guidelines GeneralTMS320C28x Rules and Guidelines TMS320C28x Rules and GuidelinesXAR0 M0M1MAP Use of the DMA Resource Submitting DMA Transfer RequestsOverview Algorithm and FrameworkRequirements for the Use of the DMA Resource Logical ChannelData Transfer Properties Data Transfer Synchronization DMA Rule Abstract Interface DMA Guideline Average Maximum Resource CharacterizationData Transfers bytes Frequency Runtime APIs Strong Ordering of DMA Transfer RequestsSubmitting DMA Transfer Requests Device Independent DMA Optimization GuidelineCache Coherency Issues for Algorithm Producers 13 C6xxx Specific DMA Rules and GuidelinesSupporting Packed/Burst Mode DMA Transfers 14 C55x Specific DMA Rules and GuidelinesMinimizing Logical Channel Reconfiguration Overhead Addressing Automatic Endianism Conversion IssuesInter-Algorithm Synchronization Non-Preemptive SystemPreemptive System Algorithm B Algorithm a Inter-Algorithm Synchronization Appendix a General Rules Performance Characterization Rules DMA RulesGeneral Guidelines DMA Guidelines Submit Documentation Feedback DSP/BIOS Run-time Support Library Core Run-Time APIsDSP/BIOS Run-time Support Library TI C-Language Run-Time Support LibraryBibliography BooksSubmit Documentation Feedback Glossary Glossary of TermsGlossary of Terms Glossary of Terms Important Notice