National Instruments NI MATRIX External Outputs, Copy-Back

Chapter 6 Vectorized Code Generation

© National Instruments Corporation 6-21 AutoCode Reference

You should notice two things in the code shown in Example6-8. First,

thegain block added to merge the data is generated as copies from the

respective inputs into the single array. Second, the integrator block is

tightly rolled. If the merge was not present, the Integrator would have been

unrolled, causing a 6-fold increase in the amount of code for that block.

The only reason to introduce a merge block (unit gain block) is when the

cost of unrolling the algorithm of your block—in this case the integrator

block—is more expensive then the merge block. It can be seen from the

code that the cost of a merge block is a copy in a local array. Because the

integrator algorithm is complicated, it is necessary to have the merge so that

the integrator is rolled.

AutoCode will not automatically introduce the merge (copy) just to

improve vectorization. The reason is that traceability from the code to

model is reduced anytime extra code other than the block algorithm is

generated. Also, AutoCode is not able to evaluate the design decision to

make one block rolled at the expense of another. Therefore, for optimal

vectorization, you might need to change your model.

External Outputs

Another variation of the split-merge problem appears with external outputs.

External outputs are represented by the Y-structure. It contains only those

signals marked as external outputs. For scalar code generation, AutoCode

directly uses the symbol in the Y-structure instead of using local storage.

However, when the output of a block is a vector and only a subset of the

outputs are connected to external outputs, a conflict of requirements

appears between storing the block output into an array and optimizing

access to external output.

Copy-Back

When a split-merge occurs with external outputs, AutoCode must act to

preserve the semantics of the model. AutoCode has been designed to

preserve the array, and therefore the block vectorization and copy-back

those external outputs from the array into the Y-structure. In the example

shown in Figure 6-6, a simple gain block has only two of its five outputs

connected to the external output. AutoCode preserves the array for the gain

block, but copies the pieces of the array that are external output into the

Y-structure.

Contents

Main Page Important Information Warranty Copyright Trademarks Patents Conventions Contents Chapter 1 Introduction Chapter 2 C Language Reference Chapter 3 Ada Language Reference Page Chapter 4 Generating Code for Real-Time Operating Systems Chapter 5 Generated Code Architecture Page Page Chapter 6 Vectorized Code Generation Chapter 7 Code Optimization Chapter 8 AutoCode Sim Cdelay Scheduler Chapter 9 Global Scope Signals and Parameterless Procedures Appendix A Technical Support and Professional Services Index Introduction Manual Organization General Information Configuration File Language-Specific Information Structure and Content of the Generated Code Using MATRIXx Help Additional Netscape Information Related Publications C Language Reference Stand-Alone Simulation Compiling on Various Supported Platforms Stand-Alone Library System-Specific Files Page Page Target-Specific Utilities enable( ), disable( ), and background( ) Functions error( ) and fatalerr( ) Functions fatalerr( ): Stand-Alone Utilities Detected Errors ERROR: Conditions Detected in the Generated Code Implementation_Initialize( ) Function Implementation_Terminate() Function External_Input ( ) Function External_Output ( ) Function UserCode Blocks Linking Handwritten UCBs with AutoCode Applications Page Implementing Handwritten UCBs Page Figure 2-2. Linking Handwritten UCBs with AutoCode Applications File Name: Linking Handwritten UCBs (for AutoCode) with SystemBuild Page Variable Interface UCB Interface Ordering Interface Examples Inputs and Outputs Function Prototype Linking a Variable Interface UCB with the Simulator Procedure SuperBlocks Generating Reusable Procedures Linking Procedures with the SystemBuild Simulator proc_ucb_hook. Refer to the Template Programming Language User Guide. Figure 2-4. Linking Generated Reusable Procedures Linking Procedures with Real-Time Applications or Simulator Invoking Generated Procedures Directly Page Invoking Procedures Using Generated UCB Wrapper Function Invoking Procedures Using Generated Subsystem Function C Fixed-Point Arithmetic Fixed-Point AutoCode/C Implementation Page Generated Code with Fixed-Point Variables Fixed-Point Data Types Page User Types Overflow Protection Stand-Alone Files Macro Interface Function Interface Page Fixed-Point Conversion and Arithmetic Macros Conversion Macros Figure 2-6. AutoCode/C Conversion Macros for Fixed-to-Fixed Conversions sbALIGNubp(n,rp) sr wr ALIGN so wo p q(n, rp) i ALIGN so wo p (n, rp) sr wr ALIGN i p (n, rp) Arithmetic Macros Page Implementation of the Addition and Subtraction Macros Method 1: Using Wordsize Extension Method 2: Not Using Wordsize Extension Selecting Wordsize Extension in the Preprocessor Macro 32-Bit Multiplication and Division Macros 32-Bit Multiplication 32-Bit Division 16-Bit by 8-Bit Division 32-Bit by 16-Bit Division Fixed-Point Relational Macros Some Relevant Issues Ada Language Reference Stand-Alone Simulation Supported Ada Compilers for the Stand-Alone Library Supplied Templates ada_rt.tpl Template ada_sim.tpl Template ada_fxpt_sys.tpl Template ada_fxpt_sub.tpl Template Stand-Alone Library System-Specific Files Page Data Types Target-Specific Utilities Enable( ), Disable( ), and Background( ) Procedures Error Procedure( ) Procedure Implementation_Initialize( ) Procedure Page Implementation_Terminate( ) Procedure External_Input ( ) Procedure External_Output( ) Procedure UserCode Blocks Linking Handwritten UCBs with AutoCode Applications Calling UCBs Page Procedure SuperBlocks Generating Reusable Procedures Linking Procedures with Real-Time Applications or Simulator Page Ada Fixed-Point Arithmetic How to Generate Real-Time Code Fixed-Point AutoCode/Ada Architecture Fixed-Point Data Types Generic Functions Instantiated Functions Package Dependencies Generated Code with Fixed-Point Variables User Types System-Level Parameters to Generate User Types Overflow Protection Stand-Alone Files Compilation Example Page Fixed-Point Type Declarations Generic Functions Page Page Bit-Wise Functions Instantiated Functions Package Operator Instantiations Conversion Function Instantiations Sample Package Example3-3 shows a generated RT_FIXED_OPERATORS package. Addition and Subtraction Functions Page Multiplication and Division Functions 32-Bit Multiplication 32-Bit Division Conversion Functions Language-Defined Conversion Truncation Conversion Explicit Rounding Conversion Using System-Level Parameters to Generate Instantiations Using Subsystem-Level Parameters to Generate Instantiations System Scope Operators and Conversions Known Ada Compiler Problems Comparing Results to SystemBuilds Simulator No-Op Conversion Function Page Generating Code for Real-Time Operating Systems Real-Time Operating System Configuration File Configuration Items Table Syntax Table Naming Convention Table Column Contents Table Orderings File Comments RTOS Configuration File Contents Scheduler Priority Table Subsystem Table Interrupt Procedure SuperBlock Table Background Procedure SuperBlock Table Startup Procedure SuperBlock Table Processor IP Name Table Version Table Using the Configuration File Page Generated Code Architecture Symbolic Name Creation Default Names Signal Naming Duplicate Names Selection of a Signal Name Subsystem and Procedure Boundaries Typecheck Feature and Data Types Global Storage Percent vars (%var) Global Variable Blocks Sequencing Variable Blocks Global Variable Block and %var Equivalence Optimization for Read-From Variable Blocks Global Scope Signal Capability Subsystems Discrete and Continuous SuperBlocks Versus Subsystems Top-Level SuperBlock Block Ordering Interface Layers Scheduler External Interface Layer System External Interface Layer Discrete Subsystem Interface Layer Single-Rate System Multi-Rate System Sample and Hold Static Data Within Subsystems iinfo R_P and I_P State Data Procedure Data Pre-init Phase Init, Output, and State Phases Copy Back and Duplicates Error Handling Standard Procedures Structure-Based Interface Unrolled Interface Phases and Error Handling Referenced Percent Variables Page Page Procedure Arguments U, Y, S, and I Page Page Extended Procedure Information Structure Caller Identification Compatibility Issues Macro Procedure Interface Asynchronous Procedures Interrupt Background Startup Changing %var Values During Startup Condition Block Default Mode No-Default Mode Sequential Mode BlockScript Block Inputs and Outputs Environment Variables Local Variables Init, Output, and State Phases Default Phase States Local Variables and Phases Discrete Semantics Page Continuous Semantics Looping Concepts Term ino log y Loops and Scalar Code Rolling Loops with Scalar Code Generation Vectorized Code Types of Loops Examples of Rolled and Unrolled Loops Parameters Using Parameters Instead of States in a Discrete Model parameter to update more frequently than intended. Optimizations Constant Propagation/Hard-Coding Dead Code Elimination Implicit Type Conversion Special Directives UserCode Block Phases of the UCB Indirect Terms Parameterized UCB Callout Software Constructs IfThenElse Block WHILE Block BREAK Block CONTINUE Block Local Variable Block Sequencer Block Difference Between Local and Global Variable Blocks Scope Lifetime Continuous Subsystem Explicit Phases Integrator Limitations Multiprocessor Code Generation Shared Memory Architecture Distributed Memory Architecture Shared Memory Callouts Callout Naming Convention Mapping Command Options Fixed-Point Support for Multiprocessor AutoCode Definitions and Conventions Shared Memory Fixed-Point Callouts in AutoCode/C From Local Memory From Shared Memory From Shared Memory (mixed data types) Reading Shared Memory Shared Variable Block Support Page Page Shared Memory Callout Option Read Shared Variable Block Callouts Write Shared Variable Block Callouts Global Variable Block Callouts Callout Pairs Non-Shared (Local) Global Variable Blocks Entering Non-Shared (Local) Critical Section Leaving Non-Shared (Local) Critical Section Entering with Extended Procedure Info Option Specified Leaving with Extended Procedure Info Option Specified Shared Global Variable Blocks Entering Shared Critical Section Leaving Shared Critical Section Entering with Extended Procedure Info Option Specified Leaving with Extended Procedure Info Option Specified Page Vectorized Code Generation How Code Is Generated Scalar Gain Block Example Example 6-1 shows the scalar code generated for a gain block. Vectorized Gain Block Example Array Subscripts Signal Connectivity Block Outputs Block Inputs Page Vectorization Modes Maximal Vectorization Mixed Vectorization Vector Labels and Names Example Page Page Page Page Page Vectorization Features Multiple Arrays within a Block Page Split-Merge Inefficiency Split Vector Page Merge Page External Outputs Copy-Back Page Eliminating Copy-Back Other Copy-Back Scenarios Vectorized Standard Procedure Interface Page Ada Array Aggregates and Slices Page Vectorization of the BlockScript Block Matrix Outputs Code Optimization Read from Variable Blocks Page Page Page Restart Capability Page Page Merging INIT Sections Page Page Reuse of Temporary Block Outputs Reuse Temporaries as Specified Maximal Reuse of Temporaries Page Constant Propagation Page Page Optimizing with Matrix Blocks Optimizing with Constant Blocks Optimizing with Callout Blocks Optimizing with Inverse Blocks Optimizing with Division Blocks Page Summary AutoCode Sim Cdelay Scheduler Task Posting Policies Page Page Scheduler Pipeline Page Managing DataStores in the Scheduler Page Sim Cdelay Scheduler State Transition Diagrams of Tasks under Sim Cdelay Figure 8-6. New STD for ATR Triggered Tasks Figure 8-7. New STD for ATR Triggered Tasks Implementing the Sim Cdelay AutoCode Scheduler Implementation Details DataStore Priority Problem Using the Sim Cdelay Scheduler Template Configuration for Enhanced Performance Shortcomings of the Sim Cdelay Scheduler Page Global Scope Signals and Parameterless Procedures Data Monitoring/Injection Specifying Monitored Signals Generating Code for Monitored Signals Page Parameterless Procedure Specifying Parameterless Procedure Interface Input Specification Output Specification Using a Parameterless Procedure Global-to-Global Input Connection Global Output Connection Condition Block Code Generation Reusing a Parameterless Procedure Generating Code for Parameterless Procedures Issues and Limitations Communication Between Subsystems Variable Blocks Versus Global Scope SystemBuild Simulator Connection to External Output Recommendations Naming Convention Model Documentation Explicit Sequencing Command Options A Technical Support and Professional Services Index A B C D E F Page G Page H I K L M N O P R S Page T U V W X Y