Figure A-3 Bad Allocation in Cache | Compaq ECQD2KCTE instruction

In a frequently executed loop, compilers could allocate the data items accessed from memory so that, on each loop iteration, all of the memory addresses accessed are either in exactly the same aligned 64-byte block or differ in bits VA<10:6>. For loops that go through arrays in a common direction with a common stride, this requires allocating the arrays, checking that the first-iteration addresses differ, and if not, inserting up to 64 bytes of padding between the arrays. This rule will avoid thrashing in small direct-mapped data caches with block sizes up to 64 bytes and total sizes of 2K bytes or more.

Example:

REAL*4 A(1000),B(1000) DO 60 i=1,1000

60 A( i ) = f(B( i ))

Figures A–3,A–4,and A–5show bad, better, and best allocation in cache, respectively.

BAD allocation (A and B thrash in 8 KB direct-mapped cache):

Figure A–3: Bad Allocation in Cache

A

B

0

4K

8K

12K

16K

BETTER allocation (A and B offset by 64 mod 2 KB, so 16 elements of A and 16 of B can be in cache simultaneously):

Figure A–4: Better Allocation in Cache

A

0

B

4K

8K+64

12K

16K

BEST allocation (A and B offset by 64 mod 2 KB, so 16 elements of A and 16 of B can be in cache simultaneously, and both arrays fit entirely in 8 KB or bigger cache):

Figure A–5: Best Allocation in Cache

A

0

B

4K-64

8K

12K

16K

In a frequently executed loop, compilers could allocate the data items accessed from memory so that, on each loop iteration, all of the memory addresses accessed are either in exactly the same 8 KB page, or differ in bits VA<17:13>. For loops that go through arrays in a common direction with a common stride, this requires allocating the arrays, checking that the first-itera-

Software Considerations A–7

Image 281

Compaq ECQD2KCTE manual Figure A-3 Bad Allocation in Cache

Contents

Alpha Architecture Handbook October Table of Contents Instruction Formats Page 10.19 Integer Register to Floating-Point Register Move System Architecture and Programming Implications Digital Unix Ieee Floating-Point Conformance Index Figures Tables Xiii Xiv Preface Page Alpha Approach to Risc Architecture ChapterAlpha Is a True 64-Bit Architecture Alpha Is Designed for Very High-Speed Implementations Alpha Approach to Multiprocessor Shared Memory Alpha Approach to Byte ManipulationAlpha Instructions Include Hints for Achieving Higher Speed PALcode Alpha’s Very Flexible Privileged Software Library Alpha and Programming Languages Data Format Overview PALcode Instructions Instruction Format OverviewInstruction Overview Integer Operate Instructions Branch InstructionsLoad/Store Instructions Floating-Point Operate Instructions Instruction Set CharacteristicsTerminology and Conventions Unpredictable and Undefined NumberingSecurity Holes Unaligned Ranges and ExtentsAligned and Unaligned Read As Zero RAZ Must Be Zero MBZShould Be Zero SBZ Ignore IGNPage Data Types AddressingByte Word Quadword Longword Ffloating VAX Floating-Point Formats Gfloating Ffloating Load Exponent Mapping Mapf Dfloating Gfloating Register Format Ieee Floating-Point Formats Sfloating Load Exponent Mapping Maps Memory Register SFloating 13 Tfloating Datum Tfloating XFloating 14 Tfloating Register Format XFloating Big-Endian Formats 15 Xfloating Datum 17 Xfloating Big-Endian Datum Longword Integer Format in Floating-Point Unit Data Types with No Hardware Support Quadword Integer Format in Floating-Point Unit 23 Little-Endian Byte Addressing Big-Endian Addressing Support 14Alpha Architecture Handbook Program Counter Alpha RegistersInteger Registers Implementation note Lock Registers Floating-Point Registers Processor Cycle Counter PCC Register NotationOptional Registers Memory Prefetch Registers Operand Notation Meaning Operand NotationOperand Value Notation Meaning Expression Operand Notation Meaning Operand Access Type Notation Meaning Operand Access Type NotationInstruction Operand Notation Operand Name Notation Operand Data Type Notation OperatorsOperand Data Type Notation Meaning Operators Meaning Bit concatenation Loadlocked CaseDIV Physicaladdress NotPriorityencode LTU Notation Conventions Instruction FormatsSoftware Note XOR Memory Format Instructions with a Function Code Memory Instruction Format Memory Format Jump Instructions Branch Instruction FormatOperate Instruction Format Floating-Point Operate Instruction Format Floating-Point Operate Instruction Format Floating-Point/Integer Register Moves PALcode Instruction FormatFloating-Point Convert Instructions PALcode Instruction Format Page Instruction Type Section Instruction Set Overview Floating-Point Subsets Subsetting Rules Opcode Qualifiers Meaning Software Emulation RulesOpcode Qualifiers Memory Integer Load/Store Instructions Mnemonic Operation Memory Integer Load/Store Instructions Load Address LDL Load Memory Data into Integer Register Implementation Notes Load Unaligned Memory Data into Integer Register Load Memory Data into Integer Register Locked 10Alpha Architecture Handbook Implementation Notes Store Integer Register Data into Memory Conditional Stqc BEQ Hardware/Software Implementation Note Store Integer Register Data into Memory 16Alpha Architecture Handbook Store Unaligned Integer Register Data into Memory Control Instructions Summary Mnemonic Operation Control Instructions BSR BNEJMP JSR Conditional Branch Unconditional Branch Jumps Encoding Meaning Integer Arithmetic Instructions Summary Mnemonic Operation Integer Arithmetic InstructionsCount instruction CIX extension implementation note Addl Longword Add S8ADDL Scaled Longword AddS4ADDL Addq Quadword Add S8ADDQ Scaled Quadword AddS4ADDQ Integer Signed Compare Integer Unsigned Compare Count Leading Zero Count Population Count Trailing Zero Mull Longword Multiply Mulq Quadword Multiply Unsigned Quadword Multiply High Subl Longword Subtract S8SUBL Scaled Longword SubtractS4SUBL Subq Quadword Subtract S8SUBQ Scaled Quadword SubtractS4SUBQ Logical and Shift Instructions Summary Mnemonic Operation Logical and Shift Instructions Logical Functions Conditional Move Integer Is exactly equivalent to Shift Logical Shift Arithmetic Instruction Meaning Described in Section Byte Manipulation Instructions Inslh InswhInsqh Mskbl BEQ Loop Compare Byte To compare two character strings for greater/equal/less Extract Byte Ldqu Optimized examples Big-endian examples Byte Insert 56Alpha Architecture Handbook Byte Mask Insqh Inswh Sign Extend Zero Bytes Single-Precision Operations Subsets and FaultsFloating-Point Instructions Alpha finite number Definitions Dirty zero DenormalInfinity Non-finite number True zero EncodingsSign Exponent Fraction Vax Meaning Finite Ieee Rounding Modes Rounding ModesVAX Rounding Modes Ieee Rounding Mode Instruction Notation VAX Rounding Mode Instruction NotationComputational Models VAX-Format Arithmetic with Precise Exceptions IEEE-Compliant Arithmetic Without Inexact Exception High-Performance VAX-Format ArithmeticIEEE-Compliant Arithmetic VAX Trapping Modes Trapping ModesHigh-Performance IEEE-Format Arithmetic When /U or /V mode is specified Ieee Trapping Modes VAX Trapping Modes Summary Trap Mode Notation Meaning SUI Summary of Ieee Trapping Modes Trap Mode Notation Meaning SVI Arithmetic Trap Completion Trap Shadow Length Rules Page Invalid Operation INV Arithmetic Trap Overflow OVF Arithmetic Trap Division by Zero DZE Arithmetic Trap Underflow UNF Arithmetic Trap Ieee Floating-Point Trap Disable BitsInexact Result INE Arithmetic Trap Integer Overflow IOV Arithmetic Trap Ieee Denormal Control Bits Floating-Point Control Register Fpcr Floating-Point Control Register Fpcr Format Ieee Rounding Mode SelectedPage Accessing the Fpcr Software Notes Default Values of the FpcrSaving and Restoring the Fpcr Floating-Point Instruction Function Field Floating-Point Instruction Function Field Format Contents Meaning for Opcodes 1616 SQRTS/SQRTT FNCITOFS/ITOFT Unpredictable Conversion of NaN and Infinity Values Ieee Standard Propagating NaN Values Copying NaN ValuesGenerating NaN Values Memory Format Floating-Point Instructions Load Ffloating Load Gfloating Load Sfloating Load Tfloating Store Ffloating Store Gfloating Store Sfloating Store Tfloating Branch Format Floating-Point Instructions FBxx Fa.rq,disp.al Branch format Page Floating-Point Operate Format Instructions Addg AddfAdds Ieee Mulg MulfMuls Mult Copy Sign Convert Integer to Integer Floating-Point Conditional Move Fcmovne F3,F2,F1 Move from/to Floating-Point Control Register VAX Floating Add Ieee Floating Add Cmpgeq VAX Floating CompareCmpgle Cmpglt Cmpteq Ieee Floating CompareCmptle Cmptlt Convert VAX Floating to Integer Convert Integer to VAX Floating Convert VAX Floating to VAX Floating Convert Ieee Floating to Integer Convert Integer to Ieee Floating Convert Ieee SFloating to Ieee TFloating Convert Ieee TFloating to Ieee SFloating VAX Floating Divide Ieee Floating Divide Floating-Point Register to Integer Register Move Integer Register to Floating-Point Register Move Itofs is exactly equivalent to the sequence VAX Floating Multiply Ieee Floating Multiply VAX Floating Square Root Ieee Floating Square Root VAX Floating Subtract Ieee Floating Subtract 17 Miscellaneous Instructions Summary Mnemonic Operation Miscellaneous Instructions Architecture Mask 134Alpha Architecture Handbook Call Privileged Architecture Library Evict Data Cache Block Implementation Note Exception Barrier Prefetch Data 140Alpha Architecture Handbook Implementation Version Memory Barrier Read Processor Cycle Counter Trap Barrier Write Hint Implementation Note Write Memory Barrier Processor Processor j 18 VAX Compatibility Instructions Summary Mnemonic Operation VAX Compatibility Instructions VAX Compatibility Instructions Mnemonic Operation Multimedia Graphics and Video Support Byte and Word Minimum and Maximum Instruction mnemonics Pixel Error Pack Bytes Unpack Bytes Introduction Physical Address Space CharacteristicsCoherency of Memory Access Software/Hardware Note Granularity of Memory Access Memory-Like and Non-Memory-Like Behavior Width of Memory Access Hardware/Software Coordination Note Translation Buffers and Virtual CachesCaches and Write Buffers Page Atomic Change of a Single Datum Atomic Update of a Single DatumData Sharing STQ Atomic Update of Data Structures 8Alpha Architecture Handbook Ordering Considerations for Shared Data Structures Alpha Shared Memory Model Read/Write Ordering Implementation Note Definition of Before and After Architectural Definition of Processor Issue SequenceDefinition of Processor Issue Constraints 1st↓ 2nd → PiIn=4y,b PiRny,b PiWny,b PiMB PiIMB Processor Issue Constraints Definition of Storage Definition of Location Access ConstraintsDefinition of Visibility Definition of Dependence Constraint Definition of Load-Locked and Store-Conditional Timeliness Litmus TestsLitmus Test 1 Impossible Sequence PiPj Litmus Test 3 Impossible Sequence Litmus Test 2 Impossible Sequence Litmus Test 6 Sequence Okay Litmus Test 4 Sequence OkayLitmus Test 5 Sequence Okay Litmus Test 8 Impossible Sequence Litmus Test 7 Impossible Sequence Litmus Test 11 Impossible Sequence Litmus Test 9 Impossible SequenceLitmus Test 10 Sequence Okay Implications for Software Implied BarriersSingle Processor Data Stream Single Processor Instruction Stream See Footnote 1 on First Processor Second Processor Multiprocessor Context SwitchPage Multiprocessor Send/Receive Interrupt Implications for Memory Mapped I/O Multiple Processors Writing to a Single I/O Device Implications for Hardware Dictable Arithmetic Traps PALcode Instructions and Functions PALcode Special Functions Required for PALcode PALcode Environment PALcode Effects on System Code PALcode Replacement PALcode Instructions that Require Recognition Mnemonic Name Required PALcode Instructions Draina Required PALcode Instructions Mnemonic Type OperationHalt IMB Callpal Draina Drain Aborts Callpal Halt Halt Callpal IMB Instruction Memory Barrier Console Subsystem Overview Page Input/Output Overview Page Unprivileged OpenVMS Alpha PALcode OpenVMS Alpha Chmk ChmeChms Chmu Insqhilr InsqhilInsqhiq Insqhiqr Insqtiq InsqtilrInsqtiqr Insquel Readunq RdpsREI Remqhil Remqtil RemqhiqrRemqtilr Remqtiq Remqueq RemquelRscc Swasten Cflush Privileged OpenVMS Alpha PalcodeLdqp Mfpr Wtint StqpSwpctx Page Mnemonic Operation and Description Unprivileged Digital Unix PALcode Privileged Digital Unix PALcode 10-3 10-4Alpha Architecture Handbook Unprivileged Windows NT Alpha PALcode Instruction Summary Unprivileged Windows NT Alpha PALcode Privileged Windows NT Alpha PALcode 11-3 11-4Alpha Architecture Handbook 11-5 Page Hardware-Software Compact Appendix a Branch Prediction and Minimizing Branch-Taken Factor Instruction-Stream ConsiderationsInstruction Alignment Figure A-1 Branch-Format BSR and BR Opcodes Improving I-Stream Density Factor Data-Stream ConsiderationsInstruction Scheduling Factor Data Alignment Factor Shared Data in Multiple Processors Factor Avoiding Cache/TB Conflicts Factor Figure A-3 Bad Allocation in Cache Prefetching Factor Sequential Read/Write Factor Code Sequences Aligned Byte/Word Within Register Memory Accesses Division 4.1 NOP Byte SwapStylized Code Forms Load Literal Clear a Register Negate Register-to-Register Move4.6 not Booleans Pseudo-Operations Stylized Code Forms Exceptions and Trap Barriers Software Considerations A-15 Timing Considerations Atomic Sequences Alpha Choices for Ieee Options Appendix B 2Alpha Architecture Handbook Alpha Support for OS Completion Handlers Figure B-1 Ieee Floating-Point Control FPC Quadword Ieee Floating-Point Control FPC QuadwordPage Mapping to Ieee Standard Figure B-2 Ieee Trap Handling Behavior Table B-2 Ieee Floating-Point Trap Handling User Hardware1 Completion Signal Alpha Instructions Code Handler MULx Output Exceptions Cmptlt Cmptle Input Exceptions CVTff Output Exceptions SQRTx Output Exceptions Overflow Division by ZeroUnderflow Inexact Common Architecture Instruction Summary Appendix C ADDQ/V Table C-2 Common Architecture InstructionsADDL/V 15.0AF Convert Gfloating to quadword MULQ/V MULL/V SUBQ/V SUBL/V Table C-3 Ieee Floating-Point Instruction Function Codes Ieee Floating-Point Instructions Programming Note VAX Floating-Point Instructions Opcode Summary Independent Floating-Point Instructions Table C-6 Opcode Summary Table C-7 Key to Opcode Summary Symbol Meaning Table C-8 Common Architecture Opcodes in Numerical Order Common Architecture Opcodes in Numerical Order Instruction Summary C-11 Addt Instruction Summary C-13 OpenVMS Alpha PALcode Instruction Summary Mfpresp MfprasnMfprfen Mfpripl Mnemonic Opcode Description Digital Unix PALcode Instruction SummaryTable C-12 Digital Unix Privileged PALcode Instructions Table C-14 Windows NT Alpha Privileged PALcode instructions Windows NT Alpha Instruction Summary Table C-15 PALcode Opcodes in Numerical Order PALcode Opcodes in Numerical OrderWindows NT OpenVMS Alpha Wtint 00.003F 00.0063 Opcodes Reserved to PALcode Required PALcode OpcodesTable C-16 Required PALcode Opcodes Mnemonic Type Table C-17 Opcodes Reserved for PALcode Mnemonic Unused Function Code Behavior Opcodes Reserved to CompaqTable C-18 Opcodes Reserved for Compaq Mnemonic OPC01 OPC02 OPC03 OPC04 OPC05 OPC06 OPC07 Table C-19 Ascii Character Set Hex Code Ascii Character Set Processor Type Assignments Appendix DTable D-1 Processor Type Assignments Major Type Minor Type EV3 PALcode Variation Assignments Table D-3 Amask Bit Assignments Bit Meaning Architecture Mask and Implementation ValuesTable D-4 Implver Value Assignments Meaning Digital UnixPage Waivers Appendix E DECchip 21264 LDxL/STxC with WH64 Violation DECchip 21064/21066/21068 Performance Monitoring Implementation-Specific Functionality For the OpenVMS Alpha Operating System Functions and Arguments for the DECchip 21064/21066/21068 Disable performance monitoring Select performance monitoring options Select desired events muxctl= PCMUX0 = PCMUX1 Value Description PC1 PC0 Performance Monitor Interrupt Mechanism DECchip 21164/21164PC Performance Monitoring Windows NT Alpha Functions and Argument Input Contents Register Bits Meaning PCSEL0 CTR0CTR1 Modeselect OpenVMS Alpha and Digital Unix Functions and ArgumentsPCSEL1 PCSEL2 Select desired events Muxselect Disable performance monitoring do not reset counters Read the counters Select interrupt frequencies Operate on counter MBZ PCSEL1 MBZ PCSEL0MBZ S U P Counters Operate Under These Modes When Bits SetBits Meaning 6310 Table E-13 21164/21164PC Counter 1 PCSEL1 Event Selection Table E-14 21164/21164PC Counter 2 PCSEL2 Event Selection Table E-16 21164 CBOX2 Event Selection Table E-15 21164 CBOX1 Event SelectionValue Meaning Table E-18 21164PC PM1MUX Event Selection Table E-17 21164PC PM0MUX Event Selection 3 21264 Performance Monitoring PCTR0 SEXTPCTR0CTL47IERCMPCEN0 PCTR1 Bit value Meaning SL1 Select logging options Enable and write selected counters R17/a1 Bits Meaning Table E-26 21264 Enable and Write Counters for OpenVMS Alpha Page Index Index-2 Index-3 Index-4 Index-5 Index-6 Index-7 Index-8 Index-9 Index-10 Index-11 Index-12 Index-13