Software Optimization Guide for AMD64 Processors | 25112 Rev. 3.06 September 2005 |
If there are any “empty” doublewords between the last parameter and the top of the cache line, use the SFENCE instruction to flush the
The AGP 3.0 specification specifies that accelerators must be able to buffer at least 128 bytes for the initial data block transferred. Try using
Listing 31. Sending
/* Send commands to a graphic accelerator 2D engine. */ /* The shadowed structure contains 32 DWORDs worth of */ /* rendering commands and data parameters. */
/* Send out 128 (80h) bytes to FIFO in WC MMIO space. */
/* First load
mov rdi, OFFSET ShadowRegs_Structure
/* We now have a pointer to the shadowed engine structure. */ /* Grab 16 bytes at a time. */
movdqa xmm0, [rdi] movdqa xmm1, [rdi + 16] movdqa xmm2, [rdi + 32] movdqa xmm3, [rdi + 48] movdqa xmm4, [rdi + 64] movdqa xmm5, [rdi + 80] movdqa xmm6, [rdi + 96] movdqa xmm7, [rdi + 112]
/* Now get linear pointer to graphic engine mapped in */ /* WC address space. */
mov rax, PTR [Linear2Dengine_Ptr]
/* Now copy register data to processor’s WC buffer. */ /* It is slightly more optimal if the command FIFO */ /* is at a
/* Write 16 bytes at a time. */
movdqa [rax], xmm0 movdqa [rax + 16], xmm1 movdqa [rax + 32], xmm2
/* The first WC buffer will be sent after the next write */
/* (assuming FIFO is
348 | AGP Considerations | Appendix D |