HPC Magazine February 2014 - An Introduction to Performance Programming - part I.

Listing 10: An example of SSE's reuse of a source operand to store computation results.


66 0F 59 /r                                RM V/V SSE2 Multiply packed DP floating-point
MULPD xmm1, xmm2/m128                      values in xmm2/m128 by xmm1.
VEX.NDS.128.66.0F.WIG 59 /r RVM V/V AVX    Multiply packed double-precision floating-point
VMULPD xmm1,xmm2, xmm3/m128                values from xmm3/mem to xmm2 and stores
                                           result in xmm1.
(...)

MULPD (128-bit Legacy SSE version)
DEST[63:0] DEST[63:0] * SRC[63:0]
DEST[127:64] DEST[127:64] * SRC[127:64]
DEST[VLMAX-1:128] (Unmodified)
VMULPD (VEX.128 encoded version)
DEST[63:0] SRC1[63:0] * SRC2[63:0]
DEST[127:64] SRC1[127:64] * SRC2[127:64]
DEST[VLMAX-1:128] 0