HPC Magazine February 2014 - An Introduction to Performance Programming - part I.
Listing 10: An example of SSE's reuse of a source operand to store computation results.
66 0F 59 /r RM V/V SSE2 Multiply packed DP floating-point MULPD xmm1, xmm2/m128 values in xmm2/m128 by xmm1. VEX.NDS.128.66.0F.WIG 59 /r RVM V/V AVX Multiply packed double-precision floating-point VMULPD xmm1,xmm2, xmm3/m128 values from xmm3/mem to xmm2 and stores result in xmm1. (...) MULPD (128-bit Legacy SSE version) DEST[63:0] DEST[63:0] * SRC[63:0] DEST[127:64] DEST[127:64] * SRC[127:64] DEST[VLMAX-1:128] (Unmodified) VMULPD (VEX.128 encoded version) DEST[63:0] SRC1[63:0] * SRC2[63:0] DEST[127:64] SRC1[127:64] * SRC2[127:64] DEST[VLMAX-1:128] 0