HPC Magazine February 2014 - An Introduction to Performance Programming - part I.

Listing 3: Assembly code of the saxpy function in Listing 1 modified with mulpd (simultaneous computation of two fp operations).


..___tag_value_saxpy.1:
        xorl      %eax, %eax
        movslq    %edi, %rdi
        testq     %rdi, %rdi
        jle       ..B1.5
..B1.3:
        movupd    (%rsi,%rax,8), %xmm1
        mulpd     %xmm0, %xmm1
        addpd     (%rdx,%rax,8), %xmm1
        movupd    %xmm1, (%rdx,%rax,8)
        incq      %rax
        cmpq      %rdi, %rax
        jl        ..B1.3
..B1.5:
        ret