How much Algebra does C2 Know? Part 2: Distributivity

In part one of this series of posts, I looked at how important associativity and independence are for fast loops. C2 seems to utilise these properties to generate unrolled and pipelined machine code for loops, achieving higher throughput even in cases where the kernel of the loop is 3x slower according to vendor advertised instruction throughputs. C2 has a weird and wonderful relationship with distributivity, and hints from the programmer can both and help hinder the generation of good quality machine code.

Viability and Correctness

Distributivity is the simple notion of factoring out brackets. Is this, in general, a viable loop rewrite strategy? This can be utilised to transform the method Scale into FactoredScale, both of which perform floating point arithmetic:

    @CompilerControl(CompilerControl.Mode.DONT_INLINE)
    @Benchmark
    public double Scale(DoubleData state) {
        double value = 0D;
        double[] data = state.data1;
        for (int i = 0; i < data.length; ++i) {
            value += 3.14159 * data[i];
        }
        return value;
    }

    @CompilerControl(CompilerControl.Mode.DONT_INLINE)
    @Benchmark
    public double FactoredScale(DoubleData state) {
        double value = 0D;
        double[] data = state.data1;
        for (int i = 0; i < data.length; ++i) {
            value += data[i];
        }
        return 3.14159 * value;
    }

Running the project at github with the argument --include .*scale.*, there may be a performance gain to be had from this rewrite, but it isn’t clear cut:

Benchmark Mode Threads Samples Score Score Error (99.9%) Unit Param: size
FactoredScale thrpt 1 10 7.011606 0.274742 ops/ms 100000
FactoredScale thrpt 1 10 0.621515 0.026853 ops/ms 1000000
Scale thrpt 1 10 6.962434 0.240180 ops/ms 100000
Scale thrpt 1 10 0.671042 0.011686 ops/ms 1000000

With the real numbers \mathbb{R} it would be completely valid, but floating point arithmetic is not associative. Joseph Darcy explains why in this deep dive on floating point semantics:

Broken associativity of addition entails broken distributivity of any operation over it, so the two loops are not equivalent, and they give different outputs (e.g. 15662.513298516365 vs 15662.51329851632 for one sample input). The rewrite isn’t correct even for floating point data, so it isn’t an optimisation that could be applied in good faith, except in a very small number of cases. You have to rewrite the loop yourself and figure out if the small but inevitable differences are acceptable.

Counterintuitive Performance

Integer multiplication is distributive over addition, and we can check if C2 does this rewrite by running the same code with 32 bit integer values, for now fixing a scale factor of 10 (which seems like an innocuous value, no?)

    @CompilerControl(CompilerControl.Mode.DONT_INLINE)
    @Benchmark
    public int Scale_Int(IntData state) {
        int value = 0;
        int[] data = state.data1;
        for (int i = 0; i < data.length; ++i) {
            value += 10 * data[i];
        }
        return value;
    }

    @CompilerControl(CompilerControl.Mode.DONT_INLINE)
    @Benchmark
    public int FactoredScale_Int(IntData state) {
        int value = 0;
        int[] data = state.data1;
        for (int i = 0; i < data.length; ++i) {
            value += data[i];
        }
        return 10 * value;
    }

The results are fascinating:

Benchmark Mode Threads Samples Score Score Error (99.9%) Unit Param: size
FactoredScale_Int thrpt 1 10 28.339699 0.608075 ops/ms 100000
FactoredScale_Int thrpt 1 10 2.392579 0.506413 ops/ms 1000000
Scale_Int thrpt 1 10 33.335721 0.295334 ops/ms 100000
Scale_Int thrpt 1 10 2.838242 0.448213 ops/ms 1000000

The code is doing thousands more multiplications in less time when the multiplication is not factored out of the loop. So what the devil is going on? Inspecting the assembly for the faster loop is revealing

com/openkappa/simd/scale/Scale.Scale_Int(Lcom/openkappa/simd/state/IntData;)I  [0x000001c89e499320, 0x000001c89e4996f8]  984 bytes
Argument 0 is unknown.RIP: 0x1c89e499320 Code size: 0x000003d8
[Entry Point]
[Verified Entry Point]
[Constants]
  # {method} {0x000001c8b3701b10} 'Scale_Int' '(Lcom/openkappa/simd/state/IntData;)I' in 'com/openkappa/simd/scale/Scale'
  0x000001c89e499320: int3                      ;...cc

  0x000001c89e499321: nop     word ptr [rax+rax+0h]  ;...66
                                                ;...66
                                                ;...66
                                                ;...0f
                                                ;...1f
                                                ;...84
                                                ;...00
                                                ;...00
                                                ;...00
                                                ;...00
                                                ;...00

  0x000001c89e49932c: nop                       ;...66
                                                ;...66
                                                ;...66
                                                ;...90

  0x000001c89e499330: mov     dword ptr [rsp+0ffffffffffff9000h],eax
                                                ;...89
                                                ;...84
                                                ;...24
                                                ;...00
                                                ;...90
                                                ;...ff
                                                ;...ff

  0x000001c89e499337: push    rbp               ;...55

  0x000001c89e499338: sub     rsp,40h           ;...48
                                                ;...83
                                                ;...ec
                                                ;...40

  0x000001c89e49933c: mov     rbp,qword ptr [rdx+8h]  ;...48
                                                ;...8b
                                                ;...6a
                                                ;...08

  0x000001c89e499340: mov     ebx,dword ptr [rdx+10h]  ;...8b
                                                ;...5a
                                                ;...10

  0x000001c89e499343: mov     r13d,dword ptr [rdx]  ;...44
                                                ;...8b
                                                ;...2a

  0x000001c89e499346: mov     rcx,rdx           ;...48
                                                ;...8b
                                                ;...ca

  0x000001c89e499349: vzeroupper                ;...c5
                                                ;...f8
                                                ;...77

  0x000001c89e49934c: mov     r10,51da8d20h     ;...49
                                                ;...ba
                                                ;...20
                                                ;...8d
                                                ;...da
                                                ;...51
                                                ;...00
                                                ;...00
                                                ;...00
                                                ;...00

  0x000001c89e499356: call indirect r10         ;...41
                                                ;...ff
                                                ;...d2

  0x000001c89e499359: mov     r11d,dword ptr [rbp+8h]  ;...44
                                                ;...8b
                                                ;...5d
                                                ;...08
                                                ; implicit exception: dispatches to 0x000001c89e4996c1
  0x000001c89e49935d: cmp     r11d,0f800016dh   ;...41
                                                ;...81
                                                ;...fb
                                                ;...6d
                                                ;...01
                                                ;...00
                                                ;...f8
                                                ;   {metadata({type array int})}
  0x000001c89e499364: jne     1c89e4996a9h      ;...0f
                                                ;...85
                                                ;...3f
                                                ;...03
                                                ;...00
                                                ;...00
                                                ;*iload {reexecute=0 rethrow=0 return_oop=0}
                                                ; - com.openkappa.simd.scale.Scale::Scale_Int@10 (line 41)

  0x000001c89e49936a: mov     edi,dword ptr [rbp+0ch]  ;...8b
                                                ;...7d
                                                ;...0c
                                                ;*arraylength {reexecute=0 rethrow=0 return_oop=0}
                                                ; - com.openkappa.simd.scale.Scale::Scale_Int@13 (line 41)

  0x000001c89e49936d: cmp     r13d,edi          ;...44
                                                ;...3b
                                                ;...ef

  0x000001c89e499370: jnl     1c89e49967ah      ;...0f
                                                ;...8d
                                                ;...04
                                                ;...03
                                                ;...00
                                                ;...00
                                                ;*if_icmpge {reexecute=0 rethrow=0 return_oop=0}
                                                ; - com.openkappa.simd.scale.Scale::Scale_Int@14 (line 41)

  0x000001c89e499376: mov     r10d,ebp          ;...44
                                                ;...8b
                                                ;...d5

  0x000001c89e499379: mov     r8d,r13d          ;...45
                                                ;...8b
                                                ;...c5

  0x000001c89e49937c: inc     r8d               ;...41
                                                ;...ff
                                                ;...c0

  0x000001c89e49937f: shr     r10d,2h           ;...41
                                                ;...c1
                                                ;...ea
                                                ;...02

  0x000001c89e499383: and     r10d,7h           ;...41
                                                ;...83
                                                ;...e2
                                                ;...07

  0x000001c89e499387: xor     r9d,r9d           ;...45
                                                ;...33
                                                ;...c9

  0x000001c89e49938a: cmp     r8d,r9d           ;...45
                                                ;...3b
                                                ;...c1

  0x000001c89e49938d: cmovl   r8d,r9d           ;...45
                                                ;...0f
                                                ;...4c
                                                ;...c1

  0x000001c89e499391: cmp     r8d,edi           ;...44
                                                ;...3b
                                                ;...c7

  0x000001c89e499394: cmovnle r8d,edi           ;...44
                                                ;...0f
                                                ;...4f
                                                ;...c7

  0x000001c89e499398: add     r10d,r8d          ;...45
                                                ;...03
                                                ;...d0

  0x000001c89e49939b: mov     r11d,4h           ;...41
                                                ;...bb
                                                ;...04
                                                ;...00
                                                ;...00
                                                ;...00

  0x000001c89e4993a1: sub     r11d,r10d         ;...45
                                                ;...2b
                                                ;...da

  0x000001c89e4993a4: and     r11d,7h           ;...41
                                                ;...83
                                                ;...e3
                                                ;...07

  0x000001c89e4993a8: add     r11d,r8d          ;...45
                                                ;...03
                                                ;...d8

  0x000001c89e4993ab: cmp     r11d,edi          ;...44
                                                ;...3b
                                                ;...df

  0x000001c89e4993ae: cmovnle r11d,edi          ;...44
                                                ;...0f
                                                ;...4f
                                                ;...df

  0x000001c89e4993b2: nop                       ;...66
                                                ;...90

  0x000001c89e4993b4: cmp     r13d,edi          ;...44
                                                ;...3b
                                                ;...ef

  0x000001c89e4993b7: jnb     1c89e49968bh      ;...0f
                                                ;...83
                                                ;...ce
                                                ;...02
                                                ;...00
                                                ;...00

  0x000001c89e4993bd: mov     r10d,dword ptr [rbp+r13*4+10h]
                                                ;...46
                                                ;...8b
                                                ;...54
                                                ;...ad
                                                ;...10
                                                ;*iaload {reexecute=0 rethrow=0 return_oop=0}
                                                ; - com.openkappa.simd.scale.Scale::Scale_Int@23 (line 42)

  0x000001c89e4993c2: mov     r9d,r10d          ;...45
                                                ;...8b
                                                ;...ca

  0x000001c89e4993c5: shl     r9d,3h            ;...41
                                                ;...c1
                                                ;...e1
                                                ;...03

  0x000001c89e4993c9: shl     r10d,1h           ;...41
                                                ;...d1
                                                ;...e2

  0x000001c89e4993cc: add     r9d,r10d          ;...45
                                                ;...03
                                                ;...ca

  0x000001c89e4993cf: add     ebx,r9d           ;...41
                                                ;...03
                                                ;...d9
                                                ;*iadd {reexecute=0 rethrow=0 return_oop=0}
                                                ; - com.openkappa.simd.scale.Scale::Scale_Int@25 (line 42)

  0x000001c89e4993d2: inc     r13d              ;...41
                                                ;...ff
                                                ;...c5
                                                ;*iinc {reexecute=0 rethrow=0 return_oop=0}
                                                ; - com.openkappa.simd.scale.Scale::Scale_Int@27 (line 41)

  0x000001c89e4993d5: cmp     r13d,r11d         ;...45
                                                ;...3b
                                                ;...eb

  0x000001c89e4993d8: jl      1c89e4993b4h      ;...7c
                                                ;...da
                                                ;*if_icmpge {reexecute=0 rethrow=0 return_oop=0}
                                                ; - com.openkappa.simd.scale.Scale::Scale_Int@14 (line 41)

  0x000001c89e4993da: mov     r8d,edi           ;...44
                                                ;...8b
                                                ;...c7

  0x000001c89e4993dd: add     r8d,0ffffffc1h    ;...41
                                                ;...83
                                                ;...c0
                                                ;...c1

  0x000001c89e4993e1: mov     ecx,80000000h     ;...b9
                                                ;...00
                                                ;...00
                                                ;...00
                                                ;...80

  0x000001c89e4993e6: cmp     edi,r8d           ;...41
                                                ;...3b
                                                ;...f8

  0x000001c89e4993e9: cmovl   r8d,ecx           ;...44
                                                ;...0f
                                                ;...4c
                                                ;...c1

  0x000001c89e4993ed: cmp     r13d,r8d          ;...45
                                                ;...3b
                                                ;...e8

  0x000001c89e4993f0: jnl     1c89e499651h      ;...0f
                                                ;...8d
                                                ;...5b
                                                ;...02
                                                ;...00
                                                ;...00

  0x000001c89e4993f6: nop     word ptr [rax+rax+0h]  ;...66
                                                ;...66
                                                ;...0f
                                                ;...1f
                                                ;...84
                                                ;...00
                                                ;...00
                                                ;...00
                                                ;...00
                                                ;...00

  0x000001c89e499400: vmovdqu ymm8,ymmword ptr [rbp+r13*4+10h]
                                                ;...c4
                                                ;...21
                                                ;...7e
                                                ;...6f
                                                ;...44
                                                ;...ad
                                                ;...10

  0x000001c89e499407: movsxd  r10,r13d          ;...4d
                                                ;...63
                                                ;...d5

  0x000001c89e49940a: vmovdqu ymm9,ymmword ptr [rbp+r10*4+30h]
                                                ;...c4
                                                ;...21
                                                ;...7e
                                                ;...6f
                                                ;...4c
                                                ;...95
                                                ;...30

  0x000001c89e499411: vmovdqu ymm13,ymmword ptr [rbp+r10*4+0f0h]
                                                ;...c4
                                                ;...21
                                                ;...7e
                                                ;...6f
                                                ;...ac
                                                ;...95
                                                ;...f0
                                                ;...00
                                                ;...00
                                                ;...00

  0x000001c89e49941b: vmovdqu ymm12,ymmword ptr [rbp+r10*4+50h]
                                                ;...c4
                                                ;...21
                                                ;...7e
                                                ;...6f
                                                ;...64
                                                ;...95
                                                ;...50

  0x000001c89e499422: vmovdqu ymm4,ymmword ptr [rbp+r10*4+70h]
                                                ;...c4
                                                ;...a1
                                                ;...7e
                                                ;...6f
                                                ;...64
                                                ;...95
                                                ;...70

  0x000001c89e499429: vmovdqu ymm3,ymmword ptr [rbp+r10*4+90h]
                                                ;...c4
                                                ;...a1
                                                ;...7e
                                                ;...6f
                                                ;...9c
                                                ;...95
                                                ;...90
                                                ;...00
                                                ;...00
                                                ;...00

  0x000001c89e499433: vmovdqu ymm2,ymmword ptr [rbp+r10*4+0b0h]
                                                ;...c4
                                                ;...a1
                                                ;...7e
                                                ;...6f
                                                ;...94
                                                ;...95
                                                ;...b0
                                                ;...00
                                                ;...00
                                                ;...00

  0x000001c89e49943d: vmovdqu ymm0,ymmword ptr [rbp+r10*4+0d0h]
                                                ;...c4
                                                ;...a1
                                                ;...7e
                                                ;...6f
                                                ;...84
                                                ;...95
                                                ;...d0
                                                ;...00
                                                ;...00
                                                ;...00
                                                ;*iaload {reexecute=0 rethrow=0 return_oop=0}
                                                ; - com.openkappa.simd.scale.Scale::Scale_Int@23 (line 42)

  0x000001c89e499447: vpslld  ymm11,ymm8,1h     ;...c4
                                                ;...c1
                                                ;...25
                                                ;...72
                                                ;...f0
                                                ;...01

  0x000001c89e49944d: vpslld  ymm1,ymm0,1h      ;...c5
                                                ;...f5
                                                ;...72
                                                ;...f0
                                                ;...01

  0x000001c89e499452: vpslld  ymm0,ymm0,3h      ;...c5
                                                ;...fd
                                                ;...72
                                                ;...f0
                                                ;...03

  0x000001c89e499457: vpaddd  ymm5,ymm0,ymm1    ;...c5
                                                ;...fd
                                                ;...fe
                                                ;...e9

  0x000001c89e49945b: vpslld  ymm0,ymm2,3h      ;...c5
                                                ;...fd
                                                ;...72
                                                ;...f2
                                                ;...03

  0x000001c89e499460: vpslld  ymm7,ymm3,3h      ;...c5
                                                ;...c5
                                                ;...72
                                                ;...f3
                                                ;...03

  0x000001c89e499465: vpslld  ymm10,ymm4,3h     ;...c5
                                                ;...ad
                                                ;...72
                                                ;...f4
                                                ;...03

  0x000001c89e49946a: vpslld  ymm15,ymm12,3h    ;...c4
                                                ;...c1
                                                ;...05
                                                ;...72
                                                ;...f4
                                                ;...03

  0x000001c89e499470: vpslld  ymm14,ymm13,3h    ;...c4
                                                ;...c1
                                                ;...0d
                                                ;...72
                                                ;...f5
                                                ;...03

  0x000001c89e499476: vpslld  ymm1,ymm9,3h      ;...c4
                                                ;...c1
                                                ;...75
                                                ;...72
                                                ;...f1
                                                ;...03

  0x000001c89e49947c: vpslld  ymm2,ymm2,1h      ;...c5
                                                ;...ed
                                                ;...72
                                                ;...f2
                                                ;...01

  0x000001c89e499481: vpaddd  ymm6,ymm0,ymm2    ;...c5
                                                ;...fd
                                                ;...fe
                                                ;...f2

  0x000001c89e499485: vpslld  ymm0,ymm3,1h      ;...c5
                                                ;...fd
                                                ;...72
                                                ;...f3
                                                ;...01

  0x000001c89e49948a: vpaddd  ymm7,ymm7,ymm0    ;...c5
                                                ;...c5
                                                ;...fe
                                                ;...f8

  0x000001c89e49948e: vpslld  ymm0,ymm4,1h      ;...c5
                                                ;...fd
                                                ;...72
                                                ;...f4
                                                ;...01

  0x000001c89e499493: vpaddd  ymm10,ymm10,ymm0  ;...c5
                                                ;...2d
                                                ;...fe
                                                ;...d0

  0x000001c89e499497: vpslld  ymm0,ymm12,1h     ;...c4
                                                ;...c1
                                                ;...7d
                                                ;...72
                                                ;...f4
                                                ;...01

  0x000001c89e49949d: vpaddd  ymm12,ymm15,ymm0  ;...c5
                                                ;...05
                                                ;...fe
                                                ;...e0

  0x000001c89e4994a1: vpslld  ymm0,ymm13,1h     ;...c4
                                                ;...c1
                                                ;...7d
                                                ;...72
                                                ;...f5
                                                ;...01

  0x000001c89e4994a7: vpaddd  ymm4,ymm14,ymm0   ;...c5
                                                ;...8d
                                                ;...fe
                                                ;...e0

  0x000001c89e4994ab: vpslld  ymm0,ymm9,1h      ;...c4
                                                ;...c1
                                                ;...7d
                                                ;...72
                                                ;...f1
                                                ;...01

  0x000001c89e4994b1: vpaddd  ymm2,ymm1,ymm0    ;...c5
                                                ;...f5
                                                ;...fe
                                                ;...d0

  0x000001c89e4994b5: vpslld  ymm0,ymm8,3h      ;...c4
                                                ;...c1
                                                ;...7d
                                                ;...72
                                                ;...f0
                                                ;...03

  0x000001c89e4994bb: vpaddd  ymm8,ymm0,ymm11   ;...c4
                                                ;...41
                                                ;...7d
                                                ;...fe
                                                ;...c3

  0x000001c89e4994c0: vphaddd ymm0,ymm8,ymm8    ;...c4
                                                ;...c2
                                                ;...3d
                                                ;...02
                                                ;...c0

  0x000001c89e4994c5: vphaddd ymm0,ymm0,ymm3    ;...c4
                                                ;...e2
                                                ;...7d
                                                ;...02
                                                ;...c3

  0x000001c89e4994ca: vextracti128 xmm3,ymm0,1h  ;...c4
                                                ;...e3
                                                ;...7d
                                                ;...39
                                                ;...c3
                                                ;...01

  0x000001c89e4994d0: vpaddd  xmm0,xmm0,xmm3    ;...c5
                                                ;...f9
                                                ;...fe
                                                ;...c3

  0x000001c89e4994d4: vmovd   xmm3,ebx          ;...c5
                                                ;...f9
                                                ;...6e
                                                ;...db

  0x000001c89e4994d8: vpaddd  xmm3,xmm3,xmm0    ;...c5
                                                ;...e1
                                                ;...fe
                                                ;...d8

  0x000001c89e4994dc: vmovd   r10d,xmm3         ;...c4
                                                ;...c1
                                                ;...79
                                                ;...7e
                                                ;...da

  0x000001c89e4994e1: vphaddd ymm0,ymm2,ymm2    ;...c4
                                                ;...e2
                                                ;...6d
                                                ;...02
                                                ;...c2

  0x000001c89e4994e6: vphaddd ymm0,ymm0,ymm3    ;...c4
                                                ;...e2
                                                ;...7d
                                                ;...02
                                                ;...c3

  0x000001c89e4994eb: vextracti128 xmm3,ymm0,1h  ;...c4
                                                ;...e3
                                                ;...7d
                                                ;...39
                                                ;...c3
                                                ;...01

  0x000001c89e4994f1: vpaddd  xmm0,xmm0,xmm3    ;...c5
                                                ;...f9
                                                ;...fe
                                                ;...c3

  0x000001c89e4994f5: vmovd   xmm3,r10d         ;...c4
                                                ;...c1
                                                ;...79
                                                ;...6e
                                                ;...da

  0x000001c89e4994fa: vpaddd  xmm3,xmm3,xmm0    ;...c5
                                                ;...e1
                                                ;...fe
                                                ;...d8

  0x000001c89e4994fe: vmovd   r11d,xmm3         ;...c4
                                                ;...c1
                                                ;...79
                                                ;...7e
                                                ;...db

  0x000001c89e499503: vphaddd ymm2,ymm12,ymm12  ;...c4
                                                ;...c2
                                                ;...1d
                                                ;...02
                                                ;...d4

  0x000001c89e499508: vphaddd ymm2,ymm2,ymm0    ;...c4
                                                ;...e2
                                                ;...6d
                                                ;...02
                                                ;...d0

  0x000001c89e49950d: vextracti128 xmm0,ymm2,1h  ;...c4
                                                ;...e3
                                                ;...7d
                                                ;...39
                                                ;...d0
                                                ;...01

  0x000001c89e499513: vpaddd  xmm2,xmm2,xmm0    ;...c5
                                                ;...e9
                                                ;...fe
                                                ;...d0

  0x000001c89e499517: vmovd   xmm0,r11d         ;...c4
                                                ;...c1
                                                ;...79
                                                ;...6e
                                                ;...c3

  0x000001c89e49951c: vpaddd  xmm0,xmm0,xmm2    ;...c5
                                                ;...f9
                                                ;...fe
                                                ;...c2

  0x000001c89e499520: vmovd   r10d,xmm0         ;...c4
                                                ;...c1
                                                ;...79
                                                ;...7e
                                                ;...c2

  0x000001c89e499525: vphaddd ymm0,ymm10,ymm10  ;...c4
                                                ;...c2
                                                ;...2d
                                                ;...02
                                                ;...c2

  0x000001c89e49952a: vphaddd ymm0,ymm0,ymm3    ;...c4
                                                ;...e2
                                                ;...7d
                                                ;...02
                                                ;...c3

  0x000001c89e49952f: vextracti128 xmm3,ymm0,1h  ;...c4
                                                ;...e3
                                                ;...7d
                                                ;...39
                                                ;...c3
                                                ;...01

  0x000001c89e499535: vpaddd  xmm0,xmm0,xmm3    ;...c5
                                                ;...f9
                                                ;...fe
                                                ;...c3

  0x000001c89e499539: vmovd   xmm3,r10d         ;...c4
                                                ;...c1
                                                ;...79
                                                ;...6e
                                                ;...da

  0x000001c89e49953e: vpaddd  xmm3,xmm3,xmm0    ;...c5
                                                ;...e1
                                                ;...fe
                                                ;...d8

  0x000001c89e499542: vmovd   r11d,xmm3         ;...c4
                                                ;...c1
                                                ;...79
                                                ;...7e
                                                ;...db

  0x000001c89e499547: vphaddd ymm2,ymm7,ymm7    ;...c4
                                                ;...e2
                                                ;...45
                                                ;...02
                                                ;...d7

  0x000001c89e49954c: vphaddd ymm2,ymm2,ymm0    ;...c4
                                                ;...e2
                                                ;...6d
                                                ;...02
                                                ;...d0

  0x000001c89e499551: vextracti128 xmm0,ymm2,1h  ;...c4
                                                ;...e3
                                                ;...7d
                                                ;...39
                                                ;...d0
                                                ;...01

  0x000001c89e499557: vpaddd  xmm2,xmm2,xmm0    ;...c5
                                                ;...e9
                                                ;...fe
                                                ;...d0

  0x000001c89e49955b: vmovd   xmm0,r11d         ;...c4
                                                ;...c1
                                                ;...79
                                                ;...6e
                                                ;...c3

  0x000001c89e499560: vpaddd  xmm0,xmm0,xmm2    ;...c5
                                                ;...f9
                                                ;...fe
                                                ;...c2

  0x000001c89e499564: vmovd   r10d,xmm0         ;...c4
                                                ;...c1
                                                ;...79
                                                ;...7e
                                                ;...c2

  0x000001c89e499569: vphaddd ymm0,ymm6,ymm6    ;...c4
                                                ;...e2
                                                ;...4d
                                                ;...02
                                                ;...c6

  0x000001c89e49956e: vphaddd ymm0,ymm0,ymm3    ;...c4
                                                ;...e2
                                                ;...7d
                                                ;...02
                                                ;...c3

  0x000001c89e499573: vextracti128 xmm3,ymm0,1h  ;...c4
                                                ;...e3
                                                ;...7d
                                                ;...39
                                                ;...c3
                                                ;...01

  0x000001c89e499579: vpaddd  xmm0,xmm0,xmm3    ;...c5
                                                ;...f9
                                                ;...fe
                                                ;...c3

  0x000001c89e49957d: vmovd   xmm3,r10d         ;...c4
                                                ;...c1
                                                ;...79
                                                ;...6e
                                                ;...da

  0x000001c89e499582: vpaddd  xmm3,xmm3,xmm0    ;...c5
                                                ;...e1
                                                ;...fe
                                                ;...d8

  0x000001c89e499586: vmovd   r11d,xmm3         ;...c4
                                                ;...c1
                                                ;...79
                                                ;...7e
                                                ;...db

  0x000001c89e49958b: vphaddd ymm2,ymm5,ymm5    ;...c4
                                                ;...e2
                                                ;...55
                                                ;...02
                                                ;...d5

  0x000001c89e499590: vphaddd ymm2,ymm2,ymm0    ;...c4
                                                ;...e2
                                                ;...6d
                                                ;...02
                                                ;...d0

  0x000001c89e499595: vextracti128 xmm0,ymm2,1h  ;...c4
                                                ;...e3
                                                ;...7d
                                                ;...39
                                                ;...d0
                                                ;...01

  0x000001c89e49959b: vpaddd  xmm2,xmm2,xmm0    ;...c5
                                                ;...e9
                                                ;...fe
                                                ;...d0

  0x000001c89e49959f: vmovd   xmm0,r11d         ;...c4
                                                ;...c1
                                                ;...79
                                                ;...6e
                                                ;...c3

  0x000001c89e4995a4: vpaddd  xmm0,xmm0,xmm2    ;...c5
                                                ;...f9
                                                ;...fe
                                                ;...c2

  0x000001c89e4995a8: vmovd   r10d,xmm0         ;...c4
                                                ;...c1
                                                ;...79
                                                ;...7e
                                                ;...c2

  0x000001c89e4995ad: vphaddd ymm2,ymm4,ymm4    ;...c4
                                                ;...e2
                                                ;...5d
                                                ;...02
                                                ;...d4

  0x000001c89e4995b2: vphaddd ymm2,ymm2,ymm1    ;...c4
                                                ;...e2
                                                ;...6d
                                                ;...02
                                                ;...d1

  0x000001c89e4995b7: vextracti128 xmm1,ymm2,1h  ;...c4
                                                ;...e3
                                                ;...7d
                                                ;...39
                                                ;...d1
                                                ;...01

  0x000001c89e4995bd: vpaddd  xmm2,xmm2,xmm1    ;...c5
                                                ;...e9
                                                ;...fe
                                                ;...d1

  0x000001c89e4995c1: vmovd   xmm1,r10d         ;...c4
                                                ;...c1
                                                ;...79
                                                ;...6e
                                                ;...ca

  0x000001c89e4995c6: vpaddd  xmm1,xmm1,xmm2    ;...c5
                                                ;...f1
                                                ;...fe
                                                ;...ca

  0x000001c89e4995ca: vmovd   ebx,xmm1          ;...c5
                                                ;...f9
                                                ;...7e
                                                ;...cb
                                                ;*iadd {reexecute=0 rethrow=0 return_oop=0}
                                                ; - com.openkappa.simd.scale.Scale::Scale_Int@25 (line 42)

  0x000001c89e4995ce: add     r13d,40h          ;...41
                                                ;...83
                                                ;...c5
                                                ;...40
                                                ;*iinc {reexecute=0 rethrow=0 return_oop=0}
                                                ; - com.openkappa.simd.scale.Scale::Scale_Int@27 (line 41)

  0x000001c89e4995d2: cmp     r13d,r8d          ;...45
                                                ;...3b
                                                ;...e8

  0x000001c89e4995d5: jl      1c89e499400h      ;...0f
                                                ;...8c
                                                ;...25
                                                ;...fe
                                                ;...ff
                                                ;...ff
                                                ;*if_icmpge {reexecute=0 rethrow=0 return_oop=0}
                                                ; - com.openkappa.simd.scale.Scale::Scale_Int@14 (line 41)

  0x000001c89e4995db: mov     r10d,edi          ;...44
                                                ;...8b
                                                ;...d7

  0x000001c89e4995de: add     r10d,0fffffff9h   ;...41
                                                ;...83
                                                ;...c2
                                                ;...f9

  0x000001c89e4995e2: cmp     edi,r10d          ;...41
                                                ;...3b
                                                ;...fa

  0x000001c89e4995e5: cmovl   r10d,ecx          ;...44
                                                ;...0f
                                                ;...4c
                                                ;...d1

  0x000001c89e4995e9: cmp     r13d,r10d         ;...45
                                                ;...3b
                                                ;...ea

  0x000001c89e4995ec: jnl     1c89e49962eh      ;...7d
                                                ;...40

  0x000001c89e4995ee: nop                       ;...66
                                                ;...90

  0x000001c89e4995f0: vmovdqu ymm0,ymmword ptr [rbp+r13*4+10h]
                                                ;...c4
                                                ;...a1
                                                ;...7e
                                                ;...6f
                                                ;...44
                                                ;...ad
                                                ;...10
                                                ;*iaload {reexecute=0 rethrow=0 return_oop=0}
                                                ; - com.openkappa.simd.scale.Scale::Scale_Int@23 (line 42)

  0x000001c89e4995f7: vpslld  ymm1,ymm0,3h      ;...c5
                                                ;...f5
                                                ;...72
                                                ;...f0
                                                ;...03

  0x000001c89e4995fc: vpslld  ymm0,ymm0,1h      ;...c5
                                                ;...fd
                                                ;...72
                                                ;...f0
                                                ;...01

  0x000001c89e499601: vpaddd  ymm0,ymm1,ymm0    ;...c5
                                                ;...f5
                                                ;...fe
                                                ;...c0

  0x000001c89e499605: vphaddd ymm3,ymm0,ymm0    ;...c4
                                                ;...e2
                                                ;...7d
                                                ;...02
                                                ;...d8

  0x000001c89e49960a: vphaddd ymm3,ymm3,ymm1    ;...c4
                                                ;...e2
                                                ;...65
                                                ;...02
                                                ;...d9

  0x000001c89e49960f: vextracti128 xmm1,ymm3,1h  ;...c4
                                                ;...e3
                                                ;...7d
                                                ;...39
                                                ;...d9
                                                ;...01

  0x000001c89e499615: vpaddd  xmm3,xmm3,xmm1    ;...c5
                                                ;...e1
                                                ;...fe
                                                ;...d9

  0x000001c89e499619: vmovd   xmm1,ebx          ;...c5
                                                ;...f9
                                                ;...6e
                                                ;...cb

  0x000001c89e49961d: vpaddd  xmm1,xmm1,xmm3    ;...c5
                                                ;...f1
                                                ;...fe
                                                ;...cb

  0x000001c89e499621: vmovd   ebx,xmm1          ;...c5
                                                ;...f9
                                                ;...7e
                                                ;...cb
                                                ;*iadd {reexecute=0 rethrow=0 return_oop=0}
                                                ; - com.openkappa.simd.scale.Scale::Scale_Int@25 (line 42)

  0x000001c89e499625: add     r13d,8h           ;...41
                                                ;...83
                                                ;...c5
                                                ;...08
                                                ;*iinc {reexecute=0 rethrow=0 return_oop=0}
                                                ; - com.openkappa.simd.scale.Scale::Scale_Int@27 (line 41)

  0x000001c89e499629: cmp     r13d,r10d         ;...45
                                                ;...3b
                                                ;...ea

  0x000001c89e49962c: jl      1c89e4995f0h      ;...7c
                                                ;...c2
                                                ;*if_icmpge {reexecute=0 rethrow=0 return_oop=0}
                                                ; - com.openkappa.simd.scale.Scale::Scale_Int@14 (line 41)

  0x000001c89e49962e: cmp     r13d,edi          ;...44
                                                ;...3b
                                                ;...ef

  0x000001c89e499631: jnl     1c89e499651h      ;...7d
                                                ;...1e

  0x000001c89e499633: nop                       ;...90

  0x000001c89e499634: mov     r11d,dword ptr [rbp+r13*4+10h]
                                                ;...46
                                                ;...8b
                                                ;...5c
                                                ;...ad
                                                ;...10
                                                ;*iaload {reexecute=0 rethrow=0 return_oop=0}
                                                ; - com.openkappa.simd.scale.Scale::Scale_Int@23 (line 42)

  0x000001c89e499639: mov     r10d,r11d         ;...45
                                                ;...8b
                                                ;...d3

  0x000001c89e49963c: shl     r10d,3h           ;...41
                                                ;...c1
                                                ;...e2
                                                ;...03

  0x000001c89e499640: shl     r11d,1h           ;...41
                                                ;...d1
                                                ;...e3

  0x000001c89e499643: add     r10d,r11d         ;...45
                                                ;...03
                                                ;...d3

  0x000001c89e499646: add     ebx,r10d          ;...41
                                                ;...03
                                                ;...da
                                                ;*iadd {reexecute=0 rethrow=0 return_oop=0}
                                                ; - com.openkappa.simd.scale.Scale::Scale_Int@25 (line 42)

  0x000001c89e499649: inc     r13d              ;...41
                                                ;...ff
                                                ;...c5
                                                ;*iinc {reexecute=0 rethrow=0 return_oop=0}
                                                ; - com.openkappa.simd.scale.Scale::Scale_Int@27 (line 41)

  0x000001c89e49964c: cmp     r13d,edi          ;...44
                                                ;...3b
                                                ;...ef

  0x000001c89e49964f: jl      1c89e499634h      ;...7c
                                                ;...e3
                                                ;*if_icmpge {reexecute=0 rethrow=0 return_oop=0}
                                                ; - com.openkappa.simd.scale.Scale::Scale_Int@14 (line 41)

  0x000001c89e499651: cmp     r13d,edi          ;...44
                                                ;...3b
                                                ;...ef

  0x000001c89e499654: jnl     1c89e49967ah      ;...7d
                                                ;...24

  0x000001c89e499656: nop                       ;...66
                                                ;...90

  0x000001c89e499658: cmp     r13d,edi          ;...44
                                                ;...3b
                                                ;...ef

  0x000001c89e49965b: jnb     1c89e499691h      ;...73
                                                ;...34

  0x000001c89e49965d: mov     r11d,dword ptr [rbp+r13*4+10h]
                                                ;...46
                                                ;...8b
                                                ;...5c
                                                ;...ad
                                                ;...10
                                                ;*iaload {reexecute=0 rethrow=0 return_oop=0}
                                                ; - com.openkappa.simd.scale.Scale::Scale_Int@23 (line 42)

  0x000001c89e499662: mov     r10d,r11d         ;...45
                                                ;...8b
                                                ;...d3

  0x000001c89e499665: shl     r10d,3h           ;...41
                                                ;...c1
                                                ;...e2
                                                ;...03

  0x000001c89e499669: shl     r11d,1h           ;...41
                                                ;...d1
                                                ;...e3

  0x000001c89e49966c: add     r10d,r11d         ;...45
                                                ;...03
                                                ;...d3

  0x000001c89e49966f: add     ebx,r10d          ;...41
                                                ;...03
                                                ;...da
                                                ;*iadd {reexecute=0 rethrow=0 return_oop=0}
                                                ; - com.openkappa.simd.scale.Scale::Scale_Int@25 (line 42)

  0x000001c89e499672: inc     r13d              ;...41
                                                ;...ff
                                                ;...c5
                                                ;*iinc {reexecute=0 rethrow=0 return_oop=0}
                                                ; - com.openkappa.simd.scale.Scale::Scale_Int@27 (line 41)

  0x000001c89e499675: cmp     r13d,edi          ;...44
                                                ;...3b
                                                ;...ef

  0x000001c89e499678: jl      1c89e499658h      ;...7c
                                                ;...de
                                                ;*iload {reexecute=0 rethrow=0 return_oop=0}
                                                ; - com.openkappa.simd.scale.Scale::Scale_Int@10 (line 41)

  0x000001c89e49967a: mov     eax,ebx           ;...8b
                                                ;...c3

  0x000001c89e49967c: vzeroupper                ;...c5
                                                ;...f8
                                                ;...77

  0x000001c89e49967f: add     rsp,40h           ;...48
                                                ;...83
                                                ;...c4
                                                ;...40

  0x000001c89e499683: pop     rbp               ;...5d

  0x000001c89e499684: test    dword ptr [1c88a9e0000h],eax
                                                ;...85
                                                ;...05
                                                ;...76
                                                ;...69
                                                ;...54
                                                ;...ec
                                                ;   {poll_return}
  0x000001c89e49968a: ret 

The loop is aggressively unrolled, pipelined, and vectorised. Moreover, the multiplication by ten results not in a multiplication but two left shifts (see VPSLLD) and an addition. Note that x << 1 + x << 3 = x * 10 and C2 seems to know it; this rewrite can be applied because it can be proven statically that the factor is always 10. The “optimised” loop doesn’t vectorise at all (and I have no idea why not – isn’t this a bug?)

com/openkappa/simd/scale/Scale.FactoredScale_Int(Lcom/openkappa/simd/state/IntData;)I  [0x000002bbebeda320, 0x000002bbebeda4b8]  408 bytes
Argument 0 is unknown.RIP: 0x2bbebeda320 Code size: 0x00000198
[Entry Point]
[Verified Entry Point]
[Constants]
  # {method} {0x000002bb81241148} 'FactoredScale_Int' '(Lcom/openkappa/simd/state/IntData;)I' in 'com/openkappa/simd/scale/Scale'
  0x000002bbebeda320: int3                      ;...cc

  0x000002bbebeda321: nop     word ptr [rax+rax+0h]  ;...66
                                                ;...66
                                                ;...66
                                                ;...0f
                                                ;...1f
                                                ;...84
                                                ;...00
                                                ;...00
                                                ;...00
                                                ;...00
                                                ;...00

  0x000002bbebeda32c: nop                       ;...66
                                                ;...66
                                                ;...66
                                                ;...90

  0x000002bbebeda330: mov     dword ptr [rsp+0ffffffffffff9000h],eax
                                                ;...89
                                                ;...84
                                                ;...24
                                                ;...00
                                                ;...90
                                                ;...ff
                                                ;...ff

  0x000002bbebeda337: push    rbp               ;...55

  0x000002bbebeda338: sub     rsp,40h           ;...48
                                                ;...83
                                                ;...ec
                                                ;...40

  0x000002bbebeda33c: mov     rbp,qword ptr [rdx+8h]  ;...48
                                                ;...8b
                                                ;...6a
                                                ;...08

  0x000002bbebeda340: mov     ebx,dword ptr [rdx+10h]  ;...8b
                                                ;...5a
                                                ;...10

  0x000002bbebeda343: mov     r13d,dword ptr [rdx]  ;...44
                                                ;...8b
                                                ;...2a

  0x000002bbebeda346: mov     rcx,rdx           ;...48
                                                ;...8b
                                                ;...ca

  0x000002bbebeda349: mov     r10,51da8d20h     ;...49
                                                ;...ba
                                                ;...20
                                                ;...8d
                                                ;...da
                                                ;...51
                                                ;...00
                                                ;...00
                                                ;...00
                                                ;...00

  0x000002bbebeda353: call indirect r10         ;...41
                                                ;...ff
                                                ;...d2

  0x000002bbebeda356: mov     r10d,dword ptr [rbp+8h]  ;...44
                                                ;...8b
                                                ;...55
                                                ;...08
                                                ; implicit exception: dispatches to 0x000002bbebeda46d
  0x000002bbebeda35a: cmp     r10d,0f800016dh   ;...41
                                                ;...81
                                                ;...fa
                                                ;...6d
                                                ;...01
                                                ;...00
                                                ;...f8
                                                ;   {metadata({type array int})}
  0x000002bbebeda361: jne     2bbebeda459h      ;...0f
                                                ;...85
                                                ;...f2
                                                ;...00
                                                ;...00
                                                ;...00
                                                ;*iload {reexecute=0 rethrow=0 return_oop=0}
                                                ; - com.openkappa.simd.scale.Scale::FactoredScale_Int@10 (line 52)

  0x000002bbebeda367: mov     r10d,dword ptr [rbp+0ch]
                                                ;...44
                                                ;...8b
                                                ;...55
                                                ;...0c
                                                ;*arraylength {reexecute=0 rethrow=0 return_oop=0}
                                                ; - com.openkappa.simd.scale.Scale::FactoredScale_Int@13 (line 52)

  0x000002bbebeda36b: cmp     r13d,r10d         ;...45
                                                ;...3b
                                                ;...ea

  0x000002bbebeda36e: jnl     2bbebeda422h      ;...0f
                                                ;...8d
                                                ;...ae
                                                ;...00
                                                ;...00
                                                ;...00
                                                ;*if_icmpge {reexecute=0 rethrow=0 return_oop=0}
                                                ; - com.openkappa.simd.scale.Scale::FactoredScale_Int@14 (line 52)

  0x000002bbebeda374: mov     r11d,r13d         ;...45
                                                ;...8b
                                                ;...dd

  0x000002bbebeda377: inc     r11d              ;...41
                                                ;...ff
                                                ;...c3

  0x000002bbebeda37a: xor     r8d,r8d           ;...45
                                                ;...33
                                                ;...c0

  0x000002bbebeda37d: cmp     r11d,r8d          ;...45
                                                ;...3b
                                                ;...d8

  0x000002bbebeda380: cmovl   r11d,r8d          ;...45
                                                ;...0f
                                                ;...4c
                                                ;...d8

  0x000002bbebeda384: cmp     r11d,r10d         ;...45
                                                ;...3b
                                                ;...da

  0x000002bbebeda387: cmovnle r11d,r10d         ;...45
                                                ;...0f
                                                ;...4f
                                                ;...da

  0x000002bbebeda38b: nop                       ;...90
                                                ;*iaload {reexecute=0 rethrow=0 return_oop=0}
                                                ; - com.openkappa.simd.scale.Scale::FactoredScale_Int@21 (line 53)

  0x000002bbebeda38c: cmp     r13d,r10d         ;...45
                                                ;...3b
                                                ;...ea

  0x000002bbebeda38f: jnb     2bbebeda43ch      ;...0f
                                                ;...83
                                                ;...a7
                                                ;...00
                                                ;...00
                                                ;...00

  0x000002bbebeda395: add     ebx,dword ptr [rbp+r13*4+10h]
                                                ;...42
                                                ;...03
                                                ;...5c
                                                ;...ad
                                                ;...10
                                                ;*iadd {reexecute=0 rethrow=0 return_oop=0}
                                                ; - com.openkappa.simd.scale.Scale::FactoredScale_Int@22 (line 53)

  0x000002bbebeda39a: inc     r13d              ;...41
                                                ;...ff
                                                ;...c5
                                                ;*iinc {reexecute=0 rethrow=0 return_oop=0}
                                                ; - com.openkappa.simd.scale.Scale::FactoredScale_Int@24 (line 52)

  0x000002bbebeda39d: cmp     r13d,r11d         ;...45
                                                ;...3b
                                                ;...eb

  0x000002bbebeda3a0: jl      2bbebeda38ch      ;...7c
                                                ;...ea
                                                ;*if_icmpge {reexecute=0 rethrow=0 return_oop=0}
                                                ; - com.openkappa.simd.scale.Scale::FactoredScale_Int@14 (line 52)

  0x000002bbebeda3a2: mov     r11d,r10d         ;...45
                                                ;...8b
                                                ;...da

  0x000002bbebeda3a5: add     r11d,0fffffff9h   ;...41
                                                ;...83
                                                ;...c3
                                                ;...f9

  0x000002bbebeda3a9: mov     r8d,80000000h     ;...41
                                                ;...b8
                                                ;...00
                                                ;...00
                                                ;...00
                                                ;...80

  0x000002bbebeda3af: cmp     r10d,r11d         ;...45
                                                ;...3b
                                                ;...d3

  0x000002bbebeda3b2: cmovl   r11d,r8d          ;...45
                                                ;...0f
                                                ;...4c
                                                ;...d8

  0x000002bbebeda3b6: cmp     r13d,r11d         ;...45
                                                ;...3b
                                                ;...eb

  0x000002bbebeda3b9: jnl     2bbebeda409h      ;...7d
                                                ;...4e

  0x000002bbebeda3bb: nop     dword ptr [rax+rax+0h]  ;...0f
                                                ;...1f
                                                ;...44
                                                ;...00
                                                ;...00
                                                ;*iaload {reexecute=0 rethrow=0 return_oop=0}
                                                ; - com.openkappa.simd.scale.Scale::FactoredScale_Int@21 (line 53)

  0x000002bbebeda3c0: add     ebx,dword ptr [rbp+r13*4+10h]
                                                ;...42
                                                ;...03
                                                ;...5c
                                                ;...ad
                                                ;...10

  0x000002bbebeda3c5: movsxd  r8,r13d           ;...4d
                                                ;...63
                                                ;...c5

  0x000002bbebeda3c8: add     ebx,dword ptr [rbp+r8*4+14h]
                                                ;...42
                                                ;...03
                                                ;...5c
                                                ;...85
                                                ;...14

  0x000002bbebeda3cd: add     ebx,dword ptr [rbp+r8*4+18h]
                                                ;...42
                                                ;...03
                                                ;...5c
                                                ;...85
                                                ;...18

  0x000002bbebeda3d2: add     ebx,dword ptr [rbp+r8*4+1ch]
                                                ;...42
                                                ;...03
                                                ;...5c
                                                ;...85
                                                ;...1c

  0x000002bbebeda3d7: add     ebx,dword ptr [rbp+r8*4+20h]
                                                ;...42
                                                ;...03
                                                ;...5c
                                                ;...85
                                                ;...20

  0x000002bbebeda3dc: add     ebx,dword ptr [rbp+r8*4+24h]
                                                ;...42
                                                ;...03
                                                ;...5c
                                                ;...85
                                                ;...24

  0x000002bbebeda3e1: add     ebx,dword ptr [rbp+r8*4+28h]
                                                ;...42
                                                ;...03
                                                ;...5c
                                                ;...85
                                                ;...28

  0x000002bbebeda3e6: add     ebx,dword ptr [rbp+r8*4+2ch]
                                                ;...42
                                                ;...03
                                                ;...5c
                                                ;...85
                                                ;...2c
                                                ;*iadd {reexecute=0 rethrow=0 return_oop=0}
                                                ; - com.openkappa.simd.scale.Scale::FactoredScale_Int@22 (line 53)

  0x000002bbebeda3eb: add     r13d,8h           ;...41
                                                ;...83
                                                ;...c5
                                                ;...08
                                                ;*iinc {reexecute=0 rethrow=0 return_oop=0}
                                                ; - com.openkappa.simd.scale.Scale::FactoredScale_Int@24 (line 52)

  0x000002bbebeda3ef: cmp     r13d,r11d         ;...45
                                                ;...3b
                                                ;...eb

  0x000002bbebeda3f2: jl      2bbebeda3c0h      ;...7c
                                                ;...cc
                                                ;*if_icmpge {reexecute=0 rethrow=0 return_oop=0}
                                                ; - com.openkappa.simd.scale.Scale::FactoredScale_Int@14 (line 52)

  0x000002bbebeda3f4: cmp     r13d,r10d         ;...45
                                                ;...3b
                                                ;...ea

  0x000002bbebeda3f7: jnl     2bbebeda409h      ;...7d
                                                ;...10

  0x000002bbebeda3f9: nop                       ;...66
                                                ;...66
                                                ;...90
                                                ;*iaload {reexecute=0 rethrow=0 return_oop=0}
                                                ; - com.openkappa.simd.scale.Scale::FactoredScale_Int@21 (line 53)

  0x000002bbebeda3fc: add     ebx,dword ptr [rbp+r13*4+10h]
                                                ;...42
                                                ;...03
                                                ;...5c
                                                ;...ad
                                                ;...10
                                                ;*iadd {reexecute=0 rethrow=0 return_oop=0}
                                                ; - com.openkappa.simd.scale.Scale::FactoredScale_Int@22 (line 53)

  0x000002bbebeda401: inc     r13d              ;...41
                                                ;...ff
                                                ;...c5
                                                ;*iinc {reexecute=0 rethrow=0 return_oop=0}
                                                ; - com.openkappa.simd.scale.Scale::FactoredScale_Int@24 (line 52)

  0x000002bbebeda404: cmp     r13d,r10d         ;...45
                                                ;...3b
                                                ;...ea

  0x000002bbebeda407: jl      2bbebeda3fch      ;...7c
                                                ;...f3
                                                ;*if_icmpge {reexecute=0 rethrow=0 return_oop=0}
                                                ; - com.openkappa.simd.scale.Scale::FactoredScale_Int@14 (line 52)

  0x000002bbebeda409: cmp     r13d,r10d         ;...45
                                                ;...3b
                                                ;...ea

  0x000002bbebeda40c: jnl     2bbebeda422h      ;...7d
                                                ;...14

  0x000002bbebeda40e: nop                       ;...66
                                                ;...90
                                                ;*iaload {reexecute=0 rethrow=0 return_oop=0}
                                                ; - com.openkappa.simd.scale.Scale::FactoredScale_Int@21 (line 53)

  0x000002bbebeda410: cmp     r13d,r10d         ;...45
                                                ;...3b
                                                ;...ea

  0x000002bbebeda413: jnb     2bbebeda442h      ;...73
                                                ;...2d

  0x000002bbebeda415: add     ebx,dword ptr [rbp+r13*4+10h]
                                                ;...42
                                                ;...03
                                                ;...5c
                                                ;...ad
                                                ;...10
                                                ;*iadd {reexecute=0 rethrow=0 return_oop=0}
                                                ; - com.openkappa.simd.scale.Scale::FactoredScale_Int@22 (line 53)

  0x000002bbebeda41a: inc     r13d              ;...41
                                                ;...ff
                                                ;...c5
                                                ;*iinc {reexecute=0 rethrow=0 return_oop=0}
                                                ; - com.openkappa.simd.scale.Scale::FactoredScale_Int@24 (line 52)

  0x000002bbebeda41d: cmp     r13d,r10d         ;...45
                                                ;...3b
                                                ;...ea

  0x000002bbebeda420: jl      2bbebeda410h      ;...7c
                                                ;...ee
                                                ;*iload {reexecute=0 rethrow=0 return_oop=0}
                                                ; - com.openkappa.simd.scale.Scale::FactoredScale_Int@10 (line 52)

  0x000002bbebeda422: mov     r11d,ebx          ;...44
                                                ;...8b
                                                ;...db

  0x000002bbebeda425: shl     r11d,3h           ;...41
                                                ;...c1
                                                ;...e3
                                                ;...03

  0x000002bbebeda429: shl     ebx,1h            ;...d1
                                                ;...e3

  0x000002bbebeda42b: add     ebx,r11d          ;...41
                                                ;...03
                                                ;...db
                                                ;*imul {reexecute=0 rethrow=0 return_oop=0}
                                                ; - com.openkappa.simd.scale.Scale::FactoredScale_Int@33 (line 55)

  0x000002bbebeda42e: mov     eax,ebx           ;...8b
                                                ;...c3

  0x000002bbebeda430: add     rsp,40h           ;...48
                                                ;...83
                                                ;...c4
                                                ;...40

  0x000002bbebeda434: pop     rbp               ;...5d

  0x000002bbebeda435: test    dword ptr [2bbd8590000h],eax
                                                ;...85
                                                ;...05
                                                ;...c5
                                                ;...5b
                                                ;...6b
                                                ;...ec
                                                ;   {poll_return}
  0x000002bbebeda43b: ret                       ;...c3

This is a special case: data is usually dynamic and variable, so the loop cannot always be proven to be equivalent to a linear combination of bit shifts. The routine is compiled for all possible parameters, not just statically contrived cases like the one above, so you may never see this assembly in the wild. However, even with random factors, the slow looking loop is aggressively optimised in a way the hand “optimised” code is not:

    @CompilerControl(CompilerControl.Mode.DONT_INLINE)
    @Benchmark
    public int Scale_Int_Dynamic(ScaleState state) {
        int value = 0;
        int[] data = state.data;
        int factor = state.randomFactor();
        for (int i = 0; i < data.length; ++i) {
            value += factor * data[i];
        }
        return value;
    }

    @CompilerControl(CompilerControl.Mode.DONT_INLINE)
    @Benchmark
    public int FactoredScale_Int_Dynamic(ScaleState state) {
        int value = 0;
        int[] data = state.data;
        int factor = state.randomFactor();
        for (int i = 0; i < data.length; ++i) {
            value += data[i];
        }
        return factor * value;
    }
Benchmark Mode Threads Samples Score Score Error (99.9%) Unit Param: size
FactoredScale_Int_Dynamic thrpt 1 10 26.100439 0.340069 ops/ms 100000
FactoredScale_Int_Dynamic thrpt 1 10 1.918011 0.297925 ops/ms 1000000
Scale_Int_Dynamic thrpt 1 10 30.219809 2.977389 ops/ms 100000
Scale_Int_Dynamic thrpt 1 10 2.314159 0.378442 ops/ms 1000000

Far from seeking to exploit distributivity to reduce the number of multiplication instructions, it seems to almost embrace the extraneous operations as metadata to drive optimisations. The assembly for Scale_Int_Dynamic confirms this (it shows vectorised multiplication, not shifts, within the loop):

com/openkappa/simd/scale/Scale.Scale_Int_Dynamic(Lcom/openkappa/simd/scale/ScaleState;)I  [0x000001f5ca2fa120, 0x000001f5ca2fa498]  888 bytes
Argument 0 is unknown.RIP: 0x1f5ca2fa120 Code size: 0x00000378
[Entry Point]
[Verified Entry Point]
[Constants]
  # {method} {0x000001f5df561a20} 'Scale_Int_Dynamic' '(Lcom/openkappa/simd/scale/ScaleState;)I' in 'com/openkappa/simd/scale/Scale'
  0x000001f5ca2fa120: int3                      ;...cc

  0x000001f5ca2fa121: nop     word ptr [rax+rax+0h]  ;...66
                                                ;...66
                                                ;...66
                                                ;...0f
                                                ;...1f
                                                ;...84
                                                ;...00
                                                ;...00
                                                ;...00
                                                ;...00
                                                ;...00

  0x000001f5ca2fa12c: nop                       ;...66
                                                ;...66
                                                ;...66
                                                ;...90

  0x000001f5ca2fa130: mov     dword ptr [rsp+0ffffffffffff9000h],eax
                                                ;...89
                                                ;...84
                                                ;...24
                                                ;...00
                                                ;...90
                                                ;...ff
                                                ;...ff

  0x000001f5ca2fa137: push    rbp               ;...55

  0x000001f5ca2fa138: sub     rsp,50h           ;...48
                                                ;...83
                                                ;...ec
                                                ;...50

  0x000001f5ca2fa13c: mov     r13,qword ptr [rdx+10h]  ;...4c
                                                ;...8b
                                                ;...6a
                                                ;...10

  0x000001f5ca2fa140: mov     ebx,dword ptr [rdx+18h]  ;...8b
                                                ;...5a
                                                ;...18

  0x000001f5ca2fa143: mov     ebp,dword ptr [rdx+8h]  ;...8b
                                                ;...6a
                                                ;...08

  0x000001f5ca2fa146: mov     r14d,dword ptr [rdx]  ;...44
                                                ;...8b
                                                ;...32

  0x000001f5ca2fa149: mov     rcx,rdx           ;...48
                                                ;...8b
                                                ;...ca

  0x000001f5ca2fa14c: vzeroupper                ;...c5
                                                ;...f8
                                                ;...77

  0x000001f5ca2fa14f: mov     r10,51da8d20h     ;...49
                                                ;...ba
                                                ;...20
                                                ;...8d
                                                ;...da
                                                ;...51
                                                ;...00
                                                ;...00
                                                ;...00
                                                ;...00

  0x000001f5ca2fa159: call indirect r10         ;...41
                                                ;...ff
                                                ;...d2

  0x000001f5ca2fa15c: mov     r10d,dword ptr [r13+8h]  ;...45
                                                ;...8b
                                                ;...55
                                                ;...08
                                                ; implicit exception: dispatches to 0x000001f5ca2fa461
  0x000001f5ca2fa160: cmp     r10d,0f800016dh   ;...41
                                                ;...81
                                                ;...fa
                                                ;...6d
                                                ;...01
                                                ;...00
                                                ;...f8
                                                ;   {metadata({type array int})}
  0x000001f5ca2fa167: jne     1f5ca2fa445h      ;...0f
                                                ;...85
                                                ;...d8
                                                ;...02
                                                ;...00
                                                ;...00
                                                ;*iload {reexecute=0 rethrow=0 return_oop=0}
                                                ; - com.openkappa.simd.scale.Scale::Scale_Int_Dynamic@16 (line 64)

  0x000001f5ca2fa16d: mov     edi,dword ptr [r13+0ch]  ;...41
                                                ;...8b
                                                ;...7d
                                                ;...0c
                                                ;*arraylength {reexecute=0 rethrow=0 return_oop=0}
                                                ; - com.openkappa.simd.scale.Scale::Scale_Int_Dynamic@19 (line 64)

  0x000001f5ca2fa171: cmp     r14d,edi          ;...44
                                                ;...3b
                                                ;...f7

  0x000001f5ca2fa174: jnl     1f5ca2fa411h      ;...0f
                                                ;...8d
                                                ;...97
                                                ;...02
                                                ;...00
                                                ;...00
                                                ;*if_icmpge {reexecute=0 rethrow=0 return_oop=0}
                                                ; - com.openkappa.simd.scale.Scale::Scale_Int_Dynamic@20 (line 64)

  0x000001f5ca2fa17a: mov     r11d,r13d         ;...45
                                                ;...8b
                                                ;...dd

  0x000001f5ca2fa17d: mov     r10d,r14d         ;...45
                                                ;...8b
                                                ;...d6

  0x000001f5ca2fa180: inc     r10d              ;...41
                                                ;...ff
                                                ;...c2

  0x000001f5ca2fa183: shr     r11d,2h           ;...41
                                                ;...c1
                                                ;...eb
                                                ;...02

  0x000001f5ca2fa187: and     r11d,7h           ;...41
                                                ;...83
                                                ;...e3
                                                ;...07

  0x000001f5ca2fa18b: xor     r8d,r8d           ;...45
                                                ;...33
                                                ;...c0

  0x000001f5ca2fa18e: cmp     r10d,r8d          ;...45
                                                ;...3b
                                                ;...d0

  0x000001f5ca2fa191: cmovl   r10d,r8d          ;...45
                                                ;...0f
                                                ;...4c
                                                ;...d0

  0x000001f5ca2fa195: cmp     r10d,edi          ;...44
                                                ;...3b
                                                ;...d7

  0x000001f5ca2fa198: cmovnle r10d,edi          ;...44
                                                ;...0f
                                                ;...4f
                                                ;...d7

  0x000001f5ca2fa19c: add     r11d,r10d         ;...45
                                                ;...03
                                                ;...da

  0x000001f5ca2fa19f: mov     r9d,4h            ;...41
                                                ;...b9
                                                ;...04
                                                ;...00
                                                ;...00
                                                ;...00

  0x000001f5ca2fa1a5: sub     r9d,r11d          ;...45
                                                ;...2b
                                                ;...cb

  0x000001f5ca2fa1a8: and     r9d,7h            ;...41
                                                ;...83
                                                ;...e1
                                                ;...07

  0x000001f5ca2fa1ac: add     r9d,r10d          ;...45
                                                ;...03
                                                ;...ca

  0x000001f5ca2fa1af: cmp     r9d,edi           ;...44
                                                ;...3b
                                                ;...cf

  0x000001f5ca2fa1b2: cmovnle r9d,edi           ;...44
                                                ;...0f
                                                ;...4f
                                                ;...cf
                                                ;*iaload {reexecute=0 rethrow=0 return_oop=0}
                                                ; - com.openkappa.simd.scale.Scale::Scale_Int_Dynamic@29 (line 65)

  0x000001f5ca2fa1b6: cmp     r14d,edi          ;...44
                                                ;...3b
                                                ;...f7

  0x000001f5ca2fa1b9: jnb     1f5ca2fa422h      ;...0f
                                                ;...83
                                                ;...63
                                                ;...02
                                                ;...00
                                                ;...00

  0x000001f5ca2fa1bf: mov     r11d,ebp          ;...44
                                                ;...8b
                                                ;...dd

  0x000001f5ca2fa1c2: imul    r11d,dword ptr [r13+r14*4+10h]
                                                ;...47
                                                ;...0f
                                                ;...af
                                                ;...5c
                                                ;...b5
                                                ;...10

  0x000001f5ca2fa1c8: add     ebx,r11d          ;...41
                                                ;...03
                                                ;...db
                                                ;*iadd {reexecute=0 rethrow=0 return_oop=0}
                                                ; - com.openkappa.simd.scale.Scale::Scale_Int_Dynamic@31 (line 65)

  0x000001f5ca2fa1cb: inc     r14d              ;...41
                                                ;...ff
                                                ;...c6
                                                ;*iinc {reexecute=0 rethrow=0 return_oop=0}
                                                ; - com.openkappa.simd.scale.Scale::Scale_Int_Dynamic@33 (line 64)

  0x000001f5ca2fa1ce: cmp     r14d,r9d          ;...45
                                                ;...3b
                                                ;...f1

  0x000001f5ca2fa1d1: jl      1f5ca2fa1b6h      ;...7c
                                                ;...e3
                                                ;*if_icmpge {reexecute=0 rethrow=0 return_oop=0}
                                                ; - com.openkappa.simd.scale.Scale::Scale_Int_Dynamic@20 (line 64)

  0x000001f5ca2fa1d3: mov     r9d,edi           ;...44
                                                ;...8b
                                                ;...cf

  0x000001f5ca2fa1d6: add     r9d,0ffffffc1h    ;...41
                                                ;...83
                                                ;...c1
                                                ;...c1

  0x000001f5ca2fa1da: mov     r8d,80000000h     ;...41
                                                ;...b8
                                                ;...00
                                                ;...00
                                                ;...00
                                                ;...80

  0x000001f5ca2fa1e0: cmp     edi,r9d           ;...41
                                                ;...3b
                                                ;...f9

  0x000001f5ca2fa1e3: cmovl   r9d,r8d           ;...45
                                                ;...0f
                                                ;...4c
                                                ;...c8

  0x000001f5ca2fa1e7: cmp     r14d,r9d          ;...45
                                                ;...3b
                                                ;...f1

  0x000001f5ca2fa1ea: jnl     1f5ca2fa3f0h      ;...0f
                                                ;...8d
                                                ;...00
                                                ;...02
                                                ;...00
                                                ;...00

  0x000001f5ca2fa1f0: vmovd   xmm2,ebp          ;...c5
                                                ;...f9
                                                ;...6e
                                                ;...d5

  0x000001f5ca2fa1f4: vpshufd xmm2,xmm2,0h      ;...c5
                                                ;...f9
                                                ;...70
                                                ;...d2
                                                ;...00

  0x000001f5ca2fa1f9: vinserti128 ymm2,ymm2,xmm2,1h  ;...c4
                                                ;...e3
                                                ;...6d
                                                ;...38
                                                ;...d2
                                                ;...01

  0x000001f5ca2fa1ff: nop                       ;...90
                                                ;*iaload {reexecute=0 rethrow=0 return_oop=0}
                                                ; - com.openkappa.simd.scale.Scale::Scale_Int_Dynamic@29 (line 65)

  0x000001f5ca2fa200: vmovdqu ymm0,ymmword ptr [r13+r14*4+10h]
                                                ;...c4
                                                ;...81
                                                ;...7e
                                                ;...6f
                                                ;...44
                                                ;...b5
                                                ;...10

  0x000001f5ca2fa207: vpmulld ymm11,ymm0,ymm2   ;...c4
                                                ;...62
                                                ;...7d
                                                ;...40
                                                ;...da

  0x000001f5ca2fa20c: movsxd  r10,r14d          ;...4d
                                                ;...63
                                                ;...d6

  0x000001f5ca2fa20f: vmovdqu ymm0,ymmword ptr [r13+r10*4+30h]
                                                ;...c4
                                                ;...81
                                                ;...7e
                                                ;...6f
                                                ;...44
                                                ;...95
                                                ;...30

  0x000001f5ca2fa216: vmovdqu ymm1,ymmword ptr [r13+r10*4+0f0h]
                                                ;...c4
                                                ;...81
                                                ;...7e
                                                ;...6f
                                                ;...8c
                                                ;...95
                                                ;...f0
                                                ;...00
                                                ;...00
                                                ;...00

  0x000001f5ca2fa220: vmovdqu ymm3,ymmword ptr [r13+r10*4+50h]
                                                ;...c4
                                                ;...81
                                                ;...7e
                                                ;...6f
                                                ;...5c
                                                ;...95
                                                ;...50

  0x000001f5ca2fa227: vmovdqu ymm7,ymmword ptr [r13+r10*4+70h]
                                                ;...c4
                                                ;...81
                                                ;...7e
                                                ;...6f
                                                ;...7c
                                                ;...95
                                                ;...70

  0x000001f5ca2fa22e: vmovdqu ymm6,ymmword ptr [r13+r10*4+90h]
                                                ;...c4
                                                ;...81
                                                ;...7e
                                                ;...6f
                                                ;...b4
                                                ;...95
                                                ;...90
                                                ;...00
                                                ;...00
                                                ;...00

  0x000001f5ca2fa238: vmovdqu ymm5,ymmword ptr [r13+r10*4+0b0h]
                                                ;...c4
                                                ;...81
                                                ;...7e
                                                ;...6f
                                                ;...ac
                                                ;...95
                                                ;...b0
                                                ;...00
                                                ;...00
                                                ;...00

  0x000001f5ca2fa242: vmovdqu ymm4,ymmword ptr [r13+r10*4+0d0h]
                                                ;...c4
                                                ;...81
                                                ;...7e
                                                ;...6f
                                                ;...a4
                                                ;...95
                                                ;...d0
                                                ;...00
                                                ;...00
                                                ;...00

  0x000001f5ca2fa24c: vpmulld ymm9,ymm0,ymm2    ;...c4
                                                ;...62
                                                ;...7d
                                                ;...40
                                                ;...ca

  0x000001f5ca2fa251: vpmulld ymm4,ymm4,ymm2    ;...c4
                                                ;...e2
                                                ;...5d
                                                ;...40
                                                ;...e2

  0x000001f5ca2fa256: vpmulld ymm5,ymm5,ymm2    ;...c4
                                                ;...e2
                                                ;...55
                                                ;...40
                                                ;...ea

  0x000001f5ca2fa25b: vpmulld ymm6,ymm6,ymm2    ;...c4
                                                ;...e2
                                                ;...4d
                                                ;...40
                                                ;...f2

  0x000001f5ca2fa260: vpmulld ymm8,ymm7,ymm2    ;...c4
                                                ;...62
                                                ;...45
                                                ;...40
                                                ;...c2

  0x000001f5ca2fa265: vpmulld ymm10,ymm3,ymm2   ;...c4
                                                ;...62
                                                ;...65
                                                ;...40
                                                ;...d2

  0x000001f5ca2fa26a: vpmulld ymm3,ymm1,ymm2    ;...c4
                                                ;...e2
                                                ;...75
                                                ;...40
                                                ;...da

  0x000001f5ca2fa26f: vphaddd ymm1,ymm11,ymm11  ;...c4
                                                ;...c2
                                                ;...25
                                                ;...02
                                                ;...cb

  0x000001f5ca2fa274: vphaddd ymm1,ymm1,ymm0    ;...c4
                                                ;...e2
                                                ;...75
                                                ;...02
                                                ;...c8

  0x000001f5ca2fa279: vextracti128 xmm0,ymm1,1h  ;...c4
                                                ;...e3
                                                ;...7d
                                                ;...39
                                                ;...c8
                                                ;...01

  0x000001f5ca2fa27f: vpaddd  xmm1,xmm1,xmm0    ;...c5
                                                ;...f1
                                                ;...fe
                                                ;...c8

  0x000001f5ca2fa283: vmovd   xmm0,ebx          ;...c5
                                                ;...f9
                                                ;...6e
                                                ;...c3

  0x000001f5ca2fa287: vpaddd  xmm0,xmm0,xmm1    ;...c5
                                                ;...f9
                                                ;...fe
                                                ;...c1

  0x000001f5ca2fa28b: vmovd   r10d,xmm0         ;...c4
                                                ;...c1
                                                ;...79
                                                ;...7e
                                                ;...c2

  0x000001f5ca2fa290: vphaddd ymm1,ymm9,ymm9    ;...c4
                                                ;...c2
                                                ;...35
                                                ;...02
                                                ;...c9

  0x000001f5ca2fa295: vphaddd ymm1,ymm1,ymm0    ;...c4
                                                ;...e2
                                                ;...75
                                                ;...02
                                                ;...c8

  0x000001f5ca2fa29a: vextracti128 xmm0,ymm1,1h  ;...c4
                                                ;...e3
                                                ;...7d
                                                ;...39
                                                ;...c8
                                                ;...01

  0x000001f5ca2fa2a0: vpaddd  xmm1,xmm1,xmm0    ;...c5
                                                ;...f1
                                                ;...fe
                                                ;...c8

  0x000001f5ca2fa2a4: vmovd   xmm0,r10d         ;...c4
                                                ;...c1
                                                ;...79
                                                ;...6e
                                                ;...c2

  0x000001f5ca2fa2a9: vpaddd  xmm0,xmm0,xmm1    ;...c5
                                                ;...f9
                                                ;...fe
                                                ;...c1

  0x000001f5ca2fa2ad: vmovd   r11d,xmm0         ;...c4
                                                ;...c1
                                                ;...79
                                                ;...7e
                                                ;...c3

  0x000001f5ca2fa2b2: vphaddd ymm0,ymm10,ymm10  ;...c4
                                                ;...c2
                                                ;...2d
                                                ;...02
                                                ;...c2

  0x000001f5ca2fa2b7: vphaddd ymm0,ymm0,ymm1    ;...c4
                                                ;...e2
                                                ;...7d
                                                ;...02
                                                ;...c1

  0x000001f5ca2fa2bc: vextracti128 xmm1,ymm0,1h  ;...c4
                                                ;...e3
                                                ;...7d
                                                ;...39
                                                ;...c1
                                                ;...01

  0x000001f5ca2fa2c2: vpaddd  xmm0,xmm0,xmm1    ;...c5
                                                ;...f9
                                                ;...fe
                                                ;...c1

  0x000001f5ca2fa2c6: vmovd   xmm1,r11d         ;...c4
                                                ;...c1
                                                ;...79
                                                ;...6e
                                                ;...cb

  0x000001f5ca2fa2cb: vpaddd  xmm1,xmm1,xmm0    ;...c5
                                                ;...f1
                                                ;...fe
                                                ;...c8

  0x000001f5ca2fa2cf: vmovd   r10d,xmm1         ;...c4
                                                ;...c1
                                                ;...79
                                                ;...7e
                                                ;...ca

  0x000001f5ca2fa2d4: vphaddd ymm1,ymm8,ymm8    ;...c4
                                                ;...c2
                                                ;...3d
                                                ;...02
                                                ;...c8

  0x000001f5ca2fa2d9: vphaddd ymm1,ymm1,ymm0    ;...c4
                                                ;...e2
                                                ;...75
                                                ;...02
                                                ;...c8

  0x000001f5ca2fa2de: vextracti128 xmm0,ymm1,1h  ;...c4
                                                ;...e3
                                                ;...7d
                                                ;...39
                                                ;...c8
                                                ;...01

  0x000001f5ca2fa2e4: vpaddd  xmm1,xmm1,xmm0    ;...c5
                                                ;...f1
                                                ;...fe
                                                ;...c8

  0x000001f5ca2fa2e8: vmovd   xmm0,r10d         ;...c4
                                                ;...c1
                                                ;...79
                                                ;...6e
                                                ;...c2

  0x000001f5ca2fa2ed: vpaddd  xmm0,xmm0,xmm1    ;...c5
                                                ;...f9
                                                ;...fe
                                                ;...c1

  0x000001f5ca2fa2f1: vmovd   r11d,xmm0         ;...c4
                                                ;...c1
                                                ;...79
                                                ;...7e
                                                ;...c3

  0x000001f5ca2fa2f6: vphaddd ymm0,ymm6,ymm6    ;...c4
                                                ;...e2
                                                ;...4d
                                                ;...02
                                                ;...c6

  0x000001f5ca2fa2fb: vphaddd ymm0,ymm0,ymm1    ;...c4
                                                ;...e2
                                                ;...7d
                                                ;...02
                                                ;...c1

  0x000001f5ca2fa300: vextracti128 xmm1,ymm0,1h  ;...c4
                                                ;...e3
                                                ;...7d
                                                ;...39
                                                ;...c1
                                                ;...01

  0x000001f5ca2fa306: vpaddd  xmm0,xmm0,xmm1    ;...c5
                                                ;...f9
                                                ;...fe
                                                ;...c1

  0x000001f5ca2fa30a: vmovd   xmm1,r11d         ;...c4
                                                ;...c1
                                                ;...79
                                                ;...6e
                                                ;...cb

  0x000001f5ca2fa30f: vpaddd  xmm1,xmm1,xmm0    ;...c5
                                                ;...f1
                                                ;...fe
                                                ;...c8

  0x000001f5ca2fa313: vmovd   r10d,xmm1         ;...c4
                                                ;...c1
                                                ;...79
                                                ;...7e
                                                ;...ca

  0x000001f5ca2fa318: vphaddd ymm1,ymm5,ymm5    ;...c4
                                                ;...e2
                                                ;...55
                                                ;...02
                                                ;...cd

  0x000001f5ca2fa31d: vphaddd ymm1,ymm1,ymm0    ;...c4
                                                ;...e2
                                                ;...75
                                                ;...02
                                                ;...c8

  0x000001f5ca2fa322: vextracti128 xmm0,ymm1,1h  ;...c4
                                                ;...e3
                                                ;...7d
                                                ;...39
                                                ;...c8
                                                ;...01

  0x000001f5ca2fa328: vpaddd  xmm1,xmm1,xmm0    ;...c5
                                                ;...f1
                                                ;...fe
                                                ;...c8

  0x000001f5ca2fa32c: vmovd   xmm0,r10d         ;...c4
                                                ;...c1
                                                ;...79
                                                ;...6e
                                                ;...c2

  0x000001f5ca2fa331: vpaddd  xmm0,xmm0,xmm1    ;...c5
                                                ;...f9
                                                ;...fe
                                                ;...c1

  0x000001f5ca2fa335: vmovd   r11d,xmm0         ;...c4
                                                ;...c1
                                                ;...79
                                                ;...7e
                                                ;...c3

  0x000001f5ca2fa33a: vphaddd ymm0,ymm4,ymm4    ;...c4
                                                ;...e2
                                                ;...5d
                                                ;...02
                                                ;...c4

  0x000001f5ca2fa33f: vphaddd ymm0,ymm0,ymm1    ;...c4
                                                ;...e2
                                                ;...7d
                                                ;...02
                                                ;...c1

  0x000001f5ca2fa344: vextracti128 xmm1,ymm0,1h  ;...c4
                                                ;...e3
                                                ;...7d
                                                ;...39
                                                ;...c1
                                                ;...01

  0x000001f5ca2fa34a: vpaddd  xmm0,xmm0,xmm1    ;...c5
                                                ;...f9
                                                ;...fe
                                                ;...c1

  0x000001f5ca2fa34e: vmovd   xmm1,r11d         ;...c4
                                                ;...c1
                                                ;...79
                                                ;...6e
                                                ;...cb

  0x000001f5ca2fa353: vpaddd  xmm1,xmm1,xmm0    ;...c5
                                                ;...f1
                                                ;...fe
                                                ;...c8

  0x000001f5ca2fa357: vmovd   r10d,xmm1         ;...c4
                                                ;...c1
                                                ;...79
                                                ;...7e
                                                ;...ca

  0x000001f5ca2fa35c: vphaddd ymm1,ymm3,ymm3    ;...c4
                                                ;...e2
                                                ;...65
                                                ;...02
                                                ;...cb

  0x000001f5ca2fa361: vphaddd ymm1,ymm1,ymm7    ;...c4
                                                ;...e2
                                                ;...75
                                                ;...02
                                                ;...cf

  0x000001f5ca2fa366: vextracti128 xmm7,ymm1,1h  ;...c4
                                                ;...e3
                                                ;...7d
                                                ;...39
                                                ;...cf
                                                ;...01

  0x000001f5ca2fa36c: vpaddd  xmm1,xmm1,xmm7    ;...c5
                                                ;...f1
                                                ;...fe
                                                ;...cf

  0x000001f5ca2fa370: vmovd   xmm7,r10d         ;...c4
                                                ;...c1
                                                ;...79
                                                ;...6e
                                                ;...fa

  0x000001f5ca2fa375: vpaddd  xmm7,xmm7,xmm1    ;...c5
                                                ;...c1
                                                ;...fe
                                                ;...f9

  0x000001f5ca2fa379: vmovd   ebx,xmm7          ;...c5
                                                ;...f9
                                                ;...7e
                                                ;...fb
                                                ;*iadd {reexecute=0 rethrow=0 return_oop=0}
                                                ; - com.openkappa.simd.scale.Scale::Scale_Int_Dynamic@31 (line 65)

  0x000001f5ca2fa37d: add     r14d,40h          ;...41
                                                ;...83
                                                ;...c6
                                                ;...40
                                                ;*iinc {reexecute=0 rethrow=0 return_oop=0}
                                                ; - com.openkappa.simd.scale.Scale::Scale_Int_Dynamic@33 (line 64)

  0x000001f5ca2fa381: cmp     r14d,r9d          ;...45
                                                ;...3b
                                                ;...f1

  0x000001f5ca2fa384: jl      1f5ca2fa200h      ;...0f
                                                ;...8c
                                                ;...76
                                                ;...fe
                                                ;...ff
                                                ;...ff
                                                ;*if_icmpge {reexecute=0 rethrow=0 return_oop=0}
                                                ; - com.openkappa.simd.scale.Scale::Scale_Int_Dynamic@20 (line 64)

  0x000001f5ca2fa38a: mov     r11d,edi          ;...44
                                                ;...8b
                                                ;...df

  0x000001f5ca2fa38d: add     r11d,0fffffff9h   ;...41
                                                ;...83
                                                ;...c3
                                                ;...f9

  0x000001f5ca2fa391: cmp     edi,r11d          ;...41
                                                ;...3b
                                                ;...fb

  0x000001f5ca2fa394: cmovl   r11d,r8d          ;...45
                                                ;...0f
                                                ;...4c
                                                ;...d8

  0x000001f5ca2fa398: cmp     r14d,r11d         ;...45
                                                ;...3b
                                                ;...f3

  0x000001f5ca2fa39b: jnl     1f5ca2fa3d5h      ;...7d
                                                ;...38

  0x000001f5ca2fa39d: nop                       ;...66
                                                ;...66
                                                ;...90
                                                ;*iaload {reexecute=0 rethrow=0 return_oop=0}
                                                ; - com.openkappa.simd.scale.Scale::Scale_Int_Dynamic@29 (line 65)

  0x000001f5ca2fa3a0: vmovdqu ymm0,ymmword ptr [r13+r14*4+10h]
                                                ;...c4
                                                ;...81
                                                ;...7e
                                                ;...6f
                                                ;...44
                                                ;...b5
                                                ;...10

  0x000001f5ca2fa3a7: vpmulld ymm1,ymm0,ymm2    ;...c4
                                                ;...e2
                                                ;...7d
                                                ;...40
                                                ;...ca

  0x000001f5ca2fa3ac: vphaddd ymm3,ymm1,ymm1    ;...c4
                                                ;...e2
                                                ;...75
                                                ;...02
                                                ;...d9

  0x000001f5ca2fa3b1: vphaddd ymm3,ymm3,ymm0    ;...c4
                                                ;...e2
                                                ;...65
                                                ;...02
                                                ;...d8

  0x000001f5ca2fa3b6: vextracti128 xmm0,ymm3,1h  ;...c4
                                                ;...e3
                                                ;...7d
                                                ;...39
                                                ;...d8
                                                ;...01

  0x000001f5ca2fa3bc: vpaddd  xmm3,xmm3,xmm0    ;...c5
                                                ;...e1
                                                ;...fe
                                                ;...d8

  0x000001f5ca2fa3c0: vmovd   xmm0,ebx          ;...c5
                                                ;...f9
                                                ;...6e
                                                ;...c3

  0x000001f5ca2fa3c4: vpaddd  xmm0,xmm0,xmm3    ;...c5
                                                ;...f9
                                                ;...fe
                                                ;...c3

  0x000001f5ca2fa3c8: vmovd   ebx,xmm0          ;...c5
                                                ;...f9
                                                ;...7e
                                                ;...c3
                                                ;*iadd {reexecute=0 rethrow=0 return_oop=0}
                                                ; - com.openkappa.simd.scale.Scale::Scale_Int_Dynamic@31 (line 65)

  0x000001f5ca2fa3cc: add     r14d,8h           ;...41
                                                ;...83
                                                ;...c6
                                                ;...08
                                                ;*iinc {reexecute=0 rethrow=0 return_oop=0}
                                                ; - com.openkappa.simd.scale.Scale::Scale_Int_Dynamic@33 (line 64)

  0x000001f5ca2fa3d0: cmp     r14d,r11d         ;...45
                                                ;...3b
                                                ;...f3

  0x000001f5ca2fa3d3: jl      1f5ca2fa3a0h      ;...7c
                                                ;...cb
                                                ;*if_icmpge {reexecute=0 rethrow=0 return_oop=0}
                                                ; - com.openkappa.simd.scale.Scale::Scale_Int_Dynamic@20 (line 64)

  0x000001f5ca2fa3d5: cmp     r14d,edi          ;...44
                                                ;...3b
                                                ;...f7

  0x000001f5ca2fa3d8: jnl     1f5ca2fa3f0h      ;...7d
                                                ;...16

  0x000001f5ca2fa3da: nop                       ;...66
                                                ;...90
                                                ;*iaload {reexecute=0 rethrow=0 return_oop=0}
                                                ; - com.openkappa.simd.scale.Scale::Scale_Int_Dynamic@29 (line 65)

  0x000001f5ca2fa3dc: mov     r10d,ebp          ;...44
                                                ;...8b
                                                ;...d5

  0x000001f5ca2fa3df: imul    r10d,dword ptr [r13+r14*4+10h]
                                                ;...47
                                                ;...0f
                                                ;...af
                                                ;...54
                                                ;...b5
                                                ;...10

  0x000001f5ca2fa3e5: add     ebx,r10d          ;...41
                                                ;...03
                                                ;...da
                                                ;*iadd {reexecute=0 rethrow=0 return_oop=0}
                                                ; - com.openkappa.simd.scale.Scale::Scale_Int_Dynamic@31 (line 65)

  0x000001f5ca2fa3e8: inc     r14d              ;...41
                                                ;...ff
                                                ;...c6
                                                ;*iinc {reexecute=0 rethrow=0 return_oop=0}
                                                ; - com.openkappa.simd.scale.Scale::Scale_Int_Dynamic@33 (line 64)

  0x000001f5ca2fa3eb: cmp     r14d,edi          ;...44
                                                ;...3b
                                                ;...f7

  0x000001f5ca2fa3ee: jl      1f5ca2fa3dch      ;...7c
                                                ;...ec
                                                ;*if_icmpge {reexecute=0 rethrow=0 return_oop=0}
                                                ; - com.openkappa.simd.scale.Scale::Scale_Int_Dynamic@20 (line 64)

  0x000001f5ca2fa3f0: cmp     r14d,edi          ;...44
                                                ;...3b
                                                ;...f7

  0x000001f5ca2fa3f3: jnl     1f5ca2fa411h      ;...7d
                                                ;...1c

  0x000001f5ca2fa3f5: nop                       ;...66
                                                ;...66
                                                ;...90
                                                ;*iaload {reexecute=0 rethrow=0 return_oop=0}
                                                ; - com.openkappa.simd.scale.Scale::Scale_Int_Dynamic@29 (line 65)

  0x000001f5ca2fa3f8: cmp     r14d,edi          ;...44
                                                ;...3b
                                                ;...f7

  0x000001f5ca2fa3fb: jnb     1f5ca2fa428h      ;...73
                                                ;...2b

  0x000001f5ca2fa3fd: mov     r11d,ebp          ;...44
                                                ;...8b
                                                ;...dd

  0x000001f5ca2fa400: imul    r11d,dword ptr [r13+r14*4+10h]
                                                ;...47
                                                ;...0f
                                                ;...af
                                                ;...5c
                                                ;...b5
                                                ;...10

  0x000001f5ca2fa406: add     ebx,r11d          ;...41
                                                ;...03
                                                ;...db
                                                ;*iadd {reexecute=0 rethrow=0 return_oop=0}
                                                ; - com.openkappa.simd.scale.Scale::Scale_Int_Dynamic@31 (line 65)

  0x000001f5ca2fa409: inc     r14d              ;...41
                                                ;...ff
                                                ;...c6
                                                ;*iinc {reexecute=0 rethrow=0 return_oop=0}
                                                ; - com.openkappa.simd.scale.Scale::Scale_Int_Dynamic@33 (line 64)

  0x000001f5ca2fa40c: cmp     r14d,edi          ;...44
                                                ;...3b
                                                ;...f7

  0x000001f5ca2fa40f: jl      1f5ca2fa3f8h      ;...7c
                                                ;...e7
                                                ;*iload {reexecute=0 rethrow=0 return_oop=0}
                                                ; - com.openkappa.simd.scale.Scale::Scale_Int_Dynamic@16 (line 64)

  0x000001f5ca2fa411: mov     eax,ebx           ;...8b
                                                ;...c3

  0x000001f5ca2fa413: vzeroupper                ;...c5
                                                ;...f8
                                                ;...77

  0x000001f5ca2fa416: add     rsp,50h           ;...48
                                                ;...83
                                                ;...c4
                                                ;...50

  0x000001f5ca2fa41a: pop     rbp               ;...5d

  0x000001f5ca2fa41b: test    dword ptr [1f5b68c0000h],eax
                                                ;...85
                                                ;...05
                                                ;...df
                                                ;...5b
                                                ;...5c
                                                ;...ec
                                                ;   {poll_return}
  0x000001f5ca2fa421: ret                       ;...c3

There are two lessons to be learnt here. The first is that what you see is not what you get. The second is about the correctness of asymptotic analysis. If hierarchical cache renders asymptotic analysis bullshit (linear time but cache friendly algorithms can, and do, outperform logarithmic algorithms with cache misses), optimising compilers render the field practically irrelevant.