Wrong offsets in array of strings of size 5 or 11 when peephole optimizer enabled for 80386 CPU

Summary

When accessing elements in an array of string[5] or string[11], byte offsets from the base pointer of the array are calculated wrong.

System Information

Operating system: Go32v2 and 32-bit Windows.
Processor architecture: x86
Compiler version: 3.x
Device: Computer

Steps to reproduce

Compile the example project. For Go32v2:

fpc -Tgo32v2 -O1 -Op80386 -Oopeephole test.pas

For 32-bit Windows:

fpc -Twin32 -O1 -Op80386 -Oopeephole test.pas

It is important to enable peephole optimization and set CPU to 80386.

Example project

const
  StrLen = 11 {5 or 11};
  Str: array [0..15] of string[StrLen] = ('0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '15');
var
  I: Integer;
  S: ^string;
begin
  WriteLn('@Str=', HexStr(PtrUInt(@Str), 8));
  for I := Low(Str) to High(Str) div 4 do
  begin
    {$IFDEF CPU16}
      asm jmp @1; nop; int3; @1: end;
    {$ELSE}
      asm jmp .L1; nop; int3; .L1: end;
    {$ENDIF}
    S := @Str[I];
    WriteLn('Str[', I, ']=@', HexStr(PtrUInt(S), 8), '=@Str+', PtrUInt(S) - PtrUInt(@Str), '=''', S^, '''');
  end;
end.

What is the current bug behavior?

For string[5] (array element size of 6 bytes), index is multiplied by 10 to get offset. For string[11] (12 bytes), index is multiplied by 36.

What is the expected (correct) behavior?

Index should be multipled by element size to get byte offset. Behavior is correct in FPC 2.6.4; also in FPC 3.x for i8086.

Relevant logs and/or screenshots

For string[5]:

@Str=004090D0
Str[0]=@004090D0=@Str+0='0'
Str[1]=@004090DA=@Str+10=' 2    3    4    5    6    '
Str[2]=@004090E4=@Str+20=''
Str[3]=@004090EE=@Str+30='5'

For string[11]:

@Str=004090D0
Str[0]=@004090D0=@Str+0='0'
Str[1]=@004090F4=@Str+36='3'
Str[2]=@00409118=@Str+72='6'
Str[3]=@0040913C=@Str+108='9'

Look for bytes 0x90 0xCC in the compiled code to see what happens. Multiplication of eax by 6 is converted to:

8D 04 00               lea   eax,[eax][eax]
8D 04 80               lea   eax,[eax][eax]*4

which is actually multiplication by 10. Multiplication of eax by 12 becomes:

8D 04 85 00 00 00 00   lea   eax,[eax]*4[0]
8D 04 C0               lea   eax,[eax][eax]*8

that is, multiplication by 36. In FPC 2.6.4, these compiled correctly to:

8D 04 40               lea   eax,[eax][eax]*2
01 C0                  add   eax,eax

and

8D 04 40               lea   eax,[eax][eax]*2
8D 04 85 00 00 00 00   lea   eax,[eax]*4[0]

Possible fixes

Probably, the peephole optimizer optimizes multiplications by 6 and 12 incorrectly. In file \fpc-3.2.2\compiler\i386\aoptcpu.pas of the source, the replacement seems to be correct; the multiplication of an integer variable by a constant of 6 or 12 is also correct. Maybe, array offset calculation is using another code path?

Edited Oct 14, 2022 by JoeForsterSTA