Stack Trace Considerations (M0)

Report any suspected bugs that you find
Locked
gray
Posts: 107
Joined: Tue Feb 12, 2019 2:59 am
Location: Mauritius

Stack Trace Considerations (M0)

Post by gray » Tue Nov 21, 2023 11:18 am

With Astrobe version 9.0, the useful stack trace functionality of can show false positives, ie. list procedure calls that don't belong in the call chain leading up to the run-time error.

Consider this test code:

Code: Select all

MODULE TestError1;

  IMPORT Main;

  VAR x, y: INTEGER;

  PROCEDURE p1;
    VAR z: INTEGER;
  BEGIN
    y := 0; x := x DIV y
  END p1;

  PROCEDURE p0;
  BEGIN
    p1
  END p0;

BEGIN
  p0
END TestError1.
It shows this error message and stack trace on my machine (register output omitted):

Code: Select all

integer divided by zero or negative divisor
TestError1.p1    @08002738H, Line: 10
Main.Init        @080026F0H, Line: 125
TestError1.p0    @08002766H, Line: 15
TestError1..init @08002776H, Line: 19
It's the result of the heuristic used to detect the procedure calls in the chain searching for "feasible" link register (LR) values. In the above test case, 'Main.Init' happened to conincide with variable 'z' in procedure 'p1', and was still on the stack from the initialisation call chain. It's actually pointing out an unused local variable (the compiler warning for which I had ignored for the sake of the test). If I assign a value to 'z', the false positive 'Main.Init' disappears, as one would expect. In this sense, it's a feature. :)

Another case with false positives could be if you write code that calculates return addresses, eg. for a debugger, that are kept in local variables, ie. on the stack. I guess in such case we'll know how to interpret the stack trace. So all good.

However, I have run into a case where a procedure call in the chain was missing. Consider this test code:

Code: Select all

MODULE TestError2;

  IMPORT Main;

  VAR x, y: INTEGER;

  PROCEDURE p1;
  BEGIN
    y := 0; x := x DIV y
  END p1;

  PROCEDURE p0;
  BEGIN
    p1
  END p0;

BEGIN
  p0
END TestError2.
It shows this error message and stack trace on my machine:

Code: Select all

integer divided by zero or negative divisor
TestError2.p1    @08002738H, Line: 9
TestError2..init @0800277AH, Line: 18
Clearly, the call of 'p1' in 'p0' is missing (line 14).

Module Traps catches run-time errors via handler 'SCVTrap', which is activated via checks and SVC statements inserted by the compiler into the program code. 'SCVTrap' initiates the creation of the stack trace thusly:

Code: Select all

PROCEDURE SVCTrap;
BEGIN
  (* ... *)
  addr := SYSTEM.SP + 40;
  StackTrace(addr);
  (* ... *)
END SVCTrap;
In my understanding, the stack has this layout after the prologue of 'SVCTrap':

Code: Select all

stacked by the MCU's exception handling hardware:
+32   PSR
+28   PC
+24   LR
+20   R12
+16   R3
+12   R2
+8    R1
+4    R0
pushed by SVCTrap's prologue:
SP => LR  (= EXC_RETURN)
Hence, the part of the stack that needs to be scanned for potential LR entries starts at 'SYSTEM.SP + 36'. The above test code actually places the first relevant LR value at +36, hence it gets missed by 'SVCTrap'. If I change this value from +40 to +36, I get the correct output:

Code: Select all

integer divided by zero or negative divisor
TestError2.p1    @08002738H, Line: 9
TestError2.p0    @0800276AH, Line: 14
TestError2..init @0800277AH, Line: 18
So, problem solved, right? No. It took me a while to figure this one out. The stack frame for the exception handler gets aligned to a double word, ie. an 8 byte boundary: The programmer's manual does say: "The stack frame is aligned to a double-word address", but I had missed this statement, even when re-reading. Ayo. Unlike for the M3, this cannot be configured (see 'Configuration and control register (SCB_CCR)', 'STKALIGN' bit: "Always reads as one, indicates 8-byte stack alignment on exception entry." This register is read-only.)

So, depending on the layout of the stack in the procedure call-chain, there can be a one word "gap" between the double-word aligned exception stack frame and the lower end of the stack before the exception. 'SYSTEM.SP + 36' will read that "gap" value, and 'SYSTEM.SP + 40' would be the correct address for the stack track in this case.

Alas, simply always starting from 'SYSTEM.SP + 36' can result in false positives too, as this test code shows:

Code: Select all

MODULE TestError3;

  IMPORT Main;

  VAR x, y: INTEGER;

  PROCEDURE p2;
  END p2;

  PROCEDURE p1;
  BEGIN
    p2;
    y := 0; x := x DIV y
  END p1;

  PROCEDURE p0;
    VAR z: INTEGER;
  BEGIN
    z := 0;
    p1
  END p0;

BEGIN
  p0
END TestError3.
Which produces...

Code: Select all

integer divided by zero or negative divisor
TestError3.p1    @08002748H, Line: 13
TestError3.p1    @0800272EH, Line: 11
TestError3.p0    @08002780H, Line: 20
TestError3..init @08002792H, Line: 24
... because the value in the "gap" was the LR value corresponding to the call of 'p2' from 'p1' (line 11).

Unfortunately, from "inside" 'SVCTrap', I don't see any possibility to check and decide for either the +36 or the +40 offset, depending on the specific situation, since we don't know the value of SYSTEM.SP right before the exception. Maybe someone smarter than me can figure this one out. I prefer wrong call entries over missing ones, so I set the offset to +36. It's usually possible to quickly make sense of the false positives. Which, as outlined at the top, can happen even if the offset is correct. In any case, the stack trace is enourmously useful, we just need to be aware of the few caveats (and maybe setting the stack offset to +36 from +40).

And yes, the test code examples are somewhat contrived, but that's the nature of test programs. :)

Astrobe's Disassemble Application functionality has proven to be of great value for diagnosing the above issues! For this, I had adapted Traps.ShowStack and Traps.OutStackItem to print "upwards" beyond the exception stack frame to see what Traps.StackTrace scans, and without the app disassembly listing, mapping LR values on the stack to their corresponding procedure calls would have been, hm, "difficult".

Code: Select all

integer divided by zero or negative divisor
TestError3.p1    @08002748H, Line: 13
TestError3.p1    @0800272EH, Line: 11
TestError3.p0    @08002780H, Line: 20
TestError3..init @08002792H, Line: 24
 20007CD0H  r0 = 00000001H,          1
 20007CD4H  r1 = 00000000H,          0
 20007CD8H  r2 = 00000341H,        833
 20007CDCH  r3 = 00003200H,      12800
 20007CE0H r12 = FFFFFFFFH,         -1
 20007CE4H  lr = 08002733H,  134227763
 20007CE8H  pc = 0800274AH,  134227786
 20007CECH psr = 61000200H, 1627390464
 20007CF0H s36 = 08002733H,  134227763
 20007CF4H s40 = 08002785H,  134227845
 20007CF8H s44 = 00000000H,          0
 20007CFCH s48 = 08002797H,  134227863
 20007D00H s52 = 080027D5H,  134227925
 
Last edited by gray on Wed Nov 22, 2023 2:50 am, edited 1 time in total.

cfbsoftware
Site Admin
Posts: 492
Joined: Fri Dec 31, 2010 12:30 pm
Contact:

Re: Stack Trace Considerations (M0)

Post by cfbsoftware » Tue Nov 21, 2023 8:31 pm

Thank you for your detailed analysis. We are currently working on an improved version of Traps and your insight and test cases are very useful.
Unfortunately, from "inside" 'SVCTrap', I don't see any possibility to check and decide for either the +36 or the +40 offset, depending on the specific situation, since we don't know the value of SYSTEM.SP right before the exception. Maybe someone smarter than me can figure this one out.
If you are interested in working on this sort of code I recommend that you get a copy of Joseph Yiu's book The Definitive Guide to Arm® Cortex®-M0 and Cortex-M0+ Processors if you do not already have it

In the chapter on Exceptions and Interrupts it says:
If the position of the last pushed data could be in an address that is not double word aligned, the stacking mechanism automatically adjusts the stacking position to the next double word aligned location and sets a flag (bit 9) in the stacked xPSR to indicate that the double word stack adjustment has been made.
We'll be testing a solution based on this information ourselves.

gray
Posts: 107
Joined: Tue Feb 12, 2019 2:59 am
Location: Mauritius

Re: Stack Trace Considerations (M0)

Post by gray » Wed Nov 22, 2023 1:59 am

Thanks! That's exactly the flag to test to decide for either the +36 or the +40 offset, ie. solve the "gap" problem in 'SVCTrap'. My two "gap"-related test cases now show correct results with the corresponding changes in 'SVCTrap'.

To be fair, the flag is described in the M0 programming manual (PM0215) -- in the section for the SCB_CCR (4.3.5, page 60):
On exception entry, the processor uses bit[9] of the stacked PSR to indicate the stack alignment. On return from the exception,
it uses this stacked bit to restore the correct stack alignment.
Seems I need to improve my manual reading skills.

Locked