I've spent several hours in the last couple days trying to debug a weird crash in the Witch Lights when writing to global variables.
The TL;DR of it is, depending on various factors, swhen I define some, but not all global variables, and then try to read or write to those globals, the Arduino freezes up after the test pattern reaches the end of the LED strip.
If you're interested in the summary of what I'm doing next to try and fix it, scroll to the bottom.
I can work around the issue by either commenting out the specific globals in question (and any code that references them), or by commenting out other globals that get loaded into RAM, such as pre-rendered raster animations. Which, the first thing I thought--in fact, the first thing anyone who I talk to about this thinks--is that I've run out of memory somehow. This kind of thing is exactly what happens when you're up at the limit of your available SRAM. That is where I began looking.
The following is a log of Arduino Memory Kremlinology, where I attempt to interpret the behavior of RAM on an Arduino Due by checking who stands next to Stalin in the May Day Parade.
Here is a memory map of the Arduino Due:
0x0008 0000 - 0x000B FFFF 256 KiB flash bank 0
0x000C 0000 - 0x000F FFFF 256 KiB flash bank 1
Both banks above provide 512 KiB of contiguous flash memory
0x2000 0000 - 0x2000 FFFF 64 KiB SRAM0
0x2007 0000 - 0x2007 FFFF 64 KiB mirrored SRAM0, so that it's consecutive with SRAM1
0x2008 0000 - 0x2008 7FFF 32 KiB SRAM1
0x2010 0000 - 0x2010 107F 4224 bytes of NAND flash controller buffer
One key takeaway is that the Due has a contiguous address space, despite having separate 64K and 32K banks. That address space ranges from 0x2007 0000
to 0x2008 7FFF
. I was under the impression that this was not the case, so that's good to know.
Because the Due is basically a weird experiment that escaped into the wild, the usual Arduino instructions for viewing available RAM don't work. Fortunately, I found instructions here. The memory report code is looking at the contiguous RAM address space I just mentioned, like so:
char *ramstart=(char *)0x20070000;
char *ramend=(char *)0x20088000;
The code calculates 4 things:
- Dynamic RAM used (the "heap", which grows from the "top" of the static area, "up")
- Static RAM used (globals and static variables, in a reserved space "under" the heap)
- Stack RAM used (local variables, interrupts, function calls are stored here, starting at the "top" of the SRAM address space and growing "down" towards the heap; when functions complete, their local variables and pointers are cleaned up, and the stack shrinks)
- "Guess at free mem" (which is complicated)
The "free mem" calculation is stack_ptr - heapend + mi.fordblks
Which, in theory, is subtracting the totaly amount of unallocated memory blocks in the range below the stack? I think? I'm not sure. I'm reading the internet and interpreting.
Here's the memory report during the setup()
function:
Dynamic ram used: 0
Program static ram used 7404
Stack ram used 80
My guess at free mem: 90820
OK, so total free RAM is reporting at roughly 88K out of 96K, not bad.
Dynamic ram used: 1188
Program static ram used 7404
Stack ram used 104
My guess at free mem: 94492
Loop Count: 0
Now, at this point we're executing the main loop()
function for the first time, and here we see something interesting. The total amount of RAM used and the guess at free RAM no longer add up. What gives?
Well, what happens here is that memory in the heap, the dynamic RAM reported up top, has been freed up, but because more stuff is sitting on "top" of it in memory address space, the memory isn't really actually free to be used. The C function that reads a memory space and reports back on free blocks doesn't know that, however. So that's why we can't really rely that much on the "free mem" guess.
The numbers to watch are Dynamic RAM (the "heap") and the Stack, which are memory addresses on opposite sides of the big contiguous memory space. Generally, when you run into a memory issue on an Arduino, it's because you've been writing new stuff onto the end of the heap, and it collides with the stack. That's not happening here.
Dynamic ram used: 1380
Program static ram used 7404
Stack ram used 104
My guess at free mem: 94300
Loop Count: 1
During loop()
running the test pattern, memory usage stays totally static:
Dynamic ram used: 1380
Program static ram used 7404
Stack ram used 104
My guess at free mem: 94300
Loop Count: 738
Dynamic ram used: 1380
Program static ram used 7404
Stack ram used 104
My guess at free mem: 94300
Loop Count: 739
Serial stops responding on loop cycle 741, each time (I've run several tests):
Dynamic ram used: 1380
Program static ram used 7404
Stack ram used 104
My guess at free mem: 94300
Loop Count: 740
So that's loop number 740. On 741, the contents of memory change:
Dynamic ram used: 1348
Program static ram used 7404
Stack ram used 104
My guess at free mem: 94332
Three things jump out at me.
- Dynamic RAM (the "heap") dropped from 1380 to 1348, a difference of 32 bytes
- The free memory estimate has changed from 94300 to 94332, which basically mirrors the change in the heap
- It might be crashing when it tries to access
counter
, which is a global int?
I have previously had a crash just like this, when I was trying to access a global int array for the noIdle
feature. I found that moving the value I was trying to access into the FaerieSprite class definition fixed the crash, and wrote it off as a scope issue to be debugged later. But what if this was the cause instead?
While I was trying to debug noIdle, I used the debug(n)
function to light up LEDs during various stages of various method calls, to see where the crash occurred. What I found at the time was, the foremost test pattern sprite failed when currentPixel == NUM_LEDS - 1
, after its Update()
method called MarkDone()
successfully. When that happens, the isDone
bool is set to TRUE, and SpriteManager
deletes that sprite's Sprite
pointer from the spriteVector
pointer array.
Which seems to have happened, given the 32 bytes freed up from the heap. All righty, then.
So, here's a hypothesis: SpriteManager
deletes a sprite, and then the exact moment we next try to access the counter
global int, boom. We crash.
The problem is, if so, finding the root cause of this is hard.
So I'm not going to.
Instead, I'm going to take the booleans out of globals entirely, and change them into functions, which will query the mode pin when they are called. Memory used by functions is stored in the stack, completely on the other side of the memory address space from where we're allocating and deleting sprites from memory.
That's the plan, anyway. I'll report back when I run the first experiments.