UX and Product Design
IMG_7592.jpeg

Blog

Thinking out loud

Posts tagged debugging
Trouble: shot, or how I learned to stop worrying and love AC
FullSizeRender.jpg

The short version: everyone was right. Induction caused massive AC on the sensor data wire.

IMAGE.JPG

I added a makeshift low-pass power filter with a resistor and capacitor I had handy in my tool bin, and it solved the sensor issue.

Turns out, 25 meters of narrow gauge wire is a terrific antenna.

I need to do a proper measurement of the AC frequency on the power rail and design an RC filter to match it for the production design, but this is good enough for now.

Magical Mystery Crash

I've spent several hours in the last couple days trying to debug a weird crash in the Witch Lights when writing to global variables.

The TL;DR of it is, depending on various factors, swhen I define some, but not all global variables, and then try to read or write to those globals, the Arduino freezes up after the test pattern reaches the end of the LED strip.

If you're interested in the summary of what I'm doing next to try and fix it, scroll to the bottom.

I can work around the issue by either commenting out the specific globals in question (and any code that references them), or by commenting out other globals that get loaded into RAM, such as pre-rendered raster animations. Which, the first thing I thought--in fact, the first thing anyone who I talk to about this thinks--is that I've run out of memory somehow. This kind of thing is exactly what happens when you're up at the limit of your available SRAM. That is where I began looking.

The following is a log of Arduino Memory Kremlinology, where I attempt to interpret the behavior of RAM on an Arduino Due by checking who stands next to Stalin in the May Day Parade.

Here is a memory map of the Arduino Due:

0x0008 0000 - 0x000B FFFF   256 KiB flash bank 0
    0x000C 0000 - 0x000F FFFF   256 KiB flash bank 1
                                Both banks above provide 512 KiB of contiguous flash memory
    0x2000 0000 - 0x2000 FFFF   64 KiB SRAM0
    0x2007 0000 - 0x2007 FFFF   64 KiB mirrored SRAM0, so that it's consecutive with SRAM1
    0x2008 0000 - 0x2008 7FFF   32 KiB SRAM1
    0x2010 0000 - 0x2010 107F   4224 bytes of NAND flash controller buffer

One key takeaway is that the Due has a contiguous address space, despite having separate 64K and 32K banks. That address space ranges from 0x2007 0000 to 0x2008 7FFF. I was under the impression that this was not the case, so that's good to know.

Because the Due is basically a weird experiment that escaped into the wild, the usual Arduino instructions for viewing available RAM don't work. Fortunately, I found instructions here. The memory report code is looking at the contiguous RAM address space I just mentioned, like so:

char *ramstart=(char *)0x20070000;
    char *ramend=(char *)0x20088000;

The code calculates 4 things:

  • Dynamic RAM used (the "heap", which grows from the "top" of the static area, "up")
  • Static RAM used (globals and static variables, in a reserved space "under" the heap)
  • Stack RAM used (local variables, interrupts, function calls are stored here, starting at the "top" of the SRAM address space and growing "down" towards the heap; when functions complete, their local variables and pointers are cleaned up, and the stack shrinks)
  • "Guess at free mem" (which is complicated)

The "free mem" calculation is stack_ptr - heapend + mi.fordblks

Which, in theory, is subtracting the totaly amount of unallocated memory blocks in the range below the stack? I think? I'm not sure. I'm reading the internet and interpreting.

Here's the memory report during the setup() function:

Dynamic ram used: 0
    Program static ram used 7404
    Stack ram used 80

    My guess at free mem: 90820

OK, so total free RAM is reporting at roughly 88K out of 96K, not bad.

Dynamic ram used: 1188
    Program static ram used 7404
    Stack ram used 104

    My guess at free mem: 94492

    Loop Count: 0

Now, at this point we're executing the main loop() function for the first time, and here we see something interesting. The total amount of RAM used and the guess at free RAM no longer add up. What gives?

Well, what happens here is that memory in the heap, the dynamic RAM reported up top, has been freed up, but because more stuff is sitting on "top" of it in memory address space, the memory isn't really actually free to be used. The C function that reads a memory space and reports back on free blocks doesn't know that, however. So that's why we can't really rely that much on the "free mem" guess.

The numbers to watch are Dynamic RAM (the "heap") and the Stack, which are memory addresses on opposite sides of the big contiguous memory space. Generally, when you run into a memory issue on an Arduino, it's because you've been writing new stuff onto the end of the heap, and it collides with the stack. That's not happening here.

Dynamic ram used: 1380
    Program static ram used 7404
    Stack ram used 104

    My guess at free mem: 94300

    Loop Count: 1

During loop() running the test pattern, memory usage stays totally static:

Dynamic ram used: 1380
    Program static ram used 7404
    Stack ram used 104

    My guess at free mem: 94300

    Loop Count: 738

    Dynamic ram used: 1380
    Program static ram used 7404
    Stack ram used 104

    My guess at free mem: 94300

    Loop Count: 739

Serial stops responding on loop cycle 741, each time (I've run several tests):

Dynamic ram used: 1380
    Program static ram used 7404
    Stack ram used 104

    My guess at free mem: 94300

    Loop Count: 740

So that's loop number 740. On 741, the contents of memory change:

Dynamic ram used: 1348
    Program static ram used 7404
    Stack ram used 104

    My guess at free mem: 94332

Three things jump out at me.

  • Dynamic RAM (the "heap") dropped from 1380 to 1348, a difference of 32 bytes
  • The free memory estimate has changed from 94300 to 94332, which basically mirrors the change in the heap
  • It might be crashing when it tries to access counter, which is a global int?

I have previously had a crash just like this, when I was trying to access a global int array for the noIdle feature. I found that moving the value I was trying to access into the FaerieSprite class definition fixed the crash, and wrote it off as a scope issue to be debugged later. But what if this was the cause instead?

While I was trying to debug noIdle, I used the debug(n) function to light up LEDs during various stages of various method calls, to see where the crash occurred. What I found at the time was, the foremost test pattern sprite failed when currentPixel == NUM_LEDS - 1, after its Update() method called MarkDone() successfully. When that happens, the isDone bool is set to TRUE, and SpriteManager deletes that sprite's Sprite pointer from the spriteVector pointer array.

Which seems to have happened, given the 32 bytes freed up from the heap. All righty, then.

So, here's a hypothesis: SpriteManager deletes a sprite, and then the exact moment we next try to access the counter global int, boom. We crash.

The problem is, if so, finding the root cause of this is hard.

So I'm not going to.

Instead, I'm going to take the booleans out of globals entirely, and change them into functions, which will query the mode pin when they are called. Memory used by functions is stored in the stack, completely on the other side of the memory address space from where we're allocating and deleting sprites from memory.

That's the plan, anyway. I'll report back when I run the first experiments.

Minimum Viable Magic

Things accomplished:

Parabolic acceleration & braking curves for faerie sprites, implemented as a method

Recursive forced fade-from-dark-to-light algorithm when faeries stop and idle

New version of idle method which replaces CRGB pre-rendered pixel arrays with algorithm, one of a few experiments. This saves hugely on SRAM usage, allowing me to save that SRAM for pixel animation for special accents and looping travel animations.

There was much debugging.

I am pretty good at debugging. It’s basically how I approach the world.

Next up, what I want is for the sprites to travel long distances, idle, then do a random shorter move (like shown in the video below), idle, and repeat a few times.

It’s just adding another state bool to the class, and a bit of logic.

More concerning to me: when transitioning from idle to travel, there is a minor visual glitch, more obvious in person than in video.

It’s caused by the 3 pixels I’m manipulating suddenly changing state when UpdateIdle() calls StartTravel(). At that moment, 3 new entries from the same fixed colorArray are written to the CRGB that represent my pixel pattern.

I need a colorFade function to smoothly transition between two states. I’ve found example code. Fortunately, reading code and understanding what it does is also something I do decently.

Anyone reading this who might HAVE one already though for FastLED? Let me know please if so. 🙃

I have some secret ideas sketched in my lost notebook (🤬) that I was hoping to get to, but if I can just get this last faerie sprite checked in to GitHub I can turn my attention to other things that sorely need my attention.

Extra animations and things are luxuries, I’m going for minimum viable magic, so that my brain will relax and stop hyperfocusing on not having a viable product for the release date.

Considering making faeries more likely to stop and idle/flit around when they encounter other faeries. This requires collision detection. Have some sketches for that as well.

Loopy Debug is my new Frank Zappa cover band

Currently working on looping animations for when the sprites pause.

Here's one in Excel

l_pulsar_a.png

And animated

I would show you what it looks like on the NeoPixels, but the Arduino crashes when it tries to spawn the sprite object, so I've gear shifted from animation design mode into debug mode.

How I understand what the code does

So what's happening is, the Arduino bootloads, runs through all the globals at the top, stuff like:

#define afc_l_pulsar_a_ANIMATION_FRAME_WIDTH    24
#define afc_l_pulsar_a_ANIMATION_FRAMES         22

Which is where I set the parameters for each animation. In this case, you've got 22 frames of animation, each one 24 pixels wide. It also sets all the other parameters to tune the global travel acceleration rate, sprite pixel velocity, and how far each sprite travels before playing the idle animation.

We then go on to use the FastLED library to create a CRGB struct to contain the color values we will write to the NeoPixel strip:

CRGB leds[NUM_LEDS];

CRGB structs are a range of memory that contain 3 bytes for each LED: the red, green, and blue color values. So the amount of SRAM that the struct takes up is basically [NUM_LEDS] * 3 bytes.

To animate the pixels that I've been designing in Excel, what happens is, we define a char and a CRGB struct for each animation, and each is FRAME_WIDTH * FRAMES long. In the case of the pulsar loop above, that's 24 by 22, which is 528.

A char is one byte per entry, and we already know a CRGB is three, so that means we're taking up 528 times 4 bytes, or 2.06K. And we have a total of 96K of SRAM to execute the entire system, including all pre-rendered animations.

Actually, we have 32K. See, the Arduino Due has a 32K bank of SRAM, and a 64K bank of SRAM, and I'm not sure how they interact, but I can say for certain that if you define too many CRGB structs and chars, the Arduino Due crashes on boot, and our best guess for memory usage is that we're around the 30-33K mark. I'll get back to that in a bit.

So we define all the animations that we're going to run in this build, and then we define our function prototypes and our object classes. I won't get into the structure of the classes, except to say that there's a SpriteVector and SpriteManager class, which combine to handle the creation, lifecycle, and deconstruction of each animation sprite.

There is also the LoopTestSprite or FragmentTestSprite classes, which are subclasses of the Sprite class. Each different sprite class currently has code to move from point to point, and to play a pre-rendered animation for a certain number of repetitions when it reaches its destination pixel. It then moves to the next destination pixel, loops, and repeat until the whole thing goes off the pixel grid, at which point the object is marked is_done. At which point, SpriteManager deletes the object from memory on its next run-through.

So that's where the classes are defined. If the Arduino passes safely through those, we reach the point where we create our permanent objects:

InfraredSensor *sensor1;
InfraredSensor *sensor2;

SpriteManager *spriteManager;

bool isBooted;
bool testSpritesCreated;

int starttime = millis();

So we've got 2 infrared sensors per strip of the Witch Lights, and they get created and assigned the names sensor1 and sensor2.

We boot up SpriteManager, prime some bools, and set the starttime value, and then it's time to run setup().

void setup() {
    createColorsets();
    createAnimationFrames();

    isBooted = false;
    testSpritesCreated = false;


    spriteManager = new SpriteManager();

    sensor1 = new InfraredSensor(PIR_SENSOR_1_PIN);
    sensor2 = new InfraredSensor(PIR_SENSOR_2_PIN);

    resetStrip();
}

setup() reads our arrays of predefined color sets into RAM, and then runs the createAnimationFrames() function. Which reads all of the animations defined within the function into the chars we created earlier. So now, in memory, we have a full set of pixel animations, in the form of char structs with the animation's name.

Setup also sets isBooted and testSpritesCreated to false, which are bits that trigger a test pattern at the beginning of loop() later. And it links the infrared sensors to their appropriate input pins, resets the NeoPixel strip, and we're ready for the main loop().

loop() is the main engine of an Arduino project. It cycles forever, and each time, you have a chance to do some logic. In this case, we run a rainbow flag test pattern down the length of the NeoPixel strips, which gives us a nice way to test the strips as the installation is assembled.

After that (which only runs once), we check sensor1, check sensor2, create the appropriate sprite object if either is triggered. So when someone walks by the Witch Lights, an animation sprite is called into memory. But the sprite object only knows how to respond to Update() calls from SpriteManager: it won't do anything on its own.

So loop() calls spriteManager->Update(); last of all, and then the loop repeats.

Lots of stuff happens behind the scenes when you call Update() of course. But we'll get into that as we need it. Right now, I'm finishing the talk-through of the whole boot cycle of the witchlights-fastled.ino sketch, because right now, the Arduino is crashing before it draws the new animation sprite I just defined.

When your shit crashes

A few days ago, when I was just learning to create custom animation sprites, I tried to define all the animations I had converted at once. I knew animations took up SRAM, but thinking that I had 96K to play with, I wasn't concerned about SRAM usage yet.

The Arduino crashed so hard I had to manually erase its flash RAM before I could reprogram it again. For reference: that's pretty bad.

In that case, it crashed while creating the char and CRGB structs, before it reached setup(). So the Arduino would turn on, but nothing else would happen.

Today's problem is different.

Today, when I turn on the Arduino, it plays the test pattern, which means it's made it all the way through everything we just defined, to the loop(), and at least started the main loop.

But when I activate sensor1 with a pushbutton, nothing happens.

So something ain't right.

For comparison, here is what I have defined in memory for animation:

char afc_w8v1r[ANIMATION_FRAME_WIDTH * ANIMATION_FRAMES];
CRGB af_w8v1r[ANIMATION_FRAME_WIDTH * ANIMATION_FRAMES];

char afc_f_slow_stop[afc_f_slow_stop_ANIMATION_FRAME_WIDTH * afc_f_slow_stop_ANIMATION_FRAMES];
CRGB af_f_slow_stop[afc_f_slow_stop_ANIMATION_FRAME_WIDTH * afc_f_slow_stop_ANIMATION_FRAMES];

char afc_f_slow_stop_c[afc_f_slow_stop_c_ANIMATION_FRAME_WIDTH * afc_f_slow_stop_c_ANIMATION_FRAMES];
CRGB af_f_slow_stop_c[afc_f_slow_stop_c_ANIMATION_FRAME_WIDTH * afc_f_slow_stop_c_ANIMATION_FRAMES];

char afc_l_pulsar_a[afc_l_pulsar_a_ANIMATION_FRAME_WIDTH * afc_l_pulsar_a_ANIMATION_FRAMES];
CRGB af_l_pulsar_a[afc_l_pulsar_a_ANIMATION_FRAME_WIDTH * afc_l_pulsar_a_ANIMATION_FRAMES];

Looking at this, we have:

  • afc_w8v1r is the original animation sprite from last year, defined backwards for sprites heading in the "reverse" direction from sensor2.

  • afc_f_slow_stop is the Better slow stop go animation that I was testing a couple days ago.

  • afc_f_slow_stop_c is an experimental variation of that animation, using manual anti-aliasing for slow pixel-to-pixel movements.

  • afc_l_pulsar_a is the first loop test, featured at the top of this post.

When I first uploaded this sketch, in the sensor1 check in loop(), we had:

if (sensor1->IsActuated()) {
    Sprite *s1 = new FragmentTestSprite();
    // Sprite *s1 = new LoopTestSprite();

    if (! spriteManager->Add(s1)) {
        delete s1;
    }
}

And pressing the sensor button would spawn a "slow stop go" sprite. Great.

So I changed it to:

if (sensor1->IsActuated()) {
    // Sprite *s1 = new FragmentTestSprite();
    Sprite *s1 = new LoopTestSprite();

    if (! spriteManager->Add(s1)) {
        delete s1;
    }
}

And now, when you hit the sensor button, nothing happens.

Huh.

OK, so.

Change it back? Sprite spawns. Change it again? Nothing happens.

So it's consistent, whatever it is. That's a bonus. Consistent means repeatable.

My first thought was, what did I do wrong in the LoopTestSprite() class definition that I didn't do in FragmentTestSprite()? So I did a diff between the two.

diff.png

And for the life of me, I can't find anything in the LoopTestSprite() that's different and is Arduino-crashingly bad. That's the differences above. On the left, the sprite that runs, and on the right, the one that don't.

Is it SRAM usage?

afc_f_slow_stop, which does run, is 4.32K in SRAM.

afc_l_pulsar_a, which does not run, is 2.06K in SRAM.

We can't rule that out, but it seems unlikely.

So what now, smart guy?

Welp.

Fortunately, there's a debug() function, which lights up a specific LED when you call it.

In the past, this was used to show the size of the SRAM heap while the Arduino ran. That helped greatly to identify memory leaks.

What I can do with it now is, I can place debug(20) at the start of loop(), debug(21) at the next step, 22 at the next, and so on, and put debug statements into the sprite construction logic of the sprite that works, and do a control experiment to make sure I can identify all the working pieces of that sprite doing their thing. As the sprite is constructed and runs through the logic to process the animation char into CRGB frames, the most easily-seen LEDs under my desk will turn on, one by one.

(And I'll be checking that into git for sanity's sake.)

Once that's done, I will change the animation logic values in that sprite to point at the animation for LoopTestSprite().

Why? Because I just tested the debug logic, and if I used copy and paste or hand-typed all the debug into LoopTestSprite(), I'm running the risk of a typo or mistake causing it to light up the wrong LEDs or malfunction in some other unknown way.

So I will modify the sprite to run the new animation, compile, and run it on the Arduino. At which point... I will probably learn something unexpected. That's how it usually happens.

But for now, rain clouds are moving in, and I'm going to sit on my screened-in back porch and watch the rain. Animation can wait.