Blaze (Re-redux) Part 2

In my last Blaze-related post, I focused on improving performance without disturbing the structure too much. Asynchronous scrolling, dispatch tables, and a handful of other optimisations gave us a smoother, more responsive game.

That worked – perhaps a little too well.

Once the performance issues were out of the way, the real problem come into focus: the design itself. Blaze is a system built on accumulated decisions, many of them reasonable at the time, that since have calcified into constraints. This post is about what happens when you try to improve the design without being able to break those constraints, and what you learn when you fail.

The Existing Map Format

MAGE (map generator) is an acronym coined by my friend Colin a million years ago. A slew of tile map editor in AMOS followed, all handling files bearing the MAGE magic bytes as a header signature. The demo version of Blaze uses this format, where each level map consists of two byte arrays: one for visual tiles, one for special tiles.

“MAGE” magic header (4 bytes)
$width$ (2 bytes)
$height$ (2 bytes)
$width\times height$ bytes of visual tile index data
$width\times height$ bytes of special tile index data

The current level map is $600 \times 50$ tiles, giving a total size of:

8+(600\times50\times2)=60008\:bytes

The format reflects how Blaze was built: incrementally, experimentally, and without a spec. Features were bolted on as needed, and the special tile map ended up doing far more than it should: collision data, animation flags, masking information, and various other behaviours that had no business living there. One particular abuse still haunts me: the special map doubling as a bank selector to access tiles 256-259.

The root cause goes back to the original tile layout in Deluxe Paint. With 10 tiles per row in a 320-pixel image, there was no clean way to represent the full range of indices. I could have 250 tiles or 260. Nothing in between, which in turn meant either losing a handful of indices or wasting raster memory.

I went with the latter. As the tile set grew, that decision came back to bite, and I ended up introducing a special case: if a tile in the special map had a certain value, it would enable a flag (ExtraBlocks) that added 256 to the tile index. This made tiles 256-259 accessible, provided the base tile index was 0-3. Otherwise, the tile renderer would happily draw tiles from some god-forsaken area of memory, outside the tile atlas.

*Last four tiles (bottom-right) exceeded the 256-tile index.*

The format wasn’t just carrying data – it was encoding behaviour. And untangling them without breaking everything was going to be anything but straightforward.

New Map Format

The constraint I was working under was simple: no big rewrites. With assembly code, even a small change can clobber a register relied upon somewhere else in a macro or subroutine. Preserving values on the stack is the textbook answer, but that adds overhead in hot code paths. WinUAE debugging is also considerably less pleasant that working with a modern toolchain. Incremental changes only.

My first idea was to move some of the special tile map’s functionality into the visual map by widening the visual tile from one byte to two. This would pack the tile index and its metadata into a single 16-bit value:

8 bit – lower byte for tile index (tiles 0-255)
1 bit – tile index most significant bit (allows us to use tiles 256-511)
1 bit – foreground flag (this tile covers the player)
1 bit – animation flag (this tile is animated)
1 bit – rotate 180 degrees
4 bits – reserved

The problem was that the existing code depended heavily on a particular access pattern:

			
; map base address is copied into a0
Move.l map, a0
...
; special tile map stride is copied into d0
Move.l s_map, d0
...
; some offset gets computed and stored in d1
; visual tile is read in d6
Move.b 8(a0, d1.w), d6
...
; the special tile related to that offset is read
; offset is adjusted for special map
Add.w  d0, d1
; special tile is read in d7
Move.b 8(a0, d1.w), d7

		

Visual and special tile accesses are temporally coherent; when you read a visual tile, you almost always need the corresponding special tile shortly after. The pattern above exploits this neatly: add the stride to the offset, and you’re there. It’s simple. And it’s everywhere.

Doubling the visual tile size broke it in two ways. First, the tile offset would no longer be used interchangeably across both layers, since the visual layer’s stride had changed. Second, the offsets themselves, which previously fit in unsigned 16-bit integers, no longer had sufficient range, as the maximum visual layer index grew from 30k to 60k bytes. Row offset lookup tables, tile interaction code, and a large number of other subsystems all relied on variants of this pattern. The change cascaded further than I’d anticipated and I gave up on it.

This was a useful failure. The idea wasn’t unreasonable, but it violated the central constraint: don’t break the assumptions the rest of the system quietly depends on. Thanks, past Keith.

Back to the drawing board. The goal was still to separate visual metadata from the special tile map, so I took a simpler route: a third map layer, called metadata. The bitfield layout is the same as above, with one rename: the tile index MSB becomes the tile bank bit, specifying whether to draw from bank 0 (tiles 0-255) or bank 1 (tiles 256-511). The visual and special tile layers stay exactly as they were. Only the new layer is added.

This let me revert to a known-good version of the game and implement the new loader with minimal disruption. I also repurposed the in-game timer as a crude performance counter, reading the VHPOSR beam register at the start and end of each game loop iteration, with the difference giving the elapse time in scanlines (roughly 64us each). It’s not precision instrumentation, but it’s enough to confirm the new format wasn’t costing us anything.

More importantly, the three-layer structure was the first real step towards separating concerns that had previously been entangled.

New Animation System

The old tile animation system had the same problem as the old map format. It worked, but only because the engine already knew exactly what to expect. Animated tiles were identified by hardcoded special tile values, looked up in a hardcoded animation table, and replaced with the appropriate visual tile. Every animated tile, across every level, had to sit in the same position on the atlas and follow the same animation rules. It was brittle by design.

When I started entangling this, I found it useful to name the two distinct kinds of animation the game needs.

The first is what I call ambient animation; waterfalls, lava, gems, anything that plays continuously regardless of what the player does. The new system handles these in two passes. Up to 64 animation sequences are supported, each with up to 16 frames (always a power of two, which lets the frame counter wrap without a conditional). Each sequence has an associated time shifter that divides the animation counter, controlling playback speed. The counter themselves are incremented during the ISR, keeping animation stable and decoupled from frame time.

In a first pass, each sequence’s current frame is resolved to a tile index. In a second pass, when tiles are drawn, any tile with the animation bit set in its metadata byte uses that pre-resolved index instead of its stored visual tile. The key property is that each sequence is evaluated once per frame, regardless of how many instances of that tile appear on screen.

The second kind is even-based animation, triggered by player actions such as collecting a gem or stepping on a collapsing platform. Unlike ambient animations, these are instance-specific: two gems collected a few frames apart should animate independently. Each active event animation tracks its tile location, original tile index, current sequence, current frame and a completion behaviour – what to do when the sequence ends (restore the original tile, replace it or clear it). Event animations also introduce one-shot sequences, which rune once and terminate rather than looping.

The system exposes two entry points. The first registers a new animation: when an event fires, the current tile state is captured and a new entry is inserted into a fixed-sized circular buffer of active animations. Slot search time is bounded, and allocation is predictable. The second is the per-frame update pass, which advances each active entry, resolves the correct tile for the current frame, and writes it directly to the visual layer, unlike ambient animations, which are resolved indirectly at render time. When a sequence completes, its completion behaviour is applied and the slot freed.

This is the kind of change that doesn’t show up dramatically on screen, but it alters how the system can grow in a fundamental way. Animations are no longer behaviour baked into the engine, but data, authored in the map editor, independent of the code that plays them back.

*The new ambient animation system allows me to create and visualise animation sequences directly in the map editor.*

Next Steps

The tile interaction logic is next, and it’s the same problem in a different coat. This is behaviour that’s been baked into the engine when it should be expressed as data.

In the meantime, here are some screenshots with redesigned graphics:

Probably Works