One of the challenging parts of building a simulation game from scratch is always performance when simulating people / NPCs / entities. In fact, this is how I started development, – I was benchmarking various techniques for rendering many entities which culminated in moving to OpenGL. Instance rendering worked really well, but naturally I reached a bottleneck in my approach elsewhere.
Dynamic Chunking and OOP
In animation framework post, I briefly mentioned dynamic chunking. The idea is quite simple: for entities that are off-screen update them every n frames this way saving FPS:

Everything was running smoothly… until I noticed some odd discrepancies in how entities moved across the map. Calibrating movement speed (especially with changing game speed) was tricky. If you told a pawn to carry out an order and watched it on-screen, the timing would be noticeably different compared to when the same order was executed off-screen. Pretty sure there’s a quantum theory joke here about observed vs. unobserved gangsters committing crimes.
Finally, the approach of handling NPCs was still OOP. Meaning, you create a list of NPC objects, store them in a list and then iterate through them. With all these optimizations, this was obviously the bottleneck.
Below you can see the video of maxed out 25k NPCs (on a 1k x 1k grid) at ~60 FPS.
This system:
- uses ModernGL instance rendering
- has OOP approach for NPCs
- utilizes dynamic chunking for on-screen and off-screen entities
- implements certain expensive calculations in C (Cython).
25k NPCs at 60 FPS is good enough but I think we could do better.
For those familiar with game dev concepts like ECS (Entity Component System), you can probably already guess where this is headed. And for the Python crowd who’ve spent time with NumPy, you’ll know it too: vectorization – performing operations on whole arrays at once, instead of my old habit of looping through objects in a more OOP style. For regular CS folk, I’m replacing AoS approach with SoA.
Vectorization
There’s no shortage of articles on ECS approaches, so I’m not going to reinvent the wheel. Instead, I’m building something custom to fit this game’s needs – a dedicated entity manager.
Rather than using the traditional GameObject architecture, where each object stores its own render position, width, height, and so on…

… we create arrays of these fields, where a massive array of render_x has all the render x positions for all the entities (indices being the entity identifiers).

This means we can perform vector math for all entities all in one go. Below is an example of how the movement (render position) is calculated

Next, after all entity data is ready to be updated for the frame tick, I implemented an Entity Batch Render in Cython (builds to C) so for fast array updates:

Here we copy the entity data to instances array that later gets sent to GPU (ModernGL part).
Just to test things out I put 1000s of NPCs to move in unison first
FPS was virtually unaffected but we hold off our enthusiasm because there is not animation, no calculations yet, just sliding.
The next steps took a few days to implement but essentially we mimicked GameObject OOP animation steps in arrays: grid and render positions movement (tracking path movement), changing of frames every tick (animation) / maintaining current animation indices, calculating y sorting values, calculating facing directions, and more. Finally, after a few more optimizations, I was able to get to ~121,500 NPCs when FPS started to dip a little below 60
This feels like a good milestone to stop on. We’ve gone from 25k NPCs to 121k NPCs – nearly a 5x improvement. I stopped testing at that point, but I’m confident we could push even more walking pawns with a few additional tweaks.