After trying blitting and SDL2 rendering in Pygame, it was clear that neither would be enough for large-scale NPC rendering. Even with optimizations like culling non-visible NPCs, FPS would drop too fast as soon as I started adding animations or game logic.
So, I finally switched to instance rendering in OpenGL using ModernGL, and the performance boost was huge.
What Is Instance Rendering?
Normally, in OpenGL, you issue a separate draw call for each object. This works fine for small numbers, but when dealing with thousands of NPCs, it quickly becomes a bottleneck. Instance rendering lets you render multiple objects in a single draw call, drastically reducing CPU overhead.
Instead of sending position, rotation, scale, and other per-object data for each NPC separately, I batch all of it at once. The GPU then processes everything in parallel, handling thousands of NPCs way more efficiently.
Setting Up Shaders
I used ModernGL to manage OpenGL in Python and set up vertex and fragment shaders for instance rendering. The vertex shader applies transformations using per-instance data, while the fragment shader handles coloring and lighting.
With this setup, I can now render over 10,000 NPCs while maintaining 200 FPS, even before further optimizations. That’s a massive leap from what was possible with SDL2 or blitting.
Before (SDL2: 10k NPCs @ 50 FPS)
After (OpenGL: 10k NPCs @ 200FPS)
Next Steps
Huge thanks to einarf from the ModernGL Discord server for the help in getting everything up and running.
As of now, I’m using moderngl for rendering graphics and pygame for window management. Next, I’ll be focusing on animations and pathfinding, which will introduce new performance challenges.