Back in May 2011, I was leading the design of an RPG… as its only programmer. It was a failure. However! Programming for games is a lot of fun. I love systems, and game systems offer a lot of really neat problems. So I'm going to do it again — without an RPG involved.
I've been meaning to revisit the engine I wrote for that RPG, and now I have the time to do it. The code is horrific, though, so I'm practically starting over from scratch. It's better that way (no, really). I've also been planning to remake & expand Prisma, so this fits in nicely. I can fuel design patterns by what is needed in Prisma.
Onsang is still going (and hey, it kinda looks like it's actually evolving), but it makes more sense to focus on game system knowledge right now as I'm planning on poking people through July-August in California for employment. I might end up pushing the QA side of things, so who knows what'll happen. Regardless, this is double-fun and good mental exercise.
Even since that scaffold of an RPG, I've been interested in building a rendering system that used command buffers, but not so much about the things that seem to implicitly come along with it (until recently). To achieve maximum speed in such a system, you want to minimize the amount of data you're throwing around, and you definitely don't want rampant allocations during a frame. This amounts to using compact data structures and contiguous arrays to minimize cache misses. This approach to systems is often called “data-oriented design” (DOD) or “data-driven design”, where the benefits are not just efficiency, but facilitating task distribution and modularity.
I'm going to take that to heart with the new engine. I've been reading articles of DOD proponents since last weekend, and the patterns make a lot of sense. bitsquid, for example, decouples systems to encapsulate them and to allow them to manage their storage as optimally as possible. Their “low-level” systems expose IDs instead of pointers so the system is free to store & re-organize its storage as it sees fit. By allowing the system to re-organize its memory, it can (e.g.) ensure its objects are sorted and adjacent, without affecting any IDs to those objects.
This means they can iterate through the objects without wasting CPU time or cache misses on empty slots or memory hopping through distant dynamically-allocated objects (the latter of which you are usually stuck with in something like
std::map). For some systems, that might not be necessary, but you still get the benefit of not having dangling pointers, and you have a better guarantee that external systems aren't going to unduly modify the object (because they can't access it without asking for it). In most cases, you don't mutate resource data once it's loaded, so you really don't need mutable access, but if you do, it's O(1) away!
There are downsides, though. It takes some careful thinking to ensure the IDs are unique and cannot be used once the object dies (e.g., when an object is removed, and another added, the new one could have an internal index equal to the index of the removed object). It's not very difficult, and you can usually define a sane upper bound on the number of objects the system will ever possibly use (e.g., do you really need 4 billion sounds?), so you can use a bunch of bits of the ID just for the unique part (e.g., an incrementing counter that wraps around), and leave the rest to store the internal index. An upside to having an upper bound means you can reduce a ton of allocations, and you can form a better understanding of the resources required by a system through its use. I think such deep understanding is going to be extremely valuable.
I'm going to harp on a lot about bitsquid throughout this article, so be prepared.
Another aspect that emerges from DOD is simplicity. Because structures are open and don't prohibit or protect use, whatever wants to use them can do so in the most direct manner. In contrast to OOP, DOD necessitates that anything can do anything to everything that's open. In OOP, you're often hiding data and placing an interface on top to mediate access. Why? Because everything is afraid of being malformed or misused. This might make some sense for software that really, really shouldn't crash and should catch misuse as soon as possible through its design, but in games we just want to run fast and get the code over with. Always having to plaster on a layer of interface & dumb-dumb security slows us down by making things rigid, so why do it? We're not interested in the preemptive assurances that get in our way and slow the game down.
Not having a protective interface over everything doesn't prevent us from crashing hard. We should crash hard when we encounter an error, and we should be vigilant to test for errors, because otherwise they're going to go unnoticed or ignored. It's better to fix the system early before everything depends on its broken behavior. This is, naturally, also something bitsquid does. Instead of removing that “pesky” triggering assertion that's due to malformed input, just fix the malformed input. Is the error too strict? Evaluate it and decide whether it should be removed or turned into a warning instead. For example, we may want to error when we encounter an animation that has a bajillion bones, but it could be totally legitimate, so a warning would be more sane until we have a real reason to prohibit it — e.g., an upper bound on the animation system's support for bones.
Back to decoupling. How do you share data if everything is disparate and in its own little universe? More data! Low-level systems can have an output stream of events or a way to poll per-frame (whatever makes more sense for the system). To connect low-level systems, you can use a higher-level system that knows about all of the systems involved. For example, the
Game might want to connect HIDs to the
Player, so it could poll the input system and send off to the player, the binding manager, whatever. By patching things together, we can design the interfaces to only deal with what they know, which reduces their complexity.
You might say that puts a lot of strain on the high-level system to connect things. True, but it's much better than distributing the interface to so many systems that you end up with an unmaintainable spaghetti of references, global objects, and code fluff. It moves the complexity away from the low-level systems and into the high-level systems where we can be more intelligent about the way they are connected. By connecting in a higher-level system, we reduce the complexity of the lower-level systems and increase their modularity. You can take any system out of the equation and replace it with another as long as it's not intrinsically attached to any other system.
What does this mean? We can replace systems at compile time (or even at runtime) to handle specific hardware for optimality, without resorting to expensive virtual calls on interfaces (which is too often only used to facilitate system swapping or so-called extensibility — booo). The internals could be implemented to avoid such nonsense (i.e., since we expose an event stream or polling, that's most of the interface, and it can differ internally), but still give it the modularity to support different hardware/capabilities efficiently. Most of the time, what you end up removing due to the removal of a system should only be in the high-level system. That will be a good indication that your systems are decoupled.
DOD obviously has a lot of implications for the code. Data is nearly always public, interfaces stay out of structures for POD traits (and really for uniformity when you want to extend the interface for dealing with a type), and abstract classes mostly disappear, but they are still valuable. For example, data streaming is already a costly operation, and we want to be able to use it on-demand from different kinds of storage media. We could hide this behind a little system that takes requests and gives back blobs, but it'd end up being global, since a lot of stuff is going to need data. We could keep it encapsulated and pass along streams from the high-level systems, but it's very roundabout, and adds more interface to our systems. In reality, you're going to have some resource system to fill up those systems, so the low-level storage media access should be disparate even from that. It doesn't fit into our system model. It's much easier to give it an abstract interface and allow different implementations of it for memory/file/whatever access and use it appropriately within the resource system.
Of course, you may only ever need standard filesystem access, so maybe you don't need the abstract interface. That's fine, but its consequences should be considered early on. What if you move to a server-client model (for tooling, the actual game, etc.)? You're probably going to want to stream over the network, or at least into memory for the network system, so your streams are going to then need direct memory support or support for a network buffer. What about platform support? You could handle that by using PIMPL, including different code, or by using a bunch of macros to determine the platform & change the code accordingly, but then you end up with some manner of code spaghetti and have to either switch the type data or duplicate the whole interface for each platform.
If you foresee anything of that nature with something used in many places or heavily dependent on the platform (in this case, storage media access), you probably want a virtual interface. It separates the different implementations and makes the code simpler. You still have to duplicate the interface, but with an abstract class you have a constraint on what the interface is supposed to provide. This helps in maintainability when you change an interface. You don't have to guess whether a change has made it to all of the platform-specific implementations because the compiler tells you.
The core of my engine is going to be modeled around bitsquid's foundation library. Having the interface separate from most types isn't very critical for something I'm authoring, since any extension to the interface I would make would be core and could thus be in-class if the interfaces were designed that way, but I'm going to do it anyways. Style shifts are fun!
It has other benefits, such as moving the recompilation factor to the interface and the code that includes it — away from the type, which rarely changes. These types have public data to facilitate the built-in functionality (since the interface consists of non-friend free functions), which happens to enable anyone else to add functionality. In addition to reducing recompilation from monolithic headers, code only includes what it uses and doesn't incur extra cost from the stuff that it doesn't need. This means types that only need a type definition (i.e., none of its interface) don't incur the cost of the interface.
Granted, you can design code to work like this without separating the interface from the class, but you still end up with inclusion spaghetti due to dependencies. That's what causes infectious recompilation, especially with class templates, which still costs more time due to the interface being needlessly included. You could move the interface definitions out of the class and still have the problem simply because the declarations are still a part of the class, and you then additionally have the maintenance cost of syncing the declaration and implementation files.
Like the foundation libray, my types are grouped together by … type. Core types, collection types, math types, etc. Interface headers (obviously) include the required type headers, so you only ever need to include the interface header for a type if you need to use functionality for it. If you need types, you have fewer headers to include since they're grouped. Because types rarely change, the recompilation factor of doing this is minimal. Interfaces change far more often, and it's less work to maintain & include few headers (which, again, are small because there's no interface in the type headers).
A downside I've already noticed is the boilerplate caused with free functions for class templates. The template specification has to be repeated for each function. If I were doing this in my own style, that would be unnecessary because everything for a class template would be declared and defined in-class (which is undesired here for aforementioned reasons). It's not that bad, though. At least I don't have to repeat the entire declaration like you have to do with non-header implementations (or with separate declarations and definitions). It does have a nice side-effect for the objects the functions operate on: data member prefixes/suffixes are not needed because you always have to go through the object to get at its members, and thus there are no name confusions.
I'm again taking bitsquid's cue here by using a single prefix underscore for internal data members, just to indicate that something should only be mutated if you know what you're doing. My motto here is “the user should protect the user from the user”, but I'm still going to furiously flip tables via assertions if they misuse the existing interface. I'm still going to have (free function) accessors for these internal data members, because the data in a type might change (and thus have a significant effect on anything that directly accesses its members), because it's more consistent with accessor-like functions that make small calculations, and finally because writing
array._size is kinda bleh. This is in contrast to my usual internal data members with
m_ as a prefix. Arguments of readability against something like that don't work on me (because I can read it just fine and “m_this m_is_hard m_to m_read” is never a real occurrence in code), but here I'm making an exception just to see how it goes. The rest of the design isn't really in my style either, so why not.
One thing I'm definitely not doing is repeating declarations just so the documentation is in a smaller area. Documentation is best represented outside of code, and editors these days can jump to functions if you really need to look at the code instead of the documentation (and why do you need to see just the documentation in-code if you already have documentation elsewhere?).
I'm very tempted to use the facilities from duct++ for the engine, but I really want to avoid using the stdlib if I can help it (which duct++ uses somewhat significantly). This is pedantic for someone that isn't doing “AAA” games where performance is critical because of all the stupid pixels, but I see it as a way to learn how all of this works. Making a game is the motivation to write the engine, but it has its own values and intrigue that will keep me plenty happy even if I don't get to actually making the game. I'm a system designer, which games have, but first and foremost need before they can be games, so I have plenty of fun in the backend.
I might end up using a few specific things from duct++, like compiler detection and endian handling, which don't include any (or little) of the stdlib.
Pedantic motivations justified, there are still good reasons to avoid the C++ stdlib. I can have a more intimate understanding of the algorithms and structures I'm using and can tailor them specifically to the needs of the engine. Everything in the engine can be consistent in style and (hopefully) behavior. I'm not forced to deal with the mental and (supposedly minor, now) performance cost of exceptions (which are unnecessary since I want to crash hard) and I have less platform-/implementation-specific junk to deal with (case in point: Microsoft broke
cos() is not something I'm going to write).
I have yet to seriously dig into the rendering system design, but it's also going to be bitsquid-inspired. Ideally, rendering will be distributed into tasks, merged for sorting, then distributed again to the hardware. Sort keys are really neat, surprisingly simple, and I can't wait to play with them.
I need to stop expounding on this article and get back to the code. Hopefully I'll have something interesting to write about on this topic in the near future. Here's a list of what I've been reading:
- bitsquid blog
Very valuable information on how DOD is used in their engine, and a lot of real-world examples in design and methodology.
- Data-Oriented Design (Or Why You Might Be Shooting Yourself in The Foot With OOP)
A very good article on the essentials, benefits, and drawbacks of DOD. “Where there’s one, there are many.”
- How to make a rendering engine
A collection of links for existing command-buffer rendering engines and an FAQ on the design of such rendering.
- Order your graphics draw calls around!
Specifically about command buffers and sort keys.
- Growing Ginkgo Pt. 1: The Reading List
A series on the design of the engine for Secrets of Rætikon (and Chasing Aurora?). This contains some useful links, but they use a typical OOP design, and a lot of virtual inheritance as far as I can see. See Ginkgo's Game Loop for time management in the main loop.