Important C++ Anti-Patterns #
Here we list three important C++ specific anti-patterns or design flaws we constantly keep an eye out for. Naturally, there’s more, but we only list the anti-patterns that we avoid at all cost. Allowing these patterns into our code has proven a reliable way of introducing a ceaseless stream of bugs into our dev life which is detrimental to our happiness and productivity. Maybe not surprisingly they’re all related to memory management. Naive memory management in C/C++ is a relentless source of bugs. Fortunately, it’s not particularly hard to spot code that might be affected by these issues and when writing new code to simply avoid the possibility of these bugs occurring.
Global Variables #
This one’s too fundamental to count and not C++ specific: avoid global variables. Global variables completely destroy any hope of understanding the code without knowing the entire code in extreme detail. Additionally, they prevent us from running two instances of the algorithm in the same process. This happens because they share state through global variables and typically can’t agree on what values those need to have. Therefore, code using global variables is likely not thread-safe and there’s a good chance it can’t be made thread-safe without removing the global variables. We avoid global variables under pretty much all circumstances.
Raw Pointers #
Raw pointers are semantically horribly overloaded. To start with it could be the address of
- one object,
- or of the first element of an array.
In both of these cases it could either be:
- a reference with pointer syntax,
- or an optional, where
nullptrsignals not valid.
Lastly, in all of the previous four cases it’s unclear who owns the memory and who’s responsible for freeing it. The choices are that the memory
- is automatically managed by something else,
- must be freed using
free,delete,delete[], - is freed using some custom API.
Moreover, when pointing to multiple elements, its impossible to know how many exactly. As if that’s not enough, it’s impossible to know, until after the program crashed, if a pointer was a dangling pointer. (Obligatory reminder that dereferencing a dangling pointer will not always result in a crash, which is why we must use tools like ASAN.)
Note that when using a pointer in a struct and allowing users to set the pointer, there’s no guarantee that the pointer always has the same semantics.
Therefore, we use pointers extremely sparingly. Naturally, we’d use them inside custom containers where the pointer points to the allocated memory. Alternatively, we use them as views when interfacing with C APIs such as MPI. Pointers are not particularly complex if it’s implicitly clear which type of pointer a particular pointer refers to. This can be quite hard to figure out with confidence if they’re use is scattered across the entire codebase; and not particularly hard if they’re used locally.
Alternatives #
Consider using one of the following alternatives:
-
Use an
std::optionalwhen the pointer could be anullptr. In tight inner loops we don’t want to be checking if something is available of not on every iteration. For HPC applications, probably you’d consider rewriting to have simple inner loops. -
For views into arrays of elements consider using a view instead, e.g.
std::spanorstd::mdspan. However, many array libraries have their own, and they’re easy to implement. -
Write functions that accept their arguments as references, unless ownership of the object is required.
-
Use containers to manage the lifetime of the memory, e.g.
std::vector,std::shared_ptror custom containers when appropriate. -
Use smart pointers for instances of virtual classes and pass by reference whenever possible.
Dangling Pointers and References #
We avoid situations in which it’s not guaranteed that all of the objects used are still alive. This is mostly a concern when writing classes or structs:
class Foo {
public:
Foo(double * ptr, const double& reference, std::span<double> span);
private:
double* ptr;
const double& reference;
std::span<double> view;
};
The problem is that Foo doesn’t guarantee that ptr, reference and span
are valid through the lifetime of it’s instances. As a result this code allows
us to easily write code that has dangling pointers or references:
Foo make_foo() {
double ptr = new double[3];
double x = 42.0;
std::span<double> span(&x, 1);
return Foo(ptr, x, span);
}
The problems are:
- The memory allocated for
ptrleaks, unless it’s later retrieved magically fromFooand freed. Foo::referenceis a reference to a local, and will result in a use-after-stack error if ever accessed outside ofmake_foo.- Simply by reading the code for
Foowe can’t know thatstd::spanis valid throughout the lifetime of the object.
While they may look like horrible ideas here, the temptation to write code that
“just” keeps ptr alive while foo exists isn’t small at all. It may seem like
a trivial matter to “just” make sure that things are setup in the right order
and taken down in the right order. In a sense it absolutely is and yet it’s a
merciless, repeated source of errors. Fortunately, it’s entirely avoidable
using RAII.
Exceptions #
The above has exceptions like views, e.g. std::span, or iterators. However,
it’s part of those objects semantics to not own the resource. Therefore, they
can’t use used if it can’t be guaranteed that the resource they point to stays
alive.
Note that passing these to functions as arguments is the important case where we can guarantee that the resource stays alive. For example,
double sum(std::span<double> x);
is safe because when we call sum we know that the memory that x points to
is valid. We also know that it will not become invalid during the course of
running sum. (Note, that this last point is almost impossible to guarantee if
there are global variables in the program.)
Allowing the Possibility of Leaking #
C/C++ is notorious for leaking memory. Fortunately, this too is a largely solved problem. Let’s consider,
std::vector<double> v(100);
We see that the only way this will leak is if there’s a bug in std::vector.
Otherwise, when the std::vector goes out of scope it deallocates the memory
it refers to, reliably every single time.
Consequently it’s safe to use it in another struct:
struct Foo {
std::vector<double> x;
};
Again, there can be no leak, because the destructor of Foo will run the dtor
of std::vector and the memory will be deallocated.
See how it happens almost by default? It’s a consequence of using RAII and managing ownership in a sane manner.
The reliable way of leaving the safe world is by using malloc or it’s
slightly better sibling new. As always, neither is banned, we want to
allocate memory and in HPC sometimes we want precise control over how we
allocate memory. However, they must to be tamed by containing how frequently
they’re used and how far apart the new and delete can appear. One can do so
by creating the appropriate RAII-style abstraction, e.g. containers, which
means the new and delete are isolated to the same class. We can write these
container with full attention to memory related issues, test it rigorously and
fix any bugs that inevitably do creep through them by editing a single class.
When one does this, the use for new and delete almost vanishes from C++
codebases. As a result we remove a major source of leaking memory.