Murphy's Law Of Software Abstractions

2020-02-25

All software abstractions, sooner or later, leak.

When this happens, it hurts.

To drive a car, you need not know how it works internally. The mechanics of an automobile is well abstracted from the driver. Similarly, software libraries, tools, and frameworks promise abstraction to the engineers using them. They promise that one can use them effectively without delving into the internals.

justice-2060093_640

Let me tell you a story of uWSGI, processes, and threads.

uWSGI is a container for running web applications. It is popular in the Python world. Global interpreter lock makes concurrency muddy in Python applications. A way around this is to spawn multiple processes. While starting uWSGI, one can configure the number of processes to spawn to service concurrent web requests. The master uWSGI process starts, initializes, and loads the Python application code, and then forks the configured number of child processes. One caveat while forking is that the child process does not inherit any of the threads created in the parent process. Since uWSGI loads and initializes the Python application and then forks, the child process will not inherit any threads created during initialization in the parent process.

This bit us hard.

We were initializing a Kafka library on application load. This library internally created background threads that aggregate and push messages to a Kafka broker. The child processes uWSGI forked did not have these threads. Hence, our payloads were not flowing to Kafka.

In 99% of the cases, an application developer need not bother about how uWSGI forks and creates child processes. uWSGI abstracts this well, and the application developer can go on with her regular day to day work willfully ignorant of these abstractions. The same goes for the Kafka library. One need not pry open the library to figure out how it aggregates the messages and sends them to Kafka. In this particular case, the abstractions leaked and bit us. We spent a couple of days debugging.

One of the principles of good software design is that it should be easy to reason about.

You can interpret the above in different ways. A short interpretation is that when you look at a piece of code, it should be easy to figure it out in one’s head. Whatever you need to reason about the code should be in it and apparent - no hidden surprises. Leaky abstractions break this principle.

The uWSGI incident is fresh in my mind, but I have seen this to be a recurring pattern.

https://twitter.com/abhyrama/status/1207334951605063681

This is the problem with abstractions and magic technologies. When the abstractions leak, they bite us in ways we do not expect, and at times we do not anticipate.

Be skeptical of technologies that claim to be magic. Do not pay only lip service to KISS, follow it in spirit. Think twice before incorporating such technologies into your stack.
Figuring out the anti-patterns beforehand is as crucial as figuring out the best practices. All technologies come with a list of dont’s - grok them.
Even though it may sound like an oxymoron, invest time and effort in going beyond the abstraction and peel the layers. At the least, build a minimal mental model.
If you are a library developer, documenting how things can go wrong is as important as highlighting the rosy use cases; this is the least you can do to help your fellow craftsmen. Map the minefields, so the one does not accidentally trip on them.

https://twitter.com/abhyrama/status/1186322830410956801

1. uWSGI is a fantastic piece of software. I am not trying to diss on uWSGI at all. 2. For our particular problem, we disabled the default forking behavior of uWSGI. We enabled, lazyload, which loads the Python code after the fork rather than before.

Image by Sang Hyun Cho from Pixabay

← Charlatans and Us The million dollar question →