Vroom Vroom: Performance Engineering from First Principles

Speed is a feature. We want our apps and websites to be blazing fast. Multiple studies show that for every additional second a user waits, the likelihood of them dropping off increases. In this post, I’ll explore performance engineering through first-principles thinking—agnostic of programming languages, frameworks, and architectures. This post is not an introduction to performance engineering techniques. It is meant to equip you with a framework to think about when faced with a slow application.

Performance engineering techniques broadly fall into the following categories, and you can mix and match them wherever appropriate.

Can I be lazy?
Re-use what’s already done—this is what caching is all about. Compute something once and reuse it until it goes stale. 

Can I bring it closer to the user?
When you cache something in the browser, you’re bringing it closer to the user. The same principle applies to CDNs: they sit farther than the browser but much closer than the server. It’s all about reducing distance between the payload and the end user; the shorter the distance, the faster the response.

Can I parallelise the work?
Execute units of work simultaneously—that’s what threading and concurrency are about.

Can I reorder the steps?
Break a unit of work into multiple steps and see whether some steps can run in parallel or out of sequence. Combine this with re-use and bring-it-closer techniques. Can you cache some of those steps?

Can I use idle time to get more done?
If a unit of computation spends time waiting for something, use that idle time to compute another task. Event based concurrency uses this idea.

Can I defer non‑critical work?
Defer execution to a later time and respond to the user immediately. This gives the impression of snappiness even if the actual work happens in the background.

Can I pre-compute?
If you know something will be needed later, calculate or prefetch it in advance so that when the user asks for it, you can serve it instantly.

Can I reduce the total amount of work?
Sometimes speed comes not from doing things faster but from doing less. Simplify algorithms, use efficient data structures, and avoid unnecessary computation to reduce total workload.

Can I shrink the payload?
Shrink what’s sent over the wire—compress assets, optimise images, eliminate dead code, and lazy-load what’s not immediately needed. Smaller payloads mean faster load times.

Can I optimise the critical path?
Identify which operations directly impact what the user sees first (like rendering above-the-fold content) and optimise those paths for speed.

Can I bundle multiple operations together?
Combine multiple small requests or operations into a single one to reduce round trips, latency, and overhead.

Can I make it feel faster?
Show a loading screen or skeleton UI to give the illusion that progress is happening, rather than displaying a blank page.

Can I stream results as they’re ready immediately?
Instead of waiting for the entire process to finish, send results as soon as partial units are ready—this is what streaming is all about.

Things to keep in mind while working on performance:

  1. Do some quick napkin math to estimate the headroom for optimisation.
  2. Account for real-world constraints. The real world is messy—slow networks, dropped packets, and noisy neighbours can all disrupt your ideal optimisation plan.
  3. Once you form a performance engineering hypothesis, profile your application to validate it. Many optimisations sound good in theory but fail in practice.
  4. Measure the outcome using both profiling tools and real-world monitoring to confirm that your changes actually improved performance.
  5. Lastly, balance performance with maintainability. Often, optimisations and maintainability pull in opposite directions, so aim for a pragmatic middle ground.

Leave a comment