Month: January, 2019

The Source

I think the minute you have a backup plan, you’ve admitted you’re not going to succeed.

Elon Musk said so. Chew on this for sometime before you read the rest.


I was not honest with the above. It was not Musk who made the statement but Elizabeth Holmes, at the peak of her popularity. It has been a hard fall for Elizabeth from then to now; she is now accused of fraud.

Did your opinion of the quote change with the source? Did you go from awe to retching?

I believe we give as much importance to the source of a quote as to the quote itself. We should be internalizing quotes and aphorisms by divorcing the source. A quote should be evaluated solely on its content, not who it came from. When we do not do this, the aura of the person shadows the import of the saying, reducing it to personality worship. The significance of the quote tends to get lost.

The act of viewing a quote objectively also acts like a shit umbrella against famous people getting away with baloney like the above from Holmes. If a successful person makes a statement, we take it at face value thinking if she says it, it must be true. We should guard ourselves against this attitude.

Internalizing a quote just on its content is not easy to do as we all love narrative fallacy, but it is worth trying. As with everything, we get better at it with practice.

Image credit: Ilyass SEDDOUG

Déjà Vu

You have been trying to solve a problem for quite some time; the solution appears to be elusive. As you grapple more with the problem, a seed of a solution germinates which sprouts into an answer. In hindsight, the resolution appears obvious. You kick yourself telling why did I not think of this sooner?

How many times has the above happened to you?

I believe almost everyone goes through these in life.


One simple hack to get better and faster at problem solving is to backtrace through your thinking. Once you successfully arrive at a solution, replay your unsuccessful attempts and figure out what you could have tweaked in your thinking process to arrive at the answer faster.

High performance teams do post-mortem analysis after a critical issue. Members create RCA(root cause analysis) document which contains what went wrong, what could have been done to prevent the untoward incident from occurring and what are the steps to be taken to avoid a relapse. We should be applying the same steps to our thought process when we do not arrive at solutions on time; think of this as an RCA of your thinking process.

This simple trick I believe helps us in getting better and faster at problem-solving.

Image credit: Diego PH

Now You See Me

In the modern software world, where micro-services are de rigueur, observability of systems is paramount. If you do not have a way to observe your application, you are as good as dead.



The first step towards embracing observability is figuring out what to track. Broadly, we can categorize software observability into:
1. Infrastructure metrics.
2. Application metrics.
3. Business metrics.
4. Distributed tracing.
5. Logging.
6. Alerting.

Infrastructure metrics:
Infrastructure metrics boil down to capturing the pulse of the underlying infrastructure where the application is running. Some examples are CPU utilization, memory usage, disc space usage, network ingress, and egress. Infrastructure metrics should give a clear picture as to how well the application is utilizing the hardware it is running on. Infrastructure metrics also aid in capacity planning and scaling.

Application metrics:
Application metrics help in gauging the efficiency of the application; how fast or slow the application is responding and where are the bottlenecks. Some examples of application metrics are the API response time, the number of times a particular API is called, the processing time of a specific segment of code, calls to external services and their latency. Application metrics help in weeding out potential bottlenecks as well as in optimizing the application.

Infrastructure metrics give an overall picture whereas application metrics help in drilling down to the specifics. For example, if the infrastructure metric indicates more than 100% CPU utilization, application metrics help in zeroing in on the cause of this.

Business metrics:
Business metrics are the numbers which are crucial from a functionality point of view. For example, if the piece of code deals with user login and sign-up, some business metrics of interest would be the number of people who sign up, number of people who log in, number of people who log out, the modes of login like social login versus direct. Business metrics help in keeping a pulse on the functionality and diagnosing feature specific breakdowns.

Business metrics should not be confused with business reports. Business metrics serve a very different purpose; they are not to quantify numbers accurately but more to gauge the trend and detect anomalous behavior.

It helps to think of infrastructure, application and business metrics as a hierarchy where you zoom in from one to the other when keeping a tab on the health of the system as well as diagnosing problems. Keeping a check on all three ensures you have hale and hearty application.

Logging enables to pinpoint specific errors. The big challenge with logs is making logs easily accessible to all in the organization. Business metrics help in tracking the overall trend and logging helps to zero in on the specific details.

Distributed Tracing:
Distributed tracing ties up all the microservices in the ecosystem and assists to trace a flow end to end, as it moves from one microservice to another. Microservices fail all the time; if distributed tracing is not in place, diagnosing issues which span microservices feels like searching for a needle in a haystack.

If you have infrastructure, application and business metrics in place, you can create alerts which should be triggered when they show abnormal behavior; this pre-empts potential downtimes and business loss. One golden rule for alerts is, if it is an alert, it should be actionable. If not, alerts lose their significance and meaning.

Both commercial, as well as open source software, are available to build observability. NewRelic is one of the primary contenders on the commercial side. StatsD, Prometheus and the ilk dominate the open source spectrum. For log management, Splunk is the clear leader in the commercial space. ELK stack takes the crown on the open source front. Zipkin is an open source reference implementation of distributed tracing. Most of the metrics tracking software have alerting capabilities these days.

If you already have microservices or are moving towards that paradigm, you should be investing heavily on observability. Microservices without observability is a fool’s errand.