Wild Wild World of External Calls

Today, while developing software, external calls are a given—your code talks to external HTTP services, databases, and caches. These external communications happen over networks that are fast and work well most of the time. Once in a while, networks do show their true color—they become slow, congested, and unreliable. Even the external services can get overloaded, slow down, and start throwing errors. The code one writes to interface with external services should be able to stand steady under these circumstances.

sunset-4469118_640

In this post, I will go through some of the basics one should keep in mind while calling external services. I will use the Python Requests library to demonstrate this with external HTTP calls. The concepts remain almost the same irrespective of the programming language, library, or the kind of external service. This post is not a Python Requests tutorial.


I have created a Jupyter Notebook so that you can read and run the code interactively. Click here, then click on the file WildWildWorldOfExternalCalls.ipynb to launch the Jupyter Notebook. If you are not familiar with executing code in a Jupyter Notebook, read about it here. You can find the Notebook source here.


Let us call api.github.com using Requests.

External calls happen in two stages. First, the library asks for a socket connection from the server and waits for the server to respond. Then, it asks for the payload and waits for the server to respond. In both of these interactions, the server might choose not to respond. If you do not handle this scenario, you will be stuck indefinitely, waiting on the external service.

Timeouts to the rescue. Most libraries have a default timeout, but it may not be what you want

The first element in the timeout tuple is the time we are willing to wait to establish a socket connection with the server. The second is the time we are willing to wait for the server to respond once we make a request.

Let us see the socket timeout in action by connecting to github.com on a random port. Since the port is not open(hopefully), github.com will not accept the connection resulting in a socket timeout.

The output.

Time spent waiting for socket connection – 3.42826354 Seconds
Time spent waiting for socket connection – 6.4075264999999995 Seconds

As you can see from the output, Requests waited till the configured socket timeout to establish a connection and then errored out.

Let us move onto the read timeout.

We will use httpbin service, which lets us configure read timeouts.

The output.

Timed out after 6.941002429 Seconds

In the above, we are asking httpbin to delay the response by 9 seconds. Our read timeout is 6 seconds. As you can see from the output, Requests timed out after 6 seconds, the configured read timeout.

Let us change the read timeout to 11 seconds. We no longer get a ReadTimeout exception.

A common misconception about the read timeout is that it is the maximum time the code spends in receiving/processing the response. That is not the case. Read timeout is the time between the client sending the request and waiting for the first byte of the response from the external service. After that, if the server keeps on responding for hours, our code will be stuck reading the response.

Let me illustrate this.

The output.

Time spent waiting for the response – 28.210101459 Seconds

We are asking httpbin to send data for 30 seconds by passing the duration parameter. Requests read timeout is 15 seconds. As evident from the output, the code spends much more than 15 seconds on the response.

If you want to bound the processing time to 15 seconds, you will have to use a thread/process and stop the execution after 15 seconds.

The output.

Time spent waiting for the response – 20.012269603 Seconds

Even though we receive the HTTP response for 30 seconds, our code terminates after 20 seconds.

In many real-world scenarios, we might be calling an external service multiple times in a short duration. In such a situation, it does not make sense for us to open the socket connection each time. We should be opening the socket connection once and then re-using it subsequently.

The output.

DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): api.github.com:443
DEBUG:urllib3.connectionpool:https://api.github.com:443 “GET / HTTP/1.1” 200 496
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): api.github.com:443
DEBUG:urllib3.connectionpool:https://api.github.com:443 “GET / HTTP/1.1” 200 496
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): api.github.com:443
DEBUG:urllib3.connectionpool:https://api.github.com:443 “GET / HTTP/1.1” 200 496
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): api.github.com:443
DEBUG:urllib3.connectionpool:https://api.github.com:443 “GET / HTTP/1.1” 200 496
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): api.github.com:443
DEBUG:urllib3.connectionpool:https://api.github.com:443 “GET / HTTP/1.1” 200 496

As you can see from the output, Requests started a new connection each time; this is inefficient and non-performant.

We can prevent this by using HTTP Keep-Alive as below. Using Requests Session enables this.

The output.

DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): api.github.com:443
DEBUG:urllib3.connectionpool:https://api.github.com:443 “GET / HTTP/1.1” 200 496
DEBUG:urllib3.connectionpool:https://api.github.com:443 “GET / HTTP/1.1” 200 496
DEBUG:urllib3.connectionpool:https://api.github.com:443 “GET / HTTP/1.1” 200 496
DEBUG:urllib3.connectionpool:https://api.github.com:443 “GET / HTTP/1.1” 200 496
DEBUG:urllib3.connectionpool:https://api.github.com:443 “GET / HTTP/1.1” 200 496

Now, Requests established the socket connection only once and re-used it subsequently.

In a real-world scenario, where multiple threads call external services simultaneously, one should use a pool.

The output.

DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): api.github.com:443
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (2): api.github.com:443
DEBUG:urllib3.connectionpool:https://api.github.com:443 “GET / HTTP/1.1” 200 496
DEBUG:urllib3.connectionpool:https://api.github.com:443 “GET / HTTP/1.1” 200 496
DEBUG:urllib3.connectionpool:https://api.github.com:443 “GET / HTTP/1.1” 200 496
DEBUG:urllib3.connectionpool:https://api.github.com:443 “GET / HTTP/1.1” 200 496

As we have created a pool of size two, Requests created only two connections and re-used them, even though we made four external calls.

Pools also help you to play nice with external services as external services have an upper limit to the number of connections a client can open. If you breach this threshold, external services start refusing connections.

When calling an external service, you might get an error. Sometimes, these errors might be transient. Hence, it makes sense to re-try. The re-tries should happen with an exponential back-off.

Exponential back-off is a technique in which clients re-try failed requests with increasing delays between the re-tries. Exponential back-off ensures that the external services do not get overwhelmed, another instance of playing nice with external services.

The output.

DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): httpbin.org:443
DEBUG:urllib3.connectionpool:https://httpbin.org:443 “GET /status/500 HTTP/1.1” 500 0
DEBUG:urllib3.util.retry:Incremented Retry for (url=’/status/500′): Retry(total=2, connect=None, read=None, redirect=None, status=None)
DEBUG:urllib3.connectionpool:Retry: /status/500
DEBUG:urllib3.connectionpool:https://httpbin.org:443 “GET /status/500 HTTP/1.1” 500 0
DEBUG:urllib3.util.retry:Incremented Retry for (url=’/status/500′): Retry(total=1, connect=None, read=None, redirect=None, status=None)
DEBUG:urllib3.connectionpool:Retry: /status/500
DEBUG:urllib3.connectionpool:https://httpbin.org:443 “GET /status/500 HTTP/1.1” 500 0
DEBUG:urllib3.util.retry:Incremented Retry for (url=’/status/500′): Retry(total=0, connect=None, read=None, redirect=None, status=None)
DEBUG:urllib3.connectionpool:Retry: /status/500
DEBUG:urllib3.connectionpool:https://httpbin.org:443 “GET /status/500 HTTP/1.1” 500 0

In the above, we are asking httpbin to respond with an HTTP 500 status code. We configured Requests to re-try thrice, and from the output, we can see that Requests did just that.

Client libraries do a fantastic job of abstracting all the flakiness from external calls and lull us into a false sense of security. But, all abstractions leak at one time or the other. These defenses will help you to tide over these leaks.

No post on external services can be complete without talking about the Circuit Breaker design pattern. Circuit Breaker design pattern helps one to build a mental model of many of the things we talked about and gives a common vocabulary to discuss them. All programming languages have libraries to implement Circuit Breakers. I believe Netflix popularised the term Circuit Breaker with its library Hystrix.

Get articles on coding, software and product development, managing software teams, scaling organisations and enhancing productivity by subscribing to my blog

Image by RENE RAUSCHENBERGER from Pixabay

Centralization and Decentralization

Top management loves centralization. Rank and file prefer decentralization.

Why?

Imagine you are the CEO of a company with multiple teams. 

Teams need software to do their work. When the need arises for a software, someone from each of the team talks to the software company negotiates a price and procures the software. 

As a CEO, you observe this and see it as a duplication of effort – wastage of time, energy, and resources. You think you can improve efficiency by centralizing the software procurement process. Only one team will be doing the work – the software procurement team. Also, this team will be able to negotiate a better price due to multiple orders, remove redundancy, manage licenses better, and block unnecessary software spends.

Since software cost is a real expense, you can quantify the gain from this exercise.

black-ceiling-wall-161043

What about the downside?

Earlier, each team could independently procure the software they saw fit. Now, the individual teams have to go through the centralized procurement team and justify the need; this leads to back and forth and delays. The delay affects the cadence of work leading to employee dissatisfaction. Employee dissatisfaction leads to reduced quality of work, which in turn negatively affects the bottom line.

It is not easy to quantify the second-order effects of centralization, sometimes impossible.

The CEO, due to the broad nature of her work, sees the duplication everywhere. She also witnesses the expenses as a result of this; it is in her face. She wants to eliminate this and bring efficiency and cost-saving to the organization. Hence, she champions centralization. 

The rank and file are hands-on; they have to deal with the management policies to do their work. They experience the second-order effects of centralization day in and out. They instinctually develop anti-centralization spidey sense

Unline the rank and file; the CEO does not have the ringside view of the second-order side effects of centralization. The rank and file do not see the duplications the CEO sees because they do not have the same expansive look like that of the CEO.

Centralization efforts have a quantifiable impact. If not entirely measurable, you can do some mental gymnastics to get an idea.

The downsides of centralization are unquantifiable. The unquantifiable plays a crucial role in success, sometimes much more than the quantifiable.

Morgan Housel calls this the McNamara Fallacy.

McNamara Fallacy: A belief that rational decisions can be made with quantitative measures alone, when in fact the things you can’t measure are often the most consequential. Named after Defense Secretary McNamara, who tried to quantify every aspect of the Vietnam War.

Let us flip the earlier scenario. Imagine that the centralized procurement team does bring in efficiency and reduce cost, albeit at a minor loss of productivity. The software procurement expense as a whole is never on the mind of the rank and file; the rank and file do not look at it as closely as the CEO; it is not always on their face. Hence, the rank and file still view centralization as a bane, even when it brings in advantages.

The consensus is that a decentralized way of working trumps a centralized approach; this applies to the military too. Jocko Willink, a prolific US Navy Seal, champions decentralized command. 

There are valid cases for centralization, especially when the talent required to do something is in short supply, and there are legitimate gains to be had from economies of scale. But, when you centralize, think hard of the unquantifiable second-order effects of the decision.

Get articles on coding, software and product development, managing software teams, scaling organisations and enhancing productivity by subscribing to my blog

 

Working Hard To Be Lazy

The programming world heralds laziness as one of the virtues of a programmer.

Larry Wall, the creator of Perl, says – Most of you are familiar with the virtues of a programmer. There are three, of course: laziness, impatience, and hubris.

What no one tells you is that this laziness does not come for free; one has to work hard to imbibe this trait.

 

work-47200_640

 

In practical terms, what does being lazy translate to?

  1. Doing as little as possible, never more than needed.
  2. Instead of doing things yourself, delegating to well-established tools, libraries, and frameworks.

Let us work with some concrete examples.

You want to parse a CSV file.

You think: let me load the file, parse it line by line, and split each line on a comma. You roll up your sleeves and code this. You feel smug having solved the problem yourself without anyone’s help.

Trouble starts when the CSV you parse has a header. Now you add an if condition to detect the first line. Later, someone uploads a CSV separated by a tab instead of a comma. You add another if condition to accommodate this. Another person uploads a CSV which has quoted fields. You start doubting yourself and ask how many such “unknown unknows” are there when it comes to parsing a CSV?

Unknown unknowns are risks that come from situations that are so unexpected that they would not be considered.

CSV parsing might have a lot of “unknown unknowns” for you – a person who is not well versed with the intricacies of CSV format. But there are experts out there who know the CSV format and have written libraries to handle all the edge cases and surprises that it might throw. You hedge your “unknown unknown” risk by delegating the CSV parsing to one of these libraries.

In short, be lazy, do as little as possible, and delegate to well-established libraries.

“Fools say that they learn by experience. I prefer to profit by others experience.” 

― Otto von Bismarck

Let us consider another scenario.

You want to store a counter in a database. One approach is: when you want to increment the count, you get the current count from the database, add one to it and store the new count back in the database.

Do you see the problem with this approach?

What if many threads are doing this in parallel? You will end up with a wrong count. A better approach is to delegate the task of incrementing the count to the database by leveraging SQL’s arithmetic operators. This approach makes the counter increment atomic. Many threads trying to increment the count is no longer a concern.

By doing less yourself and delegating the task of incrementing the counter to the database, you have saved yourself from bugs.

Why is this hard work?

This sort of thinking does not come easy; you have to work hard to identify where what and to whom you can delegate the work.

Dunning-Kruger effect might have a role to play in this. We believe we are the experts and best suited to do things.

In the field of psychology, the Dunning–Kruger effect is a cognitive bias in which people assess their cognitive ability as greater than it is. It is related to the cognitive bias of illusory superiority and comes from the inability of people to recognize their lack of ability.

While coding, most of the time, you are solving a problem that someone else has already solved, probably in a different context. Be aware of your biases and always question: Is this something I have to code myself, or can I offload this to an already written, well established and well-tested library or framework?

“Learn from the mistakes of others. You can’t live long enough to make them all yourself.”

― Eleanor Roosevelt

Get articles on coding, software and product development, managing software teams, scaling organisations and enhancing productivity by subscribing to my blog

Image by Clker-Free-Vector-Images from Pixabay

The Million Dollar Question

What is the point of life?

All of us have pondered over this question. Luminaries have devoted their lives in the pursuit of an answer to this question. Philosophers have written voluminous texts trying to answer this question.

I am no Yogi, but that does not disqualify me from trying to answer this profound question. Beware, my answer might leave you with a feeling of meh.

During a holiday, a group of us friends played a weird game of football. We were randomly dribbling the ball, passing, and tackling each other – no teams, rules, goals, and referees. This pointless pursuit of the ball was fun.

What is the difference between kids and adults?

emily-morter-8xAA0f9yQnE-unsplash

Kids involve themselves in pointless pursuits. They are always engaged in one activity or the other. These consume them. We, the self-critical adults, try to see a point in everything. Few things consume us.

Give a cardboard box to a kid. She can keep herself occupied with the box for hours—an adult dreads at the thought of this.

When a child is young, she loves to draw irrespective of whether she is good at drawing or not. As she grows older, she pursues drawing only if she finds herself good at it. Enter adulthood, she becomes self-critical and continues her hobby only if she sees a point in it.

As an adult, try to remember the last time you were engaged in and consumed by a pointless activity.

A child actively indulges in role-play, creating stories in her head and acting it out. An adult passively watches role play in tv-series and movies. A child plays a variety of games. An adult passively enjoys sports watching others play.

As we age, we move from an active to a passive life. We try to seek a point in everything.

A child has no time to search for meaning. She is busy indulging herself in everything. The activity is the end; it is not a means to an end. I believe the same goes for life.

The point of life is not to search for meaning but to indulge in it. It is a pointless existence, and there is a joy to be had in understanding this. It is liberating.

Get articles on coding, software and product development, managing software teams, scaling organisations and enhancing productivity by subscribing to my blog

Photo by Emily Morter on Unsplash

Murphy’s Law Of Software Abstractions

All software abstractions, sooner or later, leak.

When this happens, it hurts.

To drive a car, you need not know how it works internally. The mechanics of an automobile is well abstracted from the driver. Similarly, software libraries, tools, and frameworks promise abstraction to the engineers using them. They promise that one can use them effectively without delving into the internals.

justice-2060093_640

Let me tell you a story of uWSGI, processes, and threads.

uWSGI is a container for running web applications. It is popular in the Python world. Global interpreter lock makes concurrency muddy in Python applications. A way around this is to spawn multiple processes. While starting uWSGI, one can configure the number of processes to spawn to service concurrent web requests. The master uWSGI process starts, initializes, and loads the Python application code, and then forks the configured number of child processes. One caveat while forking is that the child process does not inherit any of the threads created in the parent process. Since uWSGI loads and initializes the Python application and then forks, the child process will not inherit any threads created during initialization in the parent process.

This bit us hard.

We were initializing a Kafka library on application load. This library internally created background threads that aggregate and push messages to a Kafka broker. The child processes uWSGI forked did not have these threads. Hence, our payloads were not flowing to Kafka.

In 99% of the cases, an application developer need not bother about how uWSGI forks and creates child processes. uWSGI abstracts this well, and the application developer can go on with her regular day to day work willfully ignorant of these abstractions. The same goes for the Kafka library. One need not pry open the library to figure out how it aggregates the messages and sends them to Kafka. In this particular case, the abstractions leaked and bit us. We spent a couple of days debugging.

One of the principles of good software design is that it should be easy to reason about.

You can interpret the above in different ways. A short interpretation is that when you look at a piece of code, it should be easy to figure it out in one’s head. Whatever you need to reason about the code should be in it and apparent – no hidden surprises. Leaky abstractions break this principle.

The uWSGI incident is fresh in my mind, but I have seen this to be a recurring pattern.

This is the problem with abstractions and magic technologies. When the abstractions leak, they bite us in ways we do not expect, and at times we do not anticipate.

  1. Be skeptical of technologies that claim to be magic. Do not pay only lip service to KISS, follow it in spirit. Think twice before incorporating such technologies into your stack.
  2. Figuring out the anti-patterns beforehand is as crucial as figuring out the best practices. All technologies come with a list of dont’s – grok them.
  3. Even though it may sound like an oxymoron, invest time and effort in going beyond the abstraction and peel the layers. At the least, build a minimal mental model.
  4. If you are a library developer, documenting how things can go wrong is as important as highlighting the rosy use cases; this is the least you can do to help your fellow craftsmen. Map the minefields, so the one does not accidentally trip on them.

1. uWSGI is a fantastic piece of software. I am not trying to diss on uWSGI at all.

2. For our particular problem, we disabled the default forking behavior of uWSGI. We enabled, lazyload, which loads the Python code after the fork rather than before.

Get articles on coding, software and product development, managing software teams, scaling organisations and enhancing productivity by subscribing to my blog

Image by Sang Hyun Cho from Pixabay

Charlatans and Us

Charlatan – a person, falsely claiming to have special knowledge or skill.

“How do we hire amazing engineers fast?” is a question people ask me often.

When someone asks me the question, they usually expect a profound answer, which will cure all their hiring pains. Hiring, especially good people, is a long, involved, and arduous process. There are no deep secrets to this. But, this is not what people want to hear because they already know this. Instead, they expect a magic potion, a hack, which will wipe out all the hiring woes. My standard answer to the hiring question is along the lines of – “I do not have any tricks up my sleeve to help you with that.”

ace-1869825_640

When someone is expecting a profound answer, and you do not have any, it is very tempting to come up with one. When the other person is seeking enlightenment, and all you have is mundaneness, you feel like an amateur and a buzz kill.

Charlatans start like this. People expect magical answers from them; they do not have any. Still, the expectation from others is so high that they start coming up with one, and then it becomes a self-fulfilling prophecy. Also, once you do this multiple times, you start drinking your own kool-aid. You do not even recognize that you are a charlatan. You genuinely start believing that you are a messiah.

Growth is a catch-22 problem. You need to endure pain to grow. You are not ready to experience pain unless you see the growth. But you do not see growth unless you suffer pain. Charlatans, with their quick and simple hacks, give us hope of disproportionate returns by investing little effort, hence the demand for charlatans.

The rich and the famous are often called charlatans. We hold the rich and the famous accountable for lofty morals and weave stories of their impeccable character. We forget that they are just like us – winging through life, taking shortcuts, not knowing what is happening or where they are heading. When the rich and the famous know of these high expectations, which a lot of them do not have(there is nothing wrong with this), they artificially try to mold themselves on these lines. In today’s age of social media, where virtue signaling is just a click away, it is getting easier to do this. When you fake it, it can only go so far. One day, the cloak falls, and the grandness built on flimsy appearances comes tumbling down. We then start calling these celebrities charlatans, but little we introspect on our role in them turning out to be charlatans.

As much as we like to blame charlatans for their deception, a significant part of the blame rests on us for creating them. It is our unholy expectations and quick reward-seeking nature that gives rise to charlatans.

Get articles on coding, software and product development, managing software teams, scaling organisations and enhancing productivity by subscribing to my blog

Image by Pexels from Pixabay

The Games We Play

Two members of a criminal gang are arrested and imprisoned. Each prisoner is in solitary confinement with no means of communicating with the other. The prosecutors lack sufficient evidence to convict the pair on the principal charge, but they have enough to convict both on a lesser charge. Simultaneously, the prosecutors offer each prisoner a bargain – betray the other by testifying that the other committed the crime, or cooperate with the other by remaining silent. The possible outcomes are:

  • If A and B betray each other, both of them serve two years in prison.
  • If A betrays B, but B remains silent, A will be set free, and B will serve three years in prison (and vice versa).
  • If A and B both remain silent, both will serve only one year in prison (on the lesser charge).

The prisoners cannot communicate and come up with an optimal strategy.

If A betrays, then it is in B’s best interest too to betray. B will end up serving two years instead of three if B remains silent. If A does not betray, then also it is in B’s best interest to betray. B will walk out scot-free by ratting out on A. Same line of thinking applies to A too.

If A and B think only of their self-interest, they end up betraying each other — A and B will spend two years in prison. Instead, if they cooperate by remaining silent, they get out in a year — cooperating results in a better outcome. Optimizing for their self-interest ends up harming them instead of helping.

Game theory refers to the above as prisoner’s dilemma. Game theory is the study of human cooperation and the incentives driving us to cooperate.

perry-grone-lbLgFFlADrY-unsplash

Prisoner’s dilemma in a nutshell models situations in which individuals selfishly act in their self-interest, thinking it will benefit them when, in reality, it ends up harming all including themselves.

If you keep your eyes open, you can see prisoner’s dilemma everywhere.

You see it on the roads every day. All are selfishly optimizing for themselves by not following traffic rules, thus leading to detrimental traffic conditions for all.

You see it in companies where individuals and teams selfishly optimize for their narrow goals, which ends up harming the company.

You see it in the treatment of public resources like buses, toilets, and parks. No one seems to care about the upkeep of shared public resources, whereas caring for these resources would lead to a better quality of life for all.

Use prisoner’s dilemma as a lens to understand why people do not collaborate even when collaboration would have resulted in a better outcome.

Prisoner’s dilemma gives us a model to think about:

  1. Cooperation between people who do not know each other.
  2. The incentives to cooperate when the benefit of collaboration is not apparent.

Show me the incentives, and I will show you the outcome.

– Charlie Munger

In the face of non-communication and unclear incentives to collaborate, what would have lead to A and B cooperating?

Imagine that the criminal community had a strict rule of never confessing to the police. Breaking this code meant certain death. In the presence of such a system, perhaps A and B would have remained silent, leading to implicit cooperation. The Italian mafia has such a code called Omertà.

Imagine that in the criminal community, confessing to the police meant that your reputation is tainted forever. You will never find work again. In the presence of such a convention, perhaps A and B would have remained silent, leading to implicit cooperation.

If A and B had to work together in the future on other projects, perhaps A and B would have remained silent, leading to implicit cooperation.

If A and B were the members of a cult that says betraying a fellow member leads to eternal damnation in the afterlife, perhaps A and B would have remained silent, leading to implicit cooperation.

Many of the social constructs like strong laws, fervor nationalism, religion, trust, reputation, and community are society’s answer to prisoner’s dilemma. We, humans, have collectively evolved these practices as a way to facilitate implicit cooperation, thus leading to a better quality of life.

Thinking through the lens of prisoner’s dilemma explains why:

Small teams are more successful than big ones.

Scrawny resource-starved startups trump multinational corporations with deep pockets.

Tightly knit small communities have a lower crime rate than big cities.

Small homogeneous nations are more successful than diverse big ones.

How do companies solve the problem of prisoner’s dilemma?
Organization values, emphasis on team building and bonding, rewards and recognition, processes, and rules are some of the obvious ones. Some companies create a religious cult-like atmosphere as an answer to the problem of prisoner’s dilemma.

Implicit collaboration between people is critical to the success of everything – teams, projects, companies, societies, and countries; use prisoner’s dilemma as a way to think and model this. Thinking in terms of prisoner’s dilemma helps us to devise constructs that incentivize collaboration.

This post is not a rigorous explanation of prisoner’s dilemma; I have taken poetic liberties with it. Wikipedia entry on prisoner’s dilemma has a thorough explanation; it is an engaging read too.

Get articles on coding, software and product development, managing software teams, scaling organisations and enhancing productivity by subscribing to my blog

Photo by Perry Grone on Unsplash.

Becoming a Guru Programmer

Are you in awe of the Jedi programmers who seem to produce bugless code? Are you bewildered by the Guru programmers who fight inefficient code with their hands tied and eyes closed? They are not superhumans; these are mere mortals who have a repertoire of bug patterns in their heads owing to their experience. They have also mastered behavioral traits that aid in detecting bugs and flushing out inefficient code.

yoda-3888783_640
One can avoid the majority of bugs by adopting two behavioral traits.

  1. Taking a step back and asking – What can go wrong?
  2. Asserting your assumptions.

Let us work with an example.

The below code accepts a list and returns the first element.


def get_first_elem(lst):
    return lst[0]

What are the implicit assumptions that you see?

  1. No one will pass a null list.
  2. No one will pass an empty list.

What can go wrong?

  1. If someone passes a null list, the code errors out.
  2. If someone passes an empty list, the code errors out.

The idea is not to code defensively but to be aware of the assumptions and error conditions. It might as well be that the function should error out when someone passes a null or empty list; when it happens, it should not be a surprise.

Adopt these behavioral traits whenever you read or write code; you will be miles ahead of the rest.

Some of the other common bug patterns follow.

Never let it leak

photo-of-gray-faucet-2339722

Not closing opened resources – be it file descriptors, database connections, HTTP connections, or socket connections. Programming languages have constructs to do this – finally block in Java and Python, defer in Go lang. Whenever you open a resource, close it; never let it leak.

Fence it

chain-linked-fence-683402

When establishing a connection to an external resource, be it a database or a remote server; configure appropriate timeouts. Being stuck for an undefined period establishing a connection is not a happy place to be in; fence the connection establishment time to a reasonable value. Also, timeouts come in various flavors – connection establishment timeout, socket timeout, HTTP server timeouts. Familiarise yourself with all that apply to your scenario.

Bound it

rope-dew-brown-beige-39279

Keep an eye on runaway resource creation. There is a limit to the no of connections that an external system can handle; there is a limit to the no of files one can create on a file system. Be aware of these limits and put in checks and balances to bound the creation to acceptable values.

Reuse over recreate

triangle-1710880_640

If you are opening a connection or creating an object repeatedly, check whether it is possible to pool the resources instead of repeatedly re-creating. Create a pool of resources once and then reuse when needed. This principle applies to all sorts of connections – HTTP, socket, database.

Using a resource pool alleviates the boundless resource creation problem too.

Encoding it right

data-4309971_640

While working with text, take care of character encoding. Things work great until one beautiful day someone passes a foreign text to your code, and everything collapses like a house of cards. Familiarise yourself with character encoding and take care of it while coding.

Stand on the shoulders of giants

elephant-1822481_640

If I have seen further, it is by standing on the shoulders of Giants.

– Isaac Newton.

The three chief virtues of a programmer are: Laziness, Impatience and Hubris.

– Larry Wall.

Doing everything yourself and not delegating to established libraries, frameworks, and tools is the root cause of a large number of bugs. In all probability, someone would have faced the problem that you are facing and crafted a well-tested solution to it – shamelessly use it. There is a reason why great programmers claim laziness is a virtue – follow this in spirit and practice.

Grokking these bug patterns and adopting the behavioral traits will make you a coding Yoda, who does not want to be one?

An earlier post on similar lines on Software Security.

Get articles on coding, software and product development, managing software teams, scaling organisations and enhancing productivity by subscribing to my blog

Yoda image by Mario Eppinger from Pixabay

Leaky tap photo by Luis Quintero from Pexels

Fence photo by Min An from Pexels

Recycle image by 95C from Pixabay

Encode image by Gerd Altmann from Pixabay

Elephant image by Sasin Tipchai from Pixabay

Enablers, Not Doers

How do you run effective Platform Engineering teams?

All organizations have Platform Engineering teams in one form or the other; these are centralized engineering teams providing building blocks for other engineering groups within the company. The customers for these teams are the internal engineers, not the end-users of the product.

For Platform Engineering teams to be effective, do not strongly couple them with the other engineering groups. Even though these are centralized engineering teams, they should operate in a decentralized manner. Platform Engineering teams should follow the mantra of Loosely coupled but strongly aligned with the other engineering teams. To achieve this, view Platform Engineering teams as enablers, not as doers.

stone-wall-86660_640

The build engineering team creates tools and resources so that any team in the organization can deploy their builds to production, i.e., the Build engineering team enables you to deploy builds using the tools they create; they do not deploy the build for you.

The performance engineering team creates tools and frameworks for you to identify performance bottlenecks, i.e., the performance engineering team enables you to identify performance bottlenecks using the tools they build; they do not identify performance bottlenecks for you.

The security engineering team creates tools and libraries for you to identify security loopholes, i.e., the security engineering team enables you to identify security holes using the tools they build; they do not identify security loopholes for you.

This is how organizations should mold and communicate the role of Platform Engineering teams. Positioning the Platform Engineering teams as doers instead of enablers makes them the bottleneck for other teams to get their work done.

Other teams rely on Platform Engineering teams for their success. You do not want them second-guessing whether the Platform Engineering teams are doing their job or not. They need to trust the Platform Engineering teams with their critical workloads. Without trust, everything breaks, leading to unnecessary back and forth, sapping the energy of all parties involved.

To build this culture of trust, Platform Engineering teams should focus on observability, performance, and stability.

Lack of observability creates anxiety. If the build engineering team does not give a dashboard where one can see the progress of builds, history of builds, and create alerts on build failure – teams would be anxious about their builds. They would bug the Build Engineering team on this, causing stress all around. Observability is a must for creating a culture of zero stress and anxiety.

Documentation is a subset of observability. Once the Platform Engineering team releases a tool or an API, others should be able to use them without intervention from the Platform Engineering team. To achieve this, Platform Engineering teams should focus on clear and concise documentation.

Since all teams use the tools and APIs of Platform Engineering teams, performance improvements have a multiplicative effect.

The build tool not working, brings the engineering org to a standstill. Login not working negatively affects all parts of the application. Hence, a keen focus on stability is a must. These are the foundations on which other engineering teams build their features.

Due to the nature of the work of Platform Engineering teams, consulting, guiding, and proliferation of best practices becomes part and parcel of the day to day responsibilities; this is understated, but Platform Engineering teams spend a significant chunk of their time on this. Look for ways to institutionalize these so that the Platform Engineering teams are engaged in their core work and not spending time on this. As discussed earlier, focussing on observability, performance, and stability goes a long way towards this.

Feature prioritization can become a challenge for Platform Engineering teams as everyone comes to them with feature requests. A simple yardstick to use is – If we do not release this functionality, is there a way for the requesting team to go about their work, albeit in a roundabout manner? If the answer is yes, then it is not a burning problem. If not, you need to figure out a way to get the feature out as soon as possible.

Platform Engineering teams should adopt a broad perspective when developing features. Do not develop features for a particular team. Think of how the feature relates to other teams and design it so that everyone in the organization can leverage the feature.

If you want to build a culture of speed and rapid iteration, viewing Platform Engineering teams as enablers and not as doers is critical.

Get articles on coding, software and product development, managing software teams, scaling organisations and enhancing productivity by subscribing to my blog

Image by Susbany from Pixabay

 

Optimists, Pessimists, and Better Coders

Whether one is an optimist or pessimist is dictated by genetics. Wise people say that you can give your genetics a run for the money by making happiness a conscious choice and learn to be deliberately happy.

What does this have to do with coding?

christophe-hautier-902vnYeoWS4-unsplash.jpg

How do you decide whether a code is good or not?
There are many parameters, but a definitive indicator is the number of bugs – lesser the bugs, better the code.

What is the definition of a bug?
A bug is a problem with the code, which makes it not work correctly.

Why do bugs occur?
The person who authored the code did not anticipate that particular condition, and the code does not know how to handle that situation.

How does one prevent bugs?
Anticipate all that can go wrong and take care of them while coding.

To do this, one needs to take a bleak look at things – exhaustively think of all that can go wrong and take care of these. To put it shortly, you need to wear a pessimist’s cap.

Then, do pessimists make better coders?

There is another way to look at this.

To create something, you need to be an optimist; coding is about creating something new. To write code devoid of bugs, you need to take care of edge cases and boundary conditions and account for them.

Is a better coder someone who can balance optimism and pessimism?

Get articles on coding, software and product development, managing software teams, scaling organisations and enhancing productivity by subscribing to my blog

Photo by Christophe Hautier on Unsplash