The Games We Play

Two members of a criminal gang are arrested and imprisoned. Each prisoner is in solitary confinement with no means of communicating with the other. The prosecutors lack sufficient evidence to convict the pair on the principal charge, but they have enough to convict both on a lesser charge. Simultaneously, the prosecutors offer each prisoner a bargain – betray the other by testifying that the other committed the crime, or cooperate with the other by remaining silent. The possible outcomes are:

  • If A and B betray each other, both of them serve two years in prison.
  • If A betrays B, but B remains silent, A will be set free, and B will serve three years in prison (and vice versa).
  • If A and B both remain silent, both will serve only one year in prison (on the lesser charge).

The prisoners cannot communicate and come up with an optimal strategy.

If A betrays, then it is in B’s best interest too to betray. B will end up serving two years instead of three if B remains silent. If A does not betray, then also it is in B’s best interest to betray. B will walk out scot-free by ratting out on A. Same line of thinking applies to A too.

If A and B think only of their self-interest, they end up betraying each other — A and B will spend two years in prison. Instead, if they cooperate by remaining silent, they get out in a year — cooperating results in a better outcome. Optimizing for their self-interest ends up harming them instead of helping.

Game theory refers to the above as prisoner’s dilemma. Game theory is the study of human cooperation and the incentives driving us to cooperate.

perry-grone-lbLgFFlADrY-unsplash

Prisoner’s dilemma in a nutshell models situations in which individuals selfishly act in their self-interest, thinking it will benefit them when, in reality, it ends up harming all including themselves.

If you keep your eyes open, you can see prisoner’s dilemma everywhere.

You see it on the roads every day. All are selfishly optimizing for themselves by not following traffic rules, thus leading to detrimental traffic conditions for all.

You see it in companies where individuals and teams selfishly optimize for their narrow goals, which ends up harming the company.

You see it in the treatment of public resources like buses, toilets, and parks. No one seems to care about the upkeep of shared public resources, whereas caring for these resources would lead to a better quality of life for all.

Use prisoner’s dilemma as a lens to understand why people do not collaborate even when collaboration would have resulted in a better outcome.

Prisoner’s dilemma gives us a model to think about:

  1. Cooperation between people who do not know each other.
  2. The incentives to cooperate when the benefit of collaboration is not apparent.

Show me the incentives, and I will show you the outcome.

– Charlie Munger

In the face of non-communication and unclear incentives to collaborate, what would have lead to A and B cooperating?

Imagine that the criminal community had a strict rule of never confessing to the police. Breaking this code meant certain death. In the presence of such a system, perhaps A and B would have remained silent, leading to implicit cooperation. The Italian mafia has such a code called Omertà.

Imagine that in the criminal community, confessing to the police meant that your reputation is tainted forever. You will never find work again. In the presence of such a convention, perhaps A and B would have remained silent, leading to implicit cooperation.

If A and B had to work together in the future on other projects, perhaps A and B would have remained silent, leading to implicit cooperation.

If A and B were the members of a cult that says betraying a fellow member leads to eternal damnation in the afterlife, perhaps A and B would have remained silent, leading to implicit cooperation.

Many of the social constructs like strong laws, fervor nationalism, religion, trust, reputation, and community are society’s answer to prisoner’s dilemma. We, humans, have collectively evolved these practices as a way to facilitate implicit cooperation, thus leading to a better quality of life.

Thinking through the lens of prisoner’s dilemma explains why:

Small teams are more successful than big ones.

Scrawny resource-starved startups trump multinational corporations with deep pockets.

Tightly knit small communities have a lower crime rate than big cities.

Small homogeneous nations are more successful than diverse big ones.

How do companies solve the problem of prisoner’s dilemma?
Organization values, emphasis on team building and bonding, rewards and recognition, processes, and rules are some of the obvious ones. Some companies create a religious cult-like atmosphere as an answer to the problem of prisoner’s dilemma.

Implicit collaboration between people is critical to the success of everything – teams, projects, companies, societies, and countries; use prisoner’s dilemma as a way to think and model this. Thinking in terms of prisoner’s dilemma helps us to devise constructs that incentivize collaboration.

This post is not a rigorous explanation of prisoner’s dilemma; I have taken poetic liberties with it. Wikipedia entry on prisoner’s dilemma has a thorough explanation; it is an engaging read too.

Get articles on coding, software and product development, managing software teams, scaling organisations and enhancing productivity by subscribing to my blog

Photo by Perry Grone on Unsplash.

Becoming a Guru Programmer

Are you in awe of the Jedi programmers who seem to produce bugless code? Are you bewildered by the Guru programmers who fight inefficient code with their hands tied and eyes closed? They are not superhumans; these are mere mortals who have a repertoire of bug patterns in their heads owing to their experience. They have also mastered behavioral traits that aid in detecting bugs and flushing out inefficient code.

yoda-3888783_640

 

One can avoid the majority of bugs by adopting two behavioral traits.

  1. Taking a step back and asking – What can go wrong?
  2. Asserting your assumptions.

Let us work with an example.

The below code accepts a list and returns the first element.


def get_first_elem(lst):
    return lst[0]

What are the implicit assumptions that you see?

  1. No one will pass a null list.
  2. No one will pass an empty list.

What can go wrong?

  1. If someone passes a null list, the code errors out.
  2. If someone passes an empty list, the code errors out.

The idea is not to code defensively but to be aware of the assumptions and error conditions. It might as well be that the function should error out when someone passes a null or empty list; when it happens, it should not be a surprise.

Adopt these behavioral traits whenever you read or write code; you will be miles ahead of the rest.

Some of the other common bug patterns follow.

Never let it leak

photo-of-gray-faucet-2339722

Not closing opened resources – be it file descriptors, database connections, HTTP connections, or socket connections. Programming languages have constructs to do this – finally block in Java and Python, defer in Go lang. Whenever you open a resource, close it; never let it leak.

Fence it

chain-linked-fence-683402

When establishing a connection to an external resource, be it a database or a remote server; configure appropriate timeouts. Being stuck for an undefined period establishing a connection is not a happy place to be in; fence the connection establishment time to a reasonable value. Also, timeouts come in various flavors – connection establishment timeout, socket timeout, HTTP server timeouts. Familiarise yourself with all that apply to your scenario.

Bound it

rope-dew-brown-beige-39279

Keep an eye on runaway resource creation. There is a limit to the no of connections that an external system can handle; there is a limit to the no of files one can create on a file system. Be aware of these limits and put in checks and balances to bound the creation to acceptable values.

Reuse over recreate

triangle-1710880_640

If you are opening a connection or creating an object repeatedly, check whether it is possible to pool the resources instead of repeatedly re-creating. Create a pool of resources once and then reuse when needed. This principle applies to all sorts of connections – HTTP, socket, database.

Using a resource pool alleviates the boundless resource creation problem too.

Encoding it right

data-4309971_640

While working with text, take care of character encoding. Things work great until one beautiful day someone passes a foreign text to your code, and everything collapses like a house of cards. Familiarise yourself with character encoding and take care of it while coding.

Stand on the shoulders of giants

elephant-1822481_640

If I have seen further, it is by standing on the shoulders of Giants.

– Isaac Newton.

The three chief virtues of a programmer are: Laziness, Impatience and Hubris.

– Larry Wall.

Doing everything yourself and not delegating to established libraries, frameworks, and tools is the root cause of a large number of bugs. In all probability, someone would have faced the problem that you are facing and crafted a well-tested solution to it – shamelessly use it. There is a reason why great programmers claim laziness is a virtue – follow this in spirit and practice.

Grokking these bug patterns and adopting the behavioral traits will make you a coding Yoda, who does not want to be one?

An earlier post on similar lines on Software Security.

Get articles on coding, software and product development, managing software teams, scaling organisations and enhancing productivity by subscribing to my blog

Yoda image by Mario Eppinger from Pixabay

Leaky tap photo by Luis Quintero from Pexels

Fence photo by Min An from Pexels

Recycle image by 95C from Pixabay

Encode image by Gerd Altmann from Pixabay

Elephant image by Sasin Tipchai from Pixabay

Enablers, Not Doers

How do you run effective Platform Engineering teams?

All organizations have Platform Engineering teams in one form or the other; these are centralized engineering teams providing building blocks for other engineering groups within the company. The customers for these teams are the internal engineers, not the end-users of the product.

For Platform Engineering teams to be effective, do not strongly couple them with the other engineering groups. Even though these are centralized engineering teams, they should operate in a decentralized manner. Platform Engineering teams should follow the mantra of Loosely coupled but strongly aligned with the other engineering teams. To achieve this, view Platform Engineering teams as enablers, not as doers.

stone-wall-86660_640

The build engineering team creates tools and resources so that any team in the organization can deploy their builds to production, i.e., the Build engineering team enables you to deploy builds using the tools they create; they do not deploy the build for you.

The performance engineering team creates tools and frameworks for you to identify performance bottlenecks, i.e., the performance engineering team enables you to identify performance bottlenecks using the tools they build; they do not identify performance bottlenecks for you.

The security engineering team creates tools and libraries for you to identify security loopholes, i.e., the security engineering team enables you to identify security holes using the tools they build; they do not identify security loopholes for you.

This is how organizations should mold and communicate the role of Platform Engineering teams. Positioning the Platform Engineering teams as doers instead of enablers makes them the bottleneck for other teams to get their work done.

Other teams rely on Platform Engineering teams for their success. You do not want them second-guessing whether the Platform Engineering teams are doing their job or not. They need to trust the Platform Engineering teams with their critical workloads. Without trust, everything breaks, leading to unnecessary back and forth, sapping the energy of all parties involved.

To build this culture of trust, Platform Engineering teams should focus on observability, performance, and stability.

Lack of observability creates anxiety. If the build engineering team does not give a dashboard where one can see the progress of builds, history of builds, and create alerts on build failure – teams would be anxious about their builds. They would bug the Build Engineering team on this, causing stress all around. Observability is a must for creating a culture of zero stress and anxiety.

Documentation is a subset of observability. Once the Platform Engineering team releases a tool or an API, others should be able to use them without intervention from the Platform Engineering team. To achieve this, Platform Engineering teams should focus on clear and concise documentation.

Since all teams use the tools and APIs of Platform Engineering teams, performance improvements have a multiplicative effect.

The build tool not working, brings the engineering org to a standstill. Login not working negatively affects all parts of the application. Hence, a keen focus on stability is a must. These are the foundations on which other engineering teams build their features.

Due to the nature of the work of Platform Engineering teams, consulting, guiding, and proliferation of best practices becomes part and parcel of the day to day responsibilities; this is understated, but Platform Engineering teams spend a significant chunk of their time on this. Look for ways to institutionalize these so that the Platform Engineering teams are engaged in their core work and not spending time on this. As discussed earlier, focussing on observability, performance, and stability goes a long way towards this.

Feature prioritization can become a challenge for Platform Engineering teams as everyone comes to them with feature requests. A simple yardstick to use is – If we do not release this functionality, is there a way for the requesting team to go about their work, albeit in a roundabout manner? If the answer is yes, then it is not a burning problem. If not, you need to figure out a way to get the feature out as soon as possible.

Platform Engineering teams should adopt a broad perspective when developing features. Do not develop features for a particular team. Think of how the feature relates to other teams and design it so that everyone in the organization can leverage the feature.

If you want to build a culture of speed and rapid iteration, viewing Platform Engineering teams as enablers and not as doers is critical.

Get articles on coding, software and product development, managing software teams, scaling organisations and enhancing productivity by subscribing to my blog

Image by Susbany from Pixabay

 

Optimists, Pessimists, and Better Coders

Whether one is an optimist or pessimist is dictated by genetics. Wise people say that you can give your genetics a run for the money by making happiness a conscious choice and learn to be deliberately happy.

What does this have to do with coding?

christophe-hautier-902vnYeoWS4-unsplash.jpg

How do you decide whether a code is good or not?
There are many parameters, but a definitive indicator is the number of bugs – lesser the bugs, better the code.

What is the definition of a bug?
A bug is a problem with the code, which makes it not work correctly.

Why do bugs occur?
The person who authored the code did not anticipate that particular condition, and the code does not know how to handle that situation.

How does one prevent bugs?
Anticipate all that can go wrong and take care of them while coding.

To do this, one needs to take a bleak look at things – exhaustively think of all that can go wrong and take care of these. To put it shortly, you need to wear a pessimist’s cap.

Then, do pessimists make better coders?

There is another way to look at this.

To create something, you need to be an optimist; coding is about creating something new. To write code devoid of bugs, you need to take care of edge cases and boundary conditions and account for them.

Is a better coder someone who can balance optimism and pessimism?

Get articles on coding, software and product development, managing software teams, scaling organisations and enhancing productivity by subscribing to my blog

Photo by Christophe Hautier on Unsplash

NOT – Not Only Testing

How do you create a quality assurance strategy? 

Is quality assurance the same as testing?

Companies nowadays do not have a separate quality assurance(QA) team. People coming from the old world find this difficult to digest. In the last decade, the way of developing, deploying, and monitoring applications has changed.

Gone are the days of the waterfall model of development. You no longer take months to develop a feature, followed by a long testing cycle and finally hand it over to the production team for deployment. Release cycles have shrunk from months to hours. People who build the application are responsible for deploying, running, and monitoring it.

In this brave new world, base your quality assurance strategy on:

  1. What is the cost of a bug in production?
  2. How quickly can one detect a bug in production?
  3. How quickly can one fix a production bug?
  4. If a bug manifests in production, how does one limit the damage?

austin-neill-ZahNAl_Ic3o-unsplash.jpg

 

The cost of bugs is not uniform; bugs in different areas of the application cost differently. The line of business also dictates the price of bugs.

For an aerospace or medical devices company, a bug means the difference between life and death literally, not so much for a social network or an e-commerce website. If an e-commerce website does not load, it significantly hits the company’s bottom line. For a social network, it might make an insignificant dent in advertising revenue. For an e-commerce company, the cost of a bug in the recently purchased items section on the home page is not the same as checkout from the shopping cart not working. While your shopping cart checkout has to be bulletproof, you can be lenient with the recently purchased items section on the home page.

Vary quality control based on the criticality of the functionality. Be tight where it needs to be and loose where you have room to wiggle. You need not have a uniform quality assurance strategy for the entire application.

Testing dents your speed and time to production irrespective of whether you do it manually or automate it; this is the cost you pay for quality – you need to make peace with this. The stricter you get, the costlier it becomes. Automated testing is not free; it too has a cost – more things to maintain and manage. In some cases, you might end up writing twice the code due to automated testing. I am not arguing against automated testing but asking you to – factor in and mentally accept the cost – before you take the plunge.

automation

 

There is a tendency to equate testing with quality assurance. Testing is a subset of quality assurance; testing is NOT quality assurance. Quality assurance is much more than JUST testing and comprises of a variety of things.

Today, experimentation, incremental development, and speed are the essence. Try a small idea, see whether it sticks, and then rapidly iterate and expand. In such an environment, following a design philosophy that gives your room for error and factors in bugs and things not working is key.

Quick detection of production bugs rests on the observability built into the application. Speedy recovery from production bugs is a function of deployment practices. Reducing the impact of production bugs follows how you roll out features. All these are a result of tooling, development practices, and engineering culture. Today, these are as important as vanilla testing – manual or automated.

The ease with which you can set up a development environment for the application has a direct bearing on the quality of the product. In a world of micro-services and external dependencies, setting up a development environment can get complicated with a lot of moving parts. If you make it difficult for developers to create a robust development environment, the quality of the product takes a hit.

Static code analysis and enforcing best practices through tooling like pre-commit hooks improve the quality of the code, which directly improves the quality of the application. It is paramount when you use languages that are promiscuous with typing; this is one of those things where the cost is low, resistance is nil, and the rewards are high. Be always on the lookout for tools and processes where you get a better quality product with zero resistance and impediment to speed.

Adopting the asynchronous and event-based architecture and design patterns gives room for error and recovery. In today’s fast-paced environment where you do not have the luxury of time to test every minute aspect of the product, this is a boon.

Do not have a tunnel vision when it comes to quality – do not restrict it to only testing; adopt the “NOT – Not Only Testing” strategy.

Get articles on coding, software and product development, managing software teams, scaling organisations and enhancing productivity by subscribing to my blog

Comic from XKCD.

Photo by Austin Neill on Unsplash

Reflection on AWS re:Invent

AWS re:Invent is an annual event hosted by Amazon in Las Vegas. It is a celebration of all things AWS, as well as an opportunity to advance one’s AWS skills and meet the teams behind the various AWS services.

welcome-to-fabulous-las-vegas-nevada-signage-165799

Approximately 65,000 people attended the event this year, and I was one of them thanks to my employer Goibibo. It was a 5-day affair. I have been using AWS since 2012, have seen the platform grow from a couple of services to what it is today, but this was my first time at re:Invent.

Reflections on re:Invent follow.

re:Invent has the right mixture of fun and seriousness – multiple parties, tech talks, after hours with food and drinks, fun games, product launches, certification booths, and networking events.

The event takes place in multiple hotels(Casinos) on the Las Vegas strip. AWS ties up with the hotels on the strip, which results in discounted rates for the stay. This sweet deal closes as re:Invent nears, so plan your attendance. Staying close to the venues is highly recommended as the day starts early and ends late.

AWS has free shuttles running from these designated hotels to the conference venues. Another reason to stay in the chosen hotels. These shuttles also run between the various conference sites, but it does take time to move from one to the other. Casinos are like a maze; it is easy to get lost. Getting in and out of the casinos to the shuttle parking lot itself takes some practice. By the time you have perfected this art, the conference ends. AWS stations many people in the casinos to guide and help with directions and information, but still, it eats up quite a bit of time.

Umpteen number of tech talks run in parallel in multiple venues at all the hotels. Talks cover a wide variety of subjects: battle stories -victories, defeats, and lessons from AWS users; the internals of AWS services; best practices, and architecture sessions. As AI/ML are the current darlings of the tech world, there was a separate track dedicated to AI/ML.

AWS was heavily promoting serverless; there was a lot of emphasis on serverless throughout the event. Some of the vintage AWS products like SQS, SNS, and DynamoDB have been re-branded under the serverless moniker.

AWS publishes the schedule for the talks in advance. There is a re:Invent app which has all the details of the event. In the app, you can reserve seats for the talks. If you want assured seating, you need to reserve. Sessions do fill up fast, and the reservation closes as re:Invent nears. All talks have a line for the people who have booked in advance and a walk-in queue. Advance bookers get preferential treatment. The walk-in line is at the mercy of seats being empty. There are repeats of the talks on subsequent days. Sessions are also live-streamed; there is a dedicated area for viewing this on the big screen.

If you are serious about attending as many talks as possible, cluster your talks in one venue and then move to the other. Commuting from one place to the other takes time.

Another highlight of re:Invent is the keynotes from AWS stalwarts like Andy Jassy, Verner Vogels, etc. No walk-ins are allowed for the keynotes; you need to reserve your place in advance. Keynote seating fills up fast and closes as re:Invent gets closer.

There is an expo area where all vendor companies set up stalls and market their products with demos, brochures, and goodies. Expo is a fantastic opportunity to get a glimpse of all the tools available in the market today. The landscape this year was broadly divided into – APMs, logging solutions, and security products; data lake, data visualization, ETL, data warehouses, and data governance tools; SRE and DevOps tools. Even though AI/ML was the buzzword in all of these, surprisingly, there were not too many products catering to AI/ML specifically.

Vendors host multiple after-parties at various bars, restaurants, and nightclubs. One such event was the Sumo slam – Sumo wrestling match organized by Sumo Logic in association with other companies. The venue for Sumo Slam was Omnia, a prominent night club in Vegas.

There are multiple hands-on sessions where you can hone your AWS skills and also earn AWS certifications. There were deep racer league matches too, which I was keen to attend but could not.

The primary venue has a festive atmosphere throughout the conference days – DJs spinning out EDM and live bands performing. There was a mechanical bull ride to test your cowboy skills.

re:Invent started with a kick-off party on Sunday night and ended with a grand closing party on Thursday night. The kick-off party had a product launch – AWS DeepComposer, multiple music and dance shows, food, and drinks, marching bands, roller skaters, impromptu dance troops, and a chicken wing eating contest.

The closing party was massive and held in the Las Vegas festival grounds. It had multiple tents for different kinds of music. There was a venue for dodgeball and another site for other fun games. The EDM stage with the pulsating light show was fantastic. Thankfully, there were plenty of vegetarian options and a good selection of drinks. The highlight was a drone show organized by Intel. After witnessing the show, I believe we are not far from a future where an army of synchronized pulsating drones will replace fireworks.

It was an enjoyable week of broadening my horizon while having loads of fun. Amazon has done a fantastic job in organizing the event.

Get articles on coding, software and product development, managing software teams, scaling organisations and enhancing productivity by subscribing to my blog

Generalization – The Superpower

I was reading this Twitter thread on Ben Horowitz’s new book on culture. The book’s content is apparent to anyone who has spent time in a corporate setup. I have been listening to the audiobook – “Zen: The Art of Simple Living.” Again, the content is not radically new, something you would already know instinctually. Off late, there has been a spurt of twitter accounts dishing out wisdom. Most of the tweets seem to be a regurgitation of common sense.

mpho-mojapelo-UHDx3BHlFvY-unsplash.jpg

Am I trying to say that they do not add any value? No, the opposite. They are doing an excellent service by codifying common sense into pithy one-liners, simple rules, and principles.

All these people are generalizing lessons learned from specific instances into a broader set of rules and guidelines which apply to more expansive areas of life.

What do I mean by the above? Let me explain with an example.

Let us say that a company advertises telling – Deposit your money with us for two years for a guaranteed return of 15%. The bond yield rate is 8%. You see this scheme as attractive and invest. After a year, the company owner goes absconding, taking your money with her.

How do you generalize the lesson from this misadventure?

Be skeptical of any scheme(gold, real estate, etc.) that promises GUARANTEED returns over and above the current bond yield rate.

I believe generalizing lessons learned from specific situations to a much broader arena of life is a superpower which everyone should develop. Get into the habit of doing this for everything. If you followed a particular process that led you to success, try to make this process generic so that you can apply it elsewhere too.

Generalizing wisdom helps in pattern matching as well as molding the way you think. You begin to see patterns in your thoughts and actions where you can apply the principles created out of past experiences.

This habit sharpens decision making. You start seeing patterns in decision-making scenarios and can base your decision on a previously created rule.

You may not be able to create rules out of everything – creating a step by step guide on how to balance a bicycle is next to impossible. Wherever it is possible, do it; train your instincts with the rest and let your intuition take care of it.

Get articles on coding, software and product development, managing software teams, scaling organisations and enhancing productivity by subscribing to my blog

Photo by Mpho Mojapelo on Unsplash