Void

Déjà Vu

You have been trying to solve a problem for quite some time; the solution appears to be elusive. As you grapple more with the problem, a seed of a solution germinates which sprouts into an answer. In hindsight, the resolution appears obvious. You kick yourself telling why did I not think of this sooner?

How many times has the above happened to you?

I believe almost everyone goes through these in life.

diego-ph-249471-unsplash

One simple hack to get better and faster at problem solving is to backtrace through your thinking. Once you successfully arrive at a solution, replay your unsuccessful attempts and figure out what you could have tweaked in your thinking process to arrive at the answer faster.

High performance teams do post-mortem analysis after a critical issue. Members create RCA(root cause analysis) document which contains what went wrong, what could have been done to prevent the untoward incident from occurring and what are the steps to be taken to avoid a relapse. We should be applying the same steps to our thought process when we do not arrive at solutions on time; think of this as an RCA of your thinking process.

This simple trick I believe helps us in getting better and faster at problem-solving.

Image credit: Diego PH

Now You See Me

In the modern software world, where micro-services are de rigueur, observability of systems is paramount. If you do not have a way to observe your application, you are as good as dead.

w-a-t-a-r-i-776085-unsplash

W A T A R I

The first step towards embracing observability is figuring out what to track. Broadly, we can categorize software observability into:
1. Infrastructure metrics.
2. Application metrics.
3. Business metrics.
4. Distributed tracing.
5. Logging.
6. Alerting.

Infrastructure metrics:
Infrastructure metrics boil down to capturing the pulse of the underlying infrastructure where the application is running. Some examples are CPU utilization, memory usage, disc space usage, network ingress, and egress. Infrastructure metrics should give a clear picture as to how well the application is utilizing the hardware it is running on. Infrastructure metrics also aid in capacity planning and scaling.

Application metrics:
Application metrics help in gauging the efficiency of the application; how fast or slow the application is responding and where are the bottlenecks. Some examples of application metrics are the API response time, the number of times a particular API is called, the processing time of a specific segment of code, calls to external services and their latency. Application metrics help in weeding out potential bottlenecks as well as in optimizing the application.

Infrastructure metrics give an overall picture whereas application metrics help in drilling down to the specifics. For example, if the infrastructure metric indicates more than 100% CPU utilization, application metrics help in zeroing in on the cause of this.

Business metrics:
Business metrics are the numbers which are crucial from a functionality point of view. For example, if the piece of code deals with user login and sign-up, some business metrics of interest would be the number of people who sign up, number of people who log in, number of people who log out, the modes of login like social login versus direct. Business metrics help in keeping a pulse on the functionality and diagnosing feature specific breakdowns.

Business metrics should not be confused with business reports. Business metrics serve a very different purpose; they are not to quantify numbers accurately but more to gauge the trend and detect anomalous behavior.

It helps to think of infrastructure, application and business metrics as a hierarchy where you zoom in from one to the other when keeping a tab on the health of the system as well as diagnosing problems. Keeping a check on all three ensures you have hale and hearty application.

Logging:
Logging enables to pinpoint specific errors. The big challenge with logs is making logs easily accessible to all in the organization. Business metrics help in tracking the overall trend and logging helps to zero in on the specific details.

Distributed Tracing:
Distributed tracing ties up all the microservices in the ecosystem and assists to trace a flow end to end, as it moves from one microservice to another. Microservices fail all the time; if distributed tracing is not in place, diagnosing issues which span microservices feels like searching for a needle in a haystack.

Alerts:
If you have infrastructure, application and business metrics in place, you can create alerts which should be triggered when they show abnormal behavior; this pre-empts potential downtimes and business loss. One golden rule for alerts is, if it is an alert, it should be actionable. If not, alerts lose their significance and meaning.

Both commercial, as well as open source software, are available to build observability. NewRelic is one of the primary contenders on the commercial side. StatsD, Prometheus and the ilk dominate the open source spectrum. For log management, Splunk is the clear leader in the commercial space. ELK stack takes the crown on the open source front. Zipkin is an open source reference implementation of distributed tracing. Most of the metrics tracking software have alerting capabilities these days.

If you already have microservices or are moving towards that paradigm, you should be investing heavily on observability. Microservices without observability is a fool’s errand.

Market Size

When I was working on Kwery, a lot of people used to ask me – Why are you not trying to build a team and raise money? The niche market that Kwery served was not amenable to VC money or building a team. Kwery was an experiment in creating a single person lifestyle business.

Market size is defined as – The number of individuals in a certain market who are potential buyers and/or sellers of a product or service.

carnival-carousel-circus-992763

When building any business, gauging the potential market size is the most critical step. The size of your team and the amount of money you want to raise should correlate with the market size. If the intention is to create a big business, you should be going after a big market. A lot of would-be startup founders are fixated on their idea and raising money that they do not analyze the market size or fool themselves into thinking that the market is vast. Peter Thiel, in his book Zero To One, beautifully explains the significance of market size.

The story that you build around your startup should also reflect the market size that you are after. In 2014, professor Aswath Damodaran estimated the value of Uber at ~6 Billion dollars with the story that Uber is an urban car service company. Bill Gurley, the lead investor in Uber, refuted this saying that Uber is not an urban car service company but a logistics company that is going to be everywhere, not just urban areas and is going to exploit global networking benefits, not local networking benefits. With this story, Bill Gurley was valuing Uber at ~53 Billion, a ~9X increase from Aswath Damodaran’s valuation. With a change in the story, the market size that Uber is after went up. This incident illustrates the importance of framing your startup story for big market size.

Some parting words on market size from Kunal Shah, founder of FreeCharge – Market > Product > Team.

Poor Man’s Anomaly Detection

You have a feature where if someone signs up on your product, you create a wallet for that person and top it up with complimentary money. Your organization swears by micro-services; hence sign-up logic is in one service and wallet creation and crediting is in another service. Once a user signs up, sign up service sends a message to the wallet service so that it can create the wallet and do the credit. To ensure the sanctity of the system, you have to make sure that the number of signups, wallets created and credits done match one another. Also, if these go out of sync, alerts need to be in place to take corrective action.

Since the two are disparate distributed systems, one way to achieve the above is to use an anomaly detector. There are off the shelf products for this as well as open source projects. If you do not have the time, need and resources to invest in deploying an anomaly detection system, having a reconciliation system is the way to go.

black-and-white-child-connected-265702

Reconciliation is deeply entrenched in the financial domain where it is a way of life. The technology world can borrow this and use it as a poor man’s anomaly detector. For the scenario that we started with, we run queries on the data repository of the sign-up and wallet systems at regular intervals. These queries fetch the count of sign-ups, wallet creations, and credits that occurred during the period. Once we have the numbers, all we have to do is ensure that they match. One can do this with a simple bash script; this is extremely simple to develop and deploy.

Reconciliation can play a role in all places where two-phase commit flows are involved. For example, most payment flows follow a two-phase commit process. You first deduct money from the user’s payment instrument and then fulfill the commitment. There is a good chance that post payment debit, your system dies not doing the fulfillment. Having a reconciliation system in place helps you to take corrective action in these scenarios.

Reconciliation is a simple way to achieve anomaly detection until you have the need and the resources to invest in a more robust distributed anomaly detector.

Startups and VC La La Land

With daily morning newspapers dedicating pages and front lines to startups and venture capital(VC) investments, they are no longer the niche they used to be. Whenever a startup investment is in the news, I hear questions along the lines of:

  1. Is the VC crazy to invest so much money in that startup?
  2. Why does that startup need massive amounts of money?

In my opinion, these questions stem from not having an understanding of how the VC business works.

Let us say that there is a contest with prize money of 1 billion dollars. You have to form teams, do some tasks; whoever wins, walks away with the prize money. There are a hundred individuals who are eager to participate but do not have the financial means and resources to recruit people to form teams and get to work. There is a wealthy benefactor who is ready to fund these hundred individuals with 10,000 dollars each. He strikes a deal with each of these hundred people saying, if you win, you give me 50% of the prize money; if you lose, you do not have to repay the 10,000 dollars I gave you. The benefactor is sure that one of them is going to bag the prize, she does not know who. The benefactor has invested 1 million dollars in total(10000 * 100) to ensure a return of half a billion dollars, i.e., 500x returns.

The above is a crude explanation of how the VC industry works. The individuals are the startup founders, and the wealthy benefactor is the VC. The only difference is, in the real world, there is no guaranteed success, i.e., there is no way for the VC to tell with cent percent certainty that someone is going to win for sure; it is all about probabilities, data, reading the trend and gut instincts. To bring a semblance of certainty to returns, VC spreads her bet on a wide array of startups in different domains; at least a handful of them is going to hit the jackpot.

dollar-499481_640.jpg

Why does this startup need so much money and are the VCs crazy to give them that?
Startups that VCs invest in are not chasing organic growth. Organic growth is steady, sustainable growth which compounds to big returns over decades. VCs are chasing exponential returns in a few years; not steady growth. The money that VCs give to startups helps them with this exponential growth.

Why so?
Double-digit returns are achievable by investing in public companies and debt. Why take the risk of investing in startups with absolutely no guarantee of returns when there are better ways to get there? For the chance that VCs are taking, the gains have to be comparable, i.e., exponential returns.

Why do startups need so much money?
To short-circuit the whole organic growth curve and achieve exponential growth so that investors get the returns in a relatively short period. The money is spent on scaling teams, building technology and more importantly in acquiring users through discounts and advertising. In short, the VC capital infusion empowers startups to compress multi-decades of growth to a decade or less.

Why is this startup spending money on discounts when they are not even profitable?
Big spending circles back to the concept of exponential growth and building market dominance for a winner take all effect. The idea is that once the startup dominates the market or has enough users, they will eventually figure out a way to make money, dominate the market and turn profitable. Case in point; Amazon, Google, and Facebook. When VCs invested in these firms, none of them were revenue positive, but look at where they are today. Of course, there is a risk of startups never figuring out how to make money, but VC investment is about calculated risks and spreading the bet.

You might have read of startups of previous successful founders raising astronomical capital without even having a product. VCs are not crazy to do this; they are betting and playing with probabilities and sentiments. The chance of the founder hitting the lottery again is high, hence the clamor to invest. VCs know for sure that not all the startups they invest in are going to be home runs, some are going to go dud and die, and they take this into account while investing. Only a very handful of the startups they invest in turn out to be successes and these handfuls generate the returns. A minority of the successes cover up for the majority of the failures. Returning to our lottery example, out of the hundred, the VC is betting on only one person to win, she knows the rest of them are going to fail, but she gets 500x returns with just one win and 99 failures.

Treat this as a dummies guide to startups and venture capital. I have watered down a lot ignoring VCs and funds who invest for more extended periods. Like any business, the VC industry is varied and diverse.

On Writing

I have been writing for quite some time now. I started blogging as soon as I got out of college. I have been writing on and off since then, but in the last one year, I have been successful in maintaining a writing streak. I have always wanted to write regularly; it is only now that I have been successful at it. A significant impetus to the writing streak was the realization that I was becoming a passive consumer and not a producer. I am a voracious reader, and at one point, I realized that I was consuming stuff without putting out anything of my own. Off-late, I have been promoting my posts a bit more aggressively due to which I have received bouquets as well as brickbats. A couple of questions have come my way along the lines of how do you manage to write so regularly, where do you get the ideas from and some trepidations people go through when they want to express their opinion in public.

I write for myself; this might sound like a cliche with a lot of writers claiming so, but you need to start writing to get this. Writing is like having a structured conversation with oneself; it is therapeutic. It helps to bring clarity to a lot of ideas I have. It also helps me to develop a rubric for thinking and decision making which aids me at work as well as in my personal life. Every day I make tons of decisions, creating a framework for this helps me to evaluate the choices objectively and get better at it. Writing helps me to reinforce the process to make these decisions.

pencil-918449_640

One of the trepidations that most face when it comes to putting out their opinion in public is whether it is good enough, what will others think? Whenever I look at my past writings, I have a terrible feeling in my stomach. I can spot quite a bit of grammatical errors, half-baked ideas, and poorly constructed phrases. I am sure I will feel the same about this post in future. But, you get better by doing, not by not doing, hence, if I write more, I should get progressively better at it, at least, that is the hope. Seeing this from the lens of a journey rather than a destination helps. Ask yourself the question; what is more important to you? Is it your image in other’s eyes or your improvement? What do you prioritize?

What if I write something which I no longer believe to be true? It does not matter. If your views and thoughts are not constantly evolving, there is something wrong. If your beliefs are changing, it means that you are getting exposed to diverse ideas and opinions and you are continually updating your thoughts. A better way to put it is; strong opinions, loosely held. My writings are a reflection of my views at a particular point in time; they may very well change in the future. Wise people understand this.

Another common misconception everyone has is that one has to write something long and unique. Even I held this view earlier, but I have changed. I no longer believe that writings have to span pages to be effective. Shorter essays are equally palatable and powerful. Haikus, proverbs, and parables are some examples of short useful musings. Even if you voice something that someone has written before but color it in your perspective, it adds a lot of value.

How do I write?
I have a rough idea in mind. I draft that. I sleep over it for a couple of days, editing it now and then. I try to shorten it as much as possible nuking unnecessary words and sentences. When I feel I have been moderately successful in communicating the concept effectively, I publish it. After posting, I find numerous ways in which I could have done it better, but I have to draw the line somewhere.

I am writing this in the hope that I can enthuse others to start writing more often.

Idiomatic Code

What does writing idiomatic code mean?

Let us say you are using Python to populate a list with numbers.

One way to do this is


nos = []
for i in range(5):
    nos.append(i)

Another is


nos = [i for i in range(5)]

The second one is the idiomatic code. It leverages the Pythonic way of coding. Python is just an example here; every programming language has a philosophy and a method of doing things. Code that adheres to this is said to be idiomatic.

matrix-356024_640.jpg

The advantage of idiomatic code is that it takes less mental work and space to understand. The first one spans a couple of lines which translates to spending more grey cells to understand what is going on. The second one is concise and to the point. You end up expending less mental bandwidth to understand the second piece of code.

Also, idiomatic code is an inside speak between people in the know; like how a society functions with social norms and conventions, programming languages and communities achieve the same through idioms and conventions. You look at idiomatic code and you know instantly what this piece of code is trying to do, it is embedded in your subconscious like muscle memory.

Learning a programming language is not just about learning the syntax, it is more about learning the idioms and standard conventions of the language and the community behind it.

PS: You can populate a list in Python in a lot of different ways including using built-in library functions. Debating that is not the point of this post.

Sherlock Versus Calvin Ball

We can classify software development into:
1. Maintaining and enhancing existing software.
2. Software development from scratch.

Given a choice between the two, developers usually gravitate towards from scratch development. Developing something from scratch is an intensive creative work where you have the freedom to shape the product the way you see fit. Hence, it is pretty obvious why people prefer this. I draw a parallel here with Calvin Ball. For those of you not familiar with Calvin ball, it is a game that Calvin invented where he makes rules on the fly during the game. From scratch development is akin to Calvin Ball, you can create and amend rules on the fly. If you chose a framework and in the course of development you see it does not fit the bill, you have the freedom to swap it with something else. You are operating under a lot of degrees of freedom.

calvin_and_hobbes_original

Maintaining and enhancing existing software is more like solving a puzzle or playing a game with well laid out rules. Someone has already laid the foundation or in a lot of cases built the entire structure. You first have to expend time and effort in groking this and familiarising yourself with what is already there, only then you will be able to do something. A lot of times you need to get into the mind of the original developer and decipher things from her perspective. Working on code written by others is more like Sherlock Holme’s work. When you do changes and enhancements, you have to ensure what you are doing fits well into the existing framework. You are working in a constrained environment; you have to stick to the rules of the game. All this is as much or sometimes more challenging than developing software from scratch.

sherlock-3828991_640

Debugging is an acquired skill which carries over to all areas of development. When you troubleshoot code written by others, you become more attuned to add enough debugging information in the code you write. You will start empathizing with the person who will maintain your system in the future and ensure that person has enough data points to debug when things go wrong. It might as well happen that that future person is you only. Injecting debugging information and future proofing your project is a fundamental behavioral change that maintenance induces in you.

There is nothing wrong in preferring to create something from scratch, but it is imperative to have the second skill set under your belt. The real world requires more of type two work than type one. If from scratch development is all you have done till now, it is high time you challenge yourself with category two work. You will feel a bit frustrated and handcuffed in the beginning, but the way to approach it is like solving a mystery. If you see it that way, it becomes a fun and entertaining experience.

PS: Calvin and Hobbes image taken from Wikipedia.

Concurrency Models

We can roughly classify concurrency models into:
1. Thread based concurrency.
2. Event based concurrency.

Imagine that you run a store with only one customer service representative. As soon as a customer walks in, the customer service representative greets the customer with a quick hello saying – “If you need any help, give me a shout, and I will help you out.” She then waits for the customer to seek help. She aims to complete the interaction as soon as possible and wait for the next communication. When a customer asks for help, she quickly answers the query and goes back to waiting. If a customer asks where is the washroom, she points in the right direction quickly and reverts to waiting. If a customer asks her for the price of a product, she quickly conveys the price and goes back to waiting. The point to note here is that there is only one customer service representative for the entire store servicing all customers. This model works exceptionally well when the representative is fast, and the answers to the queries are quick. Concurrency based on events works like this.

Now consider the situation where you have five customer service representatives in your store. As soon as a customer walks in, a representative is assigned exclusively to that customer. When another customer walks in, one more representative is picked from the pool and assigned to the customer. The critical point to note here is that there is a one to one relationship between the customer service representative and the customer. When one representative is servicing a customer, she does not bother about other customers; she is exclusive to that customer. Since our pool has five representatives, at most, we can serve only five customers at a time. What do we do when the sixth customer walks into the store? We can wait until one of the customers walks out or we can have a rule saying that a representative services a customer for a fixed period after which she will be assigned to another waiting customer. She is reassigned to the original customer once the time elapses. Concurrency based on threads works like this.

Coming back to the scenario wherein the sixth customer walks in. Now, we have to ask the sixth customer to wait until a representative is free. On the other hand, we have to wean away a representative from one of the existing customers and assign her to the new customer. When this happens, the customer who was initially being serviced by this representative has to wait. After the elapsed time, we have to assign the representative back to the original customer. When a lot of customers walk in, and you have a fixed no of representatives, quite a bit of coordination is needed to service all customers satisfactorily. In a computer, the CPU scheduler takes care of switching between tasks. Switching is a comparatively time-consuming operation and an overhead of the thread based concurrency model when compared to an event based one.

In the single representative scenario, what happens if one of the customers starts a long conversation with the representative? The representative will be stuck with the customer, and if other customers have queries, they will have to wait for the representative to finish the ongoing conversation. Also, what if one of the customers sends a representative on a long-running errand like fetching something from the depot a mile away? Until the representative returns, all other customers have to wait to get their queries resolved. One egregious customer can jeopardize all other customers and hold up the entire store operation.

Hence, when working with event based concurrency, it is essential not to:
1. Carry out CPU intensive tasks akin to having a long-running conversation with the representative.
2. Carry out blocking IO tasks similar to sending the representative to the depot.

superhero-534120_640

NGINX and Redis are probably the most commonly used software that leverage event based concurrency. The workloads that these cater to are quick. Hence event based concurrency makes perfect sense here.

Taking the case of NGINX used as a reverse proxy, what does it do? Pick a client connection from the listen queue, do some operations on this and then forward it to the upstream server and then wait for the upstream to respond. While waiting for the upstream, NGINX can pick more client connections from the queue and repeat the above. When the upstream sends a response, it relies on this back to the client. Since all these are short-lived operations, this fits beautifully into an event based concurrency model. Good old Apache HTTP server creates a thread/process for each connection to do the same. The no of threads it has constraints apache. If the number of incoming requests is more than the number of threads in its pool, it has to deal with switching and coordination. NGINX does not have this overhead which makes it comparatively faster than Apache in real-world workloads. All of this is a bit simplistic and hand-wavy but should convey the idea.

Event based concurrency cannot leverage multiple CPU cores which all modern processors have. To do this, you create one event unit for each core usually called a worker. Also, most software that leverage event based concurrency adopt a hybrid model where they use event based concurrency for short-lived quick operations and off-load long-running tasks to a thread/process.

I have glossed over a lot of details and nuances to explain a complex topic like concurrency in simple terms. Treat this as a good starting guide to dig more into this fascinating world.

Startup Hiring

Hiring is an area which is usually given a lot of lip service but neglected in practice. Especially in a startup, where there is a shortage of workforce, hiring at all levels has a profound impact on the future of the organization. When compared to a big organization, the value proposition of a startup to its employees is concerning speed, freedom, and responsibility. But, when it comes to hiring, startups adopt the cookie cutter hiring strategy of big companies.

The regular interview process is biased toward extroverts and people who can speak well. In my opinion and observation, this does not correlate to someone who is good at what she does. I have been observing this since my college days. During campus placements, flamboyant and boisterous students made their way into companies while the quiet ones who were good at studies and who could get along with others were often overlooked. The same thing happens to a lesser extent with the regular interview process too.

Taleb says the following of doers.

For those who do things, it is harder to talk about what they do. Reality doesn’t care about talk. It is full of pauses and hesitations. Clear, non-hesitant speech is the domain of non-doers.

Interviewing is an intimidating experience, and not a lot of people excel in this, especially people who are good at what they do. Also, the kind of questions that are asked during an interview has hardly any resemblance to the day to day work. I have written about this before too. A big step in this direction is to craft an interview process that mimics your organization’s day to day tasks. The best way to do this is to have an assignment based interview process. You can create assignments that resemble tasks that you do on a daily basis and ask candidates to work their way through this. An approach of this sort takes the whole adversarial, and interrogative tone out of interviews and makes it an enjoyable experience.

An important aspect to take care while creating an interview process is to take subjectivity out of interviews. Who conducts the interview should not have a bearing on the outcome of the interview. Another thing to keep in mind is to subject all candidates of a particular level to the same questions. Following the above ensures that you are evaluating every candidate on the same yardstick. It is essential to do this to create a quantitative interview process. This sort of standardization also helps in creating a checklist of what you expect in answers to questions. If you have these covered, anyone can take anyone’s interview and evaluate candidates. Is it not ironic that startups claim to be data-driven, but when it comes to interviews, the whole data-driven approach takes a corner seat.

hiring-3531130_640

An example interview template would be to have an assignment that closely resembles a task related to work. Try to keep the assignment practical yet straightforward. A good candidate should not take more than 3 to 4 hours to complete the assignment. On submission; evaluate the assignment on code quality, conventions, comments, instructions to run, test cases, boundary condition handling, error handling, and documentation. Post that, have a phone screen with a bunch of questions that mostly have a one-word answer. Make this as practical and broad as possible. Try to touch all aspects of work from data structure choices to computational complexity to databases, web servers, etc. The idea here is that if someone is good at what they do, they should be able to answer these without any preparation. Then invite the candidate for face to face round where you ask the candidate to expand on the take-home assignment. Add a bit more complexity and see how the candidate fares. Finish off the entire process with a design round. I have presented a very rough template, tweak it, add more rounds and improvise to fit your needs; every organization is different.

Along with all the above, soft skills are critical. The sad fact is a lot of people do not know how to conduct interviews. I have heard of instances where interviewers were rude and outright obnoxious, not punctual, candidates not kept informed in advance of the interview schedule, candidates not being adequately attended during the face to face rounds, etc. All these play a significant role in shaping the image of the organization in the candidate’s mind.

In the talent market, big companies, as well as startups, are competing for the same piece of the pie. By adopting a hiring process similar to big companies, startups are not going to get an ace up their shoulder. What startups should be focussing on is coming up with a flexible hiring process that is unique to their organization. It is not easy to execute this in a big corporation but very much doable in a startup. Starting from how you engage with a candidate for the first time to the interview to the offer rollout process, everything is ripe for disruption. Differentiate yourself from others. An interview is a big opportunity for you to make a mark on the candidate, seize it right there.