Void

Tag: software

Sherlock Versus Calvin Ball

We can classify software development into:
1. Maintaining and enhancing existing software.
2. Software development from scratch.

Given a choice between the two, developers usually gravitate towards from scratch development. Developing something from scratch is an intensive creative work where you have the freedom to shape the product the way you see fit. Hence, it is pretty obvious why people prefer this. I draw a parallel here with Calvin Ball. For those of you not familiar with Calvin ball, it is a game that Calvin invented where he makes rules on the fly during the game. From scratch development is akin to Calvin Ball, you can create and amend rules on the fly. If you chose a framework and in the course of development you see it does not fit the bill, you have the freedom to swap it with something else. You are operating under a lot of degrees of freedom.

calvin_and_hobbes_original

Maintaining and enhancing existing software is more like solving a puzzle or playing a game with well laid out rules. Someone has already laid the foundation or in a lot of cases built the entire structure. You first have to expend time and effort in groking this and familiarising yourself with what is already there, only then you will be able to do something. A lot of times you need to get into the mind of the original developer and decipher things from her perspective. Working on code written by others is more like Sherlock Holme’s work. When you do changes and enhancements, you have to ensure what you are doing fits well into the existing framework. You are working in a constrained environment; you have to stick to the rules of the game. All this is as much or sometimes more challenging than developing software from scratch.

sherlock-3828991_640

Debugging is an acquired skill which carries over to all areas of development. When you troubleshoot code written by others, you become more attuned to add enough debugging information in the code you write. You will start empathizing with the person who will maintain your system in the future and ensure that person has enough data points to debug when things go wrong. It might as well happen that that future person is you only. Injecting debugging information and future proofing your project is a fundamental behavioral change that maintenance induces in you.

There is nothing wrong in preferring to create something from scratch, but it is imperative to have the second skill set under your belt. The real world requires more of type two work than type one. If from scratch development is all you have done till now, it is high time you challenge yourself with category two work. You will feel a bit frustrated and handcuffed in the beginning, but the way to approach it is like solving a mystery. If you see it that way, it becomes a fun and entertaining experience.

PS: Calvin and Hobbes image taken from Wikipedia.

Advertisements

Concurrency Models

We can roughly classify concurrency models into:
1. Thread based concurrency.
2. Event based concurrency.

Imagine that you run a store with only one customer service representative. As soon as a customer walks in, the customer service representative greets the customer with a quick hello saying – “If you need any help, give me a shout, and I will help you out.” She then waits for the customer to seek help. She aims to complete the interaction as soon as possible and wait for the next communication. When a customer asks for help, she quickly answers the query and goes back to waiting. If a customer asks where is the washroom, she points in the right direction quickly and reverts to waiting. If a customer asks her for the price of a product, she quickly conveys the price and goes back to waiting. The point to note here is that there is only one customer service representative for the entire store servicing all customers. This model works exceptionally well when the representative is fast, and the answers to the queries are quick. Concurrency based on events works like this.

Now consider the situation where you have five customer service representatives in your store. As soon as a customer walks in, a representative is assigned exclusively to that customer. When another customer walks in, one more representative is picked from the pool and assigned to the customer. The critical point to note here is that there is a one to one relationship between the customer service representative and the customer. When one representative is servicing a customer, she does not bother about other customers; she is exclusive to that customer. Since our pool has five representatives, at most, we can serve only five customers at a time. What do we do when the sixth customer walks into the store? We can wait until one of the customers walks out or we can have a rule saying that a representative services a customer for a fixed period after which she will be assigned to another waiting customer. She is reassigned to the original customer once the time elapses. Concurrency based on threads works like this.

Coming back to the scenario wherein the sixth customer walks in. Now, we have to ask the sixth customer to wait until a representative is free. On the other hand, we have to wean away a representative from one of the existing customers and assign her to the new customer. When this happens, the customer who was initially being serviced by this representative has to wait. After the elapsed time, we have to assign the representative back to the original customer. When a lot of customers walk in, and you have a fixed no of representatives, quite a bit of coordination is needed to service all customers satisfactorily. In a computer, the CPU scheduler takes care of switching between tasks. Switching is a comparatively time-consuming operation and an overhead of the thread based concurrency model when compared to an event based one.

In the single representative scenario, what happens if one of the customers starts a long conversation with the representative? The representative will be stuck with the customer, and if other customers have queries, they will have to wait for the representative to finish the ongoing conversation. Also, what if one of the customers sends a representative on a long-running errand like fetching something from the depot a mile away? Until the representative returns, all other customers have to wait to get their queries resolved. One egregious customer can jeopardize all other customers and hold up the entire store operation.

Hence, when working with event based concurrency, it is essential not to:
1. Carry out CPU intensive tasks akin to having a long-running conversation with the representative.
2. Carry out blocking IO tasks similar to sending the representative to the depot.

superhero-534120_640

NGINX and Redis are probably the most commonly used software that leverage event based concurrency. The workloads that these cater to are quick. Hence event based concurrency makes perfect sense here.

Taking the case of NGINX used as a reverse proxy, what does it do? Pick a client connection from the listen queue, do some operations on this and then forward it to the upstream server and then wait for the upstream to respond. While waiting for the upstream, NGINX can pick more client connections from the queue and repeat the above. When the upstream sends a response, it relies on this back to the client. Since all these are short-lived operations, this fits beautifully into an event based concurrency model. Good old Apache HTTP server creates a thread/process for each connection to do the same. The no of threads it has constraints apache. If the number of incoming requests is more than the number of threads in its pool, it has to deal with switching and coordination. NGINX does not have this overhead which makes it comparatively faster than Apache in real-world workloads. All of this is a bit simplistic and hand-wavy but should convey the idea.

Event based concurrency cannot leverage multiple CPU cores which all modern processors have. To do this, you create one event unit for each core usually called a worker. Also, most software that leverage event based concurrency adopt a hybrid model where they use event based concurrency for short-lived quick operations and off-load long-running tasks to a thread/process.

I have glossed over a lot of details and nuances to explain a complex topic like concurrency in simple terms. Treat this as a good starting guide to dig more into this fascinating world.

Ode To Queues

If you have a producer with an uneven rate of production and a consumer which cannot keep pace with the producer at its peak, use a queue.

If you have a workload which need not be addressed synchronously, use a queue.

If your customer-facing application is riddled with workloads which can be deferred, move these to a queue thus making the customer-facing application lean and mean.

duck-3217049_640

Think of a queue as a shock absorber.

There are workloads which need to be processed immediately with sub-millisecond latency and then there are ones where you have the luxury of taking time. It is advisable not to mix these in an application. The second kind of workload can be addressed by moving it to a queue and having a consumer process them.

For example, consider a scenario where you are consuming messages and persisting them in a data store. These messages are coming in at a variable rate and at its peak, the data store cannot handle the load. You have two options. Scale the data store to meet the peak load or slap a queue in between to absorb the shock. Queue solves this problem in a KISS manner.

Queues enable applications to be highly available while giving enough room to manoeuvre. As long as the queue is highly available, the chance of message loss is almost nil. Since a queue is durable, you need not perfect your consumer’s high availability, you get leeway to manage.

With applications embracing microservices paradigm, there is a lot of API back and forth. Not all API consumption has to be in real-time. Whatever can be deferred should use a queue as the transport mechanism.

Queue introduces a bit more complexity into an application but the advantage it brings to the table makes it a worthwhile investment.

Software Security

Some disparate thoughts on security in no particular order.

Many security bugs can be avoided by making a clear distinction between authentication and authorization. When one logs into Facebook, one uses a username and password. Facebook lets you log in only once it is sure that you are the owner of the account by verifying your password. This is authentication. Once you log in, you cannot view all your friends’ photos. You can only view those photos which your friends have authorized you to view. This is authorization. There is a class of security bugs that arise because developers have not made this distinction.

security-department-1653345_640

A lot of security is knowing what not do. Security by obscurity and hand rolling security algorithms and protocols are the two things that immediately come to my mind. For example, while storing passwords, instead of coming up with an elaborate custom secure storage scheme, employ the industry standard bcrypt.

There is a thought process that you will do better security by having tons of access control. One of the manifestations of this is restricting SSH access to production boxes. Unless you have invested tons in tooling, this slows down teams drastically. In today’s world, where speed is paramount, this does not work. Under pressure to do things fast, teams find ingenious ways to circumvent these controls. Strict access control only works in organizations which are fine with taking things slowly but this usually stifles productivity and leaves a bevy of frustrated developers. The only way around this problem is to have the most necessary access control and take care of the rest through tooling. An example is how Netflix uses tools to enable developers to SSH into production boxes without compromising security.

Security implemented in a naive manner goes against the human nature of seeking to accomplish tasks in the least restrictive manner. If you do not invest in tooling, security always gets in the way of accomplishing things.

A less intrusive way of doing security is to configure systems with sane defaults. An example – when you provision a server, ensure that it is fortified by default. If you are using external tools, configure them with defaults. For example, if you are using Slack, configure it so that only people with your organization’s email address can sign up. Carry out a periodic audit of systems. This could be anything from periodically scanning SSH access logs to repository audits to ensure secrets and passwords are not leaked.

No writeup on security can be complete without touching upon compliance. There are tons – PCI, HIPAA, SOX etc. All these come with their own baggage. One simple way around this is to first understand what and all parts of your application have to be under the scope of compliance. For example, if you have an e-commerce application taking credit card information, you have to be PCI compliant. But this does not mean your entire application has to be under the scope of PCI audit. You can smartly bifurcate the application into parts that deal with payment and parts that do not. Once this is done, only the parts that deal with payment have to be under PCI scope.

A final note, security is a never-ending concern, there is nothing called enough security. Where you draw the line is up to you.

Here is a hilarious comic by XKCD on teaching a lesson to people who do not follow security practices.

exploits_of_a_mom

Naming Things

There are only two hard things in Computer Science: cache invalidation and naming things.

— Phil Karlton

Even though the above might have been in jest, naming variables while writing code is a head-scratching experience. Should I make it short? Should I make it descriptive? If descriptive, how descriptive? These thoughts keep running in one’s head.

tag-309129_640

A simple strategy is to keep the descriptiveness of a variable’s name in line with the reach of that variable. If the variable is short-lived i.e within a small block, stick to a short name as the cognitive load of the variable is negligible. If the variable’s reach is much larger, as in if it spans a large number of lines, make it as descriptive as possible.

Goes without saying that names should adhere to the conventions that your team has adopted.

Switching Languages

think-2177813_640

Many are apprehensive about switching programming languages. It is perfectly fine to have preferences – I am heavily biased towards statically typed languages with great tooling support, but being dogmatic is not something one should aim for.

What could be the downsides of switching programming languages? I am disregarding the psychological aversion to change and sticking to hard facts.

1. One will lose the fluency(syntax).
This is a non-issue, syntax is similar to muscle memory, one will get it back in a day or two. This is akin to swimming or driving after an extended break, one naturally gets it back.

2. One will forget the way of doing things.
Every language has a culture and a community accepted way of getting things done. Regaining this might not be as easy as syntax retrieval, but with some effort and thought, one should recoup.

3. One will not be up to date with the language.
Languages keep evolving, core ideas and philosophy remain the same. The standard library might become more expansive, VM might become faster, some earlier prescribed way of doing things might be an anathema now but the foundational principles remain intact.

4. There is no demand for this language.
As long as your fundamentals are good, this should not be a concern. There are roles which require deep language know how, but these are far and few. In fact, it is the opposite, more the languages in your kitty, more the opportunities.

The biggest upside to learning a new language is the exposure to new ideas and thought processes. Any new language immensely expands one’s horizon. For example, the way Java approaches concurrency is very different from GoLang’s take on concurrency. Having this sort of diverse exposure helps one build robust systems and mental models.

Programming languages should be viewed as a means to an end, not an end in itself. There are cases where programming languages make a difference, otherwise, there would not be so many around with new languages cropping up now and then, but you are doing a disservice to yourself by restricting to a few.

Conventions

Most programming languages have conventions. These could be for naming or code patterns.

rule-1752415_640

How does this help?

A simplistic view is that it helps to keep code consistent, especially when multiple people work on it.

A deeper way to look at this I believe is in reducing the cognitive load.

In cognitive psychology, cognitive load refers to the effort being used in the working memory.

If you have conventions, it is one less thing to think about. You do not have to spend mental capacity on thinking whether to name variables small case, capital case, camel case, with hyphen, underscore etc. You blindly rely on the convention. Same applies to code patterns. You look at the pattern and automatically grok the idea; without expending grey cells.

I strongly believe that all tech teams should have conventions wherever possible; outside code too. Freeing up any amount of working memory for things that matter will go a long way towards increasing productivity.

 

Anti features

When evaluating new technology, framework or library; a lot of importance is given to the salient features. While it is very important to know the positives, the negatives usually tend to be glossed over. Being aware of the shortcomings of a framework gives one the ability to anticipate problems down the road.

feedback-3239454_640

For example, let us take NoSQL databases. A lot of time is spent on singing paeans to the scalability, malleability etc of NoSQL databases while hardly thinking about the negatives that come with it.

Two simple techniques which give a good visibility on anti-features:
1. The very obvious one, Google for the shortcomings. Someone would have written a blog post on the interwebs highlighting how a framework or technology let them down. For example, take this post by Uber on how Postgres did not work as expected for them.
2. Comb through Github and/or JIRA peeking at the bugs raised and enhancements requested.

Both of the above will provide a good picture of the shortcomings. If you are evaluating a closed source proprietary technology, the above may not make the cut.

Once a mental note is made of the negatives, ponder on the scenarios where this might affect your usage. It helps to spend quality time on this as this will save one from a lot of future trouble.

If you think about this, this might sound very obvious but tends to be highly neglected. We get so caught up in the positives of something that the negatives tend to be ignored and this usually comes biting us back later.

Testing legacy applications

When contemplating on introducing automated testing in legacy applications, it is easy to get bogged down in terminology; unit testing, integration testing, regression testing, black box testing, white box testing, stress testing, etc. Quite a bit of time is spent in debates on unit testing versus integration testing, I have written about this before too.

A practical way to approach testing legacy applications is to first scope out the intention behind the test. Is it to test the behavior of a particular method, an API response or how an application behaves post an HTTP form submit? Next step is to jot down what and all has to be done to enable this. For example, if a database is involved, it can be mocked or a test database with bootstrapped data can be used.

software-762486_640

The gamut of changes needed to inject testability into an application that has never seen testing before should never be underestimated. The way you would structure testable code is vividly different from coding being incognizant of testing.

Take a look at the code below, how would you unit test getUser method without creating a database connection?

public class Foo {
    DbConnection connection = null;
    public Foo() {
        connection = <establish db connection>;
    }

    public User getUser(int id) {
        ////Query db and get user data        
        User user = new User();
        //Fill user with data from db
        return user;
    }
}

To mould this into testable code, DbConnection creation needs to be decoupled from object creation, like below:

public class Foo {
    DbConnection dbConnection = null;
    public Foo(DbConnection dbConnection) {
        this.dbConnection = dbConnection;
    }

    public User getUser(int id) {
        //Query db and get user data
        User user = new User();
        //Fill user with data from db
        return user;
    }
}

Since the DbConnection is independent of object creation, DbConnection can be mocked to unit test any method in the class. An application written without testing in mind would be replete with code like the above. Code patterns like these are one of the biggest hurdles in testing legacy applications.

Next step is to eliminate the resistance to testing. This would mean all the infrastructure and libraries needed to carry out testing are set up and a reference is readily available to follow. Bunch test cases into categories like unit tests, tests that need a mocked object, tests that need a mocked database, tests that need a database seeded with data, tests that need a web server etc. Post this, implement one test case for each of these categories. This will serve a dual purpose, the setup would be ready for each category and a reference readily available to others to emulate.

One aspect that is usually neglected is the effect of testing on the product release cycle. As a result of testing, more code, dependencies, and infrastructure is introduced which needs to be maintained. Along with working on new features, writing tests for these also has to be taken into account. While refactoring, it is not just the code that has to be refactored, even the test cases have to be refactored. This is a tradeoff between time to market, and maintainability and reliability.

Testing is no longer a chore it used to be, testing tools and frameworks have grown by leaps and bounds. With the advent of docker, headless browsers, Selenium etc; testing is very much within reach of most of the teams provided the intention is there and effort is put in.

Build versus buy

Consciously or unconsciously, as software engineers, we perennially take build versus buy decisions. It might be as trivial as copy pasting code from somewhere versus racking up our brains to write our own; using an already available library or writing one from scratch; using a time tested framework against designing one; building a piece of software internally as compared to buying one.

backdrop-21534_640

The way we account for the build versus buy decision varies. Some of the frivolous reasons for building in-house are NIH syndrome, hubris, and planning fallacy.  We generally tend to overemphasize our expertise, knowledge, and capability which naturally lead to building internally. Also, we underestimate the amount of work involved in creating software, only once we get our feet wet does the reality set in. A very valid reason for building internally is cost but when accounting for cost, we usually overlook the hidden cost of building software. Buying a software has an upfront monetary cost whereas by building internally we pay in the form of opportunity cost, talent cost, feature cost etc.

Build versus buy arguments are reminiscent of qualitative speak like “This is not our core expertise, we should be concentrating on solving our business problems”, “This is going to cost us a bomb, let us build in-house”, “We should have had this yesterday, building in-house will cost us another 6 months”, “Will that external product be able to handle our scale”, “Can we trust them with our data” etc. In most cases, build versus buy decisions are qualitative, it is not an easy exercise to quantify them.

When evaluating a product that is already out in the market versus building something similar, a cardinal mistake people commit is mapping features one to one. Even though having 100 different features looks rosy and attractive, usually we end up using only a select few. Instead of trying to match an external product feature to feature, scope out the features that you need or would probably use and then estimate the effort. Another is refinement. An external product will be refined and polished, but you may not need the same level of refinement. For example, you might not need a web interface for the product, a terminal interface would work fine for your use case.

When faced with the build versus buy decision, asking the following help:

  1. Is this my core expertise or is it something I can let others do for me?
  2. What is the cost of getting this done externally versus hiring people to build this?
  3. How much control do I need over this i.e can I live with some error, downtime or opaqueness?
  4. Will I really do a better job building this internally?
  5. Do I have the expertise needed to build this?
  6. Once I build this, will I be able to maintain and enhance?
  7. What is the opportunity cost of having this sometime in the future versus having it now?

Use the answers to the above as a beacon for the build versus buy decision.