Idiomatic Code

What does writing idiomatic code mean?

Let us say you are using Python to populate a list with numbers.

One way to do this is


nos = []
for i in range(5):
    nos.append(i)

Another is


nos = [i for i in range(5)]

The second one is the idiomatic code. It leverages the Pythonic way of coding. Python is just an example here; every programming language has a philosophy and a method of doing things. Code that adheres to this is said to be idiomatic.

matrix-356024_640.jpg

The advantage of idiomatic code is that it takes less mental work and space to understand. The first one spans a couple of lines which translates to spending more grey cells to understand what is going on. The second one is concise and to the point. You end up expending less mental bandwidth to understand the second piece of code.

Also, idiomatic code is an inside speak between people in the know; like how a society functions with social norms and conventions, programming languages and communities achieve the same through idioms and conventions. You look at idiomatic code and you know instantly what this piece of code is trying to do, it is embedded in your subconscious like muscle memory.

Learning a programming language is not just about learning the syntax, it is more about learning the idioms and standard conventions of the language and the community behind it.

PS: You can populate a list in Python in a lot of different ways including using built-in library functions. Debating that is not the point of this post.

Concurrency Models

We can roughly classify concurrency models into:
1. Thread based concurrency.
2. Event based concurrency.

Imagine that you run a store with only one customer service representative. As soon as a customer walks in, the customer service representative greets the customer with a quick hello saying – “If you need any help, give me a shout, and I will help you out.” She then waits for the customer to seek help. She aims to complete the interaction as soon as possible and wait for the next communication. When a customer asks for help, she quickly answers the query and goes back to waiting. If a customer asks where is the washroom, she points in the right direction quickly and reverts to waiting. If a customer asks her for the price of a product, she quickly conveys the price and goes back to waiting. The point to note here is that there is only one customer service representative for the entire store servicing all customers. This model works exceptionally well when the representative is fast, and the answers to the queries are quick. Concurrency based on events works like this.

Now consider the situation where you have five customer service representatives in your store. As soon as a customer walks in, a representative is assigned exclusively to that customer. When another customer walks in, one more representative is picked from the pool and assigned to the customer. The critical point to note here is that there is a one to one relationship between the customer service representative and the customer. When one representative is servicing a customer, she does not bother about other customers; she is exclusive to that customer. Since our pool has five representatives, at most, we can serve only five customers at a time. What do we do when the sixth customer walks into the store? We can wait until one of the customers walks out or we can have a rule saying that a representative services a customer for a fixed period after which she will be assigned to another waiting customer. She is reassigned to the original customer once the time elapses. Concurrency based on threads works like this.

Coming back to the scenario wherein the sixth customer walks in. Now, we have to ask the sixth customer to wait until a representative is free. On the other hand, we have to wean away a representative from one of the existing customers and assign her to the new customer. When this happens, the customer who was initially being serviced by this representative has to wait. After the elapsed time, we have to assign the representative back to the original customer. When a lot of customers walk in, and you have a fixed no of representatives, quite a bit of coordination is needed to service all customers satisfactorily. In a computer, the CPU scheduler takes care of switching between tasks. Switching is a comparatively time-consuming operation and an overhead of the thread based concurrency model when compared to an event based one.

In the single representative scenario, what happens if one of the customers starts a long conversation with the representative? The representative will be stuck with the customer, and if other customers have queries, they will have to wait for the representative to finish the ongoing conversation. Also, what if one of the customers sends a representative on a long-running errand like fetching something from the depot a mile away? Until the representative returns, all other customers have to wait to get their queries resolved. One egregious customer can jeopardize all other customers and hold up the entire store operation.

Hence, when working with event based concurrency, it is essential not to:
1. Carry out CPU intensive tasks akin to having a long-running conversation with the representative.
2. Carry out blocking IO tasks similar to sending the representative to the depot.

superhero-534120_640

NGINX and Redis are probably the most commonly used software that leverage event based concurrency. The workloads that these cater to are quick. Hence event based concurrency makes perfect sense here.

Taking the case of NGINX used as a reverse proxy, what does it do? Pick a client connection from the listen queue, do some operations on this and then forward it to the upstream server and then wait for the upstream to respond. While waiting for the upstream, NGINX can pick more client connections from the queue and repeat the above. When the upstream sends a response, it relies on this back to the client. Since all these are short-lived operations, this fits beautifully into an event based concurrency model. Good old Apache HTTP server creates a thread/process for each connection to do the same. The no of threads it has constraints apache. If the number of incoming requests is more than the number of threads in its pool, it has to deal with switching and coordination. NGINX does not have this overhead which makes it comparatively faster than Apache in real-world workloads. All of this is a bit simplistic and hand-wavy but should convey the idea.

Event based concurrency cannot leverage multiple CPU cores which all modern processors have. To do this, you create one event unit for each core usually called a worker. Also, most software that leverage event based concurrency adopt a hybrid model where they use event based concurrency for short-lived quick operations and off-load long-running tasks to a thread/process.

I have glossed over a lot of details and nuances to explain a complex topic like concurrency in simple terms. Treat this as a good starting guide to dig more into this fascinating world.

Ode To Queues

If you have a producer with an uneven rate of production and a consumer which cannot keep pace with the producer at its peak, use a queue.

If you have a workload which need not be addressed synchronously, use a queue.

If your customer-facing application is riddled with workloads which can be deferred, move these to a queue thus making the customer-facing application lean and mean.

Get articles on coding, software and product development, managing software teams, scaling organisations and enhancing productivity by subscribing to my blog

duck-3217049_640

Think of a queue as a shock absorber.

There are workloads which need to be processed immediately with sub-millisecond latency, and then there are ones where you have the luxury of taking time. It is advisable not to mix these in an application. The second kind of workload can be addressed by moving it to a queue and having a consumer process them.

For example, consider a scenario where you are consuming messages and persisting them in a data store. These messages are coming in at a variable rate, and at its peak, the data store cannot handle the load. You have two options. Scale the data store to meet the peak load or slap a queue in between to absorb the shock. Queue solves this problem in a KISS manner.

Queues enable applications to be highly available while giving enough room to maneuver. As long as the queue is highly available, the chance of message loss is almost nil. Since a queue is durable, you need not perfect your consumer’s high availability; you get leeway to manage.

With applications embracing microservices paradigm, there is a lot of API back and forth. Not all API consumption has to be in real-time. Whatever can be deferred should use a queue as the transport mechanism.

Queue introduces a bit more complexity into an application but the advantage it brings to the table makes it a worthwhile investment.

Software Security

Some disparate thoughts on security in no particular order.

Many security bugs can be avoided by making a clear distinction between authentication and authorization. When one logs into Facebook, one uses a username and password. Facebook lets you log in only once it is sure that you are the owner of the account by verifying your password. This is authentication. Once you log in, you cannot view all your friends’ photos. You can only view those photos which your friends have authorized you to view. This is authorization. There is a class of security bugs that arise because developers have not made this distinction.

security-department-1653345_640

A lot of security is knowing what not do. Security by obscurity and hand rolling security algorithms and protocols are the two things that immediately come to my mind. For example, while storing passwords, instead of coming up with an elaborate custom secure storage scheme, employ the industry standard bcrypt.

There is a thought process that you will do better security by having tons of access control. One of the manifestations of this is restricting SSH access to production boxes. Unless you have invested tons in tooling, this slows down teams drastically. In today’s world, where speed is paramount, this does not work. Under pressure to do things fast, teams find ingenious ways to circumvent these controls. Strict access control only works in organizations which are fine with taking things slowly but this usually stifles productivity and leaves a bevy of frustrated developers. The only way around this problem is to have the most necessary access control and take care of the rest through tooling. An example is how Netflix uses tools to enable developers to SSH into production boxes without compromising security.

Security implemented in a naive manner goes against the human nature of seeking to accomplish tasks in the least restrictive manner. If you do not invest in tooling, security always gets in the way of accomplishing things.

A less intrusive way of doing security is to configure systems with sane defaults. An example – when you provision a server, ensure that it is fortified by default. If you are using external tools, configure them with defaults. For example, if you are using Slack, configure it so that only people with your organization’s email address can sign up. Carry out a periodic audit of systems. This could be anything from periodically scanning SSH access logs to repository audits to ensure secrets and passwords are not leaked.

No writeup on security can be complete without touching upon compliance. There are tons – PCI, HIPAA, SOX etc. All these come with their own baggage. One simple way around this is to first understand what and all parts of your application have to be under the scope of compliance. For example, if you have an e-commerce application taking credit card information, you have to be PCI compliant. But this does not mean your entire application has to be under the scope of PCI audit. You can smartly bifurcate the application into parts that deal with payment and parts that do not. Once this is done, only the parts that deal with payment have to be under PCI scope.

A final note, security is a never-ending concern, there is nothing called enough security. Where you draw the line is up to you.

Here is a hilarious comic by XKCD on teaching a lesson to people who do not follow security practices.

exploits_of_a_mom

Naming Things

There are only two hard things in Computer Science: cache invalidation and naming things.

— Phil Karlton

Even though the above might have been in jest, naming variables while writing code is a head-scratching experience. Should I make it short? Should I make it descriptive? If descriptive, how descriptive? These thoughts keep running in one’s head.

tag-309129_640

A simple strategy is to keep the descriptiveness of a variable’s name in line with the reach of that variable. If the variable is short-lived i.e within a small block, stick to a short name as the cognitive load of the variable is negligible. If the variable’s reach is much larger, as in if it spans a large number of lines, make it as descriptive as possible.

Goes without saying that names should adhere to the conventions that your team has adopted.

Switching Languages

think-2177813_640

Many are apprehensive about switching programming languages. It is perfectly fine to have preferences – I am heavily biased towards statically typed languages with great tooling support, but being dogmatic is not something one should aim for.

What could be the downsides of switching programming languages? I am disregarding the psychological aversion to change and sticking to hard facts.

1. One will lose the fluency(syntax).
This is a non-issue, syntax is similar to muscle memory, one will get it back in a day or two. This is akin to swimming or driving after an extended break, one naturally gets it back.

2. One will forget the way of doing things.
Every language has a culture and a community accepted way of getting things done. Regaining this might not be as easy as syntax retrieval, but with some effort and thought, one should recoup.

3. One will not be up to date with the language.
Languages keep evolving, core ideas and philosophy remain the same. The standard library might become more expansive, VM might become faster, some earlier prescribed way of doing things might be an anathema now but the foundational principles remain intact.

4. There is no demand for this language.
As long as your fundamentals are good, this should not be a concern. There are roles which require deep language know how, but these are far and few. In fact, it is the opposite, more the languages in your kitty, more the opportunities.

The biggest upside to learning a new language is the exposure to new ideas and thought processes. Any new language immensely expands one’s horizon. For example, the way Java approaches concurrency is very different from GoLang’s take on concurrency. Having this sort of diverse exposure helps one build robust systems and mental models.

Programming languages should be viewed as a means to an end, not an end in itself. There are cases where programming languages make a difference, otherwise, there would not be so many around with new languages cropping up now and then, but you are doing a disservice to yourself by restricting to a few.