Void

Tag: software

Open Source and Revenue

This is the second part in a series on open source software. In the first part, we examined why equating open source with “just” free is fool’s errand. In this post, we will explore the different avenues for revenue from open source software.

lucas-favre-489526-unsplash

Get essays about software development, management, productivity, and more by subscribing to the blog

The first one is pretty straight forward – charge for support, maintenance, consulting, and custom development. Software takes an effort to understand and maintain. Either you can do it in-house or outsource it to an external firm. Big enterprises have specific needs which require custom enhancements. They also need consistent support and suggestions. There are a lot of companies which have used this model to generate revenue from open source software like Redhat, Percona, etc.

The SAAS option for open source software has gained immense traction in the last decade or so especially since the advent of cloud. Instead of you taking the pain to host and maintain, the company behind the software deploys and manages it for a recurring fee. Most of the popular open source software is available under this model nowadays. WordPress, MongoDB, ElasticSearch are some prime examples of this strategy.

Another revenue strategy is the open core model. The core is open and free, but features which enterprises need like security, high availability and user management are part of the commercial offering. For example, the core database could be open, but clustering and high availability might be available only in the retail version. InfluxDB uses this model.

Then there is the licensing play. Software licensing is nuanced and comes with a lot of baggage and restrictions. The open source version of the software is released under a restrictive and commercial antagonistic license. If you want to use the software in a business setting, you have the option of buying a more permissive and commercial friendly license; this is very prevalent in software bundled in commercial products.

It is not uncommon for a company to use a mixture of the above strategies.

In the next part of the series, we will go through some recent developments in the open source world in an attempt to ward off the threat from big cloud providers like AWS.

Image Credit: lucas Favre

Open Source != Free

This is the first post in a series on open source software. You can read the second post here.

One of the most common conflations I see people making is mistaking open source software for free software; both are not the same. Being free is just icing on the cake, the more significant advantage is the freedom and flexibility that comes with open source software.

Let us say you are an enterprise with millions of dollars that has built its entire product on a closed source database. Your business is now profoundly entwined with the success of the database company. What happens if the database company goes kaput and shuts down? You now have to replace the database. Depending on the complexity of the product and the business, this might take significant effort and end up derailing your entire business. Open source software severely mitigates this problem.

beans-close-up-colors-1446267.jpg

 

Get essays about software development, management, productivity, and more by subscribing to the blog

There is no concept of shutting down in open source world. Open source software development is inherently decentralized. In a lot of cases, committees govern open source software development. These committees have many stakeholders whose best interest is in keeping the software alive. Apart from this, many boutique firms provide development and maintenance services. All this leads to a robust eco-system that prevents a project from abruptly shutting down and taking you hostage.

Commercial closed source software reminds me of a famous line from the Eagles song Hotel California – You can check out any time you like, but you can never leave. Once you are locked into a piece of software, it is not easy to get out. During pricing discussion, there is an asymmetry. As a locked in customer, it is difficult for you to leave, there is no BATNA. Open source software does not have this problem.

Having access to source code is a huge advantage. When I was building Kwery, I used Apache Derby as the database. I started seeing a weird bug in Kwery which lead me to tinker with the Apache Derby source and finally unearthing a bug in the Derby database. A couple of mail exchanges on the Derby mailing list confirmed this. If I did not have access to the source code, there would be no way for me to figure this out.

I am not saying that open source software is a panacea to all problems and that you should completely shun commercial closed source software. Each has its place but equating opensource software with only free is folly.

You can read the second post here.

Image credit: Photo by Adailton Batista from Pexels

Now You See Me

In the modern software world, where micro-services are de rigueur, observability of systems is paramount. If you do not have a way to observe your application, you are as good as dead.

w-a-t-a-r-i-776085-unsplash

W A T A R I

The first step towards embracing observability is figuring out what to track. Broadly, we can categorize software observability into:
1. Infrastructure metrics.
2. Application metrics.
3. Business metrics.
4. Distributed tracing.
5. Logging.
6. Alerting.

Infrastructure metrics:
Infrastructure metrics boil down to capturing the pulse of the underlying infrastructure where the application is running. Some examples are CPU utilization, memory usage, disc space usage, network ingress, and egress. Infrastructure metrics should give a clear picture as to how well the application is utilizing the hardware it is running on. Infrastructure metrics also aid in capacity planning and scaling.

Application metrics:
Application metrics help in gauging the efficiency of the application; how fast or slow the application is responding and where are the bottlenecks. Some examples of application metrics are the API response time, the number of times a particular API is called, the processing time of a specific segment of code, calls to external services and their latency. Application metrics help in weeding out potential bottlenecks as well as in optimizing the application.

Infrastructure metrics give an overall picture whereas application metrics help in drilling down to the specifics. For example, if the infrastructure metric indicates more than 100% CPU utilization, application metrics help in zeroing in on the cause of this.

Business metrics:
Business metrics are the numbers which are crucial from a functionality point of view. For example, if the piece of code deals with user login and sign-up, some business metrics of interest would be the number of people who sign up, number of people who log in, number of people who log out, the modes of login like social login versus direct. Business metrics help in keeping a pulse on the functionality and diagnosing feature specific breakdowns.

Business metrics should not be confused with business reports. Business metrics serve a very different purpose; they are not to quantify numbers accurately but more to gauge the trend and detect anomalous behavior.

It helps to think of infrastructure, application and business metrics as a hierarchy where you zoom in from one to the other when keeping a tab on the health of the system as well as diagnosing problems. Keeping a check on all three ensures you have hale and hearty application.

Logging:
Logging enables to pinpoint specific errors. The big challenge with logs is making logs easily accessible to all in the organization. Business metrics help in tracking the overall trend and logging helps to zero in on the specific details.

Distributed Tracing:
Distributed tracing ties up all the microservices in the ecosystem and assists to trace a flow end to end, as it moves from one microservice to another. Microservices fail all the time; if distributed tracing is not in place, diagnosing issues which span microservices feels like searching for a needle in a haystack.

Alerts:
If you have infrastructure, application and business metrics in place, you can create alerts which should be triggered when they show abnormal behavior; this pre-empts potential downtimes and business loss. One golden rule for alerts is, if it is an alert, it should be actionable. If not, alerts lose their significance and meaning.

Both commercial, as well as open source software, are available to build observability. NewRelic is one of the primary contenders on the commercial side. StatsD, Prometheus and the ilk dominate the open source spectrum. For log management, Splunk is the clear leader in the commercial space. ELK stack takes the crown on the open source front. Zipkin is an open source reference implementation of distributed tracing. Most of the metrics tracking software have alerting capabilities these days.

If you already have microservices or are moving towards that paradigm, you should be investing heavily on observability. Microservices without observability is a fool’s errand.

Sherlock Versus Calvin Ball

We can classify software development into:
1. Maintaining and enhancing existing software.
2. Software development from scratch.

Given a choice between the two, developers usually gravitate towards from scratch development. Developing something from scratch is an intensive creative work where you have the freedom to shape the product the way you see fit. Hence, it is pretty obvious why people prefer this. I draw a parallel here with Calvin Ball. For those of you not familiar with Calvin ball, it is a game that Calvin invented where he makes rules on the fly during the game. From scratch development is akin to Calvin Ball, you can create and amend rules on the fly. If you chose a framework and in the course of development you see it does not fit the bill, you have the freedom to swap it with something else. You are operating under a lot of degrees of freedom.

calvin_and_hobbes_original

Maintaining and enhancing existing software is more like solving a puzzle or playing a game with well laid out rules. Someone has already laid the foundation or in a lot of cases built the entire structure. You first have to expend time and effort in groking this and familiarising yourself with what is already there, only then you will be able to do something. A lot of times you need to get into the mind of the original developer and decipher things from her perspective. Working on code written by others is more like Sherlock Holme’s work. When you do changes and enhancements, you have to ensure what you are doing fits well into the existing framework. You are working in a constrained environment; you have to stick to the rules of the game. All this is as much or sometimes more challenging than developing software from scratch.

sherlock-3828991_640

Debugging is an acquired skill which carries over to all areas of development. When you troubleshoot code written by others, you become more attuned to add enough debugging information in the code you write. You will start empathizing with the person who will maintain your system in the future and ensure that person has enough data points to debug when things go wrong. It might as well happen that that future person is you only. Injecting debugging information and future proofing your project is a fundamental behavioral change that maintenance induces in you.

There is nothing wrong in preferring to create something from scratch, but it is imperative to have the second skill set under your belt. The real world requires more of type two work than type one. If from scratch development is all you have done till now, it is high time you challenge yourself with category two work. You will feel a bit frustrated and handcuffed in the beginning, but the way to approach it is like solving a mystery. If you see it that way, it becomes a fun and entertaining experience.

PS: Calvin and Hobbes image taken from Wikipedia.

Concurrency Models

We can roughly classify concurrency models into:
1. Thread based concurrency.
2. Event based concurrency.

Imagine that you run a store with only one customer service representative. As soon as a customer walks in, the customer service representative greets the customer with a quick hello saying – “If you need any help, give me a shout, and I will help you out.” She then waits for the customer to seek help. She aims to complete the interaction as soon as possible and wait for the next communication. When a customer asks for help, she quickly answers the query and goes back to waiting. If a customer asks where is the washroom, she points in the right direction quickly and reverts to waiting. If a customer asks her for the price of a product, she quickly conveys the price and goes back to waiting. The point to note here is that there is only one customer service representative for the entire store servicing all customers. This model works exceptionally well when the representative is fast, and the answers to the queries are quick. Concurrency based on events works like this.

Now consider the situation where you have five customer service representatives in your store. As soon as a customer walks in, a representative is assigned exclusively to that customer. When another customer walks in, one more representative is picked from the pool and assigned to the customer. The critical point to note here is that there is a one to one relationship between the customer service representative and the customer. When one representative is servicing a customer, she does not bother about other customers; she is exclusive to that customer. Since our pool has five representatives, at most, we can serve only five customers at a time. What do we do when the sixth customer walks into the store? We can wait until one of the customers walks out or we can have a rule saying that a representative services a customer for a fixed period after which she will be assigned to another waiting customer. She is reassigned to the original customer once the time elapses. Concurrency based on threads works like this.

Coming back to the scenario wherein the sixth customer walks in. Now, we have to ask the sixth customer to wait until a representative is free. On the other hand, we have to wean away a representative from one of the existing customers and assign her to the new customer. When this happens, the customer who was initially being serviced by this representative has to wait. After the elapsed time, we have to assign the representative back to the original customer. When a lot of customers walk in, and you have a fixed no of representatives, quite a bit of coordination is needed to service all customers satisfactorily. In a computer, the CPU scheduler takes care of switching between tasks. Switching is a comparatively time-consuming operation and an overhead of the thread based concurrency model when compared to an event based one.

In the single representative scenario, what happens if one of the customers starts a long conversation with the representative? The representative will be stuck with the customer, and if other customers have queries, they will have to wait for the representative to finish the ongoing conversation. Also, what if one of the customers sends a representative on a long-running errand like fetching something from the depot a mile away? Until the representative returns, all other customers have to wait to get their queries resolved. One egregious customer can jeopardize all other customers and hold up the entire store operation.

Hence, when working with event based concurrency, it is essential not to:
1. Carry out CPU intensive tasks akin to having a long-running conversation with the representative.
2. Carry out blocking IO tasks similar to sending the representative to the depot.

superhero-534120_640

NGINX and Redis are probably the most commonly used software that leverage event based concurrency. The workloads that these cater to are quick. Hence event based concurrency makes perfect sense here.

Taking the case of NGINX used as a reverse proxy, what does it do? Pick a client connection from the listen queue, do some operations on this and then forward it to the upstream server and then wait for the upstream to respond. While waiting for the upstream, NGINX can pick more client connections from the queue and repeat the above. When the upstream sends a response, it relies on this back to the client. Since all these are short-lived operations, this fits beautifully into an event based concurrency model. Good old Apache HTTP server creates a thread/process for each connection to do the same. The no of threads it has constraints apache. If the number of incoming requests is more than the number of threads in its pool, it has to deal with switching and coordination. NGINX does not have this overhead which makes it comparatively faster than Apache in real-world workloads. All of this is a bit simplistic and hand-wavy but should convey the idea.

Event based concurrency cannot leverage multiple CPU cores which all modern processors have. To do this, you create one event unit for each core usually called a worker. Also, most software that leverage event based concurrency adopt a hybrid model where they use event based concurrency for short-lived quick operations and off-load long-running tasks to a thread/process.

I have glossed over a lot of details and nuances to explain a complex topic like concurrency in simple terms. Treat this as a good starting guide to dig more into this fascinating world.

Ode To Queues

If you have a producer with an uneven rate of production and a consumer which cannot keep pace with the producer at its peak, use a queue.

If you have a workload which need not be addressed synchronously, use a queue.

If your customer-facing application is riddled with workloads which can be deferred, move these to a queue thus making the customer-facing application lean and mean.

duck-3217049_640

Think of a queue as a shock absorber.

There are workloads which need to be processed immediately with sub-millisecond latency, and then there are ones where you have the luxury of taking time. It is advisable not to mix these in an application. The second kind of workload can be addressed by moving it to a queue and having a consumer process them.

For example, consider a scenario where you are consuming messages and persisting them in a data store. These messages are coming in at a variable rate, and at its peak, the data store cannot handle the load. You have two options. Scale the data store to meet the peak load or slap a queue in between to absorb the shock. Queue solves this problem in a KISS manner.

Queues enable applications to be highly available while giving enough room to maneuver. As long as the queue is highly available, the chance of message loss is almost nil. Since a queue is durable, you need not perfect your consumer’s high availability; you get leeway to manage.

With applications embracing microservices paradigm, there is a lot of API back and forth. Not all API consumption has to be in real-time. Whatever can be deferred should use a queue as the transport mechanism.

Queue introduces a bit more complexity into an application but the advantage it brings to the table makes it a worthwhile investment.

Software Security

Some disparate thoughts on security in no particular order.

Many security bugs can be avoided by making a clear distinction between authentication and authorization. When one logs into Facebook, one uses a username and password. Facebook lets you log in only once it is sure that you are the owner of the account by verifying your password. This is authentication. Once you log in, you cannot view all your friends’ photos. You can only view those photos which your friends have authorized you to view. This is authorization. There is a class of security bugs that arise because developers have not made this distinction.

security-department-1653345_640

A lot of security is knowing what not do. Security by obscurity and hand rolling security algorithms and protocols are the two things that immediately come to my mind. For example, while storing passwords, instead of coming up with an elaborate custom secure storage scheme, employ the industry standard bcrypt.

There is a thought process that you will do better security by having tons of access control. One of the manifestations of this is restricting SSH access to production boxes. Unless you have invested tons in tooling, this slows down teams drastically. In today’s world, where speed is paramount, this does not work. Under pressure to do things fast, teams find ingenious ways to circumvent these controls. Strict access control only works in organizations which are fine with taking things slowly but this usually stifles productivity and leaves a bevy of frustrated developers. The only way around this problem is to have the most necessary access control and take care of the rest through tooling. An example is how Netflix uses tools to enable developers to SSH into production boxes without compromising security.

Security implemented in a naive manner goes against the human nature of seeking to accomplish tasks in the least restrictive manner. If you do not invest in tooling, security always gets in the way of accomplishing things.

A less intrusive way of doing security is to configure systems with sane defaults. An example – when you provision a server, ensure that it is fortified by default. If you are using external tools, configure them with defaults. For example, if you are using Slack, configure it so that only people with your organization’s email address can sign up. Carry out a periodic audit of systems. This could be anything from periodically scanning SSH access logs to repository audits to ensure secrets and passwords are not leaked.

No writeup on security can be complete without touching upon compliance. There are tons – PCI, HIPAA, SOX etc. All these come with their own baggage. One simple way around this is to first understand what and all parts of your application have to be under the scope of compliance. For example, if you have an e-commerce application taking credit card information, you have to be PCI compliant. But this does not mean your entire application has to be under the scope of PCI audit. You can smartly bifurcate the application into parts that deal with payment and parts that do not. Once this is done, only the parts that deal with payment have to be under PCI scope.

A final note, security is a never-ending concern, there is nothing called enough security. Where you draw the line is up to you.

Here is a hilarious comic by XKCD on teaching a lesson to people who do not follow security practices.

exploits_of_a_mom

Naming Things

There are only two hard things in Computer Science: cache invalidation and naming things.

— Phil Karlton

Even though the above might have been in jest, naming variables while writing code is a head-scratching experience. Should I make it short? Should I make it descriptive? If descriptive, how descriptive? These thoughts keep running in one’s head.

tag-309129_640

A simple strategy is to keep the descriptiveness of a variable’s name in line with the reach of that variable. If the variable is short-lived i.e within a small block, stick to a short name as the cognitive load of the variable is negligible. If the variable’s reach is much larger, as in if it spans a large number of lines, make it as descriptive as possible.

Goes without saying that names should adhere to the conventions that your team has adopted.

Switching Languages

think-2177813_640

Many are apprehensive about switching programming languages. It is perfectly fine to have preferences – I am heavily biased towards statically typed languages with great tooling support, but being dogmatic is not something one should aim for.

What could be the downsides of switching programming languages? I am disregarding the psychological aversion to change and sticking to hard facts.

1. One will lose the fluency(syntax).
This is a non-issue, syntax is similar to muscle memory, one will get it back in a day or two. This is akin to swimming or driving after an extended break, one naturally gets it back.

2. One will forget the way of doing things.
Every language has a culture and a community accepted way of getting things done. Regaining this might not be as easy as syntax retrieval, but with some effort and thought, one should recoup.

3. One will not be up to date with the language.
Languages keep evolving, core ideas and philosophy remain the same. The standard library might become more expansive, VM might become faster, some earlier prescribed way of doing things might be an anathema now but the foundational principles remain intact.

4. There is no demand for this language.
As long as your fundamentals are good, this should not be a concern. There are roles which require deep language know how, but these are far and few. In fact, it is the opposite, more the languages in your kitty, more the opportunities.

The biggest upside to learning a new language is the exposure to new ideas and thought processes. Any new language immensely expands one’s horizon. For example, the way Java approaches concurrency is very different from GoLang’s take on concurrency. Having this sort of diverse exposure helps one build robust systems and mental models.

Programming languages should be viewed as a means to an end, not an end in itself. There are cases where programming languages make a difference, otherwise, there would not be so many around with new languages cropping up now and then, but you are doing a disservice to yourself by restricting to a few.

Conventions

Most programming languages have conventions. These could be for naming or code patterns.

rule-1752415_640

How does this help?

A simplistic view is that it helps to keep code consistent, especially when multiple people work on it.

A deeper way to look at this I believe is in reducing the cognitive load.

In cognitive psychology, cognitive load refers to the effort being used in the working memory.

If you have conventions, it is one less thing to think about. You do not have to spend mental capacity on thinking whether to name variables small case, capital case, camel case, with hyphen, underscore etc. You blindly rely on the convention. Same applies to code patterns. You look at the pattern and automatically grok the idea; without expending grey cells.

I strongly believe that all tech teams should have conventions wherever possible; outside code too. Freeing up any amount of working memory for things that matter will go a long way towards increasing productivity.