Enablers, Not Doers

How do you run effective Platform Engineering teams?

All organizations have Platform Engineering teams in one form or the other; these are centralized engineering teams providing building blocks for other engineering groups within the company. The customers for these teams are the internal engineers, not the end-users of the product.

For Platform Engineering teams to be effective, do not strongly couple them with the other engineering groups. Even though these are centralized engineering teams, they should operate in a decentralized manner. Platform Engineering teams should follow the mantra of Loosely coupled but strongly aligned with the other engineering teams. To achieve this, view Platform Engineering teams as enablers, not as doers.

stone-wall-86660_640

The build engineering team creates tools and resources so that any team in the organization can deploy their builds to production, i.e., the Build engineering team enables you to deploy builds using the tools they create; they do not deploy the build for you.

The performance engineering team creates tools and frameworks for you to identify performance bottlenecks, i.e., the performance engineering team enables you to identify performance bottlenecks using the tools they build; they do not identify performance bottlenecks for you.

The security engineering team creates tools and libraries for you to identify security loopholes, i.e., the security engineering team enables you to identify security holes using the tools they build; they do not identify security loopholes for you.

This is how organizations should mold and communicate the role of Platform Engineering teams. Positioning the Platform Engineering teams as doers instead of enablers makes them the bottleneck for other teams to get their work done.

Other teams rely on Platform Engineering teams for their success. You do not want them second-guessing whether the Platform Engineering teams are doing their job or not. They need to trust the Platform Engineering teams with their critical workloads. Without trust, everything breaks, leading to unnecessary back and forth, sapping the energy of all parties involved.

To build this culture of trust, Platform Engineering teams should focus on observability, performance, and stability.

Lack of observability creates anxiety. If the build engineering team does not give a dashboard where one can see the progress of builds, history of builds, and create alerts on build failure – teams would be anxious about their builds. They would bug the Build Engineering team on this, causing stress all around. Observability is a must for creating a culture of zero stress and anxiety.

Documentation is a subset of observability. Once the Platform Engineering team releases a tool or an API, others should be able to use them without intervention from the Platform Engineering team. To achieve this, Platform Engineering teams should focus on clear and concise documentation.

Since all teams use the tools and APIs of Platform Engineering teams, performance improvements have a multiplicative effect.

The build tool not working, brings the engineering org to a standstill. Login not working negatively affects all parts of the application. Hence, a keen focus on stability is a must. These are the foundations on which other engineering teams build their features.

Due to the nature of the work of Platform Engineering teams, consulting, guiding, and proliferation of best practices becomes part and parcel of the day to day responsibilities; this is understated, but Platform Engineering teams spend a significant chunk of their time on this. Look for ways to institutionalize these so that the Platform Engineering teams are engaged in their core work and not spending time on this. As discussed earlier, focussing on observability, performance, and stability goes a long way towards this.

Feature prioritization can become a challenge for Platform Engineering teams as everyone comes to them with feature requests. A simple yardstick to use is – If we do not release this functionality, is there a way for the requesting team to go about their work, albeit in a roundabout manner? If the answer is yes, then it is not a burning problem. If not, you need to figure out a way to get the feature out as soon as possible.

Platform Engineering teams should adopt a broad perspective when developing features. Do not develop features for a particular team. Think of how the feature relates to other teams and design it so that everyone in the organization can leverage the feature.

If you want to build a culture of speed and rapid iteration, viewing Platform Engineering teams as enablers and not as doers is critical.

Get articles on coding, software and product development, managing software teams, scaling organisations and enhancing productivity by subscribing to my blog

Image by Susbany from Pixabay

 

Optimists, Pessimists, and Better Coders

Whether one is an optimist or pessimist is dictated by genetics. Wise people say that you can give your genetics a run for the money by making happiness a conscious choice and learn to be deliberately happy.

What does this have to do with coding?

christophe-hautier-902vnYeoWS4-unsplash.jpg

How do you decide whether a code is good or not?
There are many parameters, but a definitive indicator is the number of bugs – lesser the bugs, better the code.

What is the definition of a bug?
A bug is a problem with the code, which makes it not work correctly.

Why do bugs occur?
The person who authored the code did not anticipate that particular condition, and the code does not know how to handle that situation.

How does one prevent bugs?
Anticipate all that can go wrong and take care of them while coding.

To do this, one needs to take a bleak look at things – exhaustively think of all that can go wrong and take care of these. To put it shortly, you need to wear a pessimist’s cap.

Then, do pessimists make better coders?

There is another way to look at this.

To create something, you need to be an optimist; coding is about creating something new. To write code devoid of bugs, you need to take care of edge cases and boundary conditions and account for them.

Is a better coder someone who can balance optimism and pessimism?

Get articles on coding, software and product development, managing software teams, scaling organisations and enhancing productivity by subscribing to my blog

Photo by Christophe Hautier on Unsplash

NOT – Not Only Testing

How do you create a quality assurance strategy? 

Is quality assurance the same as testing?

Companies nowadays do not have a separate quality assurance(QA) team. People coming from the old world find this difficult to digest. In the last decade, the way of developing, deploying, and monitoring applications has changed.

Gone are the days of the waterfall model of development. You no longer take months to develop a feature, followed by a long testing cycle and finally hand it over to the production team for deployment. Release cycles have shrunk from months to hours. People who build the application are responsible for deploying, running, and monitoring it.

In this brave new world, base your quality assurance strategy on:

  1. What is the cost of a bug in production?
  2. How quickly can one detect a bug in production?
  3. How quickly can one fix a production bug?
  4. If a bug manifests in production, how does one limit the damage?

austin-neill-ZahNAl_Ic3o-unsplash.jpg

 

The cost of bugs is not uniform; bugs in different areas of the application cost differently. The line of business also dictates the price of bugs.

For an aerospace or medical devices company, a bug means the difference between life and death literally, not so much for a social network or an e-commerce website. If an e-commerce website does not load, it significantly hits the company’s bottom line. For a social network, it might make an insignificant dent in advertising revenue. For an e-commerce company, the cost of a bug in the recently purchased items section on the home page is not the same as checkout from the shopping cart not working. While your shopping cart checkout has to be bulletproof, you can be lenient with the recently purchased items section on the home page.

Vary quality control based on the criticality of the functionality. Be tight where it needs to be and loose where you have room to wiggle. You need not have a uniform quality assurance strategy for the entire application.

Testing dents your speed and time to production irrespective of whether you do it manually or automate it; this is the cost you pay for quality – you need to make peace with this. The stricter you get, the costlier it becomes. Automated testing is not free; it too has a cost – more things to maintain and manage. In some cases, you might end up writing twice the code due to automated testing. I am not arguing against automated testing but asking you to – factor in and mentally accept the cost – before you take the plunge.

automation

 

There is a tendency to equate testing with quality assurance. Testing is a subset of quality assurance; testing is NOT quality assurance. Quality assurance is much more than JUST testing and comprises of a variety of things.

Today, experimentation, incremental development, and speed are the essence. Try a small idea, see whether it sticks, and then rapidly iterate and expand. In such an environment, following a design philosophy that gives your room for error and factors in bugs and things not working is key.

Quick detection of production bugs rests on the observability built into the application. Speedy recovery from production bugs is a function of deployment practices. Reducing the impact of production bugs follows how you roll out features. All these are a result of tooling, development practices, and engineering culture. Today, these are as important as vanilla testing – manual or automated.

The ease with which you can set up a development environment for the application has a direct bearing on the quality of the product. In a world of micro-services and external dependencies, setting up a development environment can get complicated with a lot of moving parts. If you make it difficult for developers to create a robust development environment, the quality of the product takes a hit.

Static code analysis and enforcing best practices through tooling like pre-commit hooks improve the quality of the code, which directly improves the quality of the application. It is paramount when you use languages that are promiscuous with typing; this is one of those things where the cost is low, resistance is nil, and the rewards are high. Be always on the lookout for tools and processes where you get a better quality product with zero resistance and impediment to speed.

Adopting the asynchronous and event-based architecture and design patterns gives room for error and recovery. In today’s fast-paced environment where you do not have the luxury of time to test every minute aspect of the product, this is a boon.

Do not have a tunnel vision when it comes to quality – do not restrict it to only testing; adopt the “NOT – Not Only Testing” strategy.

Get articles on coding, software and product development, managing software teams, scaling organisations and enhancing productivity by subscribing to my blog

Comic from XKCD.

Photo by Austin Neill on Unsplash