Void

Tag: programming

Unit test – purist versus practical

building-blocks-615239_640

Whenever you ask a question on unit testing in a forum, there is always that one person whose only job is to point out what you are doing is not unit testing but integration testing. It is important to know the difference but it is more important to not lose sight of the goal, ensuring a reliable and a bug-free application. Also, you need to adopt a terminology that works for you and your team, rather than what purists think or say.

In absolute terms, if a test depends on anything that is not in your control, it is not a unit test. For example, if a method that you are testing uses a public function, a method from an included library, database or an external API, it is not a unit test but an integration test. For a test to qualify as a unit test, you need to mock all these dependencies and get them under your control, only then you can claim your test as a unit test. Now that we have the purists happy, let us move to a more practical worldview.

When a regular joe developer refers to a test as a unit test, what she means is, she is trying to test a functionality in a massive gnarly application that she thinks is a small independent unit. This unit might have some components that are not under her control. Instead of debating whether she is unit testing or integration testing, a better discussion is trying to figure out what is the intention of the test, what needs to be controlled/mocked and not. Helping her to figure this out and achieving this will add more value than debating whether a test is a unit test or integration test.

Selfie

Let us say that you want to execute a job periodically, what comes to your mind first? If you are familiar with Linux, I can hear your screaming cron. Well, no doubt about that, cron is great, but what if I told you there is another approach which you can take to execute periodic jobs? Our good old continuous integration server Jenkins can supplant cron as a tool to execute periodic jobs and it kicks ass in doing so.

What makes Jenkins such a gem for executing periodic jobs?

1. You get a great web front end which is comfortably accessible from the browser.

2. The front end gives you a complete history of the previous runs with detailed information like when did the last execution occur, how long it took, what was the output during this execution, historic trend of the execution time and other diagnostic information.

buildHistory

buildTimeTrend

3. You can leverage the Jenkins plugin eco system and do some nifty things. For example, you can use log parser plugin to parse the execution output and alert if a specific format is found in the output. The great part here is that your job need not have alerting logic baked in, your job concentrates on doing what it does best, let Jenkins take care of the rest.

4. You can configure regular Jenkins build rules like alerting on execution failure, preventing subsequent executions if the current one fails, etc.

5. You can chain multiple jobs and the chain is very obvious thanks to the great Jenkins UI.

All this is great, but one problem I faced with Jenkins is that you cannot have a Job call itself recursively with a delay in between, you have to schedule the job execution using cron expression. The difference is subtle, but there are implications of this limitation which I will expound with an example. Let us say that I have a job which ideally should run every 15 minutes, but sometimes this job execution takes more than 15 minutes to complete, in that case what happens is, job executions queue up and fire successively one after the other. The way I want this scenario to pan out is, once the execution finishes, it should wait for 15 minutes before the next execution starts. I could not find a way to do this in Jenkins and hence selfie was born.

Selfie is a Jenkins build trigger plugin which lets a project to trigger itself after a configured delay. The plugin appears as one of the build triggers while configuring a new project.

selfie

This is my first attempt at writing a Jenkins plugin, pull requests and code reviews are more than welcome.

SQS versus Kinesis

A lot of people are confused between SQS and Kinesis. In some ways, both act like a queue, but there is a huge difference between the two.

SQS is a queue, adheres to FIFO and promises at least once delivery.

Kinesis is a distributed message stream handler. A simplistic and hand wavy way to think of Kinesis is like one huge log file, items that you write to the stream as lines in this log file. When you want to process the stream, you get a pointer to the log file. When you read a line from this log file, the pointer moves to the next line. Kinesis is stateless, as in, it does not maintain the pointer for you, it is upto your reading process to maintain this. What this means is that, say you are reading off a Kinesis stream and your process goes down, when you bring the reader process up again, it will start processing from the start, not from the last line before crash. There is no concept of popping items out of Kinesis, data is always there(expires after 7 days), you manipulate the pointer to this data. Hence, if you want to re process the stream, you can replay i.e you can start from the beginning and do the whole thing over and over again. AWS provides a client library for Kinesis which maintains the state for you. This client library uses dynamodb to persist the state.

This should give you a fair idea of when to use Kinesis and when to opt in for SQS.

Nothing is sacrosanct

There is an interesting bug opened against Kafka. For those of you too lazy to click on the link and read through the description, I am reproducing it here in full.

It appears that validation of configuration properties is performed in the ConsumerConfig and ProducerConfig constructors. This is generally bad practice as it couples object construction and validation. It also makes it difficult to mock these objects in unit tests.

Ideally validation of the configuration properties should be separated from object construction and initiated by those that rely/use these config objects.

http://misko.hevery.com/code-reviewers-guide/flaw-constructor-does-real-work/

It links to an article by Misko Hevery on writing testable code. If you have not read posts by Misko Hevery, I urge you to. For an open source project like Kafka, it might make sense(Jay Kreps, the person behind Kafka does not agree with the bug as visible in the comment) to follow all the guidelines of writing testable code but for your project it might not. If you are a small company with a two or three person team, do not blindly follow practices because someone on the internet says so.

Follow rules and guidelines only if it helps you to make your code more secure, performant, easier to maintain etc, do not ape guidelines without understanding why they are laid out in the first place. To go back in history, checked exceptions were all the rage in the Java land a couple of years ago, but these days, after frameworks like Spring sprang up, people look down upon checked exceptions. Same with TDD. TDD was expounded as the next best thing after sliced bread, but now programmers are raising their doubts about TDD.

A lot of times, mocking objects, interfaces etc takes more work than writing the actual functionality. In many projects, there might not be an ROI in writing/maintaining this elaborate test framework/infrastructure. It is true that injecting dependencies into an object makes it easier to test, but it also comes with the downside of having to take care of injecting dependencies every time you create that object. If you are injecting dependencies by hand, object creation becomes an elaborate exercise each time, else you have to delegate this to some framework like Spring, Guice etc, now your project is bloated. Maybe you should side step dependency injection and create the object with the dependencies inside it.

The situations and background under which these rules/guidelines crop up might be radically different than the one in your organisation/team. Taking the call of what to follow and what not to is more of an art than science. Your instincts, past experiences etc help you in formulating them.

Designing for failure

In the world of software, failure is a certainty. Servers go kaput, databases go down, processes go out of memory, things break all the time. You can categorize software as good or bad based on how they behave in these adverse scenarios. I am not trying to imply that software has to be resilient to all these, on the other hand, I believe that it is perfectly fine to crap out when shit hits the fan. The question is how do you handle this crapping out.

Whenever architecting components, devote ample amount of time to failure scenarios. Let us take the case of a piece of code which interacts with an external, third party API. What are the questions you should be asking when designing this component? What happens if the API suddenly stops responding one day? Can I hold my main thread hostage to the API response time? What happens if the API takes eons to respond? In case there is an exception, am I logging enough data to debug? If there are performance issues, do I have enough diagnostic data? Diagnostic data might be in terms of graphing the API response time, no of times the code path was executed, etc. Do I need to send out an alert when something goes wrong? All these question revolve around failure handling. These questions should be second nature to you as a software engineer.

I have seen a tendency among developers to devote inordinate amount of time in making their code adhere to the latest programming fad, trying to use the best possible library etc, but not to failure scenarios. Logging data might not be as sexy as debating which design pattern to use, but once things break, logs are your only friend. Next time when you are furiously pounding on the key board, take a step back and ask these questions. In the future, the developer who maintains the code that you wrote today, will thank you for doing this.

Go lang

Last round of the recently concluded stripectf was in Go lang. This gave me a good opportunity to familiarize myself with the language. Even though my native programming language is Java, I have worked professionally in JavaScript, Perl and PHP; dabbled in Python for my personal projects and can manage to read Ruby, Lisp(and it’s dialects), Erlang and Scala with some Google help. When I ruminate on programming languages, I do not see any of these replacing Java as the de facto lingo of the enterprise world but I see the promise in Go lang.

1. Go lang is strongly and statically typed. This means that a lot of mistakes that could potentially cause your code to blow up in production would be caught at compile time. Apart from this, if the language crosses a critical mass of adoption, someone will come up with an IDE that can possibly match Eclipse, IntelliJ, et al. Also, one of the principles behind the language’s design is to aid tooling which means that a lot of tools could possibly crop up which would help to make code more secure, performant etc.

2. Syntax of Go lang is not revolutionary. I consider this a virtue. I am strongly of the opinion that if a language has to gain mass adoption, it’s syntax should be very close to the prevalent languages. Go lang does not deviate much from the Cish syntax but has subtle improvements which improves programmer productivity.

What makes Java a good programming language for the enterprise? Syntax of Java is very close to C which means that you could possibly train a lot of the computer science graduates out there to code in Java. Try Lisp with an average Joe programmer and you will know what I am talking about. With the people supply problem solved let us move on to other factors. Enterprises gravitate towards stability, security and viewing programmers as replaceable components of a machine. Java gives enough constructs to prevent a reasonably sane headed person from shooting themselves in the foot. Static/strong typing, code analysis tools and IDEs go a long way in helping with this. Without tearing your hairs out, try working with a code base in a dynamic language designed by an architect and then handed to a team for coding and then passed on to a testing team and then shifted over to an offshore team for maintenance. And to top it all, you do not have Eclipse or IntelliJ to help you with this mess.

Go lang has all the pluses of Java minus the verbosity. Of course it has a lot of other features which you can read about in the website.

Mental model of systems

One fine Sunday evening our quartz jobs running inside tomcat server started to freeze.  At the same moment, tomcat went kaput. sshed into the server and started poking around the logs. No error in logs. Hmm, ok. Checked the system health, again stats looked hale and hearty. Now what do I do?

I started to reason around the problem based on my mental model of the the inner workings of the various components in our application. Jobs are freezing and at the same time tomcat stops serving out requests. Cannot be a tomcat issue as I am not getting a tomcat connection error, so it means that tomcat has enough threads to spare and it’s thread pool is not exhausted. The implication of this is new threads are being spawned but are not able to proceed. Hmm, ok, so the threads are getting blocked. Now how do I figure out where the threads are getting blocked. Can I hook something into the JVM and get the state of all the threads running inside the container. Oh yes, JMX gives me that ability. Restarted tomcat with JMX configured, hooked into JMX and found out that the threads are getting blocked while trying to fetch database connections from the connection pooler. Our servers were not under any undue traffic spike, hence cannot be a scaling problem. That means some query is running rogue. How do I get the query that is running rogue. I can do two things, check the process list in the MySQL server or check the stack trace of the threads currently running and identify the point at which it is stuck. Took the thread approach, zeroed in on the code where the query was getting executed, fixed the query and things were back to normal.

If you are a full stack developer, as the current state of things stand, it is virtually impossible for you to develop a deep understanding of each and every component in the eco system. Let us list down the things that you need to be on top if you are running a web application in java. You have to know Java, the J2EE servlet spec, the application server that you are using, your MVC framework of choice, the components the MVC framework uses(maybe spring for dependency injection or hibernate for persistence etc etc), sql and the peculiarities of the database, HTML, CSS JavaScript, JavaScript frameworks, CSS frameworks, the OS in which you are hosting your app, the administration of this OS, physical hosting, the list goes on and on. To make matters worse there are so many options available for each of these components. Also, these components have additional dependencies. Getting intimate with the intricacies and the idiosyncracies of each and every component of the application is a herculean task better suited for immortals. This is where having a generic mental model helps.

Coming back to the problem of the crons stalling, even though I was not an expert in the internals of tomcat nor jmx nor db connection pooling, I had a very good general understanding of how these systems work in general and I could fit the problem into it and reason through and fix it. Let us say that you are using an evented HTTP server and you see that the throughput is not up to mark. If you know how an evented server works in general, you can think of at least two loop holes in your system that might be causing this. Either you are doing blocking calls or running some CPU intensive tasks. This knowledge is transferable irrespective of whether you use netty or nodejs or tornado or some other new kid on the block.

I am not trying to discourage you from developing a deep understanding of particular frameworks or technologies, but to the contrary in addition to it try to abstract your understanding out of the specifics and into generals so that tomorrow when a new shiny technology shows up, you can reason around it based on your past experience.