Poor Man’s Anomaly Detection
You have a feature where if someone signs up on your product, you create a wallet for that person and top it up with complimentary money. Your organization swears by micro-services; hence sign-up logic is in one service and wallet creation and crediting is in another service. Once a user signs up, sign up service sends a message to the wallet service so that it can create the wallet and do the credit. To ensure the sanctity of the system, you have to make sure that the number of signups, wallets created and credits done match one another. Also, if these go out of sync, alerts need to be in place to take corrective action.
Since the two are disparate distributed systems, one way to achieve the above is to use an anomaly detector. There are off the shelf products for this as well as open source projects. If you do not have the time, need and resources to invest in deploying an anomaly detection system, having a reconciliation system is the way to go.
Reconciliation is deeply entrenched in the financial domain where it is a way of life. The technology world can borrow this and use it as a poor man’s anomaly detector. For the scenario that we started with, we run queries on the data repository of the sign-up and wallet systems at regular intervals. These queries fetch the count of sign-ups, wallet creations, and credits that occurred during the period. Once we have the numbers, all we have to do is ensure that they match. One can do this with a simple bash script; this is extremely simple to develop and deploy.
Reconciliation can play a role in all places where two-phase commit flows are involved. For example, most payment flows follow a two-phase commit process. You first deduct money from the user’s payment instrument and then fulfill the commitment. There is a good chance that post payment debit, your system dies not doing the fulfillment. Having a reconciliation system in place helps you to take corrective action in these scenarios.
Reconciliation is a simple way to achieve anomaly detection until you have the need and the resources to invest in a more robust distributed anomaly detector.