My former colleague Brian H. Madsen, Esq. posted a lament that he couldn't find an RSS reader that would allow him to comment on the most recent version of a post. I responded that this limitation is fundamental to reality at large and that the alternative to accepting potentially obsolete data is paralysis. Brian disagrees with this assessment. Rather than comment again on his blog I've chosen to elaborate my point in greater detail on my own.
It would seem to be a good thing in general to make decisions based on the latest information. Certainly we should try to always use data that is as accurate as possible. Unfortunately here reality conspires against us. A few fundamental principals get in the way:
- Decision making takes time. This may range from nanoseconds to years depending on the type of decision being made.
- Information takes time to propagate. Even within a system information transfer is not instantaneous and between systems the delays are significantly longer.
- Information may change independently of our decision process. New information may become available before, during or after our decision process.
The consequence of that is that if we are to make a decision by a certain deadline then there is a cutoff period before that deadline where we must stop accepting new information and must use what we have. There are very few decisions of any consequence where we can push off the decision indefinitely.
As an example take the process of posting a customer a bill. At some point you have to actually print the letter and post it. If you hang around because the address may be updated then you'll never send the bill, go broke, starve and die. So that's bad. On the other hand if you don't use the latest address the bill might get lost and not paid. So you'd like to use the latest address. These desires are contradictory.
To handle this at some point x we decide to go with the information we have. It may be wrong. The customer may call us at time y and give a new address. If y is after x then we've sent the bill to the wrong address. Even if y is before x then processing latencies do not allow us to guarantee that the new address will be used. The new address may not have had time to get from the CRM system to the billing system. This is functionally equivalent to the customer changing their address five minutes after we hand the bill over to the postal company. We can minimise the latency between our systems but we can never eliminate it. Indeed as long as it's reasonably fast this doesn't give us any real benefit. The real issue is the timing of the call from the customer and that is entirely beyond our control.
The solution is not some kind of futile attempt to ensure we have the latest data. That's simply not possible. The solution is a compensation mechanism that deals with the cases where our decisions are wrong. In the case of a bill posted to the wrong address real world solutions exist. Mail can be forwarded or returned to sender. We can also initiate new workflows that involve contacting the customer via other channels. These processes may be individually expensive but they have two distinct advantages:
- In most practical applications the likelihood of an individual data item being obsolete and causing a wrong decision is small. As such the compensation mechanism is required only infrequently relative to the total number of decisions. This means the cost averaged across all decisions is small.
- It's actually possible. This means it wins hands down over a low cost but impossible solution.
Brian has suggested that in his RSS example some kind of version attribute may address his concerns. I think not. This would inevitably strangle conversations involving many participants. On a new post with many commenters the likelihood that you can submit your post before a relevant change (i.e. another comment) would be relatively low. This would tend to promote only short comments that lack context and lead to poor discussions. The current compensation mechanism of followup comments is simple and widely effective. As for versioning only the post content I remain unconvinced that a sufficient number of posts are updated after posting for this to have any real value. Certainly in the variety of blogs I follow this is a rare circumstance, the more usual behaviour being comments or followup posts.
In my personal opinion the erroneous view that we can have the latest information available to work with is promoted by monolithic systems that do multiple business functions and have a single all-encompassing data repository. This leads to the kind of thinking that we have the latest information because we can query it directly from the "source". What this fails to consider is that this means we can query the latest version we have but does not mean that this version represents current reality. Even then transactions in process mean that the data we query may not be the latest version. Anything that varies over time is subject to having decisions made on other than the latest data.
SOA has an advantage here in that the explicitly decoupled nature of the data and often asynchronous interactions between services embeds understanding that the data available is not the latest, but rather the latest as at a point in time. Knowing this immediately gives the realisation that the data may have changed subsequent to that point.
None of the above implies anything about a desire to gain understanding or increase our knowledge. These are good things to do. What this relates to are the practical limitations in the application of our knowledge. Additionally if you are only using historic data then many of these concerns will not apply as historic data is immutable in almost all cases.
In summary:
- Brian is wrong. This is obviously the most important point.
- Part of making a decision is realising that some of the information you use may be out of date.
- You cannot guarantee that your data is the latest version.
- In order to deal with decisions made with obsolete data a compensation mechanism is required.