Figures and facts
Oct 28, 2013
Anyone working on the topic of Data and Big Data is fully convinced that technologies are mature enough to tackle the main issues about the principle of the 3V (Volume, Variety, Velocity). But reality is here to bring us down! The idea of this post has risen reading a recent article on the unemployment figures announced by the French government.
The facts:
- On September 25, the French government announced a decrease in the unemployment rate of more than 100 000 unemployed in one month.
- This announcement was analyzed and completely challenged in regard to the incredibly high number of ‘unsubscribers’.
- At the end of the day a new announcement indicates that there had been a problem in the figures.
The subject is neither to initiate a political debate nor to point out the different responsibilities. I am much more interested on what this event highlights and what kind of questions lays behind.
As usual, this error is a combination of multiple and disconnected issues, mostly three of them are worth being presented here:
- Bug on a rather small volume of Information compared to the total amount of information treated (for example around 180 000 messages lost on a global SMS platform processing several million of messages).
- Lack of interoperability between the different IT systems that did not make it possible to integrate the different data flows and check the data quality.
- No accurate model to evaluate and calculate the results of the operation and that could identify a global issue (we are not talking about 5 or 10% margin of error but almost 50 %).
We do know that 69 % of deciders are not comfortable with KPI’s and information on which they base their decisions (IBM source). This event makes it clear. The issue is people are required to promote information and numbers, even if they do not know they are right or not.
The second point is: “OK, I am not sure of the absolute numbers but what about the trends?” The answer is again: I do not know how to qualify the trends and be predictive.
The reality is that answering these questions is not easy at all. So what to do?
- Define or redefine the business process, taking into account the information generated and calculated by other Information System and making it possible to check that this will no generate errors within the whole process, and align IT capabilities permanently around critical data.
- Rework the Data model and data mining contents in order to create a more predictable model rather than one only based on past events.
- Last but not least, install a real Data Life Cycle Management or Data governance policy in order to anticipate and become a data driven organization.