In my first blog, I want to share some thoughts that come to mind every now and then in my work as a data scientist and data analyst. Maybe the phrase “The data is wrong or there is not enough data”, something that we often say when finding weak results in our analysis, is not actually encompassing the problem that we are trying to solve. Before talking about Machine Learning, statistics or data quality control, we should talk about education and the application of data to computational thinking.
First, we are going to define computational thinking. Computational thinking is a process that allows us to overtake a complex problem, understand it and develop the most adequate solutions using informatics, i.e., the essence of computational thinking is thinking as a computer scientist when we are facing a problem. An alternative definition would be that computational thinking is the process that allows formulating problems in such a way that their solutions can be represented as sequences of instructions and algorithms. This last definition is the one I want to use as a starting point for this reflection.
As we can see, computational thinking is the mental process that we all use when designing and developing projects in our workplace, and this becomes even clearer when talking about problems that require user data to be solved. Data analysts constantly use this way of thinking, using the available data to find explanations for what is happening or to develop solutions to the problems they face. But we encounter a recurrent problem, prime matter, the most basic pieces we need for our work do not have enough power to enable the development of solutions. This occurs because data were not considered during the process of computational thinking at the starting and design phases of the project, that is, there was no foresight of the problems that could be solved with data in the future.
The current way of facing project development, independently of the working methodology (Agile, Waterfall, etc.), generates projects that are thought from the point of view of the features needed in the short and middle term, and the needs they will have, but this way of thinking generates problems in future features, designed to be implemented with data in mind. The data structure, data extraction and the information offered by them are limited by the first features, and modifying or using them becomes an exponential increase of effort.
We can think of an example feature; let’s think about an action A that produces a reaction B in an application. This is one of the most basic functionalities of any app. It can be, of course, as complex as we make it, but we may imagine something like a text message that appears when a button is pressed.
If we think about this feature, we may not see a priori where data fits in here, but adding data to computational thinking in this development we may start seeing future features. For example, we could register who presses A, how many users press A, how many times A is pressed during a period of time, how long it takes for B to appear after pressing A, how long B is on screen, etc. This simple feature allows developing new, more powerful features, if we have previously thought about adding these properties to the feature. We may be able to know the proportion of users taking this action, B’s response time and whether this might be impacting usability, how many times the feature is used in a day, month or year, if it is actually used at all, periods of maximum and minimum use, etc.
As you can see, incorporating data into computational thinking in the design and development process results in a huge increase in the growth and improvement possibilities of our applications. Moreover, when designing new features that involve data analysis, we can actually do it with less effort and greater efficiency, since the application is already prepared for it.
In science, this process is always performed from the point of view of what we want to achieve: we have a hypothesis and we collect data always with it in mind. In software development, it is hard to foresee all the future developments, but if we progressively incorporate this way of thinking, the world of data analysis will take a giant leap.
About Jordi Martinez Rodriguez
I always like to start my biography by highlighting my scientific background. During my university years I studied the Biology Degree in the University of Barcelona. There, between animals and plants, I discovered my passion for data and the scientific method to solve problems through study and analysis. As the next step in my evolution as a scientist I studied a master’s degree in Bioinformatics and Biostatistics, where I discovered the field of programming and, more specifically, Machine Learning. It has now become my passion. Thanks to the opportunity that Sogeti offered me, I was able to apply all this acquired knowledge to projects with innovative goals, among which Cognitive QA was the flagship project. In CognitiveQA we developed a product able to apply, for the first time, artificial intelligence to the field of testing to improve and advise QA processes, becoming a recognized project at the international level. This experience in Testing has also allowed me to participate as a speaker in several events to explain how to advance the field through artificial intelligence. I am currently working within the innovation team in Sogeti Spain, developing integrative solutions in analysis systems, data visualization and machine learning for all kinds of clients. Finally, I do not want to leave out one of my core functions within the team, which is to train, advise and help to grow the new members that join the Sogeti family.
More on Jordi Martinez Rodriguez.