Lately, I had the joy to host a session at a university’s Mathematics Meet Industry Day.
I haven’t had the opportunity to see real data science work in action. I think many are in the same situation. It is seen as something strange, a black hole, for most of us in the traditional IT industry.
I provided the case; analysis of an old wooden apartment house heating energy consumption.
Two different teams got the task to analyze five years of data and find out what factors influence the energy consumption. The teams consisted of students and researchers that seem themselves as mathematicians.
With curiosity, I studied their teamwork, especially the younger college team.
I quite soon realized that mathematicians is not a consistent guild. Without any co-ordination, the following happened in a simultaneous creative chaos:
- One was driving the case and the discussion
- One was starting to define algorithms/formulas to describe the consumption case
- One was starting to analyze in more depth the physics around heat transfer in wall materials, and how the relation between inside and outside temperature works
- Two opened up Excel and MathLab and started to identify what open data is available that can affect heating energy consumption
A fun session which I learned a lot from. Most of all it de-mystified the data scientist work. I was also impressed by these young people’s knowledge and drive to solve problems.
I also must say that the creative chaos became more controlled after a while, and they presented an impressive result where they have divided the total energy consumption in different sources of heat. They also analysed how tenant’s characteristics and behavior affects the energy consumption.
The other team, the older one, started in another approach to discuss what factors that could affect the heating energy consumption. For example: Does sunshine and wind direction make sense? Does snow on the roof stop heat loss? How is heating water produced?
Another difference with this team was that they used a programming tool and Python to analyse the data.
The basic result from the two teams what that the temperature is the far most affecting factor for the heating energy consumption. Tenants behavior and characteristics was negligible.
What I realized regarding the team and the roles is that the Data Scientist is not a role, it is a team. All the different capabilities needed to solve problems and create a usable result can’t normally be provided by one single person. We have to think TEAM.
I researched a little on the Internet and found the following picture from Data Science Institute. It actually show what a multi-facetted knowledge area Data Science is.
One finding was that I as an Enterprise Architect / Business Analyst can set-up a team of young math students and create magic together. I would then take the role of “Domains/Business Knowledge” in the model above.