Last week, I was in Barcelona for the second edition of the Novartis Datathon challenge. The Datathon Challenge hosted some of the best data scientists to solve a Big Data challenge focused on predictive analytics and AI applied to a forecasting challenge and clustering analysis. You could participate individually or in a group. We had one day and a half to do our best. They then shared with us the right metrics that were needed to evaluate the model and provided references and answers we needed. Additionally, there was a mentoring meeting to clarify some doubts about the data, metrics, business case, etc.
The challenge started on Saturday at 11:00 o’clock and ended on Sunday afternoon. We didn’t finish in the Top 5 but we were in the Top 10. After the competition, we identified some tips and learnings that I would like to share in this blog to help you achieve your next challenges.
- Simple is always better. You must begin from a very basic technical solution that you know of. You can keep enhancing the capabilities as you progress. This means, altogether you have to define the framework to solve the challenge because there are many different techniques, algorithms, libraries, and languages. In our case, our first approach was using Python Regression Algorithm, but our best submit was using R and ARIMA model. We wasted a lot of time changing the environment.
- Plan your strategy. Spend some time in advance and do your homework before the event to explore and learn from different sources about the challenge. In our case, we couldn’t do that because of clashing business travels and other work commitments. Next time I promise to dedicate enough time to prepare in advance and build some useful templates and functions to solve data transformations, missing values, categorical features, etc. A good practice could be to create a common repository to share these templates between the team members. You can do this through Visual Studio Code and Azure DevOps.
- Clearly define and discuss the steps with your team and also create a schedule plan to be able to measure your work. Again time runs, and the optimization of this is a great advantage. I used Trello, but we had no time to follow and check. It probably makes sense to explore some other options.
- Be complementary, if you are not the best programmer, please wait for it and rather support your peers. It is always a better practice to add capabilities to the backbone than starting from scratch. There are a lot of things to do, cleaning, featuring.
- Take your time in evaluating metrics analysis. This is a key point because you need to understand how well you can improve your results, so that you can include this in your development. In our case, the metric used to define the upper and lower bound deviations had a negative effect on the result. That means, you have to define a clear way to calculate these values; otherwise the final result could be wrong even if your prediction was closer to the correct one.
And for sure, enjoy it!
For me, it was a great experience and I would definitely like to participate again! Also following these rules I hope to be more competitive and have the possibility to make it to the TOP 5. The real added value is the lessons learned and how it helped us improve ourselves. Thanks a lot to Novartis and Eurecat Events for this awesome challenge.
About Alberto Alonso Marcos
My name is Alberto Alonso. Actually I work with Sogeti Spain in Business Intelligence Department with Microsoft Technologies. My profile is very orientated to customer, and how the DATA can improve the organization. My first steps in the data management were in the Pharmaceutical Sector. (I´m pharmaceutical too). I worked hard to extract and built procedures for gathering all the information across the organization. Measurement all kind of events. Aggregating different sources like ERP, LIMS, HVAC, OEE tools, and productivity machine reports.
More on Alberto Alonso Marcos.