This story starts with an issue reported by one of my customer: a car manufacturer.
Multiple times in a day, it’s IT team (thanks to batches) has to migrate a large volume of data (several TB) from 3D CAD (Computer Aid Design) application version n to version n+1. Classical scenario of data migration that takes place throughout the year. Indeed, they decided to use in parallel the two versions for a while until they are sure all is fine with the new app version.
3D CAD application editor provides specific scripts generating a lot of logs (30GB per week). But, most of these logs are not structured meaning that it is very complex to exploit logs data for understanding the frequent data migration issues. They cannot fix them and understand.
They ask us to have a look: first conclusions of data exploration and data mining.
- It costs a lot of time to read the log and try to understand something.
- Logs are not structured and retrieving one information requires to open 10 logs.
Getting manual datasets for performing data mining required 8 hours for 50 observations.
- Findings of data mining with very little data is not good. Initial idea was to explain data processing issues thanks to other variables found in the logs.
We need a way to structure these logs and we cannot wait to use CAD app editor
Use of LogStash/ElasticSearch/Kibana
Use of ElasticSearch in this case was powerful. It allows (thanks to Logstash patterns and filters) to extract and structure data from these logs. As ELK stack is scalable, large volumes are not a problem.
Kibana allows to design quick dynamic reports of monitoring.