You need a lot of data when you want to train a machine learning algorithm and since Big Data became a buzz word we have been collecting massive amounts of data that could be used for this purpose.
The last couple of years we have been discussing bias in machine learning. We have heard about Microsoft’s Chatbot Tay and Google’s Image recognition that “recognized” non-Caucasian people as Gorillas and most AI-professionals nowadays know they must consider a way to avoid bias when selecting training data.
But as the world is ever changing and there is one form of bias that will always be present – the fact that historical data doesn’t necessarily represent the future very well! I will call this history bias. You could argue that ML will never work well in an ever-changing world, but that would probably be to go too far.
ML can be used for a few different categories of problems:
- Anomaly/feature detection
Categorization would usually not be much impacted by history bias as you are trying to categorize a situation as belonging to one of several previously defined categories. An example could be to identify objects on a conveyor belt where you want to identify certain types of objects and are perfectly happy to reject everything else as being incorrect. If you extend this scenario and have the ML be self-learning and identifying new categories, you could get into trouble though.
Anomaly or feature detection where you have ML trying to detect something in a (stream of) picture(s), sound, or other data would usually be fine if you are trying to detect predetermined features in varying surroundings or random anomalies in very well-defined objects. An example of this is the identification and counting of cyclists in Copenhagen, which is done by ML-algorithms embedded in the camera – by the way, an excellent way of avoiding GDPR problems – as soon as a cyclist is identified as such and counted the image itself is discarded – no one will ever see it! But again, if you try to make the detection too intelligent and self-learning – history bias might kick-in.
Prediction – now you are looking for trouble – You can only predict deterministic processes! Human behavior could be argued to be deterministic if you know all the parameters that could influence behavior on an individual level, but here we are talking thousands of parameters most of them unconscious for the individual. This makes it impossible to predict human behavior at least with accuracy – and you will have even bigger difficulty going to group level – especially as the very individualistic synthetic generation enter your scope. You could have better success If the system you try to predict doesn’t involve human behavior. There exist non-human non-deterministic systems, though – the weather is a well-known example. If you can limit the prediction horizon for such systems compared to the uncertainty of the system you could be doing fine – as you see on weather forecasts – the shorter the prediction period, the better forecast you get. This type of processes is where concepts like digital twins belong and they are extremely helpful in managing complex systems.
About Erik Haahr
Erik Haahr has been a Managing Consultant at Capgemini Sogeti Denmark since 2015. In this role, he is improving local service offering descriptions, participating in pre-sales activities, mentoring graduates, and consulting with customers.
More on Erik Haahr.