Over the holidays, we will repost the top ten most popular blog posts of the year. This is one:
We all know that it is not the volume of data, but the ability to analyse these BIG DATA to optimize decision making in real time that represents the real added value of them.
What does it mean?
Machine learning is a rather new domain of IT and advanced mathematics, based on new statistical algorithms that could analyse big volume of diverse data sources (image, sound, video, social network, geo-localization, “traditional” structured database, etc…) in near real time. Computers, using these new types of programs, could learn from data for better future use.
Whether it is health, education, trade or the environment, statistical machine learning allows to analyse and give insight in different use cases even further. In fact, machine learning algorithms are used in very diverse contexts: to recognize hand-written text, to extract information from images, to build automatic language-translation systems, to predict the behaviour of customers in an online shop, to find genes that might be related to a particular disease, and so on. Generally speaking, machine learning algorithms can always be used, if we want to extract “patterns” from complex and large volume of data.
In health, a lot of data are already stored on patients in various formats and it represents a huge data volume. In medical imaging, machine learning allows to see many things that we cannot see before. For example, coupling visual recognition appliance (like the Kinect from Microsoft) with these new ways to analyse big data helps doctors to monitor automatically elder people to see if they will fall or not (at home, in hospital, even in street). In this case, big data analysis mixes specific patient data, localization of the body and its immediate environment, and computer gradually learns to detect abnormal events (see Jammie Shotton current work at http://jamie.shotton.org/work/ or Francis Bach at http://www.di.ens.fr/~fbach/ ).
In order to ensure that the patterns extracted are “really there” and not only arise due to noise in the data, it is important that the behaviour of machine learning algorithms is well understood, i.e. the results can be controlled and reproduced. This is exactly what machine learning theory is focused on: to analyse machine learning algorithms from a theoretical point of view to predict the results, which are the best algorithms to apply depending of the data, etc…
Afraid about a world where humans will be overwhelmed by algorithms?
For Francis Bach who works at INRIA, it is not an issue because user requirements are much higher than what we can do and there will always be an interaction and great help between machine and humans.