Everything is data! (3/5)


In part two of the series, we looked at how we can reduce Big Data preparation from 6 months to one week with R. Today, we will explain how R changes the game by making it possible to inject easily images, videos, sounds or other kinds of data.

Everything! Even how you look and even what you say. One day in 2014, John T. Chambers (former CEO of Cisco) declared “Everything that exists is an object. Everything that happens is a function call.”Would you allow me to add something to this quote from this visionary person nicknamed “Mr Internet” at this time: Every single object, even abstract one can be broken down into data set.

Scientists, researchers, statisticians cannot limit their research to numeric tables nor to text anymore. To satisfy the latest ambitions of the world lead by information, they must be able to analyse complex objects like different sounds and images. While the tools for analysing these objects are still relatively rare, R is already at the forefront of the field of data analysis software. Indeed, this language for statistical computing today brings several packages that work perfectly when processing this complex data.

R owes its strength to the users who constantly improve and complete the range of functions offered by this tool, based on their needs. Thus, with the help of Grace Smith-Vidaurre, Marcelo Araya-Salas has developed the R package warbleR: to streamline high-throughput acoustic analysis of animal sounds. And adding two other packages, Rraven and NatureSounds, you can even simplify the use of R for bioacoustics research[1]. This completes the sound package seewave (analysis and synthesis) developed by Jérôme Sueur, Thierry Aubin and Caroline Simonis[2]. It includes more than a hundred functions of interface, editing, time analysis, amplitude and frequency, synthesis and modification of sounds, calculation of indicators and other related functions. So we are equipped more than enough.

After sound, comes light. It is also to the natural sciences that R owes its best tools in image analysis. Packages such as EBImage allow you to read and write image files, to manipulate, filter and transform these. There’s more. You can also analyse images and detect patterns from them or extract particularities from them. This makes it a perfectly suited tool for bio cell analysis, for example. Doctors may prefer the NeuRoconductive[3] packages for medical image analysis. Finally, geographers already have a multitude of packages available for transforming and manipulating spatial data, but even more interestingly, R packages now make it possible to perform analyses of remote sensing images via packages such as RStoolbox, landsat and hsdat.

For these different actors on the ground and for many others, the advantage of using R does not lie in the use of a single package or a single functionality. There are other tools, maybe more specific, to perform these operations. The strength of R is perhaps not to be so specific but to combine these very special packages and features with many others as well as with many more general features. With the R tool alone, as a scientist you can import an image or a sound, modify it, analyse it, detect patterns in it and then integrate all this data into statistical and machine learning procedures. It only remains for you to generate automated reports and to deploy the results via interactive visuals or web applications with Shiny (yet another great R package).

This blog has been co-authored by Paul Majerus and Kamel Abid.

Paul Majerus is a Data analyst – Statistician at Sogeti Luxembourg.

[1] https://marce10.github.io/

[2] http://rug.mnhn.fr/seewave/

[3] https://neuroconductor.org/

Sogeti Labs


SogetiLabs gathers distinguished technology leaders from around the Sogeti world. It is an initiative explaining not how IT works, but what IT means for business.

Related Posts

Your email address will not be published.