Our personal computer: NEW tool to dramatically improve big data processing

Nov 15, 2013
Sogeti Labs

Today, visualizing billions of pieces of data in real time is important to understanding trends and behaviors, but currently this requires a supercomputer to do quickly. As a matter of fact, this type of visualization is either:

  • Available for large organization with large computer power
  • Available with great latency or
  • Available on old data, because it takes time to compute and visualize them.

A new technology, available in our personal computer, will soon change that situation.

New visualization technology takes just milliseconds to turn hundreds of millions of data points into maps and animation. This new software can use the graphics processors found on everyday computers to process huge amount of data more quickly than is normally possible, opening up new ways to visually explore everything from social networks and other source of data on the internet (including the internet of things).

Known as MapD[1], or massively parallel database, the new technology achieves true speed gains by storing the data in the onboard memory of graphics processing units (GPUs) instead of in central processing units (CPUs), as is conventional. Using a single high-performance GPU card can make data processing up to 70 times faster. The solution takes advantage of the computational power available in commodity-level, of-the-shelf GPUs, originally designed to accelerate the drawing of 3D graphics to a computer screen.

So, MapD is a hyper-parallel  SQL database that allows for real-time querying, analysis and visualization of “big data” in real-time on hardware ranging from sub 1000 USD commodity laptops and desktops all the way to high performance computing  clusters with hundreds or thousands of nodes.

It really changes the picture of what could be achieved in big data real time visualization using cheap and common hardware.  It is the cost aspect of things.

Much large-scale visualization—including animated maps and charts—take several seconds or longer to process data before it can be displayed. With MapD, a user can adjust search terms and other parameters—like time frame or geographical region—and see a new visualization instantly, without having to wait for each new map and animation to compute and load[2]. It is the real-time aspect of things.

Right now the prototype technology is being demonstrated on tweets[3].  This public web site (tweetmap)  can be used to visualize 50 million geo-localized tweets posted between September 28 and October 6 this year. The tool allows users to explore different search terms, examine geographical trends, and zoom in on each tweet. Map-D scans all the tweets that have been loaded on the GPUs, constructing visualizations such as maps of how word usage is propagating across around the world in real time.

Nvidia, one of the leading manufacturers of GPUs, plans to demonstrate MapD on more than one billion tweets using eight GPUs at an upcoming conference. The researchers are also planning to do a joint demo with Gnip (http://gnip.com/) , a leading reseller of social-media data from sources like Twitter, Foursquare, Facebook, Youtube and more.

Being able to visualize massive streams of geographically identifiable social-media and mobile-phone data in real time using cheap computer hardware  will have powerful implications for a lot of big data analytics application, like epidemiology and disaster response (see my previous blog post)

[1]
http://geops.csail.mit.edu/docs/mapd_overview.pdf

[2]
http://mapd.csail.mit.edu/tweetmap/

[3]
http://sunlightfoundation.com/issues/

About the author

SogetiLabs gathers distinguished technology leaders from around the Sogeti world. It is an initiative explaining not how IT works, but what IT means for business.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

Slide to submit