OUR PERSONAL COMPUTER: NEW TOOL TO DRAMATICALLY IMPROVE BIG DATA PROCESSING

November 15, 2013

Sogeti Labs

Today, visualizing billions of pieces of data in real time is important to understanding trends and behaviors, but currently this requires a supercomputer to do quickly. As a matter of fact, this type of visualization is either:

Available for large organization with large computer power
Available with great latency or
Available on old data, because it takes time to compute and visualize them.

A new technology, available in our personal computer, will soon change that situation.more–>

New visualization technology takes just milliseconds to turn hundreds of millions of data points into maps and animation. This new software can use the graphics processors found on everyday computers to process huge amount of data more quickly than is normally possible, opening up new ways to visually explore everything from social networks and other source of data on the internet (including the internet of things).

Known as MapD[1], or massively parallel database, the new technology achieves true speed gains by storing the data in the onboard memory of graphics processing units (GPUs) instead of in central processing units (CPUs), as is conventional. Using a single high-performance GPU card can make data processing up to 70 times faster. The solution takes advantage of the computational power available in commodity-level, of-the-shelf GPUs, originally designed to accelerate the drawing of 3D graphics to a computer screen.

So, MapD is a hyper-parallel SQL database that allows for real-time querying, analysis and visualization of “big data” in real-time on hardware ranging from sub 1000 USD commodity laptops and desktops all the way to high performance computing clusters with hundreds or thousands of nodes.

It really changes the picture of what could be achieved in big data real time visualization using cheap and common hardware. It is the cost aspect of things.

Much large-scale visualization—including animated maps and charts—take several seconds or longer to process data before it can be displayed. With MapD, a user can adjust search terms and other parameters—like time frame or geographical region—and see a new visualization instantly, without having to wait for each new map and animation to compute and load[2]. It is the real-time aspect of things.

Right now the prototype technology is being demonstrated on tweets[3]. This public web site (tweetmap) can be used to visualize 50 million geo-localized tweets posted between September 28 and October 6 this year. The tool allows users to explore different search terms, examine geographical trends, and zoom in on each tweet. Map-D scans all the tweets that have been loaded on the GPUs, constructing visualizations such as maps of how word usage is propagating across around the world in real time.

Nvidia, one of the leading manufacturers of GPUs, plans to demonstrate MapD on more than one billion tweets using eight GPUs at an upcoming conference. The researchers are also planning to do a joint demo with Gnip (http://gnip.com/) , a leading reseller of social-media data from sources like Twitter, Foursquare, Facebook, Youtube and more.

Being able to visualize massive streams of geographically identifiable social-media and mobile-phone data in real time using cheap computer hardware will have powerful implications for a lot of big data analytics application, like epidemiology and disaster response (see my previous blog post)

[1]
http://geops.csail.mit.edu/docs/mapd_overview.pdf

[2]
http://mapd.csail.mit.edu/tweetmap/

[3]
http://sunlightfoundation.com/issues/

About the author

SogetiLabs gathers distinguished technology leaders from around the Sogeti world. It is an initiative explaining not how IT works, but what IT means for business.

Generative AI

Cloud

Testing

Artificial intelligence

Security

OUR PERSONAL COMPUTER: NEW TOOL TO DRAMATICALLY IMPROVE BIG DATA PROCESSING

November 15, 2013

About the author

Related posts

The Uniform Information Management Framework

How to Organize Data Reporting and Establish Sources of Truth

Cluster Insight: A Weighted Clustering Tool for Large Textual Data Exploration

Crafting Compelling Data Personas: Examples and Application

Crafting Compelling Data Personas: Prompts and Questions

Why Meaning Gets Lost: The Disconnect Between Strategy and Operations

Crafting Compelling Data Personas: Beyond the Average User

The Maha Kumbh: A Logistical and Digital Transformation Case Study

Lack of data ownership leads to failed AI implementations

The Backbone of Analytics and AI: Why Data Architecture Matters

Comments

Leave a Reply Cancel reply