Datafying Bitcoin: can social media data predict the value of digital currencies?

Datafying, is that a word? Is this guy making up buzzwords just for fun? Well, sometimes yes. But not in this case.

I am currently reading a book by Viktor Mayer-Schönberger and Kenneth Cukier called Big Data: A Revolution That Will Transform How We Live, Work, and Think. This book describes how big data will become the basics for everyday life.

Datafying is one of the terms they coined and introduce in the book. While I’m not a big fan of the terminology, this term is actually a pretty good construct to grasp what big data and the internet of things are changing about the world:

“To datafy a phenomenon, is to put it in a quantified format so it can be tabulated and analyzed.”

So Twitter, as it’s data firehose is being used for sentiment analyses, is the datafication (Yes, this is the time to get out your buzzwordbingo card) of sentiment: “Twitter enabled the datafication of sentiment by creating an easy way for people to record and share their stray thoughts, which had previously been lost to the winds of time.” (Someone pointed out to me that over at the New Republic they use the same quote to make a different (and also interesting) argument)

So we at VINTlabs wrote an entire report (PDF) on the use of social media data to predict consumer/human behavior. Predictive and social analytics have always been about data but we argued that, following the terminology, using big data technologies we entered the era of the ultimate datafication of personalized marketing.

Yesterday I came across an interesting post on using social data to predict digital currencies such as Bitcoin. Bitcoin is a decentralized digital currency that is currently a hot topic due to it’s bubble-like state: it has gone up, up, up and a long way down. There already have been projects that use Twitter data to predict the stock exchange, a Happiness Index and more, but letting social data fuel the predictive analytics to digital currencies is a pretty good idea.

In a post on the FreshNetworks blog Rick Burgess explains why:

Bitcoin however has several characteristics which make it an ideal market for social data prediction:

  • The value of Bitcoins is determined almost solely on market demand, because the number of coins on the market is predictable and are not tied to any physical goods
  • Bitcoin traders tend to be in the same demographic as social media users, and so their attitudes, opinions and sentiment towards Bitcoin are well documented
  • Bitcoin is predominately traded by individuals rather than large institutions
  • Events that affect Bitcoin value are disseminated first and foremost on social media

So yes, if the demographic and context (use, types of content, information flows) of a medium is similar to the userbase of a product, of course it makes sense to use this data to make predictions on how the product, Bitcoin in this case, will ‘act’.

Bitcoin is just a string of data, and each bitcoin has a unique code element to set it apart from all the other bitcoins. So in terms of datafying, bitcoin already is the datafication of money in a way. But what can we say about the value of Bitcoin in the future using Twitter? Here are 100.000+ tweets on bitcoin, be sure to let me know what you’d find, than I can make a data driven investment and maybe start stacking up bitcoins.

Image credits: Wired, TheUpStart

Big Data and Algorithms Rule!

The proper interpretation and linkage of data already leads to better decision-making, more sales, fewer risks and cost reduction. However . . .

‘We are still in the early, black-and-white-tv stage of Social Analytics . . .’

Big Social Today Is Still Like B&W TV
This is the most respected view of Paul Barrett, Customer Management Director at Teradata. Let us examine what that literally means. Black-and-white was the tv period in which there were few channels and we required different antennas to receive them. Very often the only thing to see on television was ‘snow’. We were troubled by ‘atmospheric disturbance’, and saw only ‘snow’ (or ‘noise’) if a strong wind had turned the aerial a little. It was far from being the ideal situation, with only gray tints to represent a colorful world. And, we could only see the same program at fixed times.

The Commercial Big Data Challenge
It would be an exaggeration to denunciate Big Social in a comparable way, because modern Social Analytics is in far better shape. But things could be better: a single antenna please, sharper picture, more details, more channels and sources, real time, various angles, more aggregation levels, pattern recognition and, above all: the ability to predict behavior. We need to know what people really want, serve them in a timely way, bind them to us, and build up relationships with and via them. This striking improvement is the commercial Big Data challenge for organizations.

A Faster and Clearer Picture
If a dataset is large enough, and up to date, and above all relevant, the empirical approach often works better than a formula. We could formulate a complex model to determine how any people will go down with flu, but investigating search results produces a faster and clearer picture. Gunther Eysenbach, a professor at the University of Toronto was the first to do so, in 2006. His conclusion back then:

‘The Internet has made measurable what was previously immeasurable: the distribution of health information in a population, tracking (in real time) health information trends over time, and identifying gaps between information supply and demand.’

Most Answers Are Latent in Large Data Sets
The same applies to many other things, such as the best pricing strategy for selling second-hand articles. We find that immediately in eBay data, which gives better insight into matters such as inflation and consumer confidence. In short, all kinds of answers are latent in large data sets, and we can uncover these without having to concern ourselves with models.

The End of Theory
As far back as 2008, the start of the first Obama term, Chris Anderson of Wired magazine spoke provocatively about a Big Data vista, in which even theory-forming and the scientific method would become superfluous; but, of course, data cannot speak for itself. At most, empiricism and theory play leapfrog and, thanks to the data explosion, the emphasis currently lies on data and algorithms rather than on traditional models.

The Algorithm Is the Model
This development has been ongoing for a number of years now; compare, for example, the statistical approach to that of machine learning:

‘Statisticians emphasize probabilistic models for learning, and techniques for quantifying variation in the estimated model that results from variation in the learning sample. For many machine learners, the algorithm is the model, and emphasis is placed on developing interpretable yet flexible methods of learning in challenging context (computer vision, natural language).’

Download Our “Big Social” Report
Data and algorithms rule! Just read “Big Social: Predicting behavior with Big Data,” our second research report on Big Data, that is now available. It offers a multi-faceted orientation into next-generation Social Analytics and Social Media by presenting the rapid developments, the analysis of available tools, best practices and inspiring cases. [DOWNLOAD]

Big Social: Predicting behavior with Big Data – Download the new VINT research report

Click on the cover to download the PDF

To Be or Not To Be: for centuries, this was our favorite existential question. But in our digital age, Big Existential themes have become Big Social. The flood of ones and zeroes shook up Shakespearean logic: from OR to AND and from Question to Answer.

Yes, we are talking Big Social here: Predicting behavior with Big Data. Our second research report on Big Data bears this title and is available to all as of today [DOWNLOAD]. Learn how To Be AND Not To Be is now holding new answers.

Overly Simple
The new Social Logic is overly simple: gather as many data points as possible, mix and match and study the patterns. Discover, explore and develop fresh insight. More questions than ever can be put to the test. Digitally speaking, To Be and Not To Be are complementary since only together the ones and zeroes from disparate data sources can and will contribute to smarter sensemaking, to better answers and decisions.

Organizational Change
All Big Data development corresponds to organizational change. Just like the emerging roles of Data Scientist and Chief Analytical Officer, Big Social, or hypertargeting with Big Data, underpins the importance of the brand new Chief Customer Officer role. According to the CCO Council, the CCO is “an executive who provides the comprehensive and authoritative view of the customer and creates corporate and customer strategy at the highest levels of the company to maximize customer acquisition, retention, and profitability.” Indeed, we are talking Big Social here: Predicting behavior with Big Data.

Three Things from Here
The Big Social trend is perfectly clear. In terms of predictive power and ambition, the latest developments based on Big Data and algorithms reach much further than traditional Web Analytics supplemented by dashboards to monitor Twitter and Facebook traffic and to give rapid response. It means at least three things:

1               Drawing conclusions from unrelated facts
Seemingly completely unrelated facts increasingly will turn out be predictive in some way. For instance, the moment of the day in which you play a game of Angry Birds could indicate that you will be interested in a more expensive bottle of wine on your supermarket visit. It is not inconceivable that systems themselves will go looking for correlations and correspondingly will present options.

2               From predicting to influencing
Sunbsequently, the following question arises: which minor and major impulses can we feed someone to ensure that this particular person enters a certain mental state, one in which he or she is quite happy, is open to experiment, and ready to spend? Perhaps through the Spotify playlist or by giving a Facebook like. Perhaps by routing our prospect through a tree-lined avenue or to the coffee corner in our store.

3               Really smart organizations
Behavioral prediction already is available as a service, for instance via the Big Data algorithm set of providers like MyBuys. This could be extended from consumer behavior to the optimization of work in organizations: ranging from business processes, work flows, customer services and risk policies to training courses, social recruitment and beyond.

Entering the Age of Prediction
If society, trade & industry and government authorities are all convinced of the importance of Big Data and Big Social, of the importance of searching for patterns and of better predictions about all kinds of topics, we might perhaps be able to look forward a few months ahead with reasonable certainty. Sci-fi? Think again. Dirk Helbing and colleagues already have received 1 billion euros from the European Union for their Living Earth Simulator or Future ICT Knowledge Accelerator and Crisis Relief System. This promising name says it all: we indeed are entering the Age of Prediction.

Download the report using the button below. We would also appreciate it if you used the share buttons: it will send a Twitter or LinkedIn status update (that you can edit if you like) saying you just downloaded Big Social and it also adds a downloadlink for your followers.



Read about and download the first Big Data report ‘Creating Clarity with Big Data’.

Big Social in Dutch
The Dutch edition will be out by November 26, marking the start of Sogeti’s two-day annual Business Intelligence Symposium.

Remarkable Big Data Pros & Cons

Technology Breakthroughs
In technological terms, a great deal is happening in the context of Big Data, such as the data-analysis language R. Another example is AMPlab at Berkeley University. With Big Data as its starting point, AMPlab orients itself to the combined forces of Algorithms, Machines & People. A Big Data milestone in the summer of 2012 was the appearance of the GraphChi software, developed at Carnegie Mellon University. This has enabled analyses on a common pc, where previously large computer clusters were occupied for hours performing such tasks. With a Twitter dataset from March 2010 as a benchmark, one single GraphChi pc turned out to be able to analyze this in 59 minutes. The previous occasion this was done, 1000 large computers spent 6.5 hours on the same task. The dataset in question is available free from the website and contains 40 million users, more than 1.5 billion tweets, and 1.2 billion connections between users.

Marketing Future
Regardless of how impressive this all may be, the big issue concerns the significance of it all. Not everyone is equally enthusiastic about Big Data; but many, including Jeff Dachis of the Dachis Group, hold the opinion that Big Data on social media forms the glorious hypertargeting future. Just think about it, says Dachis, hundreds of millions of people are busy on the social web, sharing unconcernedly their whole lives with one another. It easily adds up to 500 billion dollars in brand engagement value.

At the beginning of 2012, Twitter had a total of 225 million accounts, and almost 200 million tweets were sent every day. In comparison: Facebook has more than 800 million active users and LinkedIn has more than 135 million. This ‘consumerization’ of Big Data will only assume larger proportions in the future.

There is skepticism about the use of advertisements on Facebook in particular. Just before Facebook was launched on the stock market, General Motors slashed its advertising budget for the social network. But this same gm still spends three times that sum on engagement with people on Facebook. The major challenge is to measure the ROI of this action. For such tasks we now have advanced Social Analytics tools and new algorithms such as GraphChi.

The Hype of Big Social Data
One of the applications of Social Analytics is to gather as much information as possible on the online behavior of people — Big Social Data — with the aim of predicting what they are going to do next and what they are going to buy. Peter Fader, a Marketing professor at Wharton Business School and co-director of the Wharton Customer Analytics Initiative, inserts a few prominent question marks here. He compares the projected goldmine of Big Social Data to Customer Relationship Management, which made its breakthrough in the early 1990s. At first it was regarded as the Holy Grail, but nowadays a harder evaluation is given: it causes huge frustration and is much too expensive; in short, the it party has run out of control. Fader is afraid that things will turn out the same way with the current Big Data hype.

Dragging the entire Twitter or Facebook ‘fire hose’ through some Social Analytics refinery is simply nonsensical, says Fader. If you wish to get involved in hypertargeting, you have to look at tweets at individual level and link them to the transactions that a person executes. But online and mobile do not jointly form the complete new world that the Big Social Data evangelists, in particular, would have us believe. Of course, more information can lead to new insights, but the question remains as to how many data is needed for this? How interesting is it, actually, to know where someone is shopping at any given moment and what he/she is looking at? And which information on this subject should we retain?

Fader argues that the real golden age of ‘predictive behavior’ occurred about fifty years ago. At that time, consumer information was very scarce. In the 1960s, Lester Wunderman began what he called ‘direct marketing’. That was genuine ‘data science’. Everything that could be known about a customer was kept up to date. What the direct marketing pioneers eventually achieved was RFM: the relationship between Recency, Frequency and Monetary value. The effect of F upon M is evident. R was the great surprise: it is easy to convince people to repeat previous behavior, even if they only buy things sporadically. However, you have to reach them immediately. In the marketing business, everyone is familiar with RFM, but it often signifies little to e-commerce people. With lots of Big Data you will undoubtedly come to the same conclusion, but that is a bit of a waste of all the time and effort expended.

How to Measure Anything
A good eye-opener in this context is the book How to Measure Anything: Finding the Value of “Intangibles” in Business by Douglas Hubbard, published in 2007. This is full of examples and tips to enable you to find out lots of things in a practical manner.

Big Social and Big Brother

The concrete adoption of plans in organizations for Big Data currently and predominantly covers the theme of Big Social: the customer side, inspired in particular by the social network activity of Web 2.0. But, if we take the concept of ‘social’ in a broader sense, an increasing amount of Big Data potential is released. This is more or less the route we have followed since the early nineties: first with Web Analytics, then with Social Analytics and now with Next-Generation Analytics. In this age of Big Data, further development is progressing toward Total Data Analytics and Total Data Management.

Scaling up occurs organically
An important part of the discussion revolves around the issue of the extent to which organizations should embrace Big Social Data. The answer is: only on the basis of a well-grounded policy. Smart entrepreneurship in the growing dataflow is the key to capturing the raisins from the pie, so to speak. The question as to whether or not an organization initially is working with real Big Data (sets) is actually irrelevant. Scaling up will occur organically, and a good number of privacy issues are closely attached to this situation.

The organization of privacy
Modern Social Analytics applications enable organizations to understand the rhythms of human activity, to attach predictions to them, and to plan and implement corresponding actions: Understand, Predict & Act. The possibilities of personalization and hyper-targeting are steadily increasing, and the toolbox is bursting at the seams. But do customers want that? It gives many of us a somewhat uncomfortable feeling to realize what commercial organizations know about individuals and groups. The organization of privacy and the guarantee of our personal integrity is perhaps therefore the domain par excellence to which attention should be paid. Big Data, Big Social and Big Brother are not worlds apart – certainly not in our human perception.

Big Brother fear
These days, customers are alarmed by, for example, messages about rising premiums because they have directly or indirectly presented themselves on the Internet a bit too enthusiastically, participating recklessly in certain leisure time activities, or showing themselves to be great fans of cigarettes and beer, to name just a few minor ‘offenses’. Regardless of what organizations may think about Big Data and Big Social, customers’ Big Brother fear will force them to deal seriously with the situation, to adopt standpoints, and to express these vigorously.

Both feet on the ground
Technology is advancing rapidly, we can make ever-better predictions, and we can step effortlessly from Web and Social Analytics on to Next-Generation Analytics. The accent is increasingly being placed on data and algorithms rather than on models. In short: the commercial power of Big Social Data is undeniable and is growing. At the very least, this entails increasing guarantees and responsibilities where themes such as privacy, personal integrity and, above all, perception and sentiment are involved. This is perhaps the very first observation for organizations to make with both feet firmly on the ground.

Expert Talk: Anjul Bhambari (VP of IBM Big Data) on social media and business performance

“We can understand the context of the data, which we were missing before”

“Because there is more information, so there will be more correlation and more patterns will emerge.”

Anjul Bhambhri has 23 years of experience in the database industry with engineering and management positions at IBM, Informix and Sybase. Bhambhri is currently IBM’s Vice President of Big Data Products, overseeing product strategy and business partnerships.

Opportunity for businesses: understanding decision makers

[Read more…]

Fueling sales with social data: the story of Walmart Labs & the Social Genome

Walmart is one of the retailers who is really trying to fuel their business using big data. A big part of their efforts are based on the ‘social’ data we all share on networks like Twitter and Facebook. At their R&D development called Walmart Labs their busy adding data to the Social Genome, a tool that helps Walmart reach their customers based on semantic analysis of real time social media streams. The Social Genome provides Walmart with a layer of social metadata containing customers, topics, products, locations and events.

The image above is a visualisation of how this social genome works. It’s all about interpreting a network of relations: a person is interested in a topic, a person is attending an event, the event is related to a topic, a company is related to a topic and so on. Walmart is using public data, their own enterprise applications data (CRM-tools) and social data to understand these networks and act upon this understanding.

To understand how the social genome works, consider this example.

“I love salt!”, a user enthusiastically tweeted. Within a few seconds, the tiny tweet had arrived at WalmartLabs, where it was analyzed in a lightening fast fashion. A few minutes later, a message arrived in a close friend’s mailbox “Good morning, Juliana. You asked us to remind you. Hanna’s birthday is coming up. She’s just tweeted positively about SALT, a new Angelina movie. Would you like to buy something related for her? We have a few suggestions.”

As you can see the social genome is all about adding more context to transactions and offering more personalized offers to customers. Considering the ‘salt-tweet’, Walmart shows us how the network works in this case and how they know the tweet is about Salt the movie, not the condiment:

Walmart is building a Cinematch (the personal recommendation engine Netflix uses to recommend you movies to watch) for their products. Walmart is also deploying a new internally-built search engine to power and increase sales conversions from searches. The Polaris search engine, that also grew out of the semantic technology Walmart uses for the Social Genome, has been in use for the last few months on and has already boosted conversions to sales by 10-15 percent, the company said. If users search on with Polaris, they don’t just get a page full of results. For certain searches, they will get directed to a topic page that features specials and curated items as well as traditional search results. Another example of how big data is helping Walmart to serve their customers more relevant offers.

The Social Genome is a vast, constantly changing, up-to-date knowledge base, with hundreds of millions of entities and relationships. Walmart is using it to become a me-tailer in stead of a retailer. If you have any ideas of Social Genome applications for Walmart, they are hiring. 

What’s next in Big Data? And when exactly would that be?

Everyone involved in Big Data of course has read the seminal McKinsey report with the promising title Big Data: The Next Frontier for Innovation, Competition, and Productivity. In May 2012, exactly one year after publication, Michael Chui, one of the authors, was on stage at the MIT Sloan CIO Symposium. There we heard the following remarkable words:

‘There are no [Big Data] best practices.
I’d say there are emerging next practices.’

This seems to be contradictory to the title of the aforementioned report but the similarity and the elasticity is in that tiny word “next”. To innovation, competition, and productivity Big Data indeed may bring new prosperity but hard proof is not yet on the table.

Work in progress
Organisations are experimenting but it remains too early for best practices that could simply be followed and tweaked by others. Big Data still is very much work in progress. Speaking of its predicted prosperous effect on innovation and competition, remaining in progress forever would even be logical, since both entail continuous development and dynamism.

Social Analytics
Social Analytics, the station in between Web Analytics and the so-called Next Generation Analytics predicted by Gartner, is now in particular at the forefront of Big Data development and dynamism. Here again we encounter this ominous and treacherous “next.” It signals this familiar discussion of the glass being half full or half empty. Sullivan McIntyre of Radian6, now part of Salesforce, is devoted to the first opinion. He is keen to stress that “it becomes increasingly possible to make guesses about future behavior.” Paul Barrett of Teradata rather cautiously characterizes the phase we are in as “the early, black and white TV stage of Social Analytics.”

Big Social
So the question remains exactly when we would be able to upgrade from black and white to color, and after that perhaps even HD or 3D. In our new Big Data research report, simply called “Big Social,” we will provide you with enough ammunition to judge for yourself and keep track of all rapid development in the realm of Big Social Data and Analytics.

Facebook needs big data to stay on top

There is a lot going on at Facebook HQ these days. We all witnessed a company struggling with it’s IPO and trying to come up with answers to their main challenge: increase revenue with more (mobile) and better (more social) advertisements.

About 82% of the social network’s revenue comes from advertising. But as has been argued over at Technology Review by Michael Wolff, web advertising is decreasing in value every quarter. So Facebook is looking into new sources of income such as adding a ‘want’ button to their network to monetize on users’ intention to buy things.

Recently we have seen what the data of social networks might be capable of. Twitter data is used to predict when we get sick, which hollywood movies will be a box office succes, what the state of the economy is according to consumers and so on. Facebook’s 900+ million users, with half of them checking in almost everyday, generate approximately 100 petabytes of data according to Sameet Agarwal, a director of engineering at Facebook.

Facebook (together with others) build Hive, a data warehouse system for Hadoop. Using Hive the Data Sciences Team is applying social science research techniques to create ad-hoc queries and complete varied analysis of large datasets stored in Hadoop, allowing them to mine the data. One potential use of this data is to sell insights on users (read consumer) intentions, insights on the state of the collective mood in a specific segment of the worlds population or maybe a more social insight that could provide a new understanding of human interactions in our digital society.

Anyway, it looks to me like Facebook is sitting on a pile of data (or money) that’s waiting to get used for big new things. Please share your thoughts on how Facebook can take on their big data.


VINTlabs Big Data Bookmarks: @BLO2M

As researchers we do a lot of reading. Every week one of the researchers shares the most valuable articles he has read. Consider these posts as a curated reading experience. This week: Jaap Bloem

Currently, VINT is working on the 2nd Big Data Research report, starting from the #1 Big Data business focus: social analytics. I am happy to share with you the following three valuable selected insights:

The challenge for social media analytics is in going deeper not just broader or big.

Only 30% of companies are doing social media correctly. Socialbakers has launched a new online benchmark at to help organizations become more “socially devoted”.

The Altimeter Framework for Social Analytics