Skip to Content

How big is Facebook’s Big Data pile? 500+ terabytes added everyday

Sogeti Labs
August 27, 2012

I wrote about Facebook ‘Big Data Pile‘ some weeks ago already, but at the end of last week Facebook’s VP of Engineering Jay Parikh showed some invited guests at Facebook HQ just how big this data pile actually is. And no surprise here: it is getting bigger. Fast. Big data means business for Facebook, it’s what provides insights. It enables the social network to understand user sentiment en modify designs accordingly in nearly real time for instance. It also benefits advertisers because Facebook can perform in-depth analysis over how ads are running across the platform and where they are most successful. But just how big is this pile of data? Over at TechCrunch a picture was posted showing some impressive numbers:

  • 2.5 billion content items shared per day (status updates + wall posts + photos + videos + comments)
  • 2.7 billion Likes per day
  • 300 million photos uploaded per day
  • 100+ petabytes of disk space in one of FB’s largest Hadoop (HDFS) clusters
  • 105 terabytes of data scanned via Hive (Facebook’s Hadoop query language) every 30 minutes
  • 70,000 queries executed on these databases per day
  • 500+terabytes of new data ingested into the databases every day
They also told attendees that logfiles keep track of who is accessing all this data and that only developers working on new products are granted acces in the first place. Facebook also created an intensive training process around acceptable use of user data and maintain a zero-tolerance policy: sniffing in data you don’t have permission for gets you fired.
For more coverage on the event check out the post on TechCrunch for info on Project Prism and this picture that shows the life of data on Facebook.

About the author

SogetiLabs gathers distinguished technology leaders from around the Sogeti world. It is an initiative explaining not how IT works, but what IT means for business.

    Comments

    Leave a Reply

    Your email address will not be published. Required fields are marked *