Engineering Blog

Blog posts tagged 'Data'

Chris PiroEngineering

Chat reaches 1 billion messages sent per day

Posted about 9 years ago
blog post · Web · Infra · Data · Compute · Storage · Chat · Messages

Facebook Chat usage has increased steadily since its launch last year, and this week we reached 1 billion messages sent per day. As a team we've been looking forward to this milestone; we track lots of statistics in the course of maintaining and improving Chat, but this number measures Chat's progress toward its ultimate goal: increasing communication between our users. We've invested a lot in making Chat stable and scalable in the past, and we continue making improvements even now. Read more...

Ashish ThusooEngineering at Facebook

Hive - A Petabyte Scale Data Warehouse using Hadoop

Posted about 9 years ago

A number of engineers from Facebook are speaking at the Yahoo! Hadoop Summit today about the ways we are using Hadoop and Hive for analytics. Hive is an open source, peta-byte scale date warehousing framework based on Hadoop that was developed by the Data Infrastructure Team at Facebook. In this blogpost we'll talk more about Hive, how it has been used at Facebook and its unique architecture and capabilities. Read more...

Peter VajgelEngineering

Needle in a haystack: efficient storage of billions of photos

Posted about 9 years ago
blog post · Infra · Data · Storage · Photos

The Photos application is one of Facebook’s most popular features. Up to date, users have uploaded over 15 billion photos which makes Facebook the biggest photo sharing website. For each uploaded photo, Facebook generates and stores four images of different sizes, which translates to a total of 60 billion images and 1.5PB of storage. Read more...

Chris PiroEngineering

Chat Stability and Scalability

Posted about 9 years ago
blog post · Web · Data · Compute · Languages · Chat · Messages · Performance · Optimization · User Experience

Almost ten months ago we launched Facebook Chat to 70 million users. We ventured into a lot of new territories with this product: not only were there tricky web design and product issues, we needed to develop and launch a trio of new backend services to support all of Chat's functionality. Read more...

Paul SaabEngineering at Facebook

Scaling memcached at Facebook

Posted about 9 years ago
blog post · Data · Infra · Caching · Storage

If you've read anything about scaling large websites, you've probably heard about memcached. memcached is a high-performance, distributed memory object caching system. Here at Facebook, we're likely the world's largest user of memcached. Read more...

Robert JohnsonDirector, Software Engineering at Facebook

Facebook's Scribe technology now open source

Posted about 9 years ago
blog post · Web · Data · Infra · Open Source · Performance · Compute · Development Tools

Here at Facebook, we're constantly facing scaling challanges because of our enormous growth. One particular problem we encountered a couple of years ago was collection of data from our servers. We were collecting a few billion messages a day (which seemed like a lot at the time) for everything from access logs to performance statistics to actions that went to News Feed. We used a variety of different technologies for the different use cases, and all of them were bursting at the seams. We decided to build a unified system (called Scribe) to handle all of these cases, and do it in a way that would scale with Facebook's growth. Read more...

Doug BeaverEngineering

10 billion photos

Posted about 9 years ago
blog post · Data · Photos

We recently hit a really cool milestone, our users have now uploaded over 10 billion photos to the site. Now, that’s a big number, but we actually store four image sizes for each uploaded photo, so that’s over 40 billion files. To celebrate, we got a bunch of cupcakes and handed them out to our engineering and operations groups. One of our engineers calculated that if we had gotten one cupcake for each of our photos, and lined them up side by side, the line could reach halfway to the moon. Here’s some other interesting recent stats on photos:. Read more...

Avinash LakshmanEngineering

Cassandra – A structured storage system on a P2P Network

Posted about 9 years ago
blog post · Data · Compute · Open Source · Storage

When I joined Facebook I was eagerly looking forward to a new challenge. Fortunately, Facebook cannot be accused of a lack of challenging assignments. Prashant Malik,a colleague in Facebook from the Search team, was thinking about how to solve the Inbox Search problem. This challenge is about storing reverse indices of Facebook messages that Facebook users send and receive while communicating with their friends on the Facebook network. The amount of data to be stored, the rate of growth of the data and the requirement to serve it within strict SLAs made it very apparent that a new storage solution was absolutely essential. The solution needed to scale incrementally and in a cost effective fashion. Traditional data storage solutions just wouldn’t fit the bill. The aim was to design a solution that not only solved the Inbox Search problem but also provided a system as a storage infrastructure for many problems of the same nature. Hence was born Cassandra. To keep up with Facebook tradition, Prashant and I started the implementation of Cassandra about a year ago in one of our Hackthons. Cassandra is a distributed storage system for managing structured data that is designed to scale to a very large size across many commodity servers, with no single point of failure. Reliability at massive scale is a very big challenge. Outages in the service can have significant negative impact. Hence Cassandra aims to run on top of an infrastructure of hundreds of nodes (possibly spread across different datacenters). At this scale, small and large components fail continuously; the way Cassandra manages the persistent state in the face of these failures drives the reliability and scalability of the software systems relying on this service. Cassandra has achieved several goals – scalability, high performance, high availability and applicability. In many ways Cassandra resembles a database and shares many design and implementation strategies with databases. Cassandra does not support a full relational data model; instead, it provides clients with a simple data model that supports dynamic control over data layout and format. The rest of the material talks about the data model and the distributed properties, provided by the system. Data Model. Read more...

Jason SobelEngineer at Facebook

Scaling Out

Posted about 9 years ago
blog post · Data · Infra · Caching · Performance

I joined Facebook in April 2007 and, after getting settled over the course of a few weeks, my manager Robert Johnson approached me. We talked for a while but the conversation boiled down to: Bobby: "So, Jason, we're going to open a new datacenter in Virginia by 2008. Do you think you can help?" Me: "Uh.... yes?" Bobby: "Great!" My first project at Facebook was a tad more involved then I was expecting, but I think that is one reason why we have such a great engineering organization; we have a lot of hard problems to solve and everyone here is excited to jump in and tackle them. I set out to really understand why we were building a new datacenter and what problems we had to overcome to make it work. Read more...

Joydeep Sen SarmaTenured Engineer at Facebook

Hadoop

Posted about 10 years ago
blog post · Infra · Data · Open Source · Framework

With tens of millions of users and more than a billion page views every day, Facebook ends up accumulating massive amounts of data. One of the challenges that we have faced since the early days is developing a scalable way of storing and processing all these bytes since using this historical data is a very big part of how we can improve the user experience on Facebook. This can only be done by empowering our engineers and analysts with easy to use tools to mine and manipulate large data sets. About a year back we began playing around with an open source project called Hadoop. Hadoop provides a framework for large scale parallel processing using a distributed file system and the map-reduce programming paradigm. Read more...

Aditya AgarwalDirector of Engineering at Facebook

Welcome to the Facebook Engineering Blog!

Posted about 10 years ago
blog post · Web · Mobile · Infra · Data · Culture

We are going to use this space to tell you a little about the code and systems that power Facebook. We thought it would be fun to share what goes on behind the scenes to ensure that the site scales smoothly and that we continue to provide the best overall user experience. Read more...

Keep Updated

Stay up-to-date via RSS with the latest open source project releases from Facebook, news from our Engineering teams, and upcoming events.

Subscribe
Facebook © 2017