Engineering Blog

Blog posts tagged 'Open Source'

Robert JohnsonDirector, Software Engineering at Facebook

Facebook's Scribe technology now open source

Posted about 9 years ago
blog post · Web · Data · Infra · Open Source · Performance · Compute · Development Tools

Here at Facebook, we're constantly facing scaling challanges because of our enormous growth. One particular problem we encountered a couple of years ago was collection of data from our servers. We were collecting a few billion messages a day (which seemed like a lot at the time) for everything from access logs to performance statistics to actions that went to News Feed. We used a variety of different technologies for the different use cases, and all of them were bursting at the seams. We decided to build a unified system (called Scribe) to handle all of these cases, and do it in a way that would scale with Facebook's growth. Read more...

Avinash LakshmanEngineering

Cassandra – A structured storage system on a P2P Network

Posted about 9 years ago
blog post · Data · Compute · Open Source · Storage

When I joined Facebook I was eagerly looking forward to a new challenge. Fortunately, Facebook cannot be accused of a lack of challenging assignments. Prashant Malik,a colleague in Facebook from the Search team, was thinking about how to solve the Inbox Search problem. This challenge is about storing reverse indices of Facebook messages that Facebook users send and receive while communicating with their friends on the Facebook network. The amount of data to be stored, the rate of growth of the data and the requirement to serve it within strict SLAs made it very apparent that a new storage solution was absolutely essential. The solution needed to scale incrementally and in a cost effective fashion. Traditional data storage solutions just wouldn’t fit the bill. The aim was to design a solution that not only solved the Inbox Search problem but also provided a system as a storage infrastructure for many problems of the same nature. Hence was born Cassandra. To keep up with Facebook tradition, Prashant and I started the implementation of Cassandra about a year ago in one of our Hackthons. Cassandra is a distributed storage system for managing structured data that is designed to scale to a very large size across many commodity servers, with no single point of failure. Reliability at massive scale is a very big challenge. Outages in the service can have significant negative impact. Hence Cassandra aims to run on top of an infrastructure of hundreds of nodes (possibly spread across different datacenters). At this scale, small and large components fail continuously; the way Cassandra manages the persistent state in the face of these failures drives the reliability and scalability of the software systems relying on this service. Cassandra has achieved several goals – scalability, high performance, high availability and applicability. In many ways Cassandra resembles a database and shares many design and implementation strategies with databases. Cassandra does not support a full relational data model; instead, it provides clients with a simple data model that supports dynamic control over data layout and format. The rest of the material talks about the data model and the distributed properties, provided by the system. Data Model. Read more...

David ReissEngineering

Thrift: (slightly more than) one year later

Posted about 10 years ago
blog post · Infra · Open Source · Languages

A little over a year ago, Facebook released Thrift as open source software. (See the original announcement.) Thrift is a lightweight software framework for enabling communication between programs written in different programming languages, running on different computers, or both. We decided to release it because we thought it could help other groups solve some of the same technical problems we have faced, and because we hoped that developers who found it useful would contribute improvements back to the project. A lot has happened in the year since we released Thrift. First and foremost, Thrift has gained a lot of cool features:. Read more...

Joydeep Sen SarmaTenured Engineer at Facebook


Posted about 10 years ago
blog post · Infra · Data · Open Source · Framework

With tens of millions of users and more than a billion page views every day, Facebook ends up accumulating massive amounts of data. One of the challenges that we have faced since the early days is developing a scalable way of storing and processing all these bytes since using this historical data is a very big part of how we can improve the user experience on Facebook. This can only be done by empowering our engineers and analysts with easy to use tools to mine and manipulate large data sets. About a year back we began playing around with an open source project called Hadoop. Hadoop provides a framework for large scale parallel processing using a distributed file system and the map-reduce programming paradigm. Read more...

Keep Updated

Stay up-to-date via RSS with the latest open source project releases from Facebook, news from our Engineering teams, and upcoming events.

Facebook © 2017