Engineering Blog

Blog posts tagged 'Data'

Jay ParkEngineering

Designing a Very Efficient Data Center

Posted about 7 years ago
blog post · Data · Compute · Hardware · Data Centers · Prineville · Open Source · Open Compute

When we started designing our Prineville, Ore., data center as part of the Open Compute Project, we did so with a "less is more" philosophy. We wanted a highly energy efficient, less costly, simpler and more reliable facility that could serve as a model for other data centers. Read more...

Amir MichaelEngineering

Inside the Open Compute Project Server

Posted about 7 years ago
blog post · Infra · Data · Data Centers · Hardware · Open Compute · Open Source · Optimization

We launched the Open Compute Project, an effort to nurture industry collaboration on the best practices and implementation of power- and cost-efficient compute infrastructure, yesterday. At the heart of the project lies the Open Compute server, a highly optimized server layout developed by Facebook engineers and industry partners. Read more...

Matt MillunchickTechnical Program Manager at Facebook

Facebook Hacker Cup Finals: A Champion is Crowned

Posted about 7 years ago
blog post · Culture · Web · Data · Compute · Hacking · Recruiting

The Facebook Hacker Cup started off with 11,768 people from around the world competing to solve some of the most difficult algorithmic coding challenges in three online elimination rounds. Twenty five emerged as finalists and were flown to Facebook’s HQ in Palo Alto, California, to compete in today’s ultimate event. Read more...

Nagavamsi PonnekantiEngineering at Facebook

Hybrid Incremental MySQL Backups

Posted about 7 years ago
blog post · Infra · Web · Data · MySQL · PHP · Storage · Performance

This post discusses enhancements to our database backups. As we deploy these enhancements to production servers, we may write additional posts about other improvements made along the way. Read more...

Ken DeeterSoftware engineer at Facebook

Live Commenting: Behind the Scenes

Posted about 7 years ago
blog post · Data · User Experience · News Feed · Backend · Hacking · Testing

Commenting on Facebook content has been an asynchronous form of communication. Until now. Live commenting, which we rolled out to all of our users a couple weeks ago, creates opportunities for spontaneous online conversations to take place in real time, leading to serendipitous connections that may not have ever happened otherwise. Read more...

Donn LeeEngineering at Facebook

World IPv6 Day: Solving the IP Address Chicken-and-Egg Challenge

Posted about 7 years ago
blog post · Infra · Data · Networking and Traffic

We’re announcing today our participation in World IPv6 Day, along with Google, Yahoo!, Akamai, Limelight Networks, and the Internet Society. June 8, 2011, will be the first global-scale "test flight" of IPv6, the next generation protocol for the Internet. And best of all, it’s open to everyone who’s interested in testing their IPv6 service. Read more...

Jason EvansEngineering

Scalable memory allocation using jemalloc

Posted about 7 years ago
blog post · Infra · Data · Storage

The Facebook website comprises a diverse set of server applications, most of which run on dedicated machines with 8+ CPU cores and 8+ GiB of RAM. These applications typically use POSIX threads for concurrent computations, with the goal of maximizing throughput by fully utilizing the CPUs and RAM. This environment poses some serious challenges for memory allocation, in particular:. Read more...

Cameron MarlowData Scientist at Facebook

A New Year of Facebook Fellowships

Posted about 7 years ago
blog post · Data · Culture · Research · Recruiting

Last year we introduced the Facebook Fellowship program to help support the inventive research in the academic world. We were thrilled with the response to the program during its first year, both in terms of the number of applications and the range of applicable research areas. We have thoroughly enjoyed getting to know our inaugural class of Facebook Fellows, and it is clear that we have a lot to learn from the academy. Read more...

Liyin TangSoftware engineer at Facebook

Join Optimization in Apache Hive

Posted about 7 years ago
blog post · Infra · Data · Open Source · Optimization · Performance

With more than 500 million users sharing a billion pieces of content daily, Facebook stores a vast amount of data, and needs a solid infrastructure to store and retrieve that data. This is why we use Apache Hive and Apache Hadoop so widely at Facebook. Hive is a data warehouse infrastructure built on top of Hadoop that can compile SQL queries as MapReduce jobs and run the jobs in the cluster. Read more...

Paul ButlerEngineering

Visualizing Friendships

Posted about 7 years ago
blog post · Data

Visualizing data is like photography. Instead of starting with a blank canvas, you manipulate the lens used to present the data from a certain angle. Read more...

Dhruba BorthakurEngineering

Looking at the code behind our three uses of Apache Hadoop

Posted about 7 years ago
blog post · Data · Infra · Open Source · MySQL · Storage · Messages

The size of the data warehouse cluster at Facebook has been increasing tremendously over the past few years. We use several pieces of open source software in our data warehouse including Apache Hadoop, Apache Hive, Apache HBase, Apache Thrift and Facebook Scribe. Together they keep this data processing engine humming. Read more...

Tim StankeEngineering at Facebook

Announcing the Facebook 2011 Hacker Cup

Posted about 7 years ago
blog post · Data · Culture · Compute · Hacking · Recruiting

Hacking is a central part of Facebook's culture. Whether we're building the next big product at one of our Hackathons or creating a smarter search algorithm, we're always hacking to find a better way of doing things. Read more...

Carlos BuenoFixer at Facebook

The Full Stack, Part I

Posted about 7 years ago
blog post · Infra · Data · Storage · Networking and Traffic · Compute · Hardware

One of my most vivid memories from school was the day our chemistry teacher let us in on the Big Secret: every chemical reaction is a joining or separating of links between atoms. Which links form or break is completely governed by the energy involved and the number of electrons each atom has. The principle stuck with me long after I'd forgotten the details. There existed a simple reason for all of the strange rules of chemistry, and that reason lived at a lower level of reality. Maybe other things in the world were like that too. Read more...

Kannan MuthukkaruppanTechnical Lead at Facebook

The Underlying Technology of Messages

Posted about 7 years ago
blog post · Infra · Data · Messages

We're launching a new version of Messages today that combines chat, SMS, email, and Messages into a real-time conversation. The product team spent the last year building out a robust, scalable infrastructure. As we launch the product, we wanted to share some details about the technology. Read more...

Yan YuEngineering

Crowdsourcing Mobile Device Capabilities

Posted about 7 years ago
blog post · Mobile · Data · HTML5 · Performance · Optimization

Unlike desktop browsers, the capabilities of mobile browsers vary widely from phone to phone. This presents a number of challenges to large scale mobile web development. For example, what can be fit on a 128x96 low-end phone is obviously different from what can be fit on an iPhone or a Nokia N900 with an 800x480 screen. In addition, only about 50% of smartphones today support JavaScript let alone HTML5. Read more...

Robert JohnsonDirector, Software Engineering at Facebook

Scaling Facebook to 500 Million Users and Beyond

Posted about 8 years ago
blog post · Infra · Data · Culture

Today we hit an important milestone for Facebook - half a billion users. It's particularly exciting to those of us in engineering and operations who build the systems to handle this massive growth. I started at Facebook four years ago when we had seven million users (which seemed like a really big number at the time) and the technical challenges along the way have been just as crazy as you might imagine. A few of the big numbers we deal with: * 500 million active users * 100 billion hits per day * 50 billion photos * 2 trillion objects cached, with hundreds of millions of requests per second * 130TB of logs every day Over the years we've written on this page about a number of the technical solutions we've used to handle these numbers. Today, I'd like to step back and talk about some of the general ways we think about scaling, and some of the principles we use to tackle scaling problems. Like Facebook itself, these principles involve both technology and people. In fact, only a couple of the principles below are entirely technical. At the end of the day it's people who build these systems and run them, and our best tools for scaling them are engineering and operations teams that can handle anything. The scaling statistic I'm most proud of is that we have over 1 million users per engineer, and this number has been steadily increasing. Read more...

Carlos BuenoFixer at Facebook

Internet Cartography

Posted about 8 years ago

A telegram from San Francisco to Hong Kong in 1901 must have taken many hops through British Empire cables to Europe, through the Middle East, and so on. London to New York was fast and direct. The vestiges of the Spanish and Portuguese Empires show up in the many links between South America, the Caribbean archipelago, and the Iberian peninsula. A cool thing is that you can measure these relative latencies yourself, using the present-day internet. If you run a website with a decent amount of worldwide traffic, you can use that traffic to map out how the internet responds with regards to you, and see how that matches with the gross structure of the 'net. I wrote about a cheap and cheerful way to generate this data last year, and the code has since been open-sourced as part of Yahoo's Boomerang measurement framework. Read more...

Matt JonesEngineering

Protecting Privacy with Referrers

Posted about 8 years ago
blog post · Web · Data · Security · User Experience · JavaScript

Late last week, we quickly fixed an issue after being contacted by a Wall Street Journal reporter regarding an unintentional oversight in the data shared with our advertisers by your browser when you click some ads on Facebook. This occurred in the referrer link visible to advertisers when someone clicked on an ad. A little background: In some cases the referrer could contain the user ID of a profile you visited, including your own, but we were not aware of any way that a user ID on the referrer could identify the person who clicked on the ad. We've been testing different solutions to remove user IDs completely from referrer URLs since their inclusion was first brought to our attention. However, in a rarely occurring case, advertisers knowledgeable about the structure of Facebook's URLs could use the referrer to determine when someone who clicked on an ad had been viewing his or her own profile, thus potentially enabling them to infer the user ID of that person. We have no reason to believe that any advertisers were exploiting this, and doing so would have been a violation of our terms. To our knowledge, none did. It's also important to point out that we don't share personal information with advertisers, and we never sell any of your information to anyone. Read more...

Keith AdamsEngineer at Facebook

The Life of a Typeahead Query

Posted about 8 years ago
blog post · Web · Infra · Data · Performance · User Experience · Backend · Graph · Testing

In fall of 2009, some of us at Facebook imagined a more interactive search experience, where high quality results would appear as the user typed. We summarized the vision in a blog post introducing our new search box. Making this system real for 400 million users has been challenging at every level of the software stack. On the front-end, code running in users' browsers must consume and render results quickly enough to not distract the user as they type. We set a strict goal for ourselves that the new typeahead can't be slower than the existing typeahead for finding your friends, so every millisecond delay matters. Despite these performance constraints, the UI cannot be too minimalist; each result needs enough contextual clues to convey its meaning and relevance. Because the new typeahead auto-selects the first result when you hit Enter in the search box, we need near-perfect relevance so that the first result is always the one you're looking for. To satisfy all these constraints, we designed an architecture composed of several different backend services that were optimized for specific types of results. Let's follow a session as the request leaves a user's browser and becomes a set of navigable results. Read more...

Akhil WableEngineering

Intro to Facebook Search

Posted about 8 years ago
blog post · Infra · Data · Web · Caching · User Experience

Connecting and sharing with others is Facebook’s primary value. That value necessitates having the ability to easily and efficiently find the people and information we care about. The search team at Facebook is focused on building a search product to enable our more than 400 million users to quickly find what they're looking for. In July 2007 we explained the complexities of serving one of the largest user bases in the world and the reasons for building our own in-house search service. Serving more than 150 million queries a day, and supporting a user base that has grown by more than 10x since then reinforces that decision. Read more...

Tao SteinE N G I N E E R. at Facebook

Facebook becomes a USENIX Patron

Posted about 8 years ago
blog post · Infra · Data · Web · Mobile · Compute · Open Source · Research · Graph · Languages · PHP · HipHop · Platformmore

From its beginnings, Facebook has had to solve novel systems challenges to help us scale and grow. The idea of the social graph, and its implementation as a web and mobile platform have repeatedly pushed our computer systems into uncharted territory. The workloads are non-traditional, graph-oriented and write-heavy, and the system has grown rapidly to a base of 350M users around the world. We have survived and thrived due to healthy innovation and creativity, but we haven't done it alone. We have benefited from innovation in both the open source and computer systems communities. The USENIX Association is an essential hub in the systems community and today we are pleased to announce that we are becoming a Patron of the USENIX Association. Read more...

Real-World Web Application Benchmarking

Posted about 8 years ago
blog post · Data · Infra

In order to adequately forecast compute capacity and financial expense, application developers must benchmark application performance. Three factors are important to consider when analyzing hardware performance: maximum throughput, acquisition cost and operating cost. Industry-standard benchmarks, such as those published by The Standard Performance Evaluation Corporation (SPEC) can be reasonable indicators of maximum throughput for certain workloads. At Facebook, we recognized these benchmarks wouldn’t necessarily represent our application behavior under real-world conditions and developed a proprietary analysis methodology. In this paper, we contrast the conventional wisdom of using industry-standard synthetic benchmarks as guidelines for assessing web application performance with organic traffic measurements performed in a controlled environment. Read more...

Zizhuang YangEngineering

Every Millisecond Counts

Posted about 9 years ago

Site speed has always been an important factor in the development of Facebook, even as the site evolves over time to become more feature-rich and complex. As we grow beyond the 250 million user mark, every small change to the site causes a huge ripple, affecting throngs of web surfers and their experience on Facebook. My project this summer as an engineering intern on the Infrastructure team involved tackling this imposing fact by exploring data and finding out how various changes to fundamental parts of the user experience impacted and changed user behavior. Read more...

Designing the Facebook username land rush

Posted about 9 years ago
blog post · Web · Data · Caching · User Experience · Performance · Optimization · Testing

We recently hit a milestone of 50MM usernames a few weeks ago — in just over a month since we launched usernames on June 12. Ever since we launched usernames, we’ve had a lot of people express interest in understanding how we designed the system and prepared for this big event. In a recent post, my colleague Tom Cook wrote about the site reliability and infrastructure work that we did to ensure a smooth launch. As an extension to that post, I’ll discuss some specific application and system design issues here. Launching usernames to allow over 200 million (at the time — we’re now over 250 million) people to get a username at the same time presented some really interesting performance and site reliability challenges. The two main parts of the system that needed to scale were (1) the availability checker and (2) the username assigner. Since we were pre-generating suggestions for users, we needed to check availability of all the suggested names, which placed extra load on the availability checker. Read more...

Matthew WeltyEngineering

10th Annual System Administrator Appreciation Day

Posted about 9 years ago
blog post · Data · Culture · Compute

Today we celebrate the 10th Annual System Administrator Appreciation Day. Sysadmins work throughout Facebook Ops, IT and Engineering 24 hours a day, 7 days a week to keep the critical elements of site services up and running. They make an impact just about everywhere; from internal systems and tools to production applications like Photos, Facebook Connect, and News Feed. Sysadmins are often the invisible heroes behind a company's success. A salesperson might get a bonus for exceeding sales goals, a software engineer might be featured in a magazine or a newspaper for a breakthrough product, but a system administrator...well, they usually just equate success with not getting paged at 2 in the morning. At Facebook, a dedicated group of sysadmins have labored tirelessly to scale our website to serve over 250 million users, others have built out the infrastructure that supports our network of employees across the globe, and altogether they've made a substantial contribution to Facebook's mission to give people the power to share and make the world more open and connected. Please take some time today to thank a sysadmin you know for the work that they do to keep things like your email, fileservers, and favorite websites running at peak performance. If you’re a system administrator yourself and the idea of supporting the infrastructure behind one of the most trafficked sites on the Internet makes your mouth water, be sure to check out our open positions at To learn more about System Administrator Appreciation Day, visit Read more...

Eric SunEngineering Manager at Facebook

A New Look at the Path to Popularity

Posted about 9 years ago
blog post · Data · Research · Analytics

[N.B.: The note below profiles some research that was conducted last year at Facebook based on the old News Feed. The resulting paper was recently presented at the International AAAI Conference on Weblogs and Social Media (ICWSM) conference in May 2009, where it received the Best Paper award. The full paper can be found at]. Read more...

Keep Updated

Stay up-to-date via RSS with the latest open source project releases from Facebook, news from our Engineering teams, and upcoming events.

Facebook © 2018