January 5, 2012Data · Infra · Web · Timeline · User Experience

Building Timeline: Scaling up to hold your life story

Ryan Mack

Timeline isn’t just a bold new look for Facebook­—it’s also the product of a remarkably ambitious engineering effort. While our earlier profile pages surfaced a few days or weeks of activity, from the onset we knew that with Timeline we had to think in terms of years and even decades. At a high level we needed to scan, aggregate, and rank posts, shares, photos and check-ins to surface the most significant events over years of Facebook activity.

The schedule for Timeline was very aggressive. When we sat down to build the system, one of our key priorities was eliminating technical risk by keeping the system as simple as possible and relying on internally-proven technologies. After a few discussions we decided to build on four of our core technologies: MySQL/InnoDB for storage and replication, Multifeed (the technology that powers News Feed) for ranking, Thrift for communications, and memcached for caching. We chose well-understood technologies so we could better predict capacity needs and rely on our existing monitoring and operational tool kits.

Denormalizing the data

Before we began Timeline, our existing data was highly normalized, which required many round trips to the databases. Because of this, we relied on caching to keep everything fast. When data wasn’t found in cache, it was unlikely to be clustered together on disk, which led to lots of potentially slow, random disk IO. To support our ranking model for Timeline, we would have had to keep the entire data set in cache, including low-value data that wasn’t displayed.

A massive denormalization process was required to ensure all the data necessary for ranking was available in a small number of IO-efficient database requests

Once denormalized, each row in the database contained both information about the action and enough ranking metadata so it could be selected or discarded without additional data fetches. Data is now sorted by (user, time) on disk and InnoDB does a great job of streaming data from disk with a primary key range query.

Some of our specific challenges of the denormalization process were:

  1. Dozens of legacy data formats that evolved over years. Peter Ondruška, a Facebook summer intern, defined a custom language to concisely express our data format conversion rules and wrote a compiler to turn this into runnable PHP. Three “data archeologists” wrote the conversion rules.
  2. Non-recent activity data had been moved to slow network storage. We hacked a read-only build of MySQL and deployed hundreds of servers to exert maximum IO pressure and copy this data out in weeks instead of months.
  3. Massive join queries that did tons of random IO. We consolidated join tables into a tier of flash-only databases. Traditionally PHP can perform database queries on only one server at a time, so we wrote a parallelizing query proxy that allowed us to query the entire join tier in parallel.
  4. Future-proofing the data schema. We adopted a data model that’s compatible with Multifeed. It’s more flexible and provides more semantic context around data with the added benefit of allowing more code reuse.

Timeline aggregator

We built the Timeline aggregator on top of the database. It started its life as a modified version of the Multifeed Aggregator that powers News Feed, but now it runs locally on each database box, allowing us to max out the disks without sending any data over the network that won’t be displayed on the page.

The aggregator provides a set of story generators that handle everything from geographically clustering nearby check-ins to ranking status updates. These generators are implemented in C++ and can run all these analyses in a few milliseconds, much faster than PHP could. Much of the generator logic is decomposed into a sequence of simple operations that can be reused to write new generators with minimal effort.

Caching is an important part of any Facebook project. One of the nice properties of Timeline is that the results of big queries, such as ranking all your activity in 2010, are small and can be cached for a long period without cache invalidations. A query result cache is of huge benefit and memcached is an excellent solution.

Recent Activity changes frequently so a query cache is frequently invalidated, but regenerating the summary of Recent Activity is quite fast. Here a row cache helps further boost query performance. We rely on the InnoDB buffer pool in RAM and our own Flashcache kernel driver to expand the OS cache onto a flash device.

Developing in parallel

Timeline started as a Hackathon project in late 2010 with two full-time engineers, an engineering intern, and a designer building a working demo in a single night. The full team ramped up in early 2011, and the development team was split into design, front-end engineering, infrastructure engineering, and data migrations. By doing staged and layered prototyping, we achieved an amazing amount of development parallelism and rarely was any part of the team blocked by another. Early on in the project we were simultaneously:

  1. Designing UI prototypes with our pre-existing but non-scalable backend,
  2. Building production frontend code on a simulation of the scalable backend,
  3. Building the scalable backend using samples of de-normalized data from a prototype of denormalization migration,
  4. Building the framework to run the full-scale denormalization process,
  5. Collecting and copying the data necessary for the denormalization,
  6. Performing simulated load testing to validate our capacity planning estimates.

In retrospect, that’s pretty crazy. We had to move a lot of mountains to go from the initial infrastructure review meeting to successfully turning on the 100% backend load test in just six months. Done another way, this project could have taken twice as long - and that’s being generous.

Timeline has been a wonderful opportunity to work closely with the product team. Our constant collaboration was critical to ensuring the infrastructure we built supported their product goals while simultaneously guiding the product toward features that we could implement efficiently.

As millions of users enable Timeline, it is wonderfully exciting to see all the positive feedback and even more exciting to see our performance graphs look just like our simulations.

Ryan Mack, an infrastructure engineer, looks forward to rediscovering this blog post on his timeline a decade from now.

Related Posts

Building Paper

Scott Goodson

A few weeks ago we launched Paper, a new app to explore content from your friends and the world around you—with built-in access to all the core features of Facebook. Through the process of implementing the fresh design and creating about 20 new categories of content, the team has developed new frameworks and architectural approaches to address the toughest challenges we encountered. In a series of blog posts, tech talks, and open source releases, we hope to give a comprehensive overview of the key parts of Paper's implementation and share some of our most valuable lessons learned while building this interaction-rich app on iOS.

In setting out to create Paper, our goal was not to pick apart past assumptions, but to start clean. We did establish one new assumption, however: the product should be designed and optimized for mobile devices at every level, from performance to user experience. The constraints and capabilities of smartphones with multitouch displays are quite unique, and it soon became apparent that fulfilling this charter would require not just a new set of interface designs--it would also demand rethinking the engineering approach in several major ways to successfully build out the vision.

Constraints and capabilities

Perhaps the most obvious constraint of a mobile device is its limited display size. Wasting space with unnecessary indentation, tab bars, or bordering of elements crowds the content being presented and causes text to wrap sooner and images to be smaller. Paper addresses this by removing virtually all of the traditional navigation elements, ensuring that the screen is used edge-to-edge with useful and beautiful material. While Paper was developed in isolation from iOS 7, the system's all-new design has broadly introduced similar underlying principles (both in first-party apps, and as a new set of design guidelines for third parties).

Recognizing visible area as a limitation, these displays have an important advantage compared to earlier generations of devices: multitouch sensing hardware. This input modality is extremely powerful, yet still in its early stages of utilization in the market. We can’t simply remove standard navigational elements like buttons, as their functionality must be replaced by another mechanism. This combination of facts led to an early decision that the app would be gesture-driven to a degree we had not built before, with nearly everything onscreen reacting to peoples’ touches. Disambiguating these gestures and allowing them to seamlessly interrupt in-flight animations became an important, unsolved engineering problem.

Engineering a new level of interactivity

When we decided to minimize application chrome and replace it with large surfaces that respond to touch input, we observed an essential detail which seems to apply to gestures in general. They feel best when the force imparted to the device from a person’s finger behaves as though it is carried into the device—transcending the glass, and playing out in an action that is animated naturally by a physics simulation. In fact, the familiar inertial scrolling and edge bouncing effects seen when scrolling content on iOS are examples of these velocity-sensitive interactions, but surprisingly, it has been uncommon to see this successful paradigm extended further. Paper’s user interface is made more intuitive and realistic with consistent, pervasive use of these physics-simulating animations in new contexts, such as when interacting with photos and links, dragging elements in a list that follow your finger, etc.

The stories that follow will all be about how the team implemented the user interface of Paper, including discussion of the new techniques we came up with to enable our development, and how we confronted those limitations while maximizing use of the powerful frameworks already available (in the iOS SDK, and at Facebook). Here are some of the topics we’re planning to discuss:

  • Building asynchronous user interfaces: How Paper keeps gestures and animations smooth
  • Implementing physics-simulating animations: Going beyond Core Animation
  • Advanced interactions on iOS: Coordinating animations with touch input
  • Advanced techniques with asynchronous UI: Preloading content and GPU optimization

The first technical post will dive into how Paper’s architecture ensures that the application’s main thread is significantly less burdened than in a typical UIKit application, enabling high responsiveness and frame rates, especially for an app that makes heavy use of physics-simulating animations. We’re excited to explore other topics in the coming months!

Sharing what we’ve learned

Facebook has a strong tradition of sharing our engineering work; examples I'm personally proud of include Open Compute's innovative hardware designs, major open source projects like React and HHVM, and engineering events like hackathons and F8. The Paper team is eager to participate in the community by sharing our experiences across several mediums. In addition to our planned series of engineering blog posts, we’ve already open sourced two small components, KVOController and Shimmer, and are working on more. We're also starting to plan some tech talks, and we look forward to posting some videos of them here on code.facebook.com and the Facebook Engineering Page. Each effort will be led by engineers from the Paper team, aimed at reflecting one of their personal specialties they developed while working on the project. We’ll be listening carefully to the community throughout, so please let us know which topics and formats are the most interesting and useful!

It was possible for the Paper team to focus so deeply on these challenges of interactivity and performance only because of the infrastructure we were able to build upon. iOS has a rich set of APIs that give us key functionality to use and extend with our own. Far more significantly, being situated inside Facebook, surrounded by people of consistently humbling intelligence and skill, gave us access to both client and server capabilities which proved essential for allowing our team to remain focused on the areas where we could contribute back new innovations. The Paper team is genuinely grateful for this opportunity, and we're excited to share what we’ve learned with people both inside and out of Facebook.

Thanks to all the engineers who put significant full-time effort into shipping Paper v1.0: Kimon Tsinteris, Tim Omernick, Brian Amerige, Jason Prado, Ben Cunningham, Grant Paul, Li Tan, Andrew Pouliot, Andrew Wang, Nadine Salter, and more!

More to Read

Making Facebook’s software infrastructure more energy efficient with Autoscale

Qiang Wu

Improving energy efficiency and reducing environmental impact as we scale is a top priority for our data center teams. We’ve talked a lot about our progress on energy-efficient hardware and data center design through the Open Compute Project, but we’ve also started looking at how we could improve the energy efficiency of our software. We explored multiple avenues, including power modeling and profiling, peak power management, and energy-proportional computing.

One particularly exciting piece of technology that we developed is a system for power-efficient load balancing called Autoscale. Autoscale has been rolled out to production clusters and has already demonstrated significant energy savings.

Energy efficient load balancing

Every day, Facebook web clusters handle billions of page requests that increase server utilization, especially during peak hours.

The default load-balancing policy at Facebook is based on a modified round-robin algorithm. This means every server receives roughly the same number of page requests and utilizes roughly the same amount of CPU. As a result, during low-workload hours, especially around midnight, overall CPU utilization is not as efficient as we’d like. For example, a particular type of web server at Facebook consumes about 60 watts of power when it’s idle (0 RPS, or requests-per-second). The power consumption jumps to 130 watts when it runs at low-level CPU utilization (small RPS). But when it runs at medium-level CPU utilization, power consumption increases only slightly to 150 watts. Therefore, from a power-efficiency perspective, we should try to avoid running a server at low RPS and instead try to run at medium RPS.

To tackle this problem and utilize power more efficiently, we changed the way that load is distributed to the different web servers in a cluster. The basic idea of Autoscale is that instead of a purely round-robin approach, the load balancer will concentrate workload to a server until it has at least a medium-level workload. If the overall workload is low (like at around midnight), the load balancer will use only a subset of servers. Other servers can be left running idle or be used for batch-processing workloads.

Though the idea sounds simple, it is a challenging task to implement effectively and robustly for a large-scale system.

Overall architecture

In each frontend cluster, Facebook uses custom load balancers to distribute workload to a pool of web servers. Following the implementation of Autoscale, the load balancer now uses an active, or “virtual,” pool of servers, which is essentially a subset of the physical server pool. Autoscale is designed to dynamically adjust the active pool size such that each active server will get at least medium-level CPU utilization regardless of the overall workload level. The servers that aren’t in the active pool don’t receive traffic.

Figure 1: Overall structure of Autoscale

We formulate this as a feedback loop control problem, as shown in Figure 1. The control loop starts with collecting utilization information (CPU, request queue, etc.) from all active servers. Based on this data, the Autoscale controller makes a decision on the optimal active pool size and passes the decision to our load balancers. The load balancers then distribute the workload evenly among the active servers. It repeats this process for the next control cycle.

Decision logic

A key part of the feedback loop is the decision logic. We want to make an optimal decision that will adapt to the varying workload, including workload surges or drops due to unexpected events. On one hand, we want to maximize the energy-saving opportunity. On the other, we don’t want to over-concentrate the traffic in a way that could affect site performance.

For this to work, we employ the classic control theory and PI controller to get the optimal control effect of fast reaction time, small overshoots, etc. To apply the control theory, we need to first model the relationship of key factors such as CPU utilization and request-per-second (RPS). To do this, we conduct experiments to understand how they correlate and then estimate the model based on experimental data. For example, Figure 2 shows the experimental results of the relationship between CPU and RPS for one type of web server at Facebook. In the figure, the blue dots are the raw data points while the red dashed line is the estimated model (piece-wise linear). With the models obtained, the controller is then designed using the classic stability analysis to pick the best control parameters.

Figure 2: Experimental results of the relationship between CPU and RPS for one type of web server; the red dashed line is the estimated piece-wise linear model

Deployment and preliminary results

Autoscale has been deployed to production web clusters at Facebook and has so far proven to be successful.

In the current stage, we’ve decided to either leave “inactive” servers running idle to save energy or to repurpose the inactive capacity for batch-processing tasks. Both ways improved our overall energy efficiency.

Figure 3 shows the results from one production web cluster at Facebook. The y-axis is the normalized number of servers put into inactive mode by Autoscale during a 24-hour cycle. The numbers are normalized by the maximum number of inactive servers, which occurred around midnight. Also, as we expected, none of the servers in this cluster could be put into inactive mode around peak hours.

Figure 3: Normalized number of servers in a web cluster put into inactive mode by Autoscale during a 24-hour window

In terms of energy savings when putting inactive servers into power saving mode, Figure 4 shows the total power consumption for one of our production web clusters – the normalized power values relative to the daily maximum power draw. The red line is the best we could do without Autoscale. In contrast, the blue line shows the power draw with Autoscale. In this cluster, Autoscale led to a 27% power savings around midnight (and, as expected, the power saving was 0% around peak hours). The average power saving over a 24-hour cycle is about 10-15% for different web clusters. In a system with a large number of web clusters, Autoscale can save a significant amount of energy.

Figure 4: Normalized power consumption for a production web cluster with and without Autoscale

Next steps

We are still in the early stages of optimizing our software infrastructure for power efficiency, and we’re continuing to explore opportunities in different layers of our software stack to reduce data center power and energy usage. We hope that through continued innovation, we will make Facebook’s infrastructure more efficient and environmentally sustainable.

Special thanks to all contributors to Autoscale at Facebook.

More to Read

Keep Updated

Stay up-to-date via RSS with the latest open source project releases from Facebook, news from our Engineering teams, and upcoming events.

Subscribe
Facebook © 2017