Engineering Blog

Blog posts tagged 'Data Infrastructure'

Yoshinori MatsunobuDatabase Engineer at Facebook

Migrating a database from InnoDB to MyRocks

Posted about 7 months ago

Moving one of Facebook's main databases to MyRocks cut storage usage in half. Read more...

Mark MarchukovEngineering

LogDevice: a distributed data store for logs

Posted about 8 months ago
blog post · Data · @Scale · Data Infrastructure

LogDevice helps ensure we can replicate data between distributed data stores, while maintaining high-write availability, durability, and consistency. Read more...

Relay Modern: Simpler, faster, more extensible

Posted about a year ago

The new version of Relay is designed from the ground up to be easier to use, more extensible, and optimized for mobile devices. Read more...

Faiss: A library for efficient similarity search

Posted about a year ago

Vector representation allows for fast, large-scale image searches where traditional key/value queries fall short. Read more...

Using Apache Spark for large-scale language model training

Posted about a year ago
blog post · Data · Data Infrastructure · Analytics

The pipeline is modular, readable, and more maintainable, with reductions in both resource usage and data landing time. Read more...

Justin TellerEngineering

Beringei: A high-performance time series storage engine

Posted about a year ago

Beringei powers most of the performance and health monitoring at Facebook while enabling engineers and analysts to make decisions quickly with accurate, real-time data. Read more...

Facebook Open Source 2016 year in review

Posted about a year ago

Over the past few years, Facebook's Open Source program has grown into one of the largest and most active portfolios in the industry. Read more...

Miki FriedmannEngineering

Made in NY: The engineering behind social recommendations

Posted about a year ago

The product incorporates machine learning and client-side caching to identify relevant posts and dynamically update attachments as new recommendations are added. Read more...

Divij RajkumarProduction Engineer at Facebook

Continuous MySQL backup validation: Restoring backups

Posted about a year ago

Our system continuously tests our ability to restore our databases from backups, ensuring that we can quickly and reliably recover from an outage. Read more...

A comparison of state-of-the-art graph processing systems

Posted about 2 years ago
blog post · Data · Graph · Data Infrastructure · Performance · Backend

The study measured the relative performance and ability of two systems to handle large graphs, focusing on performance and usability. Read more...

Angelo FaillaEngineering

DHCPLB: An open source load balancer

Posted about 2 years ago

From hackathon prototype to internship project, the new load balancer is now deployed across Facebook's server fleet to manage DHCP traffic. Read more...

Apache Spark @Scale: A 60 TB+ production use case

Posted about 2 years ago
blog post · Data · Infra · Data Infrastructure · Analytics · Backend · Open Source

Through a series of performance and reliability improvements, we were able to scale Spark to handle a TB-scale entity ranking system in production. Read more...

Yoshinori MatsunobuDatabase Engineer at Facebook

MyRocks: A space- and write-optimized MySQL database

Posted about 2 years ago
blog post · Data · Infra · Storage · MySQL · Backend · Data Infrastructure

Deploying MyRocks to a database tier in one of our data center regions enabled a 50 percent reduction in storage requirements. Read more...

Facebook Seattle moves into Dexter Station

Posted about 2 years ago
blog post · Infra · Culture · Seattle · Data Infrastructure · Storage · Platform

The open layout fosters Facebook's open and transparent culture, helping connect teams as they work together to connect the world. Read more...

Arun SharmaEngineering

Dragon: A distributed graph query engine

Posted about 2 years ago
blog post · Data Infrastructure · Backend · Caching · Graph

Dragon monitors real-time updates to the social graph and creates several different types of indices that improve the efficiency of fetching, filtering, and reordering the data. Read more...

Shaohua LiSoftware engineer at Facebook

Improving software RAID with a write-ahead log

Posted about 2 years ago

Software RAID has some drawbacks, which can be problematic at Facebook's scale. Using a write-ahead log can address some of these issues and improve reliability of the array. Read more...

Erin GreenEngineering

Using ISC Kea DHCP in our data centers

Posted about 3 years ago

Inside Facebook's transition to ISC Kea.

Under the hood: Facebook’s cold storage system

Posted about 3 years ago

Finding a place for images to live so they can be instantly available is a recurring scale challenge for Facebook. Read more...

Ashish ThusooEngineering at Facebook

Hive - A Petabyte Scale Data Warehouse using Hadoop

Posted about 9 years ago

A number of engineers from Facebook are speaking at the Yahoo! Hadoop Summit today about the ways we are using Hadoop and Hive for analytics. Hive is an open source, peta-byte scale date warehousing framework based on Hadoop that was developed by the Data Infrastructure Team at Facebook. In this blogpost we'll talk more about Hive, how it has been used at Facebook and its unique architecture and capabilities. Read more...

Keep Updated

Stay up-to-date via RSS with the latest open source project releases from Facebook, news from our Engineering teams, and upcoming events.

Facebook © 2018