Engineering Blog

Blog posts tagged 'Data'

Ming HuaEngineering at Facebook

#bewhoyoucansee: Ming Hua

Posted about 4 years ago
blog post · Culture · Data · Women in Tech

As more and more people from underrepresented groups lift their voices and launch their careers in tech fields, it is more important than ever that we share their stories widely. We all need visible role models who can challenge, inspire, and motivate us. With this in mind, we’re starting the #bewhoyoucansee series. Every week, we'll profile someone from an underrepresented group working in tech to learn how they got started, what they're passionate about, and what advice they have for other people pursuing a technical career. If you're a role model, or want to share who and what inspires you, follow along with us on Facebook and Instagram by tagging your posts #bewhoyoucansee. Read more...

James PearceEngineering at Facebook

9.9 million lines of code and still moving fast - Facebook open source in 2014

Posted about 4 years ago
blog post · Mobile · Culture · Data · Infra · @Scale · Open Source

The first six months of 2014 have been very busy for our open source program. In the spirit of the World Cup, we thought it was time for a half-time review of some of the highlights so far. Read more...

Nick PetroEngineering

F8 Developer Conference - Hacker Way Recap

Posted about 4 years ago
blog post · Data · Infra · Mobile · Web · Performance · Optimization · Open Source · Hack · Messages · Design Tools · Android · iOSmore

Over 1,700 developers traveled to the Concourse Exhibition Center in San Francisco for Facebook’s F8 Developer Conference last week. Read more...

Open-sourcing Haxl, a library for Haskell

Posted about 4 years ago
blog post · Infra · Data · Web · Backend · Open Source · Caching · Languages · Security · Data Science · Analyticsmore

Today we're open-sourcing Haxl, a Haskell library that simplifies access to remote data, such as databases or web-based services. Read more...

HydraBase – The evolution of HBase@Facebook

Posted about 4 years ago
blog post · Data · Infra · Messages · Analytics · Storage · Platform · Open Source

When we revamped Messages in 2010 to integrate SMS, chat, email and Facebook Messages into one inbox, we built the product on open-source Apache HBase, a distributed key value data store running on top of HDFS, and extended it to meet our requirements. At the time, HBase was chosen as the underlying durable data store because it provided the high write throughput and low latency random read performance necessary for our Messages platform. In addition, it provided other important features, including horizontal scalability, strong consistency, and high availability via automatic failover. Since then, we’ve expanded the HBase footprint across Facebook, using it not only for point-read, online transaction processing workloads like Messages, but also for online analytics processing workloads where large data scans are prevalent. Today, in addition to Messages, HBase is used in production by other Facebook services, including our internal monitoring system, the recently launched Nearby Friends feature, search indexing, streaming data analysis, and data scraping for our internal data warehouses. Read more...

Saving capacity with HDFS RAID

Posted about 4 years ago
blog post · Data · Infra · Production Engineering

As we continue to evolve our data infrastructure, we’re constantly looking for ways to maximize the utility and efficiency of our systems. One technology we’ve deployed is HDFS RAID, an implementation of Erasure Codes in HDFS to reduce the replication factor of data in HDFS. We finished putting this into production last year and wanted to share the lessons we learned along the way and how we increased capacity by tens of petabytes. Read more...

Scaling the Facebook data warehouse to 300 PB

Posted about 4 years ago
blog post · Data · Infra · Production Engineering

At Facebook, we have unique storage scalability challenges when it comes to our data warehouse. Our warehouse stores upwards of 300 PB of Hive data, with an incoming daily rate of about 600 TB. In the last year, the warehouse has seen a 3x growth in the amount of data stored. Given this growth trajectory, storage efficiency is and will continue to be a focus for our warehouse infrastructure. Read more...

Large-scale graph partitioning with Apache Giraph

Posted about 4 years ago
blog post · Infra · Data · Open Source · Graph · Graph Search · Performance · Optimization

Facebook’s architecture relies on various services that answer queries about people and their friends. Because of the size of the dataset, number of queries per second, and latency requirements, many of these systems cannot run on a single machine. Instead, people and their metadata are sharded across several machines. In such a distributed environment, answering queries might require communication among all these servers. Read more...

Yael MaguireEngineering

Announcing The Connectivity Lab at Facebook

Posted about 4 years ago
blog post · Infra · Data · Connectivity

Today we're announcing the Connectivity Lab at Facebook, a team that is working on new aerospace and communication technologies to advance the mission of improving and extending internet access. The Lab, which includes some of the world's top experts from Ascenta, NASA’s Jet Propulsion Laboratory, NASA’s Ames Research Center, and the National Optical Astronomy Observatory, is already working on new delivery platforms including planes and satellites to provide connectivity. Read more...

Steaphan GreeneEngineering

WebScaleSQL: A collaboration to build upon the MySQL upstream

Posted about 4 years ago
blog post · Data · MySQL · Production Engineering

To help the more than 1.23 billion people who use Facebook to share and connect with each other, we’ve had to build an expansive and incredibly advanced infrastructure -- including one of the largest deployments of MySQL in the world. Along the way, we’ve learned and benefited from code changes made by the MySQL community. Today we’re announcing WebScaleSQL, a collaboration among engineers from several companies that face similar challenges in running MySQL at scale and seek greater performance from a database technology tailored to their needs. Read more...

Looking back on “Look Back” videos

Posted about 4 years ago

Facebook’s mission is to help people connect with one another, and as our 10th anniversary approached last month, we wanted to do something that would let everyone participate in the event together. After some discussion, we settled on the Look Back feature, which allows people to generate one-minute videos that highlight memorable photos and posts from their time on Facebook. Read more...

A Decade of Building Facebook

Posted about 4 years ago
blog post · Culture · Web · Data · Mobile · Menlo Park · Open Source · Open Compute · Data Centers · Luleå

Today we're celebrating Facebook's 10th anniversary. Check out a timeline of the engineering milestones that have built the infrastructure supporting 1.23 billion users, 201.6 billion friend connections, 400 billion shared photos, and 7.8 trillion messages sent since the start of 2012. Read more...

Subodh IyengarSoftware engineer at Facebook

Introducing Conceal: Efficient storage encryption for Android

Posted about 4 years ago
blog post · Web · Infra · Data · Security · Open Source · Android · Java · Development Tools · Caching · Storage · Performancemore

Caching and storage are tricky problems for mobile developers because they directly impact performance and data usage on a mobile device. Caching helps developers speed up their apps and reduce network costs for the device owner by storing information directly on the phone for later access. However, internal storage capacity on Android phones is often limited, especially with lower to mid range phone models. A common solution for Android is to store some data on an expandable SD card to mitigate the storage cost. What many people don't realize is that Android's privacy model treats the SD card storage as a publicly accessible directory. This allows data to be read by any app (with the right permissions). Thus, external storage is normally not a good place to store private information. Read more...

James PearceEngineering at Facebook

2013: A Year of Open Source at Facebook

Posted about 4 years ago
blog post · Data · Mobile · Web · Infra · Open Source · Languages

Open source has always been a huge part of the Facebook engineering philosophy. 2013 has been a great year for our open source program, with a significant number of new projects that we're really proud of, a renewed commitment to run and maintain them actively, and a desire to work with the vibrant communities that have built up around them. Read more...

Dhruba BorthakurEngineering

Under the Hood: Building and open-sourcing RocksDB

Posted about 4 years ago
blog post · Data · Infra · Backend · Production Engineering · Open Source · Storage

Every time one of the 1.2 billion people who use Facebook visits the site, they see a completely unique, dynamically generated home page. There are several different applications powering this experience--and others across the site--that require global, real-time data fetching. Read more...

Carlos BuenoFixer at Facebook

The Mature Optimization Handbook

Posted about 4 years ago
blog post · Data · Culture · Testing · Performance · Optimization · Languages

I spent a good chunk of the past year working on an internal training class and a short book about performance measurement and optimization. You can download it here. Below is an excerpt. Read more...

Martin TraversoEngineering

Presto: Interacting with petabytes of data at Facebook

Posted about 4 years ago
blog post · Data · Infra · Backend · Performance

Facebook is a data-driven company. Data processing and analytics are at the heart of building and delivering products for the 1 billion+ active users of Facebook. We have one of the largest data warehouses in the world, storing more than 300 petabytes. How do we query it all?. Read more...

Ashoat TevosyanSoftware engineer at Facebook

Under the Hood: Building posts search

Posted about 4 years ago
blog post · Infra · Data · Production Engineering · Graph Search

Last week we added the ability to search posts using Graph Search, a feature that has been two years in the making. With one billion new posts added every day, the posts index contains more than one trillion total posts, comprising hundreds of terabytes of data. Indexing these posts and building a system to return real-time results has been a significant engineering challenge — and this is just beginning. Read more...

Shlomo PriymakEngineering

Under the hood: MySQL Pool Scanner (MPS)

Posted about 4 years ago
blog post · Data · Infra · MySQL

Facebook has one of the largest MySQL database clusters in the world. This cluster comprises many thousands of servers across multiple data centers on two continents. Read more...

Domas MituzasInfrastructure Engineer at Facebook

Flashcache at Facebook: From 2010 to 2013 and beyond

Posted about 4 years ago
blog post · Infra · Data · Storage · Caching · Performance · Optimization

We recently released a new version of Flashcache, kicking off the flashcache-3.x series. We’ve spent the last few months working on this new version, and our work has resulted in some performance improvements, including increasing the average hit rate from 60% to 80% and cutting the disk operation rate nearly in half. Read more...

Avery ChingEngineering

Scaling Apache Giraph to a trillion edges

Posted about 5 years ago
blog post · Data · Infra · Java · @Scale

Graph structures are ubiquitous: they provide a basic model of entities with connections between them that can represent almost anything. Flight routes connect airports, computers communicate to one another via the Internet, webpages have hypertext links to navigate to other webpages, and so on. Facebook manages a social graph that is composed of people, their friendships, subscriptions, and other connections. Read more...

Lachlan MulcahyEngineering

Windex: Automation for database provisioning

Posted about 5 years ago
blog post · Data · Infra · Production Engineering · Storage

Windex was originally developed to wipe data from hosts coming out of production, reinstall everything from the OS through to MySQL, and then configure them so they could be placed back into the spares pool all shiny and new. Now, Windex has expanded its role to cover all provisioning of MySQL DB hosts, whether they are freshly racked and set up by our site operations team or taken out of production for an offline repair like RAM replacement. Read more...

Mark MarchukovEngineering

TAO: The power of the graph

Posted about 5 years ago
blog post · Data · Infra · Caching · Production Engineering

Facebook puts an extremely demanding workload on its data backend. Every time any one of over a billion active users visits Facebook through a desktop browser or on a mobile device, they are presented with hundreds of pieces of information from the social graph. Users see News Feed stories; comments, likes, and shares for those stories; photos and check-ins from their friends -- the list goes on. The high degree of output customization, combined with a high update rate of a typical user’s News Feed, makes it impossible to generate the views presented to users ahead of time. Thus, the data set must be retrieved and rendered on the fly in a few hundred milliseconds. Read more...

Laurent DemaillySoftware engineer at Facebook

Wormhole pub/sub system: Moving data through space and time

Posted about 5 years ago
blog post · Data · Infra · Production Engineering · Caching

Over the last couple of years, we have built and deployed a reliable publish-subscribe system called Wormhole. Wormhole has become a critical part of Facebook's software infrastructure. At a high level, Wormhole propagates changes issued in one system to all systems that need to reflect those changes – within and across data centers. Read more...

Scaling memcache at Facebook

Posted about 5 years ago
blog post · Data · Infra · Caching · Production Engineering · Storage

Facebook started using memcached in August 2005 when Mark Zuckerberg downloaded it from the Internet and installed it on our Apache web servers. At that time, Facebook was starting to make increasingly sizable database queries on every page load, and page load times were significantly increasing. Providing a fast, snappy user experience has always been a high priority for Facebook, and memcached came to the rescue. Read more...

Tim ArmstrongEngineering

LinkBench: A database benchmark for the social graph

Posted about 5 years ago
blog post · Data · Infra · Graph · MySQL · Performance · Optimization · Open Source · Testing · Storage

MySQL offers a good mix of flexibility, performance, and administrative ease, but the database engineering team continues to explore alternatives to MySQL for storing social graph data. There are several generic open-source benchmarks that could provide a starting point for comparing database systems. However, the gold standard for database benchmarking is to test the performance of a system on the real production workload, since synthetic benchmarks often don't exercise systems in the same way. When making decisions about a significant component of Facebook's infrastructure, we need to understand how a database system will really perform in Facebook's production workload. Read more...

Sriram SankarSoftware engineer at Facebook

Under the Hood: Indexing and ranking in Graph Search

Posted about 5 years ago
blog post · Data · Infra · Graph Search

Search Ranking.

Keep Updated

Stay up-to-date via RSS with the latest open source project releases from Facebook, news from our Engineering teams, and upcoming events.

Facebook © 2018