Engineering Blog

Blog posts tagged 'Analytics'

Using Apache Spark for large-scale language model training

Posted about a year ago
blog post · Data · Data Infrastructure · Analytics

The pipeline is modular, readable, and more maintainable, with reductions in both resource usage and data landing time. Read more...

Justin TellerEngineering

Beringei: A high-performance time series storage engine

Posted about a year ago

Beringei powers most of the performance and health monitoring at Facebook while enabling engineers and analysts to make decisions quickly with accurate, real-time data. Read more...

Apache Spark @Scale: A 60 TB+ production use case

Posted about 2 years ago
blog post · Data · Infra · Data Infrastructure · Analytics · Backend · Open Source

Through a series of performance and reliability improvements, we were able to scale Spark to handle a TB-scale entity ranking system in production. Read more...

Jay TangEngineering

Building the Presto community

Posted about 2 years ago
blog post · Data · Infra · Analytics · Performance · Open Source

When we launched Presto, we saw dramatic query performance improvement across multiple internal Hadoop clusters. Read more...

Ari EntinCommunications at Facebook

Facebook expands AI research team to Paris

Posted about 3 years ago

The researchers will work with FAIR's teams in the U.S. on projects in image and speech recognition and natural language processing, among other areas of study. Read more...

How RocksDB is used in osquery

Posted about 3 years ago
blog post · Infra · Data · Backend · Security · Framework · Analytics · Storage · Open Source

Using RocksDB as osquery's embedded database allows osquery to store and access data in a fast, persistent way, enabling our team to solve some technical problems we'll detail in this blog. Read more...

Dain SundstromEngineering

Even faster: Data at the speed of Presto ORC

Posted about 3 years ago
blog post · Data · Backend · Open Source · Analytics · Performance · Testing

The Presto ORC reader is available in open source, and it's being used at Facebook, showing good results. Read more...

Audience Insights query engine: In-memory integer store for social analytics

Posted about 3 years ago
blog post · Web · Data · Infra · Production Engineering · Analytics · Data Science

A query engine with a hybrid integer store that organizes data in memory and on flash disks so that a query can process terabytes of data in real time. Read more...

Alex SourovSoftware Engineer / Engineering Manager / Product Manager at Facebook

Improving Facebook on Android

Posted about 4 years ago

In an effort to connect the next five billion, Facebook began to shift to a mobile-first company about two years ago. We trained hundreds of employees on mobile development, restructured internal teams to build for all platforms, and moved to a fast-paced release cycle. Read more...

Open-sourcing Haxl, a library for Haskell

Posted about 4 years ago
blog post · Infra · Data · Web · Backend · Open Source · Caching · Languages · Security · Data Science · Analyticsmore

Today we're open-sourcing Haxl, a Haskell library that simplifies access to remote data, such as databases or web-based services. Read more...

HydraBase – The evolution of HBase@Facebook

Posted about 4 years ago
blog post · Data · Infra · Messages · Analytics · Storage · Platform · Open Source

When we revamped Messages in 2010 to integrate SMS, chat, email and Facebook Messages into one inbox, we built the product on open-source Apache HBase, a distributed key value data store running on top of HDFS, and extended it to meet our requirements. At the time, HBase was chosen as the underlying durable data store because it provided the high write throughput and low latency random read performance necessary for our Messages platform. In addition, it provided other important features, including horizontal scalability, strong consistency, and high availability via automatic failover. Since then, we’ve expanded the HBase footprint across Facebook, using it not only for point-read, online transaction processing workloads like Messages, but also for online analytics processing workloads where large data scans are prevalent. Today, in addition to Messages, HBase is used in production by other Facebook services, including our internal monitoring system, the recently launched Nearby Friends feature, search indexing, streaming data analysis, and data scraping for our internal data warehouses. Read more...

Carlos Buenofixer at Facebook

Meet a Facebook Engineer: Carlos Bueno

Posted about 6 years ago
blog post · Culture · Performance · Optimization · Analytics · Tooling

At Facebook, our engineers collaborate to create an open environment where ideas win and are executed quickly. Beginning this week, our engineers will give you a look into what it's like to ideate and build at Facebook in our new "Meet a Facebook Engineer" Q&A series. Check back every week to hear from different engineers about what problems they're passionate about solving right now, what they're up to at Facebook and what advice they have for you. Read more...

Alex HimelVice President, Local at Facebook

Building Realtime Insights

Posted about 7 years ago
blog post · Web · Compute · Platform · Analytics · Optimization

Social plugins have become an important and growing source of traffic for millions of websites over the past year. We released a new version of Insights for Websites last week to give site owners better analytics on how people interact with their content and to help them optimize their websites in real time. Read more...

Jason SobelEngineering

Making Facebook 2x Faster

Posted about 8 years ago

Everyone knows the internet is better when it's fast. At Facebook, we strive to make our site as responsive as possible; we've run experiments that prove users view more pages and get more value out of the site when it runs faster. Google and Microsoft presented similar conclusions for their properties at the 2009 O'Reilly Velocity Conference. So how do we go about making Facebook faster? The first thing we have to get right is a way to measure our progress. We want to optimize for users seeing pages as fast as possible so we look at the three main components that contribute to the performance of a page load: network time, generation time, and render time. Read more...

Eric SunEngineering Manager at Facebook

A New Look at the Path to Popularity

Posted about 9 years ago
blog post · Data · Research · Analytics

[N.B.: The note below profiles some research that was conducted last year at Facebook based on the old News Feed. The resulting paper was recently presented at the International AAAI Conference on Weblogs and Social Media (ICWSM) conference in May 2009, where it received the Best Paper award. The full paper can be found at]. Read more...

Ashish ThusooEngineering at Facebook

Hive - A Petabyte Scale Data Warehouse using Hadoop

Posted about 9 years ago

A number of engineers from Facebook are speaking at the Yahoo! Hadoop Summit today about the ways we are using Hadoop and Hive for analytics. Hive is an open source, peta-byte scale date warehousing framework based on Hadoop that was developed by the Data Infrastructure Team at Facebook. In this blogpost we'll talk more about Hive, how it has been used at Facebook and its unique architecture and capabilities. Read more...

Keep Updated

Stay up-to-date via RSS with the latest open source project releases from Facebook, news from our Engineering teams, and upcoming events.

Facebook © 2018