Operating large-scale, globally distributed services requires accurate monitoring of the health and performance of our systems to identify and diagnose problems as they arise. Facebook uses a time series database (TSDB) to track and store system measurements such as product stats (e.g., how many messages are being sent per minute), service stats (e.g., the rate of queries hitting the cache tier vs the MySQL tier), and system stats (e.g., CPU, memory, and network usage), so that we can see the real-time load on our infrastructure and make decisions about how to allocate resources. Each system and service at Facebook writes hundreds to thousands of counters to our storage engine, and this data is available to query in real time by engineers looking at dashboards and doing ad-hoc performance analysis.
In early 2013, our monitoring team realized that our existing HBase-backed TSDB would not scale to handle future read loads as our systems and company grew. Our average read latency was acceptable for looking at a small amount of data, but trying to visualize more data through interactive dashboards resulted in a poor experience. The 90th percentile query time had increased to multiple seconds, which is unacceptable for automated tools that may need to issue hundreds or thousands of queries to do an analysis. Medium-sized queries of a few thousand time series took tens of seconds to execute, and larger queries executing over sparse datasets would time out since the HBase data store was tuned to prioritize writes.
In general, large-scale monitoring systems cannot handle large-scale analysis in real time because the query performance is too slow. After evaluating and rejecting several disk-based and existing in-memory cache solutions, we turned our attention to writing our own in-memory TSDB to power the health and performance monitoring system at Facebook. We presented “Gorilla: A Fast, Scalable, In-Memory Time Series Database” at VLDB 2015 and recently open-sourced Beringei, an high-performance time series storage engine based on this work.
Beringei is different from other in-memory systems, such as memcache, because it has been optimized for storing time series data used specifically for health and performance monitoring. We designed Beringei to have a very high write rate and a low read latency, while being as efficient as possible in using RAM to store the time series data. In the end, we created a system that can store all the performance and monitoring data generated at Facebook for the most recent 24 hours, allowing for extremely fast exploration and debugging of systems and services as we encounter issues in production.
Data compression was necessary to help reduce storage overhead. We considered several existing compression schemes and rejected the techniques that applied only to integer data, used approximation techniques, or needed to operate on the entire dataset. Beringei uses a lossless streaming compression algorithm to compress points within a time series with no additional compression used across time series. Each data point is a pair of 64-bit values representing the timestamp and value of the counter at that time. Timestamps and values are compressed separately using information about previous values. Timestamp compression uses a delta-of-delta encoding, so regular time series use very little memory to store timestamps.
From analyzing the data stored in our performance monitoring system, we discovered that the value in most time series does not change significantly when compared to its neighboring data points. Further, many data sources only store integers (despite the system supporting floating point values). Knowing this, we were able to tune previous academic work to be easier to compute by comparing the current value with the previous value using XOR, and storing the changed bits. Ultimately, this algorithm resulted in compressing the entire data set by at least 90 percent.
We foresee two primary use cases for Beringei. First, we have created a simple, sharded service and reference client implementation that can store and serve time series query requests. Second, Beringei can be used as an embedded library to handle the low-level details of efficiently storing time series data. Using Beringei in this way is similar to RocksDB — Beringei can be the high-performance storage system underlying other performance monitoring solutions.
Beringei used as a library has the following features:
While embedding Beringei directly into another TSDB is one way to use it, we have included a reference service implementation. This all-in-one implementation allows one to stand up a horizontally scalable, sharded service.
Beringei is part of our monitoring infrastructure at Facebook. Having a fast and scalable storage tier is important to powering the real-time response our engineers expect from the monitoring system. Data is available to query instantly after a successful put request; the delay between a counter being written to Beringei and available for consumption is approximately 300 microseconds, and our p95 server response time for a read request is approximately 65 microseconds. Compared with our old disk-based system, Beringei’s in-memory system is several orders of magnitude faster in both read and write performance. Additionally, Beringei works with our automated detection system, which observes many millions of time series to detect anomalies and raise alerts.
Beringei currently stores up to 10 billion unique time series and serves 18 million queries per minute, powering most of the performance and health monitoring at Facebook while enabling our engineers and analysts to make decisions quickly with accurate, real-time data.
By sharing this technology more widely, we hope to collaborate more closely with industry and academia and integrate new solutions into our monitoring systems.