Under the hood: Improving messages

Starting today, we’re rolling out improved features and a new look for messages. Behind the scenes, we’ve also been improving the reliability of messages across the site. To address issues with disconnection, incorrect message counts, and missed and duplicated messages, we recently undertook an effort we called the “mercury project.”

The mercury project

We knew the mercury project would be a significant effort because Facebook messages, and especially its chat interface, is embedded in a web browsing experience. As such, it poses some novel challenges as a browser application: it must be highly reliable, yet still be able to load and initialize quickly and incrementally.

A small team set out to tackle these problems, which required some changes not just to Facebook messages, but also to the way that we develop and deliver our software. Eventually, we reached the first milestone for the mercury project: replacing the chat and messages jewel implementations with a more reliable version. With other teams such as the real time infrastructure and messaging infrastructure teams also contributing substantial improvements, we were able to improve reliability and grow usage.

We didn’t set out to build or adopt a framework; the mercury project has always been about product. But we did have some guiding principles that informed our approach. The chief among these were: modular design, test-oriented development, and functional programming style.

Modularity

Before we started the mercury project, we would often accidentally introduce regressions into messages while trying to make improvements. These regressions would usually track back to the way that we were managing JavaScript dependencies. At this time, Facebook had a very simple system for managing JavaScript dependencies: you would say that a given file provides a component and then say that another file requires that component.

Now, the meaning of “requires” in this system was not inituitive. It did not mean that the required file would load before your file. It just meant that your file and the other file would both be loaded by the time time the document was ready. There was not even a guarantee that a given file would run only once. Moreover, everything was bound together in the global scope.

The worst thing about this system was that developers would make mistakes because it was easy to write code that would sometimes work if dependent files happened to be evaluated in the assumed order. In fact, because of the way we package script resources, it was especially likely to work on a developer’s machine and break when it hit production! This made it difficult to find these bugs before they were already in front of people on Facebook.

So the first step for the mercury project was to adopt a system similar to CommonJS, where every file is a module, and where modules are not evaluated until their dependent modules have been defined. We had to port a bunch of code to the new module system for mercury, but this was well worth it, and proved to be a game-changer.

Test-oriented development

The flip side of our embrace of modularity was our emphasis on unit testing. But our test-oriented development style did more than just help clarify the lines between modules—it also allowed us to focus on code quality by making it easy for us to refactor without causing regressions.

At this point, everyone knows they need to incorporate unit testing into their software development process, but it can still feel like eating your vegetables. Well, the mercury team likes vegetables, especially when properly prepared. Every engineer on the team, and some primarily so, not just wrote tests, but worked to improve the testing framework and the test scaffolding for the mercury code. Usage of the mercury test framework quickly spread across Facebook engineering, and now we are working to contribute it to open source.

The mercury codebase has over 85% test coverage. We’re still making ambitious architectural changes even as we start our public release. Since we first shipped mercury chat, we’ve introduced very few regressions as we’ve added new features, fixed bugs, and made changes to support this new version of messages.

The team only accepts patches that are accompanied by unit test changes. This isn’t a rule or a procedure, it’s just common sense. The mercury team’s consensus about the importance of testing reflects our culture of thoughtful diff review and constant evolution of our architecture.

Functional vs OOP

Finally, the mercury team chose to embrace the benefits of functional programming. This style is best embodied by the great node.js project, but we made a point of emulating that structure where appropriate. By replacing Object Oriented Programming (OOP) patterns with functional ones, the mercury team was able to build a highly decoupled system capable of fast, incremental loading. The change that the mercury team embraced was to switch from modeling domain objects as instances of a class prototype, to modeling them more like structured data returned from a database. For instance, in conventional JavaScript, you often see stuff like this:

var thread = Threads.get(id);

var participantNames = thread.getParticipantNames();

// Render participantNames

This seems fine at first, but in real-world apps like Facebook messages, we need to be able to receive a message without necessarily knowing whom it’s from. So this means that at the moment the thread exists, we may not know the names of the participants. In the end, the code above would actually need to be rewritten like:

var thread = Threads.get(id);

thread.on(‘load’, function() {

var participantNames = thread.getParticipantNames();

});

with the usual hand-wringing about what should happen if the thread has already loaded by the time the callback is registered.

Instead the mercury code looks like this:

Threads.get(id, function(thread) {

Participants.getNames(

thread.participantIDs,

function(participantNames) {

// Render participantNames

}

);

});

The parameter that’s passed to each continuation is just plain JavaScript data without any behavior. This pattern may seem like a simple syntactic transformation at first, but in the end, it changes everything. It trivializes caching, and it makes it so that initialization and update share the same code path. Ultimately, this is the hallmark of a realtime system: there’s no special treatment for initial data. Anything can be updated incrementally.

1% done

A lot of thought and care went into this new version of messages, and we hope you’ll like it. There’s so much more we can do to make the world more open and connected, and to make our messaging system more reliable and more flexible. In the meantime, we’d love to hear your feedback, so please use the Feedback link in the Actions menu to let us know what you think.