Facebook explains the backbone shutdown behind its global outage on Monday

And why it took hours to fix

And why it took hours to fix

by Richard Lawler

Updated Oct 5, 2021, 6:29 PM UTC

Illustration by Alex Castro / The Verge

Richard Lawler is a senior editor following news across tech, culture, policy, and entertainment. He joined The Verge in 2021 after several years covering news at Engadget.

The massive outage that took down Facebook, its associated services (Instagram, WhatsApp, Oculus, Messenger), its platform for businesses, and the company’s own internal network all started with routine maintenance.

According to infrastructure vice president Santosh Janardhan, a command issued during maintenance inadvertently caused a shutdown of the backbone that connects all of Facebook’s data centers, everywhere in the world.

That by itself is bad enough, but as we’ve already explained, the reason you couldn’t use Facebook is that the DNS and BGP routing information pointing to its servers suddenly disappeared. According to Janardhan, that problem was a secondary issue, as Facebook’s DNS servers noted the loss of connection to the backbone and stopped advertising the BGP routing information that helps every computer on the internet find its servers. The DNS servers were still working, but they were unreachable.

The lack of network connections and loss of DNS cut off the servers from engineers trying to fix the issue and disabled many of the tools they normally use for repair and communication — just as we heard yesterday.

The blog post notes that the engineers had additional hurdles due to the physical and system security around this crucial hardware. Once they did “activate the secure access protocols” (this is apparently not a code word for “cut open the server door with an angle grinder), they were able to get the backbone online and slowly restore services in gradually increasing loads. That’s part of the reason it took some people longer to get access back yesterday, as the power and computing demands of turning everything on at once might have caused more crashes.

So that’s it. No conspiracy theories, and no techs taking axes to secure facilities to turn Mark Zuckerberg’s baby back on. Just a bug in a command that an audit tool missed, and for six hours, services that connect billions of people disappeared.

Related:

Follow topics and authors from this story to see more like this in your personalized homepage feed and to receive email updates.

Richard Lawler

Most Popular

The impossible dream of the universal remote

The impossible dream of the universal remote

Amazon security research reportedly led to the White House’s Anthropic Fable ban

Amazon security research reportedly led to the White House’s Anthropic Fable ban

Microsoft hasn’t ruled out spinning off Xbox

Microsoft hasn’t ruled out spinning off Xbox

Bose’s latest QuietComfort Ultra are $70 off, marking a new low price

Bose’s latest QuietComfort Ultra are $70 off, marking a new low price

My first 24 hours with Siri AI on the Mac

My first 24 hours with Siri AI on the Mac

Never Post’s Mike Rugnetta on the creative process and the value of reliable power

Never Post’s Mike Rugnetta on the creative process and the value of reliable power

The impossible dream of the universal remote

The impossible dream of the universal remote

The impossible dream of the universal remote

David PierceJun 14

Amazon security research reportedly led to the White House’s Anthropic Fable ban

Amazon security research reportedly led to the White House’s Anthropic Fable ban

Amazon security research reportedly led to the White House’s Anthropic Fable ban

Terrence O'BrienJun 13

Microsoft hasn’t ruled out spinning off Xbox

Microsoft hasn’t ruled out spinning off Xbox

Microsoft hasn’t ruled out spinning off Xbox

Terrence O'BrienJun 13

Bose’s latest QuietComfort Ultra are $70 off, marking a new low price

Bose’s latest QuietComfort Ultra are $70 off, marking a new low price

Bose’s latest QuietComfort Ultra are $70 off, marking a new low price

Sheena VasaniJun 13

My first 24 hours with Siri AI on the Mac

My first 24 hours with Siri AI on the Mac

My first 24 hours with Siri AI on the Mac

Antonio G. Di BenedettoJun 13

Never Post’s Mike Rugnetta on the creative process and the value of reliable power

Never Post’s Mike Rugnetta on the creative process and the value of reliable power

Never Post’s Mike Rugnetta on the creative process and the value of reliable power

Terrence O'BrienJun 13

Jun 14

Solid-state batteries still aren’t ready, but gels are

Jun 14

The impossible dream of the universal remote

Jun 14

Conclave is the sound of a NYC summer block party

Jun 13

My first 24 hours with Siri AI on the Mac

Jun 13

My yard is dying, so I made an app for that