Cloudflare was down

claudex · 2025-06-12T20:05:06 1749758706

> Cloudflare’s critical Workers KV service went offline due to an outage of a 3rd party service that is a key dependency.

So they depend on GCP for (some of) their services

its-kostya · 2025-06-12T21:41:08 1749764468

If that is true, and there is no other BGP shenanigans, then I suspect this dependency will not be around for long

yencabulator · 2025-06-13T02:35:46 1749782146

From the article:

> Workers KV is in the process of being transitioned to significantly more resilient infrastructure for its central store: regrettably, we had a gap in coverage which was exposed during this incident.

beastman82 · 2025-06-12T21:54:51 1749765291

My WAG is it comprises 95% of the company infrastructure

IX-103 · 2025-06-12T23:12:47 1749769967

I heard that it was a "mandatory dependency" to mitigate "insider risk" or something. There's no way it's going anywhere. Odds are they'll just enforce even slower rollouts "to catch things early".

SahAssar · 2025-06-13T12:55:14 1749819314

WAG = wild-assed guess?

tempaccount420 · 2025-06-13T15:21:17 1749828077

Maybe beastman82 is wagging his tail?

pizzafeelsright · 2025-06-12T22:45:59 1749768359

ceo just said not for long

asteroidburger · 2025-06-12T21:48:36 1749764916

Sub-processor pages are an easy way to verify that sort of thing.

https://www.cloudflare.com/gdpr/subprocessors/cloudflare-ser...

reimertz · 2025-06-12T20:10:56 1749759056

wrote a similar comment - good to know for the future.

voxadam · 2025-06-12T23:01:41 1749769301

> So they depend on GCP for (some of) their services

Google denies they had any outages.

https://x.com/Google/status/1933246051512644069

https://nitter.net/Google/status/1933246051512644069

IX-103 · 2025-06-12T23:09:53 1749769793

They can say that, but any one of their customers knows it's not true.

hinkley · 2025-06-13T01:26:35 1749777995

Is this the ol' "100% down for 3% of your customers" scenario?

yencabulator · 2025-06-13T02:36:37 1749782197

https://status.cloud.google.com/incidents/ow5i3PPK96RduMcb1S...

mirashii · 2025-06-13T01:43:52 1749779032

Come on, linking four hour old tweets instead of their actual service dashboard where they clearly state there was an outage.

voytec · 2025-06-13T01:17:06 1749777426

Weaseling out of SLA/SLO payments.

koliber · 2025-06-12T19:07:39 1749755259

https://downdetector.com/ is showing outages at many major companies including Google, CloudFlare, AWS and more.

Word on the street is that there are large BGP routing issues behind all of this.

cogman10 · 2025-06-12T19:13:04 1749755584

Would make sense. I think the last time I saw this sort of thing it was BGP causing a bunch of traffic to route through Iran or china IIRC.

nijave · 2025-06-12T19:28:36 1749756516

There was also an older instance with China https://www.cyberdefensemagazine.com/experts-detailed-how-ch...

koliber · 2025-06-12T19:16:39 1749755799

I vaguely recall that incident. But it did not feel like it affected this many services.

At the same time I have not noticed anything being down firsthand. I am in Europe.

cogman10 · 2025-06-12T19:23:37 1749756217

Here's the case [1]. Looks like they targeted a single /24 so that's likely why it wasn't a bigger issue.

[1] https://bishopfox.com/blog/bgp-hijacking-technical-post-mort...

NooneAtAll3 · 2025-06-12T19:23:06 1749756186

so this is related to Israel's escalation that everyone is expecting?

CoopaTroopa · 2025-06-12T19:41:32 1749757292

The Pentagon Pizza Report has been having a lot of activity the past 24 hours. Maybe just a coincidence

Animats · 2025-06-12T20:03:42 1749758622

Internet Health Report is reporting "No data to show".

[1] https://www.ihr.live/

ramesh31 · 2025-06-12T19:16:32 1749755792

Anthropic down/degraded as well. Time to go for a walk.

jerrygoyal · 2025-06-12T18:49:42 1749754182

GCP is also down https://news.ycombinator.com/item?id=44260810

tete · 2025-06-12T19:47:48 1749757668

When being down scales. :D

ipsum2 · 2025-06-12T18:56:33 1749754593

Odd coincidence. Wonder if Cloudflare uses GCP?

ikiris · 2025-06-12T19:00:43 1749754843

It's likely their auth infra based on what the Google outage is

devmor · 2025-06-12T19:05:42 1749755142

What do you mean by this? The Google outage is a widespread outage of most GCP services.

pageandrew · 2025-06-12T19:08:12 1749755292

Google is claiming the root cause is with some of their central IAM services, which would have a cascading effect to the rest of their services.

devmor · 2025-06-12T19:11:53 1749755513

Where did you see this information? Was it on a social media channel? I do see the IAM services in the list of affected services in the incident report.

tom1337 · 2025-06-12T19:13:12 1749755592

check https://status.cloud.google.com/incidents/ow5i3PPK96RduMcb1S...

> Multiple GCP products are experiencing impact due to Identity and Access Management Service Issue

ikiris · 2025-06-12T20:31:42 1749760302

Scroll up. Its literally in this HN comment section highly upvoted.

devmor · 2025-06-13T06:50:04 1749797404

It was not when I posted the reply that you are replying to 2 hours after I posted it.

ikiris · 2025-06-12T20:33:17 1749760397

The comment was self explanatory, and no, it wasn't a widespread GCP outage. Most everything was up except for GCS and firebase, and later on identity stuff started causing cascading issues but not when this was posted.

zerd · 2025-06-12T21:08:33 1749762513

> it wasn't a widespread GCP outage.

If this wasn't widespread, what is?

Incident affecting API Gateway, Agent Assist, AlloyDB for PostgreSQL, Apigee, Apigee Edge Private Cloud, Apigee Edge Public Cloud, Apigee Hybrid, Cloud Data Fusion, Cloud Firestore, Cloud Logging, Cloud Memorystore, Cloud Monitoring, Cloud Run, Cloud Security Command Center, Cloud Shell, Cloud Spanner, Cloud Workstations, Contact Center AI Platform, Contact Center Insights, Data Catalog, Database Migration Service, Dataform, Dataplex, Dataproc Metastore, Datastream, Dialogflow CX, Dialogflow ES, Google App Engine, Google BigQuery, Google Cloud Bigtable, Google Cloud Composer, Google Cloud Console, Google Cloud DNS, Google Cloud Dataflow, Google Cloud Dataproc, Google Cloud Pub/Sub, Google Cloud SQL, Google Cloud Storage, Google Compute Engine, Identity Platform, Identity and Access Management, Looker Studio, Managed Service for Apache Kafka, Memorystore for Memcached, Memorystore for Redis, Memorystore for Redis Cluster, Persistent Disk, Personalized Service Health, Pub/Sub Lite, Speech-to-Text, Text-to-Speech, Vertex AI Search

ikiris · 2025-06-13T00:09:19 1749773359

Our entire infra in GCP stayed up just fine, we just couldn't manage anything. IDK what to tell you. Many of the things you list here were not down at all.

mirashii · 2025-06-13T01:46:22 1749779182

That it wasn’t down for you does not mean it wasn’t down for others or even almost everyone. Certainly, Google wouldn’t have listed the services as having an outage if nobody was impacted. You can’t extrapolate from “works for me” to “it must have been working for everyone”.

ikiris · 2025-06-13T04:36:53 1749789413

Dude, I literally was an SRE there. I'm well aware of how this stuff works.

If some of those things listed had actual widespread outages, it would have been much much worse.

solardev · 2025-06-13T09:35:42 1749807342

I don't understand your argument? Wasn't GCP's own status page calling them outages? Some of our upstream providers (who use GCP) were definitely affected and down.

As a former SRE there, is "widespread outage" a specific, special kind of classification that's not obvious to the public just by looking at the status page...? Or what do you mean?

doritosfan84 · 2025-06-13T01:49:39 1749779379

So weird to argue when google themselves listed these as having an outage.

artursapek · 2025-06-12T22:54:26 1749768866

Their KV store was definitely down.

neo_doom · 2025-06-12T18:57:25 1749754645

Yeah this is going to be a problem. I haven't seen an issue this widespread across so many services in a while.

tete · 2025-06-12T19:48:30 1749757710

Seems to be semi regular now that everyone puts all their eggs in only a few baskets.

solardev · 2025-06-13T09:38:27 1749807507

I gotta say, it's kinda nice when that happens... work just kinda pauses for everyone, from providers to customers. It kinda feels like a national holiday, and everyone downstream from the affected cloud can just kinda sit back and relax cuz there's nothing they can do anyway except wait.

When it's your own outage, it's all-hands-on-deck panic mode. When it's half the internet down, it's no longer your problem, lol

prauscher · 2025-06-13T13:48:18 1749822498

I guess it depends on what your company's acceptable level of downtime is. If you're like Cloudflare (who handled this well), you take this as a sign to build fault tolerance around your 3rd party providers.

If your application is mission-critical, downtime is anything but a holiday.

paxys · 2025-06-12T18:55:35 1749754535

Let me guess, someone pushed out a bad BGP config?

CSMastermind · 2025-06-12T19:20:51 1749756051

For an outage this large and widespread that would have to be the main culprit.

tete · 2025-06-12T19:44:45 1749757485

Big blog post about how they saved the internet upcoming. ;)

Currently down, but reference: https://blog.cloudflare.com/the-ddos-that-almost-broke-the-i...

aranchelk · 2025-06-12T18:26:48 1749752808

Seems to be affecting functionality of their "Verify you are human" dialogs as well as Workers.

clairegraham · 2025-06-12T18:32:02 1749753122

Yep, KV is broken too. Any worker that depends on KV is throwing exceptions. I was able to get into the dash, but it's very slow. Error rates started to go up significantly around 18:00 UTC.

Edit: The CF status page has acknowledged it's a broad outage across many services: https://www.cloudflarestatus.com/incidents/25r9t0vz99rp

aranchelk · 2025-06-12T18:35:11 1749753311

After many tries I also got into the dashboard, but it's not that usable, constant error pop-ups.

bgwalter · 2025-06-12T19:20:22 1749756022

It does. Another question is why do we get these dialogues always from Cloudflare and never from Akamai in the first place?

bgwalter · 2025-06-12T19:29:05 1749756545

Downvoting this comment and flagging the submission does not address the serious issue. These verification dialogues make the Internet unusable.

perching_aix · 2025-06-12T19:47:22 1749757642

Nor does venting about it in unrelated threads, or asserting your opinion as fact.

scubbo · 2025-06-12T20:00:13 1749758413

It's not much of a reach to go from "discussion about impact on human-verification dialogs" to 'discussion about human-verification dialog policy". This isn't an incident-management channel, it's a discussion forum - tangents are fine!

bgwalter · 2025-06-12T20:00:29 1749758429

I complained in the apnews.com thread, because the apnews.com verification, which is annoying by itself, did not work at all this time. That is hardly unrelated.

pier25 · 2025-06-12T18:51:40 1749754300

They've changed the title to "Broad Cloudflare service outages"

ourmandave · 2025-06-12T19:04:45 1749755085

Is it coincidence that there's a Scheduled Maintenance in Tokyo for 18:00 UTC in progress, and the problems started at 18:19 UTC?

alexcroox · 2025-06-12T19:54:06 1749758046

Unrelated, they have a few services that rely on GCP which is down. Still, I imagine the people working on the maintenance for Tokyo turned white during that job worried it was caused by them...

perching_aix · 2025-06-12T19:06:32 1749755192

Guess we'll find out from the postmortem. Always the silver lining with these, get to learn from and enjoy a good writeup.

solarmist · 2025-06-12T19:44:15 1749757455

Do these get posted publicly?

solardev · 2025-06-13T09:41:43 1749807703

Yeah. Cloudflare writes some of the best ones in the industry, and they're very enjoyable to read: https://blog.cloudflare.com/tag/post-mortem/

I really do appreciate the transparency and ownership that comes with these. We all fuck up, but a lot of companies would rather hide their mistakes than own up to them. Cloudflare's approach makes me trust them more.

perching_aix · 2025-06-12T19:44:37 1749757477

> Do these get posted publicly?

Yes.

jonfw · 2025-06-12T19:34:28 1749756868

There is always scheduled maintenance on that page, so that's not much of a signal in my experience

bhaney · 2025-06-12T19:06:46 1749755206

Probably

sidcool · 2025-06-12T23:18:12 1749770292

Cloudflare's lava lamps are dimming.

poorman · 2025-06-12T20:52:33 1749761553

Can’t wait to read this post-mortem. Seems odd that a Google Cloud outage would bring down Cloudflare services.

doritosfan84 · 2025-06-12T18:41:40 1749753700

They updated the incident noting that it's not just authentication affected.

pier25 · 2025-06-12T20:10:38 1749759038

Our Workers apps are up again

edit:

It works in the US but EU customers are still reporting our services as down.

edit:

EU customers are reporting ok

pier25 · 2025-06-12T18:42:57 1749753777

Workers KV has been down for like +30mins. This is impacting us seriously.

Their API is down too.

Amazing that something can impact their whole infrastructure like this given how much redundance they have.

kenhwang · 2025-06-12T20:07:54 1749758874

From their incident page (https://www.cloudflarestatus.com/incidents/25r9t0vz99rp):

> Cloudflare’s critical Workers KV service went offline due to an outage of a 3rd party service that is a key dependency.

I bet that 3rd party service is GCP.

I would be pretty pissed if I were a CF customer that used Workers KV for redundancy because it was heavily marketed as running on CF data centers.

nijave · 2025-06-12T19:26:33 1749756393

>can impact their whole infrastructure

CDN and WAF seem to be working fine. I think CF rushed a lot of newer services out without the reliability some of their older/core services enjoy

stri8ted · 2025-06-12T19:13:27 1749755607

The same is true for Google.

PeterStuer · 2025-06-13T06:41:52 1749796912

So both Cloudflare authentication as well as Google's identity systems suffered major dowtime yesterday. Are there technical dependecies between these?

tom1337 · 2025-06-13T08:42:23 1749804143

Cloudflare doesn't say this directly but in their blog they've written

> The cause of this outage was due to a failure in the underlying storage infrastructure used by our Workers KV service, which is a critical dependency for many Cloudflare products and relied upon for configuration, authentication and asset delivery across the affected services. Part of this infrastructure is backed by a third-party cloud provider, which experienced an outage today and directly impacted availability of our KV service.

vimwizard · 2025-06-12T19:03:32 1749755012

proxy seems available in general, must just be local to workers because only one of my sites going thru ZT tunnel with identity access rules is affected

ineedaj0b · 2025-06-12T18:51:24 1749754284

solar flare?

CoopaTroopa · 2025-06-12T19:09:31 1749755371

No, Cloudflare.

joduplessis · 2025-06-12T18:59:30 1749754770

Hopefully they also publish the prompt that did this.

daxfohl · 2025-06-12T19:09:43 1749755383

They should make the AI lead the postmortem.

tough · 2025-06-12T19:01:55 1749754915

i was thinking about this too

vsgherzi · 2025-06-12T20:03:29 1749758609

They're just moving fast and breaking things 100x faster. Who cares what code does just vibe it all away /s

artursapek · 2025-06-12T22:57:54 1749769074