As a former CF employee, I'd say it's a mixed bag.
There are plenty of resources , yet it's somehow never enough. You do tons of pretty amazing things with pretty amazing tools that also have notable shortcomings.
You're surround by smart people who do lots of great work, but you also end up in incident reviews where you find facepalm-y stuff. Sometimes you even find out it was a known corner case that was deemed too unlikely to prioritize.
The last incident for my team that I remember dealing with there ended up with my coworker and I realizing the staging environment we'd taken down hours earlier was actually the source of data for a production dashboard, so we'd lost some visibility and monitoring for a bit.
I've also worked at Facebook (pre-Meta days) and at Datadog, and I'd say it was about the same. Most things are done quite well, but so much stuff is happening that you still end up with occasional incidents that feel like they shouldn't have happened.
There are plenty of resources , yet it's somehow never enough. You do tons of pretty amazing things with pretty amazing tools that also have notable shortcomings.
You're surround by smart people who do lots of great work, but you also end up in incident reviews where you find facepalm-y stuff. Sometimes you even find out it was a known corner case that was deemed too unlikely to prioritize.
The last incident for my team that I remember dealing with there ended up with my coworker and I realizing the staging environment we'd taken down hours earlier was actually the source of data for a production dashboard, so we'd lost some visibility and monitoring for a bit.
I've also worked at Facebook (pre-Meta days) and at Datadog, and I'd say it was about the same. Most things are done quite well, but so much stuff is happening that you still end up with occasional incidents that feel like they shouldn't have happened.