Does Firebase allow you to set up a billing limit, or is it one of those excitin...

danpalmer · 2025-01-17T01:16:04 1737076564

I understand joking about this and the possible downsides, but there are good reasons why cloud services are set up this way: 1) it's impossible to bill at scale and exactly cut off service usage globally when a target is hit, and 2) most companies don't want a single point of failure like a misconfigured budget to bring down their production services.

The answer is probably quota management, where a limit on the number of VMs or size of database or something, caps the worst-case scenario, and where it's arguably easier to monitor an approach to that quota as it's more granular than billing.

Personally I think cloud providers could have an explicit "hobby mode" that limits certain things in such a way that the spend can't run away like this, with the trade-off that they're not really production grade in a sense, but then again those accounts are probably worth anything so I understand not building that out. That said, whenever I've seen one of these things happen, it always ends with "FooCloud said that as a one-time gesture of good will they would write off this accidental usage", so while briefly scary, maybe this is the system working fine overall?

jsheard · 2025-01-17T01:19:57 1737076797

Azure does actually have the ability to force-kill your resources when you hit a certain billing threshold, but only for things like free trials and student accounts. The instant you switch to a regular pay-as-you-go account that functionality disappears for uh... reasons.

https://learn.microsoft.com/en-us/azure/cost-management-bill...

danpalmer · 2025-01-17T01:35:31 1737077731

That's good, it makes sense it would be for only those sorts of accounts. Imagine building a billing system that could always do that, any time you accept a request, you need to check against a database if the user has hit their billing limit or if you can charge them for it. It obviously can't work. I imagine the way MS make it work is probably to slow down resource consumption for those accounts dramatically, and then just take the hit on overspend, knowing that it will be almost nothing.

jsheard · 2025-01-17T01:38:36 1737077916

I understand that a precise billing limit is probably impossible at that scale, but if they have the ability to send you an email some time after you go over a limit then they have the ability to automatically pull the plug at the same time rather than waiting for you to notice the email and do it yourself. They just don't want to.

I'm sure people would accept a best-effort system where setting a billing limit for $100 means you may be billed $140 because your spending overshot the limit before the system noticed. It still beats the alternative of waking up to a $20,000 bill out of nowhere.

danpalmer · 2025-01-17T01:56:27 1737078987

So I've stored some data in a bucket, the storage time goes over my budget, what do you do now? Just delete the data? I think most people would think that's a terrible outcome.

Let's say the answer is yes, you just delete all the data as I can no longer afford it... delete operations are billable, so do you charge the user for those deletes or not?

Let's say the answer is yes, and you bill the deletes. What happens if too many deletes are required and suddenly you're at 2x the bill cap? Now you can't document the bill cap as being able to go over by up to 1.5x. This may be unlikely, but customers use cloud services in weird and wonderful ways.

This is just one resource type, there are many different resource types on a typical cloud provider, each with multiple axes of billing, each of which has hard decisions to not just be made, but documented and communicated to customers in such a way that they understand the impact. Oh and also it's the "I just put my credit card in and go" crowd who you have to explain it to, who aren't engaging in sales conversations, not those on business contracts who might actually listen or read the documentation.

It's not at all obvious to me that this is preferable to just having someone look at these incidents on a case by case basis and seeing who should be refunded.

thot_experiment · 2025-01-17T02:18:13 1737080293

This is an insane take, there are many options here. The idea that the reason to structure billing this way is anything OTHER than that it's the most profitable way to do things is ludicrous. It doesn't matter if you can construct some other plausible reason, as long as it's the most profitable way to operate why would I believe the cause to be something other than profit maximization.

danpalmer · 2025-01-17T02:34:14 1737081254

What alternative do you suggest? I'm not saying providers should delete data when hitting a cap, but rather that this is one example of why you can't cap spend.

It is possible to combine all the billing axes for things like this so that, but when you do you get Dropbox, or Google Drive. The explicit value proposition of cloud hosting is paying for what you use, and generally granular services are lower margin and more commoditised than higher level services.

thot_experiment · 2025-01-17T03:12:32 1737083552

You could just throw an error, freeze services and do whatever "let someone handle this on a case by case basis". It's absurd to suggest that the customer doesn't want to have the option to prevent their spend from growing by multiple orders of magnitude without a human in the loop. Sure maybe deletes are a billable action (also absurd and fake), but having the options to say "hey I can't spend more than X, cut my service if my spend + cleanup would cost more than X" is absolutely doable and something many people would want.

danpalmer · 2025-01-17T04:03:49 1737086629

If you think that deleting all data (blob storage, block storage, VM states, caches, etc) is preferable to a surprise bill, then I don't think there's anything we can debate here.

thot_experiment · 2025-01-17T05:18:49 1737091129

I agree that there isn't anything we can debate here, but it's because you're making straw men. There's a middle ground here between getting charged 4 orders of magnitude more than you expected and having all of your data deleted that you're obtusely refusing to consider.

plorkyeran · 2025-01-17T03:14:20 1737083660

If writing data to a bucket would push the monthly cost for data storage over the budget, the write should fail, not succeed and then delete something else to get the data back under the limit. Why would you even consider doing it that way unless you're specifically going out of your way to write the worst possible form of billing cap?

kccqzy · 2025-01-17T14:29:04 1737124144

That's clearly not user friendly. Users cannot predict the amount of overshoot. So we are back to square one. The user could wake up to a $1,000 bill or $10,000 bill and all data deleted, and the cloud provider can just say "oh we run our billing limit enforcer job on an hourly cron schedule but your account requested many machines very quickly so you still incurred $10,000 worth of charges." A precise billing limit is impossible, and a fuzzy billing limit with a precise error bound is the same thing and also impossible. Now we are back to a fuzzy billing limit with an unknown error bound.

theodric · 2025-01-17T01:44:09 1737078249

I strongly disagree that this is in any way defensible, despite that both Google and AWS do it. You should be able to set a limit, and even if they can't cut off exactly at the limit, they should be able to cut off when you hit (say) 2* your limit, or in the absence of a limit, perhaps issue progressively louder warnings until you hit 10* your usual spend, with the ability for you to give affirmative consent to uncap if you're hoping your app will suddenly take off. Nobody needs to lose their house or their life savings to some teracorp just because "it's hard."

adriand · 2025-01-17T01:58:21 1737079101

I totally agree. You should be able to choose to have things shut down rather go past a limit. I don’t have a single client that would choose $70k versus, say, a couple hours downtime while we figure out what is happening with a resource that is going crazy.

This is where I question the risk of serverless. Although, now that I think about it, while my one client’s EC2 instances are essentially capped in terms of capacity and spend, we also use S3 etc. I suppose it would be entirely possible to accidentally write a huge amount of data to S3. But again I would rather get warnings from the app that writing to S3 has failed due to limits than get a huge bill!

danpalmer · 2025-01-17T02:16:05 1737080165

But you also pay for data at rest in S3. Should S3 stop storing that data while you figure things out? Should they bill the customer for the deletion of that data as they normally would?

I don't disagree on wanting this feature, but it's just not something that's possible to implement in totality when you dig into the details.

adriand · 2025-01-17T02:34:53 1737081293

Could it work based on deviance from norms? Like a setting where you say, I’m okay with up to two standard deviations from our typical write volume but that’s it?

Or, alternatively, just let me set the size of the volume. Treat it like a hard drive?

In terms of your point about the data at rest, part of the issue is that we get a bill once per month, and that’s probably when we would notice it. Of course there is probably a Cloudwatch alarm or something we could set (I assume) but there’s so many damn services…

danpalmer · 2025-01-17T04:12:57 1737087177

This is sort of what quotas enforce, and most cloud services have quotas. You ask for an increased quota if you reach the current one, or the system might sometimes automatically increase it if it thinks you will need it.

This all trades off though against the possibility of bringing down one of your customers when they are hitting peak sales on their website, which is a very bad look.

ndriscoll · 2025-01-17T01:50:24 1737078624

Hobby mode is using something like Hetzner instead of messing about with "cloud native" nonsense. For the $50/mo they expected, they could get an absolute monster like a Ryzen 5 3600. Cloud native stuff is for when you need to be SOC2 compliant or something and want to minimize access to everything. Clouds charge you worse than enterprise pricing (enterprises negotiate discounts).

thaumasiotes · 2025-01-17T01:20:48 1737076848

> That said, whenever I've seen one of these things happen, it always ends with "FooCloud said that as a one-time gesture of good will they would write off this accidental usage", so while briefly scary, maybe this is the system working fine overall?

I suspect that if they tried to sue you over an unpaid bill, they'd lose over issues of proper notice to you. You can't actually bill someone for a service they didn't want and didn't ask for; that's why the wash-your-car-in-traffic people are considered scammers.

danpalmer · 2025-01-17T01:39:09 1737077949

I'm not a lawyer, maybe it wouldn't hold up in court, but I imagine that good will is the driving force, even if way down the line they may not manage to actually reclaim the money. My experience of working with cloud providers is that they are almost always happy to take a short term loss (extended trial, more test resources, refunds for accidental usage, etc) to get an account that is going to grow and stick with the provider. This does make sense, customer acquisition and churn are expensive, and recurring revenue is great.

trog · 2025-01-17T04:27:25 1737088045

> it's impossible to bill at scale and exactly cut off service usage globally when a target is hit,

How much does the problem change if you remove the word 'exactly' from here, though?

Like, I don't mind if I end up paying a couple of extra bucks. Or even tens of bucks! Some people might not mind hundreds or thousands, or even more depending on their scale.

But blowing out several orders of magnitude past my usual monthly spend is the problem I'd like to avoid.

esperent · 2025-01-17T01:27:12 1737077232

> it's impossible to bill at scale and exactly cut off service usage globally when a target is hit

This seems unlikely to me. What is the technical reason for this?

danpalmer · 2025-01-17T02:13:56 1737080036

To do this you would need to check in with a central billing service every time you want to charge, and that central billing service must keep a consistent view per customer to ensure it can't spend over the cap.

This is not too hard if the billable event is, say, creating a VM. Creating the VM takes a few seconds, it's easy to add a quick API call to check billing. But what about the next hour, when you charge for the VM again? You now have the number of VMs checking in every hour (worst case, at the same time), and you need to handle all those hourly checkins consistently per customer.

That's still probably easy enough, but what if it's not a VM hour, but an API gateway request? Now on every single request you have to check in with the billing service. API gateways need to be very low latency, but you've just added a request to that process that possibly needs to go across the world to check in with the billing service running on another continent.

What if the billable resource is "database query seconds", and now you need to know how many seconds a query is going to take before you start it? Oh, and add the check in time to every database query. What if the billable resource is streaming video, do you check in on every packet you send out? What if it's CDN downloads, do you have every one of thousands of points of presence around the world all check in, even though the point of the product is to be faster than having a single far away delivery node?

There are bad workarounds for each of these, but they all mean either the cloud provider losing money (which, assuming a certain scale of customer, is too expensive), the customer over-spending (which assuming a certain scale, could still be waaay over their budget), or slowing down most services to the point that they functionally don't work anymore.

kg · 2025-01-17T04:30:04 1737088204

What I see in this thread is tons of people saying "the ideal of perfect billing cutoffs with no over-runs is impossible, which is why there are no billing cutoffs" even though I've also seen lots of people point out that - to simplify - something is better than nothing, here.

A $1k overrun past your billing cap is still way better than a $50k overrun - the cloud vendor is more likely to get paid in the end, and the customer is more likely to come away from the experience looking at it as an 'oops' instead of a catastrophic, potentially-finances-ruining 'i'm never touching this service again' incident.

There are plenty of really challenging problems in computer science and we solve them with compromises every day while hitting demanding targets. If a SSL certificate expires we expect it to stop working, and if it's revoked we expect the revocation to take effect eventually. But it becomes a situation where these guarantees benefit small companies and independent developers but we suddenly can't solve similar problems?

Fundamentally speaking if you can't afford to check against the billing cap every request, check every 10 requests. If 10 is too often, every 100. If 100 is too often, every 1000. Or check once per minute, or once per hour. Or compute a random number and check if it exceeds a threshold. The check can even be asynchronous to avoid adding intermittent latency to requests.

Any of these are better than nothing and would stop the bleeding of a runaway service incident. It's unrealistic to expect small companies and independent developers to have someone on-call 24/7 and it's also unrealistic to expect that if you sell them $100k worth of stuff they can't pay for that they'll actually pay you somehow.

collingreen · 2025-01-17T03:43:50 1737085430

All these arguments seem very much like throwing the baby out with the bathwater - I don't think we should pretend it makes sense to say "if we can't have perfect billing cutoff down to each individual api call we shouldn't have a billing cutoff at all". You've listed super achievable ways to prevent a $50/mo spend from ballooning to $70,000.

Additionally, it feels hollow to not have billing cutoff at the same rate as authorization would cutoff if they shut off my account.

danpalmer · 2025-01-17T04:10:13 1737087013

I understand, and I do think an approximate cut off would be good for some users, but I don't think it solves this problem for a few reasons. What constitutes a bill shock is wildly different between users. Is a $50 bill a shock? It is to me with an average AWS bill of $0. I don't think you can set absolute or percentage values that make sense, and you can't let it be configurable because this gets into issues like the SLAs on billing logs arriving, the overspend becomes the margin of error in the cloud provider's systems.

The other main issue is documenting this. Google Adwords I believe has an overspend concept, i.e. if you limit your billing to $100, they might still go over it. The problem is that it's limited to 2x your bill, which still bites people. I only know about this from reading HN and Reddit posts complaining about it!

ndriscoll · 2025-01-17T15:24:00 1737127440

You don't have the individual vms check in. You have the VM coordinator report how many vms are running and get back an affirmation that it can cache until the next reporting period that the total is not over budget. If over budget, coordinator begins halting services.

API gateways are similarly sending metrics somewhere. The coordinator can be the place to ingest that data and send the aggregated info to billing. If it gets back over budget, start halting endpoints. etc.

Or do it within the billing service, but fire off a shutdown notification to the coordinator of whatever service created a billing record if over budget. Same idea.

Basically, batch, amortize and cache work. Same as every computer problem. You establish some SLO for how much time your services can continue running after an overage has occurred, and if that's a couple minutes or whatever that will cut out like 99.99% of the impact in these stories.

danpalmer · 2025-01-18T01:24:20 1737163460

Solving this for any one resource type, or one billing axis, is absolutely achievable in the ways you've suggested.

Solving this across all resource types and billing axes however is a different problem. You can't cache the notion than a VM is under the billing cap for an hour if there's another service that push spend over the cap within that hour.

You're right that you could establish SLOs across everything and minimise the amount of monetary loss, in theory, but at scale (as some resource types necessarily bill infrequently, as customers are spending more per hour), I suspect even this breaks down.

Then there's still the issue of billing at rest. Do you shut off VMs? That might be an easy question to answer. Do you delete their storage though? Harder. Do you delete blob storage? Probably not, but you've got to swallow the cost of that until the customer decides to pay again.

rpcope1 · 2025-01-17T04:34:57 1737088497

AWS has a "hobby mode": they call it Lightsail.

stavros · 2025-01-17T01:04:18 1737075858

Apparently she had a $20 limit, but, if my calculations are correct, $70k is more than that, which seems odd.

whoisburbansky · 2025-01-17T01:07:57 1737076077

Somebody else pointed out that it's likely just an alert, not a hard limit, which checks out given Firebase documentation (https://firebase.google.com/docs/projects/billing/avoid-surp...), which has no mention of hard limits and explicitly warns you that an alert won't stop anything.

stavros · 2025-01-17T01:09:25 1737076165

Ah, oops...

layman51 · 2025-01-17T01:04:24 1737075864

They have a documentation page titled “Avoid surprise bills”[1] but I imagine it’s easy for some developers to skip over that.

[1]: https://firebase.google.com/docs/projects/billing/avoid-surp...

jsheard · 2025-01-17T01:08:22 1737076102

> A budget alert sends an email whenever your project's spending level hits a threshold that you've set. Budget alerts do NOT turn off services or usage for your app.

It's not an automatic hard-stop so you could still screw yourself over pretty badly with runaway spending.

unsnap_biceps · 2025-01-17T01:06:43 1737076003

https://firebase.google.com/docs/projects/billing/avoid-surp...

> We don't turn off services and usage because although you might have a bug in your app causing an increase in spend, you might just be experiencing unexpected positive growth of your app. You don't want your app to shut down unexpectedly when you need it to work the most.

Frankly, I don't see anything on that page that would actually prevent a surprise bill.

rezokun · 2025-01-17T01:09:42 1737076182

eg. OpenAI enforces strict limits until you spend a significant amount of money.

jasongill · 2025-01-17T01:29:45 1737077385

Once you do spend a significant amount of money, you'll find that the limits you can set yourself might not even work.

We have a multi-organization OpenAI account and I had set a $4k/month limit on one of the child orgs that was being used for a R&D project. Got billed ~$20k for the project one month and complained that it clearly allowed us to exceed our soft and hard limits. We were told that the limits (which you can set and act like they are a real thing) don't do anything if you have a child organization set up. They refunded us after some persuading and I know it's just normal growing pains for an organization that is undergoing rapid growth and maturity, but was still a little surprising that even the "hard limit" didn't do anything for us!

(Note that this was last year, so this bug is probably long fixed as they have redesigned their portal multiple times now)

danpalmer · 2025-01-17T01:40:23 1737078023

OpenAI functionally have one API, it's easy to limit spend on one API. It's much less easy to limit spend across hundreds of APIs, and resources that have ongoing cost (like VM hours).

mason55 · 2025-01-17T01:17:35 1737076655

My guess is that OpenAI’s margins are much lower so they aren’t in a position to forgive or have people skip out on big bills.

For Firebase, their costs are probably pretty marginal.

falcor84 · 2025-01-17T01:18:33 1737076713

From my experience, cloud LLMs are still being used by most systems as an "additional feature" with fallback to alternative basic functionality, or some other form of graceful degradation. On the other hand, there typically isn't a good fallback to having your main DB go down.