The idea that an app will experience stratospheric growth so suddenly that costs...

chuckadams · 2025-01-17T01:28:22 1737077302

That's every cloud provider. At this point, I think they're actively conspiring to not implement billing caps.

jjnoakes · 2025-01-17T03:26:20 1737084380

I use fly.io. I pre-paid for credits and if they run out, things shut down.

No affiliation, just happy to use a provider with actual caps.

danpalmer · 2025-01-17T01:49:47 1737078587

I used to think this until I tried architecting out how you'd build a billing cap. I recommend it as a design exercise. It's easy to build a bad billing cap that would slow down services and cause outages, but it's basically impossible to build a good billing cap.

ryao · 2025-01-17T04:33:10 1737088390

Oddly, they have no problem shutting things off when the limits of their free plan are exceeded:

https://firebase.google.com/docs/projects/billing/firebase-p...

I am not sure how they can do that, but cannot let people set their own limits on their paid plans.

chen_dev · 2025-01-17T04:45:31 1737089131

Limits Reached -> PubSub Notification -> Shutdown Sequence.

Because it's a free plan, the delay between 'limits reached' and actual shutdown only incurs the cost of providing the service during that brief period, not the potential liability of overcharging that might exist on a paid plan.

ffsm8 · 2025-01-17T06:05:15 1737093915

Is that really a problem though? Just don't bill beyond the cap then and leave the last few requests free, too.

Or write a disclaimer that the billing cap doesn't necessarily cut off at exactly that amount and that there might be an overcharge.

I am pretty sure most people would be okay with either of these options, we didn't need a perfect system, just one that works well enough

pclmulqdq · 2025-01-17T04:39:57 1737088797

That cutoff is rarely truly a hard cutoff. The limits are often too low to have a natural test of that, though.

ryao · 2025-01-17T16:51:59 1737132719

They could always make the amount over that is given due to their cutoff enforcement being less than perfect free, as it likely already is on the free plan. That would avoid the risk of unbounded bills associated with going on a paid plan.

pclmulqdq · 2025-01-17T17:22:05 1737134525

Most of the cloud providers have a less-than-perfect cutoff. It's worse than the cutoff of the free plan, though, because the free plan can be slowed down to have better enforcement, while the commercial plans have performance SLAs to hit.

maeil · 2025-01-17T04:25:04 1737087904

> cause outages

That's fine. The major LLM providers work like this. If you're out of credit, or hit your monthly recharge limit, it stops working, bringing down prod with it if your product relies on it. Not heard anyone complain about the concept.

If it's really a problem for you, you can be all enterprisey and contact sales, then they'll be very excited to offer you extremely high limits and post billing.

This way everyone gets what they wants.

danpalmer · 2025-01-18T01:09:52 1737162592

To be clear, the outages I'm referring to are not when you hit your billing cap. Try designing a billing system for a cloud provider that implements caps, while still retaining the performance necessary for the services you're providing to make sense, and without introducing huge, common, failure modes.

Havoc · 2025-01-17T10:45:59 1737110759

You solve this by opt in, not fancy engineering. There are two classes of customers - those that absolutely can't afford services be cut, and those that absolutely can't afford a 50k bill.

So you deploy an advanced technology known as a radio button to toggle which they want, throw a bunch of ToS & consent agreements about data loss / deletion at the ones opting for hardcaps....and done.

Also reminder that Azure has hard caps for certain account types. This is not a technical problem. They can do this, they just don't want to.

hackingonempty · 2025-01-17T03:12:01 1737083521

How is the service being able to answer the question "is there budget available for this action?" different from "is there authorization for this action?"

williamstein · 2025-01-17T03:39:31 1737085171

One example - Google Cloud network egress charges aren’t known until up to ~2 days after they happen. Since they can be obscenely expensive (eg $0.23/GB), they can make budget computation difficult.

stefanfisk · 2025-01-17T06:15:57 1737094557

What is the cause of this delay?

danpalmer · 2025-01-18T01:11:28 1737162688

I don't know for sure, but based on my knowledge as a user, I'd guess it could be something like delivering usage logs from points of presence in the CDN. PoPs can go offline regularly, they're highly dependent on other people's networks, 2 days might be the arbitrary line that has been drawn that gets enough of them in most circumstances, while not being too annoying for customers.

danpalmer · 2025-01-17T03:50:28 1737085828

Authorisation is much more cacheable than a value that inherently changes every single time you check it.

Also authorisation revocation is relatively uncommon, which means you can have a fast-path for approval, and then push only the revoked key IDs to just frontend servers.

josephg · 2025-01-17T04:17:26 1737087446

> It's easy to build a bad billing cap that would slow down services and cause outages

When you've exhausted your billing cap, what else could it do? Either shutting off services or blowing past the cap seem like the only serious options.

For a lot of small businesses, I suspect that an outage is often better than risking a surprise 6 figure bill. But it depends on what your software does.

Also if the system shut down automatically when the budget got exhausted, there's a risk that a runaway backup process or something might accidentally eat through your regular budget and get the site shut down. For that, it might make sense to assign different resources into different budgetary buckets or something.

I'm surprised firebase doesn't implement something like that.

pclmulqdq · 2025-01-17T04:20:54 1737087654

I am sitting on an algorithm for hard billing caps right now that seems it may have some holes, but gets close based on several very tricky distributed systems problems. Making a billing cap that doesn't amount to "just use one single gateway server" (serializing everything and introducing tons of latency) seems to be harder than building a database or a filesystem, and most programmers would never attempt even those.

chuckadams · 2025-01-17T06:48:54 1737096534

> it's basically impossible to build a good billing cap.

They don't have a problem implementing caps on a free tier. No one's asking for perfect, but they don't seem to care about even getting to the ballpark.

mortehu · 2025-01-17T04:26:35 1737087995

Seems pretty similar to distributed rate limiting. But it's much simpler to solve the common case of overspending on a single API: give each API the same daily limit with no communication between APIs.

Ekaros · 2025-01-17T10:18:49 1737109129

Either you want automatic scalability or you want caps. Scalability is hard with caps. Say your site selling stuff sees spike and scales up and hits cap, should the service degrade in way you did not plan for? Or go past the cap as you are still making money?

iterateoften · 2025-01-17T11:07:06 1737112026

Look at how insane twilio is.

I set up automatic recharge of $20. A small amount because not much traffic. A bad actor got ahold of our api that didn’t have rate limit yet and started spamming Africa.

Twilio had zero issue charging my credit card every second. Literally I was getting a hundred emails and bank notifications a minute. Brex didn’t stop anything.

Twilio responded that it was my fault. Yeah. I sure 100% probably should have put in that cloudflare rate limit first. But…

How easy would it be for twilio to prevent this on any level? I need rate limits? How about you rate limit credit card charges. Putting $20 recharge limit should mean $20/day or $20/hr not literal unmetered right to charge as much as possible in 20 increments.

Twilio support sent me all this info about protecting myself from African spammers who use the technique to make money from SMS charges. You know what’s more responsible than informing me of this? How about blocking sending sms to country codes known for this from the get-go and optin to send to them.

it was clear the perverse incentives that encourage twilio to massively benefit from being insecure and easily exploitable by spammers.

Ended up costing almost $3k after bill adjustment when our usual spend was $5/mo. not bankruptcy level so after fighting with support just took it as is and learned my lesson. But twilio made *50 years* of revenue in about 10minutes from their own negligence.

nothercastle · 2025-01-17T13:58:25 1737122305

It’s probably part of the business model. They rely on the African spammers to improve profits

tasuki · 2025-01-18T08:09:16 1737187756

I use digital ocean (a cloud provider) droplets. I know exactly what my bill will be at the end of the month.

edoceo · 2025-01-17T04:07:49 1737086869

Wait till we see the crazy/unmetered AI bills.

These folk can't even get a stable billing process; the coming surprises will be awesome.

amazingamazing · 2025-01-17T01:51:54 1737078714

there really isn’t a conspiracy. a hard billing cap is at least one of: very difficult (even for faang) to implement without incurring unacceptable performance regressions, impractical as downtime has worse optics than high spend (given that high spend in this case is correlated with traffic, which is good), unnecessary by those customers who represent most revenue.

itake · 2025-01-17T02:57:04 1737082624

Setting max auto scaling is doable. Even if that doesn’t translate directly to hard billing, it would still help

yadaeno · 2025-01-17T16:47:10 1737132430

Almost all GCP services have quota by default that you need to manually increase.

itake · 2025-01-17T02:55:37 1737082537

Imagine if LinkedIn wanted to use Firebase to launch a new product.

I could easily imagine a 1,000x hour over hour growth as the social media grows.

If I was LinkedIn, I’d be very upset if Firebase pulled the cord when everyone was looking at the new launch.

collingreen · 2025-01-17T03:35:19 1737084919

Would you be upset if they pulled the cord at the limit you explicitly set beforehad? If not then that's not really the thing the people asking for a billing cap are talking about.

itake · 2025-01-17T04:37:24 1737088644

What does "pulled the cord" mean?

I pay for storage. Does this mean they delete my production database? They delete my s3 buckets? They delete my server logs? My message queue?

Then yes! I would be extremely upset. How do you expect the cloud provider to magically know what is safe to "pull the plug" and what is not safe?

like_any_other · 2025-01-17T10:19:31 1737109171

Why do you pick unreasonable interpretations, when there are so many trivial, obvious, reasonable options? Even the literal interpretation of "pull the cord", meaning the power cord, would mean the servers shut down, but nothing is deleted. And why do you claim the provider would "magically" have to know these things? These can be communicated through user settings: "if bill exceeds X, do one of the following:

[ ] continue billing me [ ] throttle bandwidth [ ] gracefully shut down servers (data is deleted after 30 days of non-payment)

This is so painfully obvious even for someone that never deals with cloud vendors, that I just don't understand why you would pretend otherwise.

stvltvs · 2025-01-17T07:49:02 1737100142

More like make you servers publicly inaccessible rather than delete them. Or throttle bandwidth. Or provide a warning when the cap is approaching. Etc.

jlcummings · 2025-01-17T09:43:01 1737106981

Exactly. Why the fixation on one strategy for handling this not so uncommon scenario. It is so common that handling it should be defacto.

This isn’t a pre-paid gas pump use, but that could be one way to present it. We all want to fill as fast as possible. And if your fill spout can handle top rates, you get top fill rates, until you close in on the hard limit. Then it trickles down to the metered drop. Then stops precisely where it needs to.

By accepting/requesting a hard cap, the provider can make clear that in order to be precise, soft caps will go into affect earlier and induce progressive throttling where applicable. If the throttle doesn’t catch the final milliliter or two of gasoline, before the pump shuts off, the provider can and should just let it go. It’s a loss, but comparatively a figurative drop in the bucket.

The other obvious route is predictive where prior usage guide the guardrails. Ordering two eggs is typical for a single meal. Ordering twelve is not. Ordering three or four is unusual for most but if you are a regular diner your habits will be observable.

Any of this predicated on the provider to want to do something. They seem to lack incentives at this point for making it easy. It is stories like op that I avoid well known problematic providers like Firebase who don’t respect and foster long term relationships.