artur44's comments

artur44 · 2025-12-18T19:11:07 1766085067

SCADAbreach is a browser-based interactive simulation focused on SCADA / ICS security. The goal was to model industrial systems and attack/defense mechanics inspired by real-world incidents, while keeping it accessible directly in the browser.

I’d love feedback.

artur44 · 2025-12-11T15:10:56 1765465856

I keep wondering about one thing: maybe Disney isn’t paying for the technology at all — maybe they’re paying for a spot in the future. If generative video becomes as common as social media, AI models will be the new TV channels, and whoever controls the prime shelf space wins. In that sense, this billion isn’t a fee for Sora it’s the price of having Disney’s front row booth in a new world of storytelling. So the real question isn’t why is Disney paying? but who’s going to own the shelves in this new story marketplace?

artur44 · 2025-12-10T19:19:17 1765394357

A simple way is to split the model’s output stream before TTS. Reasoning/structured tokens go into one bucket, actual user-facing text into another. Only the second bucket is synthesized. Most thinking out loud issues come from feeding the whole stream directly into audio.

pugio · 2025-12-10T20:19:13 1765397953

There is no TTS here. It's a native audio output model which outputs audio tokens directly. (At least, that's how the other real-time models work. Maybe I've misunderstood the Qwen-Omni architecture.)

artur44 · 2025-12-10T20:35:01 1765398901

True, but even with native audio-token models you still need to split the model’s output channels. Reasoning/internal tokens shouldn't go into the audio stream only user-facing content should be emitted as audio. The principle is the same, whether the last step is TTS or audio token generation.

regularfry · 2025-12-11T15:32:19 1765467139

There's an assumption there that the audio stream contains an equivalent of the <think>/</think> tokens. Every reason to think it should, but without seeing the tokeniser config it's a bit of a guess.

artur44 · 2025-12-10T18:52:32 1765392752

A lot of the debate here swings between extremes. Claims like “AI writes most of the code now” are obviously exaggerated especially coming from a nontechnical author but acting like any use of AI is a red flag is just as unrealistic. Early stage teams do lean on LLMs for scaffolding, tests and boilerplate, but the hard engineering work is still human. Is there a bubble? Sure, valuations look frothy. But like the dotcom era, a correction doesn’t invalidate the underlying shift it just clears out the noise. The hype is inflated, the technology is real.

artur44 · 2025-12-11T20:07:35 1765483655

I think some wires got crossed. My point wasn’t that LLMs can’t produce useful infra or complex code clearly they can, as many examples here show. It’s just that neither extreme narrative AI writes everything now vs. you can’t trust it for anything serious reflects how teams actually work. LLMs are great accelerators for boilerplate, declarative configs, and repetitive logic, but they don’t replace engineering judgement they shift where that judgement is applied. That’s why I see AI as real, transformative tech inside an overhyped investment cycle, not as magic that removes humans from the loop.

Daishiman · 2025-12-11T00:38:15 1765413495

> Early stage teams do lean on LLMs for scaffolding, tests and boilerplate, but the hard engineering work is still human.

I no longer believe this. A friend of mine just did a stint a startup doing fairly sophisticated finance-related coding and LLMs allowed them to bootstrap a lot of new code, get it up and running in scalable infra with terraform, and onboard new clients extremely quickly and write docs for them based on specs and plans elaborated by the LLMs.

This last week I extended my company's development tooling by adding a new service in a k8s cluster with a bunch of extra services, shared variables and configmaps, and new helm charts that did exactly what I needed after asking nicely a couple of times. I have zero knowledge of k8s, helm or configmaps.

xdc0 · 2025-12-11T01:59:57 1765418397

If you are in charge of that tooling, how do you ensure the correctness of the work? Or is it that at this point the responsibility goes one level higher now where implementation details are not important or relevant at all and all it matters is it behaves as described?

yunnpp · 2025-12-11T03:33:46 1765424026

Just look at what they are stating:

> that did exactly what I needed

> I have zero knowledge of k8s, helm or configmaps.

Obviously this is not anything resembling engineering, or anything a respectful programmer would do. An elevator that is cut lose when you press 0 also works very well until you press 0. The claims of AI writing significant chunks of code come from these sort of people with little experience in programming or engineering in general, SPA vibe coders and what not. You should tremble at the thought of using any of the resulting systems in production, and certainly not try to replicate that workflow yourself. Which gives you a sense of how overblown these claims are.

Daishiman · 2025-12-11T04:40:45 1765428045

> The claims of AI writing significant chunks of code come from these sort of people with little experience in programming or engineering in general, SPA vibe coders and what not.

I'm sorry man but I've been doing this for 25 years and I've worked and studied with some extremely bright and productive engineers. I vouch for the code that I write or that I delegate to an LLM, and believe it or not it doesn't take a magician to write a k8s spec file, just patience to write 10 levels of nested YAMLs to describe the most boring, normal and predictable code to tell your cluster what volume mounts and env variables to load.

noodletheworld · 2025-12-11T07:49:35 1765439375

> I have zero knowledge of k8s, helm or configmaps

…

> I vouch for the code that I write or that I delegate to an LLM, and believe it or not it doesn't take a magician to write a k8s spec file…

I have been writing code since 1995.

That has zero relevance to my skill at rolling out deployments in a technology I know nothing about.

One of the two things you’ve said is false:

Either a) you do know what you’re talking about, or b) you are not confident in the results.

It can’t be both.

It sounds to me like you’re subscribed heavily into a hype train; that’s fine, but your position, as described, leaves a lot to desired, if you’re trying to describe some wide trend.

Here my anecdote: major cloudflare outages.

Hard things are hard. AI doesn’t solve that. Scaffolding is easy; ai can solve that.

Scaffolding is a reliable thing to rely on with ai.

Doing it for K8s configuration, if you don’t know k8s is stupid. I know what I’m talking about when I say that. Having it help you if you do know what you’re doing is perfectly legit.

Claiming it did help when claiming you have, and I quote, “zero knowledge” (but you actually do) is hype. Leave it on LinkedIn dude. :(

Daishiman · 2025-12-11T15:07:02 1765465622

> Either a) you do know what you’re talking about, or b) you are not confident in the results. It can’t be both.

You've been coding for a lifetime yet you don't seem to get that certainty in software is a spectrum? I have sufficient confidence in the output of LLMs to sign my name under the code it writes when putting up a PR for a specialist to read. That's good enough for 90% of the work that we do day-to-day. You think that's not hype-worthy?

> Doing it for K8s configuration, if you don’t know k8s is stupid. I know what I’m talking about when I say that. Having it help you if you do know what you’re doing is perfectly legit.

"Knowing" k8s is an oxymoron. K8s is a profoundly complicated piece of tech that can don insanely complicated things while also serving as a replacement for docker-compose or basic services that could have been hosted on ECR. The concepts behind basic k8s functionality are not difficult, but I saved myself two weeks of reading how to write helm spec files, a piece of knowledge I have no interest in learning because it doesn't add any appreciable value to the software I produce, and was instead able to focus on getting what I needed out of my cluster automation scripts.

This really isn't that complicated to understand. I don't care for being a k8s expert and I don't care for syntactical minutiae behind it. It isn't hype that I now I only need to understand the essential conceptual basics behind the software to get it working for what I need instead of doing a deep dive like I had to do years ago in when reading similar docs for similar IaC producs to get lesser functionality going.

Daishiman · 2025-12-11T04:39:38 1765427978

Because after 25 years of coding and a dozen infrastructure description languages I know that you test your code and you get someone expert in the field to look at your PRs.

LLMs are _really_ good at writing infra code if you know how infra works, believe it or not. And the ultimate responsibility still lies in human beings for code ownership.

biophysboy · 2025-12-11T01:56:15 1765418175

It depends on the task though, right? I promise I'm not in denial; I use these things all the time. Sometimes it works immediately; sometimes it doesn't. I have no way of predicting when it will or won't.

Daishiman · 2025-12-11T04:52:47 1765428767

* Infra code description languages like Terraform and K8s/helm spec files are like magic; they get 90% of the code right 90% of the time. In my experience that's about half of the work; the other half is spent debugging and correcting details that matter, but still applies to the code that I write myself.

* SQL works almost as good. It's especially useful when you need to generate queries with long lists of fields and complex query criteria. Give it a schema and let it rip.

* Python code works reasonably well. If your description is terse and clear it will generally do the right thing. It has a knack for being excessive in comments and will sometimes do things in ways that feel unnatural, but business code will be as good as the context that surrounds it. For boring, repetitive tasks like setting up program args, annotating types, and writing generic request/response cycles with common frameworks it will do boring old vanilla code. You'll likely want to touch it up and adapt it to your personal preference.

* Debugging is very much or miss. It has been absolutely fantastic at troubleshooting failed and stuck k8s jobs and service configuration issues, having no qualms about creating its own shell or python scripts to investigate ports or logs, and writing JSON parsing scripts that are snoozefest for a human to write. The regexes that I'd barely be arsed to write to parse enormous logs it writes trivially. For business logic, the more convoluted your logic the harder the time it will have, and for most debugging issues I prefer to let it run and list some hypotheses and potential issues and my intent is to learn and understand the problem myself deeply before committing to a fix.

biophysboy · 2025-12-11T15:55:46 1765468546

It sounds like it works better for declarative schema than imperative scripting/debugging (speaking loosely here). Do you agree? Seems like a good heuristic for me to keep in mind

Daishiman · 2025-12-12T16:56:47 1765558607

Very much so.

jillesvangurp · 2025-12-11T07:52:52 1765439572

The thing to remember about the dotcom era was that while there were a lot of bad companies at the time with a lot of clueless investors behind them, quite a few companies made it through the implosion of that bubble and then prospered. Amazon, Google, eBay, etc. are still around.

More importantly, the web is now dominant for enterprise SaaS applications, which is a category of software that did not really exist before the web. And the web post–dot-com bubble spawned a lot of unicorns.

In short, there was an investment bubble. But the core tech was fine.

AI feels like one of those things where the tech is similarly transformational (even more so, actually). It’s another investment bubble predicated on the price of GPUs, which is mostly making Nvidia very rich right now.

Right now the model makers are getting most of the funding and then funneling non-trivial amounts to Nvidia (and their competitors). But actually the value creation is in applications using the models these companies create. And the innovation for that isn’t coming from the likes of Anthropic, OpenAI, Mistral, X.ai, etc. They are providing core technology, but they seem to be struggling to do productive things in terms of UX and use cases. Most of the interesting things in this space are coming from smaller companies figuring out how to use the models these companies produce. Models and GPUs are infrastructure, not end-user products.

And with the rise of open-source models, open algorithms, and exponentially dropping inference costs, the core infrastructure technology is not as much of a moat as it may seem to investors. OpenAI might be well funded, but their main UI (ChatGPT) is surprisingly limited and riddled with bugs. That doesn’t look like the polished work of a company that knows what they are doing. It’s all a bit hesitant and copycat. It’s never going to be a magic solution to everyone’s problems.

From where I’m sitting, there is clear untapped value in the enterprise space for AI to be used. And it’s going to take more than a half-assed chat UI to unlock that. It’s actually going to be a lot of work to build all of that. Coding tools are, so far, the most promising application of reasoning models. It’s easy to see how that could be useful in the context of ERP/manufacturing, CRM, traditional office applications, and the financial world.

Those each represent verticals with many established players trying to figure out how to use all this new stuff — and loads more startups eager to displace them. That’s where the money is going to be post-bubble. We’ve seen nothing yet. Just like after the dot-com bubble burst, all the money is going to be in new applications on top of the new infrastructure. It’s untapped revenue. And it’s not going to be about buying GPUs or offering benchmark-beating models. That’s where all the money is going currently. That’s why it is a bubble.

artur44 · 2025-12-10T18:49:17 1765392557

Interesting experiment. Using modern LLMs to retroactively grade decade-old HN discussions is a clever way to measure how well our collective predictions age. It’s impressive how little time and compute it now takes to analyze something that would’ve required days of manual reading. My only caution is that hindsight grading can overvalue outcomes instead of reasoning — good reasoning can still lead to wrong predictions. But as a tool for calibrating forecasting and identifying real signal in discussions, this is a very cool direction.

artur44 · 2025-12-10T18:44:01 1765392241

Honestly, this feels like another case where the headline sounds bold, but the real impact will be minimal. Any age-based restriction ends up in the same place: platforms are forced to collect more data just to “prove” someone’s age. When the target group is teenagers, that’s basically a privacy disaster waiting to happen.

From a technical perspective, this is impossible to enforce cleanly. Anyone with even basic internet literacy can bypass it with a VPN + fresh account + throwaway email. And of course, the teens most determined to get around it will be the ones the policy is supposedly protecting. The bigger issue is the false sense of security. Parents and politicians get to feel like something has been “done,” while the actual online risks don’t disappear — they just move somewhere less visible. If the goal is genuinely improving teen mental health, digital literacy and real support systems work far better than regulations that will inevitably leak.

gen6acd60af · 2025-12-10T19:35:31 1765395331

Please don't do this here.

https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...

https://news.ycombinator.com/item?id=46208058

taejavu · 2025-12-10T19:52:51 1765396371

What, because they used a single em—dash?

artur44 · 2025-12-10T20:21:03 1765398063

Got it, thanks for pointing it out.

artur44 · 2025-12-09T16:59:21 1765299561

The “9x faster than Unity” line also jumped out at me. Empty-scene benchmarks are basically a measurement of how thin your abstraction layer is, not how the engine behaves under actual game workloads.

What is interesting, though, is that engines like this often reveal how much overhead comes from tooling, scene graph complexity, editor integrations, GC pressure, etc. Sometimes a very lean engine feels “faster” simply because it avoids all the layers that a mature engine needs to support large teams.

I’d love to see a demo that stresses real systems — entity updates, materials, batching, physics, etc. That would say far more about the architecture than raw FPS of drawing nothing.

artur44 · 2025-12-09T16:55:45 1765299345

I always find it interesting how often the simplest hash table layouts end up performing best in real workloads. Once you avoid pointer chasing and keep everything in a compact array, CPU caches do most of the heavy lifting.

It’s also a good reminder that clarity of layout often beats more “clever” designs, especially when the dataset fits comfortably in memory.

hinkley · 2025-12-09T19:39:15 1765309155

Until you get high memory contention from the rest of the code. Once eviction gets high you get some pretty counterintuitive improvements by fixing things that seem like they shouldn’t need to be fixed.

My best documented case was a 10x speed up from removing a double lookup that was killing caches.

crest · 2025-12-09T22:05:44 1765317944

My best improvment was just bit-interleaving both axes of a 2x32bit integer coordinate (aka z-curve). I obtained factor ~100x (yes factor not percent) throughput improvement over locality in only one dimension. All it took was ~10 lines of bit twiddling. The runtime went from a bit above 300ms to slightly less then 3ms.

hinkley · 2025-12-09T22:57:50 1765321070

End to end gets weird. I was asked to look at an admin page, nobody could figure out why it was 30s. Literally the first thing I tried got it under 4 and the second down to three. It was pulling the same list of rows twice, applying two filters and then looking at the intersection. I changed the signature to send the list as input instead of the query constraints. Then I changed them to avoid the intersect.

If you would have asked me to bet on which one would have had the bigger impact I would have split the difference.

My second favorite was similar. Two functions making a call instead of sharing the answer. Profiler said 10% cumulative. I removed half. Instead of 5% I got 20%. Which just demonstrates how much data a profiler cannot show you.

throw-the-towel · 2025-12-09T22:44:35 1765320275

I'm wondering how do you folk even come up with this kind of optimisations.

hinkley · 2025-12-09T23:03:25 1765321405

Sheer stubbornness.

Profilers lie and some more than most. I’ve gotten 3x from code with a perfectly rectangular profile output.

Part of it comes down to a trick game devs steal from finance: give each task a budget and keep it to the budget even if it’s not the tall tent pole.

You should not spend 10% of your response time on telemetry and logging combined. Yet I pulled 10% TTFB out of just the logging and telemetry code on a project. It was a frog boiling situation. Every new epic used the new code and determining the cumulative cost wasn’t easy.

saltcured · 2025-12-09T20:07:59 1765310879

To me, these sorts of examples always seem contrived. To the first order, I've never had a real hash table problem that was on machine word keys.

I've nearly always had a variable length string or other complex structure that was being hashed, not their handles.

Back in my early career in C, this would be a generic API to hash and store void pointers, but the pointers were not being hashed. The domain-specific hash function needed to downcast and perform the appropriate remote memory access to fetch the variable-length material that was actually being hashed.

jasonwatkinspdx · 2025-12-10T01:04:48 1765328688

I'm a big fan of the basic power of two choices hash table design. It's simple to understand and implement, has reasonable constant factors, and hits high load factors on real world datasets.

You can use more elaborate probe and relocation schemes, but just choosing the less full bucket and resizing if both choices are full gets you surprisingly far.

kccqzy · 2025-12-10T01:13:48 1765329228

Power-of-two length is also the natural choice for a growable linear array where the designer has no idea how many elements there will be.

artur44 · 2025-12-09T16:54:01 1765299241

The hardware story is interesting, but I’m curious how much of the real-world adoption will depend on the maturity of the compiler stack. Trainium2 already showed that good silicon isn’t enough if the software layer lags behind.

If AWS really delivers on open-sourcing more of the toolchain, that could be a much bigger signal for adoption than raw specs alone.