Hacker Newsnew | past | comments | ask | show | jobs | submit | chubot's commentslogin

Yeah that's basically what was discussed here: https://lobste.rs/s/xz6fwz/unix_find_expressions_compiled_by...

And then I pointed to this article on databases: https://notes.eatonphil.com/2023-09-21-how-do-databases-exec...

Even MySQL, Duck DB, and Cockroach DB apparently use tree-walking to evaluate expressions, not bytecode!

Probably for the same reason - many parts are dominated by I/O, so the work on optimization goes elsewhere

And MySQL is a super-mature codebase


I was just reading a paper about compiling SQL queries (actually about a fast compilation technique that allows for full compilation to machine code that is suitable for SQL and WASM): https://dl.acm.org/doi/pdf/10.1145/3485513

Sounds like many DBs do some level of compilation for complex queries. I suspect this is because SQL has primitives that actually compute things (e.g. aggregations, sorts, etc.). But find does basically none of that. Find is completely IO-bound.


Virtually all databases compile queries in one way or another, but they vary in the nature of their approaches. SQLite for example uses bytecode, while Postgres and MySQL both compile it to a computation tree which basically takes the query AST and then substitutes in different table/index operations according to the query planner.

SQLite talks about the reasons for each variation here: https://sqlite.org/whybytecode.html


Thanks for the reference.

Without being glib, I honestly wonder if Fabrice Bellard has started using any LLM coding tools. If he could be even more productive, that would be scary!

I doubt he is ideologically opposed to them, given his work on LLM compression [1]

He codes mostly in C, which I'm sure is mostly "memorized". i.e. if you have been programming in C for a few decades, you almost certainly have a deep bench of your own code that you routinely go back to / copy and modify

In most cases, I don't see an LLM helping there. It could be "out of distribution", similar to what Karpathy said about writing his end-to-end pedagogical LLM chatbot

---

Now that I think of it, Bellard would probably train his own LLM on his own code! The rest of the world's code might not help that much :-)

He has all the knowledge to do that ... I could see that becoming a paid closed-source project, like some of his other ones [2]

[1] e.g. https://bellard.org/ts_zip/

[2] https://bellard.org/lte/


What I wonder is: are current LLMs even good for the type of work he does: novel, low-level, extremely performant

I'm writing C for microcontrollers and ChatGPT is very good at it. I don't let it write any code (because that's the fun part, why would I), but I discuss with it a lot, asking questions, asking to review my code and he does good. I also love to use it to explain assembly.

It's also the best way to use llms in my opinion, for idea generation and snippets, and then do the thing "manually". Much better mastery of the code, no endless loop of "this creates that bug, fix it", and it comes up with plenty of feedback and gotchas when used this way.

This is how I used LLMs to learn and at the same time build an application using Tkinter.

As a professional C programmer, the answer seems to be no; they are not good enough.

They are absolutely good at reviewing C code. To catch stupid bugs and such. Great for pair programming type use.

This is a funny one because on the one hand the answer is obviously no, it's very fiddly stuff that requires a lot of umming and ahhing, but then weirdly they can be absurdly good in these kinds of highly technical domains precisely because they are often simple enough to pose to the LLM that any help it can give is actually applicable immediately whereas in a comparatively boring/trivial enterprise application there is a vast amount of external context to grapple with.

If Fabrice explained what he wanted, I expect the LLM would respond in kind.

If Fabrice explained what he wanted the LLM would say it's not possible.

When the coding assistant LLMs load for a while it's because they are sending Fabrice an email and he corrects it and replies synchronously.


From my experience, it's just good enough to give you a code overview of a codebase you don't know and give you enough implementation suggests to work from there.


I doubt it, although LLMs seem to do well on low-level (ASM level instructions).

I think it's the opposite: llms ask Fabrice Bellard instead

Congrats, the Chuck Norris meme has finally made its way onto HN.

Fabrice Bellard is far more deserving of the honor that ol’ Chucky.

Tough choice: Knuth, Bellard, Norvig...

They're trained on his code for sure. Every time I ask about ffmpeg internals, I know it's Fabrice's training data.

He has in fact written one: https://bellard.org/ts_server/

Yeah I've seen that, but it looks like the inference-side only?

Maybe that is a hint that he does use off-the-shelf models as a coding aid?

There may be no need to train your own, on your own code, but it's fun to think about


Are you saying a LFM could be a good idea? A Large Fabrice Model?

Why every single post in HN has to come down to talk about AI sloop...

> Without being glib, I honestly wonder if Fabrice Bellard has started using any LLM coding tools

I doubt it. I follow him and look at the code he writes and it's well thought out and organized. It's the exact opposite of AI slop I see everywhere.

> He codes mostly in C, which I'm sure is mostly "memorized". i.e. if you have been programming in C for a few decades,

C I think he memorized a long time ago. It's more like he keeps the whole structure and setup of the program (the context) in his head and is able to "see it" all and operate on it. He is so good that people are insinuating he is actually "multiple people" or he uses an LLM and so on. I imagine he is quite amused reading those comments.


Still, humans can only type so quickly. It's not hard to imagine how even a flawless coder could benefit from an llm.

> humans can only type so quickly

Real programming is 0.1% typing. Typing speed is not a limiting factor for any serious development.


You're conflating typing with programming. Typing is in fact the limiting factor to serious development.

typing would not make top-100 list of “limiting factors” for serious development.

It is if for AI users who can't type code.

I am a heavy AI user and have been typing code for 3 decades :)

Ok, if you have such insight into development, why not leverage agents to type for you? What sort of problems have you faced that you are able to code against faster than you can articulate to an agent?

I have of course found some problems like this myself. But it's such a tiny portion of coding I really question why you can't leverage LLMs to make yourself more productive


Do you feel called out?

not at all, can’t feel called out by people who don’t have a clue what they are talking about :)

Why you waste your time with people who don't have a clue what they talk about and rush to reply them?

You replied 2 min after my comment... I am sorry you are that lonely on christmas day


thanks, bored at the airport :)

Most coding is better done with agents than with your hands. Coding is the main financial impediment to development. Yes, actually articulating what you want is the hard problem. Yes, there are technical problems that demand real analytical insight and real motivation. But refusing to use agents because you think you can type faster is mistaking typing for your actual skill: reasoning and interpretation.

Keep in mind even if someone writes their own code LLM is great to accelerate: tests, makefiles, docs, etc.

Or it can review for any subtle bugs too. :)


Some talented people (mitsuhiko, Evan you) seem to leverage LLM their own way. Probably as legwork mostly.

Is Fabrice like the Chuck Norris of programming?

Hopefully without the politics…

In Soviet Russia, politics find you.

In 2025, there is no shame in using an LLM. For example, he might use it to get help debugging, or ask if a block of code can be written more clearly or efficiently.

> I honestly wonder if Fabrice Bellard has started using any LLM coding tools. If he could be even more productive, that would be scary!

That’s kind of a weird speculation to make about creative people and their processes.

If Caravaggio had had a computer with Photoshop, if Eintein had had a computer with Matlab, would they have been more productive? Is it a question that even makes sense?


> Is it a question that even makes sense?

Absolutely. It's a very intriguing thought invoking the opposite of the point you're trying to make.


Maybe today Bellard uses LLMs though

Matlab has been proven to be a indispensable tool in many fields.

AI is the same, for example creating slop or virtual girlfriends.


There is a bunch of AI slop in there ... It does seem like the author probably knows what he's talking about, since there is seemingly good info in the article [1], but there's still a lot of slop

Also, I think the end should be at the beginning:

Know when your indexes are actually sick versus just breathing normally - and when to reach for REINDEX.

VACUUM handles heap bloat. Index bloat is your problem.

The intro doesn't say that, and just goes on and on about "lies" and stupid stuff like that.

This part also feels like AI:

Yes. But here's what it doesn't do - it doesn't restructure the B-tree.

What VACUUM actually does

What VACUUM cannot do

I don't necessarily think this is bad, since I know writing is hard for many programmers. But I think we should also encourage people to improve their writing skills.

[1] I'm not an SQL expert, but it seems like some of the concrete examples point to some human experience


Author here – it’s actually funny, as you pointed out parts that are my own (TM) attempts to make it a bit lighthearted.

LLM is indeed used for correction and improving some sentences, but the rest is my honest attempt at making writing approachable. If you’re willing to invest the time, you can see my fight with technical writing over time if you go through my blog.

(Writing this in the middle of a car wash on my iPhone keyboard ;-)


Yeah, I get accused of being an LLM all the time as well, best to ignore that kind of slop... (which, ironically, goes both ways!)

Yeah my eyes glaze over when I see the familiar tone.

If it's not worth writing it sure ain't worth reading.


Sorry, you lost at the Turing test

A better title might have been VACUUM addresses heap bloat; REINDEX addresses index bloat

Similar to a recent story Go is portable, until it isn't -- the better title is Go is portable until you pull in C dependencies

https://lobste.rs/s/ijztws/go_is_portable_until_it_isn_t


There was also "boringcc"

https://gcc.gnu.org/wiki/boringcc

As a boring platform for the portable parts of boring crypto software, I'd like to see a free C compiler that clearly defines, and permanently commits to, carefully designed semantics for everything that's labeled "undefined" or "unspecified" or implementation-defined" in the C "standard" (DJ Bernstein)

And yeah I feel this:

The only thing stopping gcc from becoming the desired boringcc is to find the people willing to do the work.

(Because OSH has shopt --set strict:all, which is "boring bash". Not many people understand the corners well enough to disallow them - https://oils.pub/ )

---

And Proposal for a Friendly Dialect of C (2014)

https://blog.regehr.org/archives/1180


It is kind of ironic, given the existence of Orthodox C++, and kind of proves the point, that C isn't as simple as people think, having only read the K&R C book and nothing else.


> in the C "standard"

Oof, those passive-aggressive quotes were probably deserved at the time.


It's still not really wrong though. The C standard is just the minimal common feature set guaranteed by different C compilers, and even then there are significant differences between how those compilers implement the standard (e.g. the new C23 auto behaves differently between gcc and clang - and that's fully sanctioned by the C standard).

The actually interesting stuff happens outside the standard in vendor-specific language extensions (like the clang extended vector extension).


Off topic but if you're the author of sokol, I'm so thankful because it led to my re-learning the C language in the most enjoyable way. Started to learn Zig these days and I see you're active in the community too. Not sure if it's just me but I feel like there's a renaissance of old-school C, the language but more the mentality of minimalism in computing that Zig also embodies.


Yes it's a me :D Thanks for the kind words. And yeah, Zig is pretty cool too.


Android was also an acquistion by Google, run relatively separately, and it grew into something huge


Uh weird that I got downvotes

https://en.wikipedia.org/wiki/Android_(operating_system)#His...

Android Inc. was founded in Palo Alto, California, in October 2003 by Andy Rubin and Chris White

Google acquired the company in July of [2005] for at least $50 million

It was ad-supported of course, but it's definitely not similar to IBM acquisitions


I wasn't really familiar with this term, but as another comment here said, the only language I use that doesn't have such late binding/dynamic dispatch is C

i.e. it seems natural in Python and C++ (and Java and Rust …)

But I did notice the term "open recursion" in Siek's Essentials of Compilation - https://mitpress.mit.edu/9780262048248/essentials-of-compila...

To make our interpreters extensible we need something called "open recursion", in which the tying of the recursive knot is delayed until the functions are composed. Objected-oriented languages provide open recursion via method overriding

---

I mentioned that here too, on a thread about a type checker: https://news.ycombinator.com/item?id=45151620

To me the open recursion style clearly seems like a better default than VISITORS?

You can still REUSE traversal logic, and you don't "lose the stack", as I pointed out in the comment below: https://news.ycombinator.com/item?id=45160402

Am I missing something? I noticed there is a significant disagreement about style, which seems to not have a clear rationale: MyPy uses visitors all over, while TypeScript uses switch statements

This is a big difference! It affects nearly every line of code, and these projects have a ton of code ...


> the only language I use that doesn't have such late binding/dynamic dispatch is C

It's not that it doesn't support this, it is just explicit.


Also, I’m not 100% sure, but maybe Standard ML doesn’t support the open recursion pattern, but say OCaml does (?). So it could be a relevant distinction in that respect


Somehow this reminded me of a similar rant about devops:

It’s the Future - https://blog.paulbiggar.com/its-the-future/

And now I see that's from June 2015 -- it's over 10 years old now! Wow

I'm not sure we're really in a better place in the cloud now ... The article says

I’m going back to Heroku

and in 2025, I think people still want that


A chef is not a niche profession - everyone knows what a chef does, and has consumed what they make

A niche profession is, say, artistic cycling

People talk about Bourdain 7 years later for the same reason that they talk about musicians, actors, and painters 7 or 70 years later


I read Deep Learning: A Critical Appraisal ? in 2018, and just went back and skimmed it

https://arxiv.org/abs/1801.00631

Here are some of the points

Is deep learning approaching a wall? - He doesn't make a concrete prediction, which seems like a hedge to avoid looking silly later. Similarly, I noticed a hedge in this post:

Of course it ain’t over til it’s over. Maybe pure scaling ... will somehow magically yet solve ...

---

But the paper isn't wrong either:

Deep learning thus far is data hungry - yes, absolutely

Deep learning thus far is shallow and has limited capacity for transfer - yes, Sutskeyer is saying that deep learning doesn't generalize as well as humans

Deep learning thus far has no natural way to deal with hierarchical structure - I think this is technically true, but I would also say that a HUMAN can LEARN to use LLMs while taking these limitations into account. It's non-trivial to use them, but they are useful

Deep learning thus far has struggled with open-ended inference - same point as above -- all the limitations are of course open research questions, but it doesn't necessarily mean that scaling was "wrong". (The amount of money does seem crazy though, and if it screws up the US economy, I wouldn't be that surprised)

Deep learning thus far is not sufficiently transparent - absolutely, the scaling has greatly outpaced understanding/interpretability

Deep learning thus far has not been well integrated with prior knowledge - also seems like a valuable research direction

Deep learning thus far cannot inherently distinguish causation from correlation - ditto

Deep learning presumes a largely stable world, in ways that may be problematic - he uses the example of Google Flu Trends ... yes, deep learning cannot predict the future better than humans. That is a key point in the book "AI Snake Oil". I think this relates to the point about generalization -- deep learning is better at regurgitating and remixing the past, rather than generalizing and understanding the future.

Lots of people are saying otherwise, and then when you call them out on their predictions from 2 years ago, they have curiously short memories.

Deep learning thus far works well as an approximation, but its answers often cannot be fully trusted - absolutely, this is the main limitation. You have to verify its answers, and this can be very costly. Deep learning is only useful when verifying say 5 solutions is significantly cheaper than coming up with one yourself.

Deep learning thus far is difficult to engineer with - this is still true, e.g. deep learning failed to solve self-driving ~10 years ago

---

So Marcus is not wrong, and has nothing to apologize for. The scaling enthusiasts were not exactly wrong either, and we'll see what happens to their companies.

It does seem similar to be dot com bubble - when the dust cleared, real value was created. But you can also see that the marketing was very self-serving.

Stuff like "AGI 2027" will come off poorly -- it's an attempt by people with little power to curry favor with powerful people. They are serving as the marketing arm, and oddly not realizing it.

"AI will write all the code" will also come off poorly. Or at least we will realize that software creation != writing code, and software creation is the valuable activity


I think it would help if either side could be more quantitative about their claims, and the problem is both narratives are usually rather weaselly. Let's take this section:

>Deep learning thus far is shallow and has limited capacity for transfer - yes, Sutskeyer is saying that deep learning doesn't generalize as well as humans

But they do generalize to some extent, and my limited understanding is that they generalize way more than expected ("emergent abilities") from the pre-LLM era, when this prediction was made. Sutskever pretty much starts the podcast saying "Isn’t it straight out of science fiction?"

Now Gary Marcus says "limited capacity for transfer" so there is wiggle room there, but can this be quantified and compared to what is being seen today?

In the absence of concrete numbers, I would suspect he is wrong here. I mean, I still cannot mechanistically picture in my head how my intent, conveyed in high-level English, can get transformed into working code that fits just right into the rather bespoke surrounding code. Beyond coding, I've seen ChatGPT detect sarcasm in social media posts about truly absurd situations. In both cases, the test data is probably outside the distribution of the training data.

At some level, it is extracting abstract concepts from its training data, as well as my prompt and the unusual test data, even apply appropriate value judgements to those concepts where suitable, and combine everything properly to generate a correct response. These are much higher-level concepts than the ones Marcus says deep learning has no grasp of.

Absent quantifiable metrics, on a qualitative basis at least I would hold this point against him.

On a separate note:

> "AI will write all the code" will also come off poorly.

On the contrary, I think it is already true (cf agentic spec-driven development.) Sure, there are the hyper-boosters who were expecting software engineers to be replaced entirely, but looking back, claims from Dario, Satya, Pichai and their ilk were were all about "writing code" and not "creating software." They understand the difference and in retrospect were being deliberately careful in their wording while still aiming to create a splash.


clap clap clap clap

Agreed on all points. Let's see some numerical support.


I have definitely experienced the sycophancy ... and LLMs have sometimes repeating talking points from real estate agents, like "you the buyer doesn't pay for an agent; the seller pays".

I correct it, and it says "sorry you're right, I was repeating a talking point from an interested party"

---

BUT actually a crazy thing is that -- with simple honest questions as prompts -- I found that Claude is able to explain the 2024 National Association of Realtors settlement better than anyone I know

https://en.wikipedia.org/wiki/Burnett_v._National_Associatio...

I have multiple family members with Ph.D.s, and friends in relatively high level management, who have managed both money and dozens of people

Yet they somehow don't agree that there was collusion between buyers' and sellers' agents? They weren't aware it happened, and they also don't seem particularly interested in talking about the settlement

I feel like I am taking crazy pills when talking to people I know

Has anyone else experienced this?

Whenever I talk to agents in person, I am also flabberghasted by the naked self-interest and self-dealing. (I'm on the east coast of the US btw)

---

Specifically, based on my in-person conversations with people I have known for decades, they don't see anything odd about this kind of thing, and basically take it at face value.

NAR Settlement Scripts for REALTORS to Explain to Clients

https://www.youtube.com/watch?v=lE-ESZv0dBo&list=TLPQMjQxMTI...

https://www.nar.realtor/the-facts/nar-settlement-faqs'

They might even say say something like "you don't pay; the seller pays". However Claude can explain the incentives very clearly, with examples


The agent is there to skim 3% of the sale price in exchange for doing nothing. Now you know all there is to know about realtors.


Most people conduct very few real estate transactions in their life, so maybe they just don’t care enough to remember stuff like this.


People don't care if they're colluded against for tens of thousands of dollars? 6% of an American house is a lot of money

Because it's often spread over many years of a mortgage, I can see why SOME people might not. It is not as concrete as someone stealing your car, but the amount is in the same ballpark

But some people should care - these are the same people who track their stock portfolios closely, have college funds for their kids, etc.

A mortgage is the biggest expense for many people, and generally speaking I've found that people don't like to get ripped off :-)


A mortgage is the biggest expense for many people, and generally speaking I've found that people have no idea what they are doing and don't want to fuck it up so will happily pay lots of "professionals" whatever they say are "totally normal fees that everyone pays"


It's as simple as that successful people are selected due to their proud and energetic obedience to authority and institutions. It's tautological - the reason their opinions are respected is because institutions have approved them as people (PhD's!, management!) Authority is anyone wearing a white coat (or really any old white man in a suit with an expensive haircut), and an institution is anybody with serif letterhead.*

People are only aware of the deceit of their own industry, but still work to perpetuate it with varying levels of upset; they 1) just don't talk about how obviously evil what they do is; 2) talk about it, wish that they had chosen another industry, and maybe set deadlines (after we pay off the house, after the kids move out) to switch industries, or 3) overcompensate in the other direction and joke about what suckers the people they're conning are.

I can tell you first-hand that this is exactly what happened inside NAR. At the top it was entirely 3) - it couldn't be anything else - because they were actively lobbying for agents to have no fiduciary duty to their clients. They were targeting politicians who seemed friendly to the idea, and simply paying them to have a different opinion, or threatening to pay their opponents. If you look at how NAR (or any of these groups) actually, materially lobby, it's clear that they have exactly the same view of their industry as their worst critics.

* And by this I mean that if you are white, try to look older (or be old), buy a nice tailored suit, get an expensive haircut, incorporate with a name that sounds institutional, get letterhead (including envelopes) with a professional logo with serifs and an expensive business card with raised print, and you can con your way into anything. You don't have to be handsome or thin or articulate, but you can't have any shame because people will see it.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: