Hacker Newsnew | past | comments | ask | show | jobs | submit | red75prime's commentslogin

> Mercedes Drive Pilot is classified as L3 which is better than Tesla or Rivian.

Try to find videos where people actually use it. A handful of 1 minute long promotional and car reviewers' videos. It's mostly a marketing move.


I can't find how it will operate. Will it detect a situation where lines are not painted sufficiently clearly, warn you, and disengage? Or will you need to detect where it begins to operate wonky because lines aren't painted sufficiently clearly and to take over? I guess it's the second. It's hands-off, not eyes-off.

Also, I would like to see a car company that is further down the road of full autonomy clearly describing all the long tail scenarios. It's just impossible.


The current Gen 1s will start beeping at you if they can’t see the lines. If you don’t take over quickly it will start slowing down and beeping very insistently.

So, you bet on a) transformers can't be a load-bearing part of AI, and b) whatever replaces them will not be able to utilize TPUs. Do you have any reasons for those assumptions?

Looking at your history it's something like "I tried them and they hallucinate" and, possibly, you've read an article that talks about inevitability of hallucinations. Correct? What's your reason for thinking that hallucination rate can't be lowered to or below the human rate ("Damn! What I was thinking about?").


...this is where I randomly decided to remember this particular day of my life. Yep, I indeed did it because why not. No, it didn't work particularly well, but I do remember some things about that day.

I mean it's not just automatic thing with no higher-level control.


Does it generalize though? What a bag-of-words metaphor can say about a question "How many reinforcement learning training examples an LLM need to significantly improve performance on mathematical questions?"

A country. A collective of people with a dedicated structure to represent interests and enforce strategies of the said collective as a whole.

We need a difference to discover what it is. How can we know that all LLMs don't?

If you tediously work out the LLM math by hand, is the pen and paper conscious too?

Consciousness is not computation. You need something else.


The pen and paper are not the actual substrate of entropy reduction, so not really.

Consciousness is what it "feels like" when a part of the universe is engaged in local entropy reduction. You heard it here first, folks!


This comment here is pure gold. I love it.

On the flip side: If you do that, YOU are conscious and intelligent.

Would it mean that the machine that did the computation became conscious when it did it?

What is consciousness?


Even if they do, it can only be transiently during the inference process. Unlike a brain that is constantly undergoing dynamic electrochemical processes, an LLM is just an inert pile of data except when the model is being executed.

My first language is Russian. I can't fully understand this dreaded "doctor's cursive", but I can see that some parts of Gemini's text is probably wrong.

It's most likely "но кашель сохр-ся лающий" ("but barking cough is still present"), not "кожные покровы чистые" ("the skin is clean"). Diagnose is probably wrong too. Judging by symptoms it should be "ОРЗ", but I have no idea what's actually written there.

Still, it's very, very impressive.


Ask them to mark low confidence words.

Do they actually have access to that info "in-band"? I would guess not. OTOH it should be straightforward for the LLM program to report this -- someone else commented that you can do this when running your own LLM locally, but I guess commercial providers have incentives not to make this info available.

Naturally, their "confidence" is represented as activations in layers close to output, so they might be able to use it. Research ([0], [1], [2], [3]) shows that results of prompting LLMs to express their confidence correlate with their accuracy. The models tend to be overconfident, but in my anecdotal experience the latest models are passably good at judging their own confidence.

[0] https://ieeexplore.ieee.org/abstract/document/10832237

[1] https://arxiv.org/abs/2412.14737

[2] https://arxiv.org/abs/2509.25532

[3] https://arxiv.org/abs/2510.10913


interesting... I'll give that a shot

Do children at Piaget’s preoperational stage (ages 2-7) think?

Yes, to a limited extent, in line with their brains' development. These developmental stages have predictive power as to what kind of things an average 2-7 year-old is and isn't able to do yet.

Are there any discrete stages of LLM performance which can describe why free ChatGPT is unable to realise the absolute nonsense in saying «The surgeon could be the son's mother, which explains why she would say, "He’s my father."», and what kind of model is not expected to fall for that trap?


if we think of them in generations, it seems free ChatGPT is a generation or two behind. I gave a modified river crossing problem to ChatGPT-3, and it failed in the same way, but paid 5.1 doesn't get caught up. exactly where along the way, I'd have to do some digging, but I feel like it was 4.5. The other problem, of course, is that now you've given that question to free ChatGPT, it'll be used as training data, so the next version won't get tripped up the same way.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: