Yes, to a limited extent, in line with their brains' development. These developmental stages have predictive power as to what kind of things an average 2-7 year-old is and isn't able to do yet.
Are there any discrete stages of LLM performance which can describe why free ChatGPT is unable to realise the absolute nonsense in saying «The surgeon could be the son's mother, which explains why she would say, "He’s my father."», and what kind of model is not expected to fall for that trap?
if we think of them in generations, it seems free ChatGPT is a generation or two behind. I gave a modified river crossing problem to ChatGPT-3, and it failed in the same way, but paid 5.1 doesn't get caught up. exactly where along the way, I'd have to do some digging, but I feel like it was 4.5. The other problem, of course, is that now you've given that question to free ChatGPT, it'll be used as training data, so the next version won't get tripped up the same way.