Rant. I do not see any improvement in the outcome code. Lets not nitpick on fast parsing and just scroll through unnecessary code actions.
> ... u8line(move(line))
We are not reusing parsed line object between iterations. Forcing fresh allocation per line.
> auto words = ...
Fresh allocation per line.
> lookup/insert
Lookup and hashing done 2 times for each word. Each unique word individually allocated on the heap.
> stats.push_back
Not preallocated. Likely doing full allocate + copy per each word.
> sort_relocatable
Could have been faster with additional memory provided. But this is minor because sorting probably was not ideal in the first place.
and the icing on the cake:
>printf("%d ... (int)count ...
As old saying goes "One can write Fortran program in any language". There are zero reasons to write non type safe text output in 2025 in C++ but here we are.
TLDR. One can name their foundation library any name and use any namespace it does not change how the code written much. Right?
> ... u8line(move(line))
We are not reusing parsed line object between iterations. Forcing fresh allocation per line.
> auto words = ...
Fresh allocation per line.
> lookup/insert
Lookup and hashing done 2 times for each word. Each unique word individually allocated on the heap.
> stats.push_back
Not preallocated. Likely doing full allocate + copy per each word.
> sort_relocatable
Could have been faster with additional memory provided. But this is minor because sorting probably was not ideal in the first place.
and the icing on the cake:
>printf("%d ... (int)count ...
As old saying goes "One can write Fortran program in any language". There are zero reasons to write non type safe text output in 2025 in C++ but here we are.
TLDR. One can name their foundation library any name and use any namespace it does not change how the code written much. Right?