Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

What is the point of simulating 650GB data with ~40 columns if you are going to use a single column for testing? Is that even 16GB?


It's a strided array and slows down memory access.


It's a parquet file. Column data is stored in contiguous pages (and that's how duckdb and polars read them).


Okay, wasn't aware of that.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: