1. Assume date is 8 bytes
2. Assume 64bit counters
So for each date in the dataset we need 16 bytes to accumulate the result.
That's ~180 years worth of daily post counts per gb ram - but the dataset in the post was just 1 year.
This problem should be mostly network limited in the OP's context, decompressing snappy compressed parquet should be circa 1gb/sec. The "work" of parsing a string to a date and accumulating isn't expensive compared to snappy decompression.
I don't have a handle on the 33% longer runtime difference between duckdb and polars here.
working memory requirements
So for each date in the dataset we need 16 bytes to accumulate the result.That's ~180 years worth of daily post counts per gb ram - but the dataset in the post was just 1 year.
This problem should be mostly network limited in the OP's context, decompressing snappy compressed parquet should be circa 1gb/sec. The "work" of parsing a string to a date and accumulating isn't expensive compared to snappy decompression.
I don't have a handle on the 33% longer runtime difference between duckdb and polars here.