1. Important points that the query is a projection that only returns a fraction ...

1. Important points that the query is a projection that only returns a fraction of the 650GB that fits in memory. DuckDB is good at streaming larger than memory queries, Polars less mature there. That would show in the results.

2. S3 defaults shouldn't prevent all available threads/cpus from reading the files in parallel, so I would assume that the network bandwidth of the VM (or container) would be the bottleneck.