c5 is such a bad instance type, m6a would be so much better and even cheaper,
I would love to see this on an m8a.2xlarge (7th and 8th generations don’t use SMT) and that is even cheaper and has up to 15 Gbps
Actually for this kind of workload 15Gbps is still mediocre. What you actually want is the `n` variant of the instance types, which have higher NIC capacity.
In the c6n and m6n and maybe the upper-end 5th gens you can get 100Gbps NICs, and if you look at the 8th gen instances like the c8gn family, you can even get instances with 600Gbps of bandwidth.
A Samsung 990 Pro reads at something like 50 Gbps and PCIe 4.0 x4 is quite a bit faster than that. You can get this speed with a queue depth that isn’t crazy, and you can have multiple NVMe operations in flight reading the same large Parquet file. Latency is in the tens of microseconds.
The consensus seems to be that S3 can read one object at somewhat under 1Gbps. You can probable scale that to the full speed of your NIC by reading multiple objects at once, but you may not be able to scale by reading one object in multiple overlapping ranges. Latency is in the milliseconds.
So, sure, an EC2 with a fast instance and massive multiple object parallelism can have 10x higher bandwidth than an NVMe device, but the amount of parallelism and latency tolerance needed is a couple orders of magnitude higher than NVMe. Meanwhile that NVMe device does not charge for read operations and costs a couple hundred dollars, once.
If you are so inclined, you can build an NVMEoF setup (at much much higher cost) that separates compute and storage and has excellent performance, but this is a nontrivial undertaking.