> > You can force an fsync after each messsage [sic] with always, this will slow down the throughput to a few hundred msg/s.
Is the performance warning in the NATS possible to improve on? Couldn't you still run fsync on an interval and queue up a certain number of writes to be flushed at once? I could imagine latency suffering, but batches throughput could be preserved to some extent?
> Is the performance warning in the NATS possible to improve on? Couldn't you still run fsync on an interval and queue up a certain number of writes to be flushed at once? I could imagine latency suffering, but batches throughput could be preserved to some extent?
Yes, and you shouldn't even need a fixed interval. Just queue up any writes while an `fsync` is pending; then do all those in the next batch. This is the same approach you'd use for rounds of Paxos, particularly between availability zones or regions where latency is expected to be high. You wouldn't say "oh, I'll ack and then put it in the next round of Paxos", or "I'll wait until the next round in 2 seconds then ack"; you'd start the next batch as soon as the current one is done.
Yes, this is a reasonably common strategy. It's how Cassandra's batch and group commit modes work, and Postgres has a similar option. Hopefully NATS will implement something similar eventually.
This will block threads while waiting for other threads to write. That might work great for your threading model but I usually end up putting the writer in one thread and then other threads send writes to the writer thread.
Oh nice, yes I think your threads should be able to perform reads concurrently when the write lock is not held. Would make sure you are in WAL mode as well, since I think that will improve your concurrency.
Just that row should be locked since it's: "for update skip locked".
I agree the concurrency limitation is kind of rough, but it's kind of elegant because you don't have to implement some kind of timeout/retry thing. You're certainly still exposed to the possibility of double-sending, so yes, probably much nicer to update the row to "processing" and re-process those rows on a timeout.
Not directly DuckDB (though I think it might be able to be connected to that), but I think Apache Datafusion Ballista[0] would be a typical modern open source benchmark here.
reply