Hacker Newsnew | past | comments | ask | show | jobs | submit | maxmcd's commentslogin

"Reply in the tone of Wikipedia" has worked pretty well for me

> > You can force an fsync after each messsage [sic] with always, this will slow down the throughput to a few hundred msg/s.

Is the performance warning in the NATS possible to improve on? Couldn't you still run fsync on an interval and queue up a certain number of writes to be flushed at once? I could imagine latency suffering, but batches throughput could be preserved to some extent?


> Is the performance warning in the NATS possible to improve on? Couldn't you still run fsync on an interval and queue up a certain number of writes to be flushed at once? I could imagine latency suffering, but batches throughput could be preserved to some extent?

Yes, and you shouldn't even need a fixed interval. Just queue up any writes while an `fsync` is pending; then do all those in the next batch. This is the same approach you'd use for rounds of Paxos, particularly between availability zones or regions where latency is expected to be high. You wouldn't say "oh, I'll ack and then put it in the next round of Paxos", or "I'll wait until the next round in 2 seconds then ack"; you'd start the next batch as soon as the current one is done.


Yes, this is a reasonably common strategy. It's how Cassandra's batch and group commit modes work, and Postgres has a similar option. Hopefully NATS will implement something similar eventually.


This will block threads while waiting for other threads to write. That might work great for your threading model but I usually end up putting the writer in one thread and then other threads send writes to the writer thread.


I do open 2 connections:

First one for writing with flags:

    SQLITE_OPEN_CREATE | SQLITE_OPEN_READWRITE | SQLITE_OPEN_FULLMUTEX
Second one for reading with flags:

    SQLITE_OPEN_READONLY | SQLITE_OPEN_FULLMUTEX
As you can note, I have SQLITE_OPEN_FULLMUTEX on both of them. Should I only have it for the writing one?


Oh nice, yes I think your threads should be able to perform reads concurrently when the write lock is not held. Would make sure you are in WAL mode as well, since I think that will improve your concurrency.

Just that row should be locked since it's: "for update skip locked".

I agree the concurrency limitation is kind of rough, but it's kind of elegant because you don't have to implement some kind of timeout/retry thing. You're certainly still exposed to the possibility of double-sending, so yes, probably much nicer to update the row to "processing" and re-process those rows on a timeout.



Are there any open sourced sharded query planners like this? Something that can aggregate queries across many duckdb/sqlite dbs?


Not directly DuckDB (though I think it might be able to be connected to that), but I think Apache Datafusion Ballista[0] would be a typical modern open source benchmark here.

[0]: https://datafusion.apache.org/ballista/contributors-guide/ar...


DeepSeek released smallpond

0 - https://github.com/deepseek-ai/smallpond

1 - https://www.definite.app/blog/smallpond (overview for data engineers, practical application)


There are a few different styles: https://github.com/orgs/community/discussions/16925


Thanks for the link. These are nice additions.


Do they mention transactions anywhere? Maybe it will be OLAP?


I think this is the comparable view: https://www.energydashboard.co.uk/live


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: