I agree that this is an anti-pattern for training.
In training, you are often I/O bound over S3 - high b/w networking doesn't fix it (.saftensor files are typically 4GB in size). You need NVMe and high b/w networking along with a distributed file system.
We do this with tiered storage over S3 using HopsFS that has a HDFS API with a FUSE client, so training can just read data (from HopsFS datanode's NVMe cache) as if it is local, but it is pulled from NVMe disks over the network.
In contrast, writes go straight to S3 vis HopsFS write-through NVMe cache.
I have one (top of the line!).
Here's how bad the engineers were. For the last 6 months, the device emits 10 audible beeps every 6 hours. I do a lot of customer meetings and public speaking. People would sometimes ask - "what is that noise"? I would say "No idea, but if you wait 8 seconds, it will stop"!
Also, my heart rate would sometimes drop below 40 bpm. Then it would start pacing, which i didn't want and was extremely uncomfortable.
p.s., the reason the battery ran out was because i found a treatment for my condition that works really well through talking globally to experts (i am a computer scientist). I wrote a case study paper about my condition to help others, co-authored by my doctors.
https://www.slideshare.net/slideshow/arvc-and-flecainide-cas...
16 years later, the device is still in place, but I will have it removed early next year.
Anything sovereign AI or whatever is gone immediately when the mods wake up.
Got an EU cloud article? Publish it at 11am CET, it's disappears around 12.30.
See, Peter Thiel is smart. There are enough idiots who will buy his shtick - it's not just maga who get pointed in the direction he wants society to go (serfdom).
Cloudflare tried to build their own feature store, and get a grade F.
I wrote a book on feature stores by O'Reilly. The bad query they wrote in Clickhouse could have been caused by another more error - duplicate rows in materialized feature data. For example, in Hopsworks it prevents duplicate rows by building on primary key uniqueness enforcement in Apache Hudi. In contrast, Delta lake and Iceberg do not enforce primary key constraints, and neither does Clickhouse. So they could have the same bug again due to a bug in feature ingestion - and given they hacked together their feature store, it is not beyond the bounds of possibility.
I got downvoted, and you replied this... when tuition is not free in most European countries, at least not for higher education. In my country, med school used to cost over $18k, which is probably higher now.
Also the very hidden caveat of low or no tuition fee in some other countries is that you study in their language, not English.
He's scarily sane. I am not joking when I say he invented the anti-christ thing as a way to entrench the billionaire class. He's basically saying the anti-christ will be somebody who want to solve the world's problems through collective action. So collective action is bad. We are in such an unbalanced world, that that is the best argument he can come up with for why we should allow such wealth inequality in society.
I'm pretty sure the antichrist is that sports betting guy, Dave Portnoy? The good news is that he's low key, it's kind of a more casual antichrist this time around getting everyone addicted to gambling.
Yeah it's funny when you read his antichrist ramblings, it's essentially the antichrist will come in the form of a leader arguing for wealth redistribution via collective action. This is what instills sheer terror in the billionaire class.
1. Important points that the query is a projection that only returns a fraction of the 650GB that fits in memory. DuckDB is good at streaming larger than memory queries, Polars less mature there. That would show in the results.
2. S3 defaults shouldn't prevent all available threads/cpus from reading the files in parallel, so I would assume that the network bandwidth of the VM (or container) would be the bottleneck.
I still haven't got my head around how OTEL fits into a good open-source monitoring stack. Afaik, it is a protocol for metrics, traces, and logs. And we want our open-source monitoring services/dbs to support it, so they become pluggable. But, afaik, there's no one good DB for logs and metrics, so most of us use Prometheus for metrics and OpenSearch for logs.
Does OTEL mean we just need to replace all our collectors (like logstash for logs and all the native metrics collectors and pushgateway crap) and then reconfigure Prometheus and OpenSearch?
logs, spans and metrics are stored as time-stamped stuff. sure simple fixed-width columnar storage is faster, and makes sense to special case for numbers (add downsampling and aggregations, and histogram maintenance and whatnot), but any write-optimized storage engine can handle this, it's not the hard part (basically LevelDB, and if there's need for scaling out it'll look like Cassandra, Aerospike, ScyllaDB, or ClickHouse ... see also https://docs.greptime.com/user-guide/concepts/data-model/ and specialized storage engines https://docs.greptime.com/reference/about-greptimedb-engines... )
I think the answer is it doesn't fit in any definition of a _good_ monitoring stack, but we are stuck with it. It has largely become the blessed protocol, specification, and standard for OSS monitoring, along every axis (logging, tracing, collecting, instrumentation, etc)...its a bit like the efforts that resulted in J2EE and EJBs back in the day, only more diffuse and with more varied implementations.
And we don't really have a simpler alternative in sight...at least in the java days there was the disgust and reaction via struts, spring, EJB3+, and of course other languages and communities.
Not sure how we exactly we got into such an over-engineered mono-culture in terms of operations and monitoring and deployment for 80%+ of the industry (k8s + graf/loki/tempo + endless supporting tools or flavors), but it is really a sad state.
Then you have endless implementations handling bits and pieces of various parts of the spec, and of course you have the tools to actually ingest and analyze and report on them.
We do this with tiered storage over S3 using HopsFS that has a HDFS API with a FUSE client, so training can just read data (from HopsFS datanode's NVMe cache) as if it is local, but it is pulled from NVMe disks over the network. In contrast, writes go straight to S3 vis HopsFS write-through NVMe cache.
reply