Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Exo-Labs is an open source project that allows this too, pipeline parallelism I mean not the latter, and it's device agnostic meaning you can daisy-chain anything you have that has memory and the implementation will intelligently shard model layers across them, though its slow but scales linearly with concurrent requests.

Exo-Labs: https://github.com/exo-explore/exo





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: