curious what hardware you use? and is any of this runnable on an m1 laptop?

keyle · on April 3, 2024

Absolutely, 7B will run comfortably on 16GB of RAM and most consumer level hardware. Some of the 40B run on 32GB, but it depends on the model I found (GGUF, crossing fingers help).

I ran this originally on a M1 with 32GB, I run this on an Air M2 with 16GB (and mac mini M2 32GB), no problem.

I use llama.cpp with a SwiftUI interface (my own), all native, no scripts python/js/web.

7b is obviously less capable but the instant response makes it worth exploring. It's very useful as a Google search replacement that is instantly more valuable, for general questions, than dealing with the hellscape of blog spam ruling Google atm.

Note, for my complex code queries at $dayjob where time is of the essence, I still use GPT4 plus, which is still unmatched imho, without running special hardware at least.

regularfry · on April 4, 2024

I've been occasionally using a 7b Q4 quant on llama.cpp on an 8GB M1. It's usable, if not amazing.

nilsherzig · on April 3, 2024

Depends on your m1 specs, but should definitely be able to run a 7b model (at least with some quantization).