Can't relate at all.

Sips'@slrpnk.net · 12 days ago

Can't relate at all.

Altima NEO@lemmy.zip · 12 days ago

I’ve got a 3090, and I feel ya. Even 24 gigs is hitting the cap pretty often and slowing to a crawl once system ram starts being used.

brucethemoose@lemmy.world · edit-2 12 days ago

You can’t let it overflow if you’re using LLMs on windows. There’s a toggle for it in the Nvidia settings, and get llama.cpp to offload though its settings (or better yet, use exllama instead).

But…. Yeah. Qwen 32B fits in 24GB perfectly, and it’s great, but 72B really feels like the intelligence tipping point where I can dump so many API models, and that barely won’t fit in 24GB.