• Altima NEO@lemmy.zip
    link
    fedilink
    English
    arrow-up
    0
    ·
    12 days ago

    I’ve got a 3090, and I feel ya. Even 24 gigs is hitting the cap pretty often and slowing to a crawl once system ram starts being used.

    • brucethemoose@lemmy.world
      link
      fedilink
      English
      arrow-up
      0
      ·
      edit-2
      12 days ago

      You can’t let it overflow if you’re using LLMs on windows. There’s a toggle for it in the Nvidia settings, and get llama.cpp to offload though its settings (or better yet, use exllama instead).

      But…. Yeah. Qwen 32B fits in 24GB perfectly, and it’s great, but 72B really feels like the intelligence tipping point where I can dump so many API models, and that barely won’t fit in 24GB.