• Hackworth@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    12 days ago

    Yeah, I’ve had decent results running the 7B/8B models, particularly the fine tuned ones for specific use cases. But as ya mentioned, they’re only really good in thier scope for a single prompt or maybe a few follow-ups. I’ve seen little improvement with the 13B/14B models and find them mostly not worth the performance hit.

    • brucethemoose@lemmy.world
      link
      fedilink
      English
      arrow-up
      0
      ·
      12 days ago

      Depends which 14B. Arcee’s 14B SuperNova Medius model (which is a Qwen 2.5 with some training distilled from larger models) is really incrtedible, but old Llama 2-based 13B models are awful.

      • Hackworth@lemmy.world
        link
        fedilink
        English
        arrow-up
        0
        ·
        12 days ago

        I’ll try it out! It’s been a hot minute, and it seems like there are new options all the time.

        • brucethemoose@lemmy.world
          link
          fedilink
          English
          arrow-up
          0
          ·
          edit-2
          12 days ago

          Try a new quantization as well! Like an IQ4-M depending on the size of your GPU, or even better, an 4.5bpw exl2 with Q6 cache if you can manage to set up TabbyAPI.