• brucethemoose@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    edit-2
    12 days ago

    Try a new quantization as well! Like an IQ4-M depending on the size of your GPU, or even better, an 4.5bpw exl2 with Q6 cache if you can manage to set up TabbyAPI.