Just a stranger trying things.

  • 0 Posts
  • 11 Comments
Joined 1 year ago
cake
Cake day: July 16th, 2023

help-circle





  • I didn’t say it can’t. But I’m not sure how well it is optimized for it. From my initial testing it queues queries and submits them one after another to the model, I have not seen it batch compute the queries, but maybe it’s a setup thing on my side. vLLM on the other hand is designed specifically for the multi co current user use case and has multiple optimizations for it.


  • The Hobbyist@lemmy.ziptoSelfhosted@lemmy.worldSelf-hosting LLMs
    link
    fedilink
    English
    arrow-up
    0
    ·
    edit-2
    16 days ago

    I run the Mistral-Nemo(12B) and Mistral-Small (22B) on my GPU and they are pretty code. As others have said, the GPU memory is one of the most limiting factors. 8B models are decent, 15-25B models are good and 70B+ models are excellent (solely based on my own experience). Go for q4_K models, as they will run many times faster than higher quantization with little performance degradation. They typically come in S (Small), M (Medium) and (Large) and take the largest which fits in your GPU memory. If you go below q4, you may see more severe and noticeable performance degradation.

    If you need to serve only one user at the time, ollama +Webui works great. If you need multiple users at the same time, check out vLLM.

    Edit: I’m simplifying it very much, but hopefully should it is simple and actionable as a starting point. I’ve also seen great stuff from Gemma2-27B

    Edit2: added links

    Edit3: a decent GPU regarding bang for buck IMO is the RTX 3060 with 12GB. It may be available on the used market for a decent price and offers a good amount of VRAM and GPU performance for the cost. I would like to propose AMD GPUs as they offer much more GPU mem for their price but they are not all as supported with ROCm and I’m not sure about the compatibility for these tools, so perhaps others can chime in.

    Edit4: you can also use openwebui with vscode with the continue.dev extension such that you can have a copilot type LLM in your editor.




  • I don’t care which is better. But I can share certain unique features which make me personally chose GrapheneOS over all other options I know of:

    • it is possible to relock the bootloader
    • you can disable the internet permission
    • the location service is independent on google services, even if you install them
    • you can use mutliple profiles and pipe notifications from one profile to another
    • you control native app debugging (and its off by default)
    • you have storage scope (as well as contacts scope)
    • you get all the latest security patches and really fast
    • and more…

    1. There is no GrapheneOS account.
    2. GrapheneOS has some built in apps, namely for SMS, gallery viewer, camera, PDF reader, calculator, contacts, files, phone and web browser (vanadium, based on chromium). GrapheneOS offers no cloud. You are responsible for using the service of your choice to manage and backup your data. It is currently undergoing a transition for backup management, but otherwise you can make use of a selfhosted service like nextcloud.
    3. GrapheneOS does come preinstalled with its own app store but that it is reserved to GrapheneOS apps and the distribution of certain google services which can be optionally installed using their sandbox. Besides that, you can indeed install the aurora store to get access to the free apps on the google play store, or actually use the google play store. They can all be installed and used simultaneously. Though you might want to be mindful of you install an app on one store to not update it on another as the two versions could work differently (e.g. an app installed on f-droid might have a different notification system than one on the google play store). You do not need to use nextcloud if you don’t want to. GrapheneOS has no dependencies on any other additional app. It is a standalone OS. Once you install it, you use it however you want.

    Edit: one key advantage of GrapheneOS is the possibility of using multiple users. You can (and I recommend it) separate apps into different user profiles. You can for instance dedicate one user profile to apps requiring Google services, let’s call it Gapps. GrapheneOS then allows you to then pipe your notifications between user accounts, so if you are in your main user profile you can get notifications from apps running in Gapps in the background. Very convenient.