For flux, I actually use a script that quantizes it down to 8 bit (not FP8, but true quantization with huggingface quanto), but I would also highly recommend checking this project out. It should fit everything in vram and be dramatically faster: https://github.com/mit-han-lab/nunchaku
I just run SD1.5 models, my process involves a lot of upscaling since things come out around 512 base size; I don’t really fuck with SDXL because generating at 1024 halves and halves again the number of images I can generate in any pass (and I have a lot of 1.5-based LORA models). I do really like SDXL’s general capabilities but I really rarely dip into that world (I feel like I locked in my process like 1.5 years ago and it works for me, don’t know what you kids are doing with your fancy pony diffusions 😃)
I usually run batches of 16 at 512x768 at most, doing more than that causes bottlenecks, but I feel like I was also able to do that on the 3070ti also. I’ll look into those other tools though when I’m home, thanks for the resources. (HF diffusers? I’m still using A1111)
(ETA: I have written a bunch of unreleased plugins to make A1111 work better for me, like VSCode-like editing for special symbols like (/[, and a bunch of other optimizations. I haven’t released them because they’re not “perfect” yet and I have other projects to be working on, but there’s reasons I haven’t left A1111)
Eh, this is a problem because the “engine” is messy and unoptimized. You could at least try to switch to the “reforged” version, which might preserve extension compatibility and let you run features like torch.compile.
Hmm maybe in the new year I’ll try and update my process. I’m in the middle of a project though so it’s more about reliability than optimization. Thanks for the info though.
Oh, 16GB should be plenty for SDXL.
For flux, I actually use a script that quantizes it down to 8 bit (not FP8, but true quantization with huggingface quanto), but I would also highly recommend checking this project out. It should fit everything in vram and be dramatically faster: https://github.com/mit-han-lab/nunchaku
I just run SD1.5 models, my process involves a lot of upscaling since things come out around 512 base size; I don’t really fuck with SDXL because generating at 1024 halves and halves again the number of images I can generate in any pass (and I have a lot of 1.5-based LORA models). I do really like SDXL’s general capabilities but I really rarely dip into that world (I feel like I locked in my process like 1.5 years ago and it works for me, don’t know what you kids are doing with your fancy pony diffusions 😃)
Oh you should be able to batch the heck out of that on a 4080. Are you not using HF diffusers or something?
I’d check out stable-fast if you haven’t already:
https://github.com/chengzeyi/stable-fast
VoltaML is also old at this point, but it has really fast AITemplate implementation for SD 1.5: https://github.com/VoltaML/voltaML-fast-stable-diffusion
I usually run batches of 16 at 512x768 at most, doing more than that causes bottlenecks, but I feel like I was also able to do that on the 3070ti also. I’ll look into those other tools though when I’m home, thanks for the resources. (HF diffusers? I’m still using A1111)
(ETA: I have written a bunch of unreleased plugins to make A1111 work better for me, like VSCode-like editing for special symbols like (/[, and a bunch of other optimizations. I haven’t released them because they’re not “perfect” yet and I have other projects to be working on, but there’s reasons I haven’t left A1111)
Eh, this is a problem because the “engine” is messy and unoptimized. You could at least try to switch to the “reforged” version, which might preserve extension compatibility and let you run features like torch.compile.
Hmm maybe in the new year I’ll try and update my process. I’m in the middle of a project though so it’s more about reliability than optimization. Thanks for the info though.
Yep. I didn’t mean to process shame you or anything, just trying to point out obscure but potentially useful projects most don’t know about :P
It’s sort of a niche within a niche and I appreciate your sharing some knowledge with me, thanks!