The master-omnibus image bundles all that into a single container is MUCH simpler to deploy.
Literally just used their compose file they provide at https://github.com/AnalogJ/scrutiny/blob/master/docker/example.omnibus.docker-compose.yml and added in the device names and was done.
Depending on the exact flags, some workloads will be faster, some will be identical, and some will be slower. Compilier optimization is some dark magic that relies on a ton of factors, but you can’t just assume that going from like -O2 to -O3 will provide better performance, since the optimizations also rely on the underlying code as to what they’ll actually make happen… and is why, for the most part, everyone suggests you stop at -O2 since you can start getting unexpected behavior the further up the curve you go.
And we’re talking low single digit performance improvements at best, not anything that anyone who is doing anything that’s not running benchmarks 24/7 would ever even notice in real world performance.
Disclaimer: there are workloads that are going to show different performance uplifts, but we’re talking Firefox and KDE and games here, per the OP’s comments.
Also they do default to a different scheduler, which is almost certainly why anyone using it will notice it feels “faster”, but it’s mainlined in the kernel so it’s not like you can’t use that anywhere else.