With Examples of Offline and Online Inference
Not sure if you know this. Llama cpp now offers various backends that allow you to use iGPUs (eg vulkan, OpenGL, sycl, etc) which can run pretty fast even on 16GB laptops.
That's good to know! llama.cpp is becoming compatible with most hardware configurations. I'll try to benchmark this.
Not sure if you know this. Llama cpp now offers various backends that allow you to use iGPUs (eg vulkan, OpenGL, sycl, etc) which can run pretty fast even on 16GB laptops.
That's good to know! llama.cpp is becoming compatible with most hardware configurations. I'll try to benchmark this.