Discussion about this post

User's avatar
Giampiero Recco's avatar

Hi Benjamin, thanks for the precious index and content! is there a note/notebook describing how to serve each model in the index. More specifically, for example, I'm having troubles serving google/gemma-3-27b-it-qat-q4_0-gguf on vllm, and it would be convenient to know what configuration/version you used or if there are any "preprocessing" you perform. Thank you

Expand full comment
Max's avatar

Awesome; this is Perfect! More than a year ago we started with your guidance for local hardware configs. Your work has become foundational to our knowledgebase deploying LLMs. Just browsed your index and in total agreement because the models we have selected for app dev as best performing for our requirements are all on your index!! Definitely agree that google/gemma-3-27b-it-qat-q4_0-gguf is a good one. And chose the unsloth quantized versions as credible for fine-tuning, so seeing on your list affirms our decision! Credibility behind the quantizing is extremely important, including protecting IP locally by not injecting security concerns i.e. making unexpected outbound or telemetry connections. a “Quantization Fidelity” metric will be valuable.

Expand full comment
5 more comments...

No posts