transformers.js: Run Phi-3.5 & Llama 3.2 in Your Browser
Let's try the JavaScript version of Transformers!
Python is the go-to language for working with large language models (LLMs), and it often uses Hugging Face Transformers, which supports most LLMs. However, there's a JavaScript alternative: transformers.js. Although it currently supports fewer models, it does include very good ones like Phi-3.5 and Llama 3.2.
With transformers.js, you can easily create a web app that leverages LLMs directly in your browser. In this article, I'll show you how to use Phi-3.5 Mini with transformers.js, tested on Google Chrome with a 12 GB GPU (but it also runs on a CPU, albeit slower). We'll cover step-by-step installation, loading the model, processing inputs, generating responses, and post-processing them. Additionally, I'll show how to use your LoRA adapter, fine-tuned for Phi-3.5, in transformers.js. We will also see how to run Llama 3.2 with transformers.js.
I've created a notebook demonstrating how to merge the LoRA adapter and export the model to ONNX format for use with transformers.js: