Sitemap - 2023 - The Kaitchup – AI on a Budget
unsloth: Faster and Memory-Efficient QLoRA Fine-tuning
Behind the OpenLLM Leaderboard: The Evaluation Harness
Mixtral-8x7B: Understanding and Running the Sparse Mixture of Experts by Mistral AI
Fine-tune Better Chat Models with Distilled Identity Preference Optimization (IPO)
The Kaitchup's Table of Contents
LQ-LoRA: Jointly Fine-tune and Quantize Large Language Models
Combine Multiple LoRA Adapters for Llama 2
Simple, Fast, and Memory-Efficient Inference for Mistral 7B with Activation-Aware Quantization (AWQ)
Use FlashAttention-2 for Faster Fine-tuning and Inference
Don't Merge Your LoRA Adapter Into a 4-bit LLM
A Cheap Zephyr 7B Beta: Distilled DPO on Consumer Hardware
Zephyr 7B Beta: A Good Teacher Is All You Need
Llama 2 MT: Turn Llama 2 into a Translation System with QLoRA
Fine-tune Your Own Instruct Version of Mistral 7B with Direct Preference Optimization (DPO)
Mistral 7B: Recipes for Fine-tuning and Quantization on Your Computer
Hardware: What Do You Need to Run LLMs with Billions of Parameters
Fine-tune Quantized Llama 2 on Your GPU with QA-LoRA
QA-LoRA: Quantization-Aware Fine-tuning for Large Language Models
Fast and Small Llama 3 with Activation-Aware Quantization (AWQ)
How to Fine-tune, Quantize, and Run Microsoft phi-1.5
Run Llama 2 70B on Your GPU with ExLlamaV2
Safe, Fast, and Memory Efficient Loading of LLMs with Safetensors
Train Instruct LLMs On Your GPU with DeepSpeed Chat — Step #2: Training a Reward Model
Falcon 180B: Can It Run on Your Computer?
LoRA Adapters: When a Naive Merge Leads to Poor Performance
Train Instruct LLMs On Your GPU with DeepSpeed Chat — Step #1: Supervised Fine-tuning
Quantize and Fine-tune LLMs with GPTQ Using Transformers and TRL
GPTQ or bitsandbytes: Which Quantization Method to Use for LLMs — Examples with Llama 2
Platypus: Dataset Curation and Adapters for Better Large Language Models
Serve Large Language Models from Your Computer with Text Generation Inference
Fine-tune Llama 2 on Your Computer with QLoRa and TRL
Llama 2 and SFTTrainer: 5 Quick Tips to Get Started
What You Cannot Do With Llama 2
Quantization of Llama 2 with GTPQ for Fast Inference on Your Computer
Run Llama 2 Chat Models on Your Computer
ReLoRa: Pre-train a Large Language Model on Your GPU
Device Map: Avoid Out-of-Memory Errors When Running Large Language Models
Fine-tune a Chat Model on Your Data with QLoRA
Can You Use the Falcon Models For Commercial Applications?
vLLM: PagedAttention for 24x Faster LLM Inference
Most LLMs Don’t Comply with the Draft of the EU AI Act
Lightweight Inference with Large Language Models Using QLoRa
Simple and Quick Fine-Tuning of Falcon Models with QLoRA
High-Speed Inference with llama.cpp and Vicuna on CPU
Behind the Hype: Models based on T5 (2019) Still Better than Vicuna, Alpaca, MPT, and Dolly
Introduction to the Open LLM Falcon-40B: Performance, Training Data, and Architecture
Fine-tune Falcon-7B on Your GPU with TRL and QLoRa
QLoRA: Fine-Tune a Large Language Model on Your GPU
GPT-3.5 Translates Paragraphs Better
Meta MMS Better than OpenAI Whisper? Not So Sure…
PaLM 2 Evaluation: Automatic Summarization
PaLM 2 Evaluation: Is Google Translate Getting Worse?
Do Bigger Evaluation Datasets Make Your Results More Significant?
Scientific Credibility in Machine Translation Research: Pitfalls and Promising Trends
Run ChatGPT and GPT Models on Your Website with PHP
OpenAI Account: Documentation, Playground, and Models’ Hyperparameters
Deploy Your Local GPT Server With Triton
A Gentle Introduction to GPT Models
Italy Bans ChatGPT, Europe May Follow
The Decontaminated Evaluation of GPT-4
ChatGPT to Evaluate Generated Text
Traditional Versus Neural Metrics for Machine Translation Evaluation
Data Preprocessing for Machine Translation
Datasets to Train, Validate, and Evaluate Machine Translation