Table of Contents
Fine-tuning
LoRA and QLoRA
Fine-tune Mixtral-8x7B Quantized with AQLM (2-bit) on Your GPU*
QA-LoRA: Quantization-Aware Fine-tuning for Large Language Models
LQ-LoRA: Jointly Fine-tune and Quantize Large Language Models*
LoRA Adapters: When a Naive Merge Leads to Poor Performance*
Direct Preference Optimization (DPO) and Identity Preference Optimization (IPO)
Fine-tune Better Chat Models with Distilled Identity Preference Optimization (IPO)
Fine-tune Your Own Instruct Version of Mistral 7B with Direct Preference Optimization (DPO)*
Reinforcement Learning with Human Feedback (RLHF)
Train Instruct LLMs On Your GPU with DeepSpeed Chat — Step #1: Supervised Fine-tuning
Train Instruct LLMs On Your GPU with DeepSpeed Chat — Step #2: Training a Reward Model*
Optimization
Quantization
AQLM
GPTQ
From 16-bit to 2-bit: Finding the Best Trade-off Between Memory-Efficiency and Accuracy*
Quantization of Llama 2 with GTPQ for Fast Inference on Your Computer*
GPTQ or bitsandbytes: Which Quantization Method to Use for LLMs — Examples with Llama 2*
Quantize and Fine-tune LLMs with GPTQ Using Transformers and TRL*
AWQ
Fast and Small Llama 2 with Activation-Aware Quantization (AWQ)*
Simple, Fast, and Memory-Efficient Inference for Mistral 7B with Activation-Aware Quantization (AWQ)
bitsandbytes NF4
ExLlama
SqueezeLLM
Efficient Loading and Inference
GGUF Quantization for Fast and Memory-Efficient Inference on Your CPU*
vLLM: Serve Fast Mistral 7B and Llama 2 Models from Your Computer*
Speculative Decoding for Faster Inference with Mixtral-8x7B and Gemma*
Device Map: Avoid Out-of-Memory Errors When Running Large Language Models
Safe, Fast, and Memory Efficient Loading of LLMs with Safetensors*
Serve Large Language Models from Your Computer with Text Generation Inference
Pre-training
Merge and Mixture of Expert
The Mayonnaise: Rank First on the Open LLM Leaderboard with TIES-Merging*
Run Mixtral-8x7B on Consumer Hardware with Expert Offloading*
Mixtral-8x7B: Understanding and Running the Sparse Mixture of Experts by Mistral AI*
Benchmarking
LLM Focus
Llama 2
Falcon
Mistral 7B
Microsoft phi-1.5 and phi-2
Google Gemma
Qwen
Machine Translation
Fine-tuning
GPT
Evaluation
Traditional Versus Neural Metrics for Machine Translation Evaluation
Scientific Credibility in Machine Translation Research: Pitfalls and Promising Trends