The Kaitchup's Table of Contents
The Kaitchup has now more than 100 articles. Most of them are still up-to-date but older articles are getting difficult to find.
I created a table of contents to organize them by topic. Have a look below, you might find interesting articles that you have missed.
This table of contents is also on this page that will be regularly updated:
Fine-tuning
LoRA
QA-LoRA: Quantization-Aware Fine-tuning for Large Language Models
LQ-LoRA: Jointly Fine-tune and Quantize Large Language Models*
LoRA Adapters: When a Naive Merge Leads to Poor Performance*
Direct Preference Optimization (DPO)
Reinforcement Learning with Human Feedback (RLHF)
Train Instruct LLMs On Your GPU with DeepSpeed Chat — Step #1: Supervised Fine-tuning
Train Instruct LLMs On Your GPU with DeepSpeed Chat — Step #2: Training a Reward Model*
Optimization
Quantization
GPTQ
Quantization of Llama 2 with GTPQ for Fast Inference on Your Computer*
GPTQ or bitsandbytes: Which Quantization Method to Use for LLMs — Examples with Llama 2*
Quantize and Fine-tune LLMs with GPTQ Using Transformers and TRL*
AWQ
Fast and Small Llama 2 with Activation-Aware Quantization (AWQ)*
Simple, Fast, and Memory-Efficient Inference for Mistral 7B with Activation-Aware Quantization (AWQ)
bitsandbytes NF4
ExLlama
Efficient Loading and Inference
Device Map: Avoid Out-of-Memory Errors When Running Large Language Models
Safe, Fast, and Memory Efficient Loading of LLMs with Safetensors*
Serve Large Language Models from Your Computer with Text Generation Inference
Pre-training
LLM Focus
Llama 2
Falcon
Mistral 7B
Microsoft phi-1.5
Machine Translation
Fine-tuning
GPT
Evaluation
Traditional Versus Neural Metrics for Machine Translation Evaluation
Scientific Credibility in Machine Translation Research: Pitfalls and Promising Trends