Sitemap - 2023 - The Kaitchup – AI on a Budget

The Weekly Kaitchup #21

unsloth: Faster and Memory-Efficient QLoRA Fine-tuning

The Weekly Kaitchup #20

Behind the OpenLLM Leaderboard: The Evaluation Harness

Insights from the Falcons

The Weekly Kaitchup #19

Mixtral-8x7B: Understanding and Running the Sparse Mixture of Experts by Mistral AI

The Weekly Kaitchup #18

Fine-tune Better Chat Models with Distilled Identity Preference Optimization (IPO)

The Kaitchup's Table of Contents

Table of Contents

The Weekly Kaitchup #17

LQ-LoRA: Jointly Fine-tune and Quantize Large Language Models

Combine Multiple LoRA Adapters for Llama 2

The Weekly Kaitchup #16

Simple, Fast, and Memory-Efficient Inference for Mistral 7B with Activation-Aware Quantization (AWQ)

The Weekly Kaitchup #15

Use FlashAttention-2 for Faster Fine-tuning and Inference

Don't Merge Your LoRA Adapter Into a 4-bit LLM

The Weekly Kaitchup #14

A Cheap Zephyr 7B Beta: Distilled DPO on Consumer Hardware

Zephyr 7B Beta: A Good Teacher Is All You Need

The Weekly Kaitchup #13

Llama 2 MT: Turn Llama 2 into a Translation System with QLoRA

The Weekly Kaitchup #12

Fine-tune Your Own Instruct Version of Mistral 7B with Direct Preference Optimization (DPO)

Mistral 7B: Recipes for Fine-tuning and Quantization on Your Computer

The Weekly Kaitchup #11

Hardware: What Do You Need to Run LLMs with Billions of Parameters

Hardware for LLMs

The Weekly Kaitchup #10

Fine-tune Quantized Llama 2 on Your GPU with QA-LoRA

QA-LoRA: Quantization-Aware Fine-tuning for Large Language Models

The Weekly Kaitchup #9

Fast and Small Llama 3 with Activation-Aware Quantization (AWQ)

How to Fine-tune, Quantize, and Run Microsoft phi-1.5

The Weekly Kaitchup #8

Run Llama 2 70B on Your GPU with ExLlamaV2

The Weekly Kaitchup #7

Train Instruct LLMs On Your GPU with DeepSpeed Chat — Step #3: Reinforcement Learning with Human Feedback

Safe, Fast, and Memory Efficient Loading of LLMs with Safetensors

The Weekly Kaitchup #6

Train Instruct LLMs On Your GPU with DeepSpeed Chat — Step #2: Training a Reward Model

Falcon 180B: Can It Run on Your Computer?

The Weekly Kaitchup #5

LoRA Adapters: When a Naive Merge Leads to Poor Performance

Train Instruct LLMs On Your GPU with DeepSpeed Chat — Step #1: Supervised Fine-tuning

The Weekly Kaitchup #4

Quantize and Fine-tune LLMs with GPTQ Using Transformers and TRL

Llama 2 on a Budget

The Weekly Kaitchup #3

GPTQ or bitsandbytes: Which Quantization Method to Use for LLMs — Examples with Llama 2

The Weekly Kaitchup #2

Platypus: Dataset Curation and Adapters for Better Large Language Models

The Weekly Kaitchup #1

Padding Large Language Models

Serve Large Language Models from Your Computer with Text Generation Inference

Fine-tune Llama 2 on Your Computer with QLoRa and TRL

Llama 2 and SFTTrainer: 5 Quick Tips to Get Started

What You Cannot Do With Llama 2

Quantization of Llama 2 with GTPQ for Fast Inference on Your Computer

Run Llama 2 Chat Models on Your Computer

ReLoRa: Pre-train a Large Language Model on Your GPU

Device Map: Avoid Out-of-Memory Errors When Running Large Language Models

Fine-tune a Chat Model on Your Data with QLoRA

Can You Use the Falcon Models For Commercial Applications?

vLLM: PagedAttention for 24x Faster LLM Inference

Most LLMs Don’t Comply with the Draft of the EU AI Act

Lightweight Inference with Large Language Models Using QLoRa

Simple and Quick Fine-Tuning of Falcon Models with QLoRA

High-Speed Inference with llama.cpp and Vicuna on CPU

Behind the Hype: Models based on T5 (2019) Still Better than Vicuna, Alpaca, MPT, and Dolly

Introduction to the Open LLM Falcon-40B: Performance, Training Data, and Architecture

Fine-tune Falcon-7B on Your GPU with TRL and QLoRa

QLoRA: Fine-Tune a Large Language Model on Your GPU

GPT-3.5 Translates Paragraphs Better

Meta MMS Better than OpenAI Whisper? Not So Sure…

PaLM 2 Evaluation: Automatic Summarization

PaLM 2 Evaluation: Is Google Translate Getting Worse?

Do Bigger Evaluation Datasets Make Your Results More Significant?

Scientific Credibility in Machine Translation Research: Pitfalls and Promising Trends

Run ChatGPT and GPT Models on Your Website with PHP

OpenAI Account: Documentation, Playground, and Models’ Hyperparameters

Deploy Your Local GPT Server With Triton

A Gentle Introduction to GPT Models

Italy Bans ChatGPT, Europe May Follow

AI Won’t Replace Translators

The Decontaminated Evaluation of GPT-4

ChatGPT to Evaluate Generated Text

Traditional Versus Neural Metrics for Machine Translation Evaluation

Data Preprocessing for Machine Translation

Translate with ChatGPT

Datasets to Train, Validate, and Evaluate Machine Translation

Watch Out For Your Beam Search Hyperparameters

#nojs-banner { position: fixed; bottom: 0; left: 0; padding: 16px 16px 16px 32px; width: 100%; box-sizing: border-box; background: red; color: white; font-family: -apple-system, "Segoe UI", Roboto, Helvetica, Arial, sans-serif, "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol"; font-size: 13px; line-height: 13px; } #nojs-banner a { color: inherit; text-decoration: underline; } This site requires JavaScript to run correctly. Please turn on JavaScript or unblock scripts