4 Comments
User's avatar
Mandy Liu's avatar

Thank you for sharing Cornellius! One to bookmark for myself!

Cornellius Yudha Wijaya's avatar

You are welcome Mandy. I am glad you find it useful.

Meng Li's avatar

The training process for large models mainly includes the following stages:

1. Pretraining Stage

2. Tokenizer Training

3. Language Model Pretraining

4. Dataset Cleaning

5. Model Performance Evaluation

6. Instruction Tuning Stage

7. Open Source Dataset Organization

8. Model Evaluation Methods

Cornellius Yudha Wijaya's avatar

Thank you for the input! It's certainly true the cycle for training LLM following the steps you pointing out.