Mixtral 8x7b Instruct |
Mistral-7B (MoE) |
Sparse MoE of Mistral 7B |
|
Yi |
Llama 2 |
Some modification of different param setting from Llama 2 |
3T tokens (source is not mentioned) |
Tulu 2 |
Llama 2 |
7B쯤 되는 사이즈에서 QLoRA와 full finetuning 성능차가 난다고 주장 |
Finetuned on allenai/tulu-2-dpo-70bDPO on openbmb/UltraFeedback |
WizardLM |
Llama |
seed dataset으로부터 고성능 LLM을 이용해서 dataset을 확장하는 알고리즘 |
Evol-Instruct algorithm from Alpaca (synthetic, not open) |
Vicuna |
Llama |
싸게 학습한 LLMflashattention, grad checkpointSkypilot의 managed spot 기능 |
ShareGPT.com |
Starling LM |
OpenChat 3.5 (based on Mistral-7B-0.1) |
RLAIFAPA라는 policy optimization방식 이용reward model은 berkeley-nest/Starling-RM-7B-alpha |
APA on berkeley-nest/Nectar |
Llama 2 |
|
GQA, SwiGLU, RoPE, |
(2.2 Tokens) |
OpenHermes-2.5-Mistral-7B |
Mistral 7B |
|
GPT-4 generated data |
Llama2-70B-SteerLM-Chat |
Llama 2 |
SteerLMattribute pred model을 만들고 그거로 annotate dataset만들고 그거로 SFT 하고 bootstrapping |
trained on OASST, HH-RLHF, M-SID |
OpenChat-3.5 |
|
C-RLFTreward if GPT4 then 1, elif GPT3 |
|
pplx-70b-online |
Llama 2 |
online LLM (그러면 RAG에 가까운거 아닌가…?) |
|
Dolphin-2.2.1-Mistral-7B |
Mistral 7B |
Microsoft의 Orca 구축 방식을 따라간 Dolphin dataset를 이용하여 SFT |
dolphin + jondurbin/airoboros-2.2.1 + subset of WizardLm + subset of Samantha |
SOLAR-10.7B-Instruct |
Mistral 7B |
Depth Up-scaling (Passthrough in mergekit) |
Instruction tuning on OpenOrca(alpaca format), alpaca-gpt4, synthetic math instructAlignment tuning on Orca DPO pairs, Ultrafeedback, synthetic math instruct |
Zephyr-7b |
Mistral 7B |
Distilled SFT and distilled DPO (teacher-student) |
distilled SFT on ultrachat_200k and distilled DPO on ultrafeedback_binarized |
MPT-30B-chat |
|
8k context length pretraining, Longer context using ALiBi, flashattention |
Pretrained on mC4, C4, RedPajama, The Stack (see hf card) (1T tokens)Finetuned on ShareGPT-Vicuna, Camel_AI, GPTeacher, Guanaco, Baize (se hf card) |
Qwen-14B-Chat |
Similar to llama |
Modified llamabias for QKV layerNTK-aware interpolation for long context |
(3T token including English and Chinese) |
falcon-180b-chat |
|
|
Pretrained on RefinedWeb, Finetuned on ultrachat, Platypus, and Airoboros |
Guanaco-33B |
Llama |
QLoRA on Llama |
Finetuned on OASST1 (see here) |
Mistral-7B-Instruct-v0.1 |
|
SWA, GQA, rolling buffer cache, pre-fill and chunking |
|
PaLM-Chat-Bison-001 |
|
SwiGLU, Parallel Layers, MQA, RoPENo Biases, Shared IO Embeddings |
780B tokens from webpages, social media conversations, books, github, wikipedia, etc |
Koala-13B |
Llama |
Llama 13b finetuned on dialog dataset |
ShareGPT, HC3, OIG, Stanford Alpaca, Anthropic HH, OpenAI WebGPT, OpenAI Summarization |
ChatGLM3-6B |
|
Bidirectional model (Using encoder layer of transformer)Embedding Layer Gradient Shrink when training |
Pretrained on 1.2T Pile, 1.0T Chinese Wudao-Corpora and 250G Chinese corpora crawled from web |
GPT4All-13B-Snoozy |
Llama |
finetuning using synthetic data |
finetuned on nomic-ai/gpt4all-j-prompt-generations |
Chat-GLM2-6B |
|
Bidirectional model (Using encoder layer of transformer)FlashAttention, MQA |
|
RWKV-4-Raven-14B |
|
Linear time complexity |
|
Alpaca-13B |
Llama |
SFT on instruction set built with GPT |
SFT on GPT-generated dataset using modified self-instruct |
OpenAssistant-Pythia-12B |
pythia-12b-deduped |
|
Pretrained on Open-Assistant data |
ChatGLM-6B |
|
Bidirectional model (Using encoder layer of transformer) |
|