Model Foundation Info Dataset
Mixtral 8x7b Instruct Mistral-7B (MoE) Sparse MoE of Mistral 7B
Yi Llama 2 Some modification of different param setting from Llama 2 3T tokens (source is not mentioned)
Tulu 2 Llama 2 7B쯤 되는 사이즈에서 QLoRA와 full finetuning 성능차가 난다고 주장 Finetuned on allenai/tulu-2-dpo-70bDPO on openbmb/UltraFeedback
WizardLM Llama seed dataset으로부터 고성능 LLM을 이용해서 dataset을 확장하는 알고리즘 Evol-Instruct algorithm from Alpaca (synthetic, not open)
Vicuna Llama 싸게 학습한 LLMflashattention, grad checkpointSkypilot의 managed spot 기능 ShareGPT.com
Starling LM OpenChat 3.5 (based on Mistral-7B-0.1) RLAIFAPA라는 policy optimization방식 이용reward model은 berkeley-nest/Starling-RM-7B-alpha APA on berkeley-nest/Nectar
Llama 2 GQA, SwiGLU, RoPE, (2.2 Tokens)
OpenHermes-2.5-Mistral-7B Mistral 7B GPT-4 generated data
Llama2-70B-SteerLM-Chat Llama 2 SteerLMattribute pred model을 만들고 그거로 annotate dataset만들고 그거로 SFT 하고 bootstrapping trained on OASST, HH-RLHF, M-SID
OpenChat-3.5 C-RLFTreward if GPT4 then 1, elif GPT3
pplx-70b-online Llama 2 online LLM (그러면 RAG에 가까운거 아닌가…?)
Dolphin-2.2.1-Mistral-7B Mistral 7B Microsoft의 Orca 구축 방식을 따라간 Dolphin dataset를 이용하여 SFT dolphin + jondurbin/airoboros-2.2.1 + subset of WizardLm + subset of Samantha
SOLAR-10.7B-Instruct Mistral 7B Depth Up-scaling (Passthrough in mergekit) Instruction tuning on OpenOrca(alpaca format), alpaca-gpt4, synthetic math instructAlignment tuning on Orca DPO pairs, Ultrafeedback, synthetic math instruct
Zephyr-7b Mistral 7B Distilled SFT and distilled DPO (teacher-student) distilled SFT on ultrachat_200k and distilled DPO on ultrafeedback_binarized
MPT-30B-chat 8k context length pretraining, Longer context using ALiBi, flashattention Pretrained on mC4, C4, RedPajama, The Stack (see hf card) (1T tokens)Finetuned on ShareGPT-Vicuna, Camel_AI, GPTeacher, Guanaco, Baize (se hf card)
Qwen-14B-Chat Similar to llama Modified llamabias for QKV layerNTK-aware interpolation for long context (3T token including English and Chinese)
falcon-180b-chat Pretrained on RefinedWeb, Finetuned on ultrachat, Platypus, and Airoboros
Guanaco-33B Llama QLoRA on Llama Finetuned on OASST1 (see here)
Mistral-7B-Instruct-v0.1 SWA, GQA, rolling buffer cache, pre-fill and chunking
PaLM-Chat-Bison-001 SwiGLU, Parallel Layers, MQA, RoPENo Biases, Shared IO Embeddings 780B tokens from webpages, social media conversations, books, github, wikipedia, etc
Koala-13B Llama Llama 13b finetuned on dialog dataset ShareGPT, HC3, OIG, Stanford Alpaca, Anthropic HH, OpenAI WebGPT, OpenAI Summarization
ChatGLM3-6B Bidirectional model (Using encoder layer of transformer)Embedding Layer Gradient Shrink when training Pretrained on 1.2T Pile, 1.0T Chinese Wudao-Corpora and 250G Chinese corpora crawled from web
GPT4All-13B-Snoozy Llama finetuning using synthetic data finetuned on nomic-ai/gpt4all-j-prompt-generations
Chat-GLM2-6B Bidirectional model (Using encoder layer of transformer)FlashAttention, MQA
RWKV-4-Raven-14B Linear time complexity
Alpaca-13B Llama SFT on instruction set built with GPT SFT on GPT-generated dataset using modified self-instruct
OpenAssistant-Pythia-12B pythia-12b-deduped Pretrained on Open-Assistant data
ChatGLM-6B Bidirectional model (Using encoder layer of transformer)

Takeaways