Types

  1. Baseline LLM - hasn’t been trained on any of our* chemistry data
  2. PEFT LLM - some parameters have been trained on our* chemistry data
  3. Finetuned LLM - all parameters have been trained on our* chemistry data
  4. ~~SFT + Instruction Tuned LLM - additional safety / reinforcement learning …? (tbd)~~

*it’s probable a pretrained model has already been exposed to some chemistry data.

Sizes

This assumes it will be easiest to stay closer to Eleuther’s model artefacts for integration with https://github.com/EleutherAI/gpt-neox training codebase. The models featured below consist of a subset of open sourced models by EleutherAI on Hugging Face.

Small

Medium

Big

Others to consider