Types
- Baseline LLM - hasn’t been trained on any of our* chemistry data
- PEFT LLM - some parameters have been trained on our* chemistry data
- Finetuned LLM - all parameters have been trained on our* chemistry data
- ~~SFT + Instruction Tuned LLM - additional safety / reinforcement learning …? (tbd)~~
*it’s probable a pretrained model has already been exposed to some chemistry data.
Sizes
This assumes it will be easiest to stay closer to Eleuther’s model artefacts for integration with https://github.com/EleutherAI/gpt-neox training codebase. The models featured below consist of a subset of open sourced models by EleutherAI on Hugging Face.
Small
Medium
Big
Others to consider