We finetuned EleutherAI/Pythia-1b models on data sets of 1B tokens.

The datasets were mixtures of different proportions containing Wikipedia, Pubmed abstracts and Smiles.

We also did a grid search over learning rates [1e-4, 1e-5, 1e-6]

— = Random performance

Scan over data mixtures

Constant learning rate: 1e-5

[tldr]

By mixing wikipedia we maintained natural language performance on lambada task 🤖

We learned a new task CompleteSmile

We did not improve on periodic table - not surprising because we didn’t train for it 🧪

We got worse on IsSmile - this is strange and needs looking into 🫣

Performance on most other tasks was maintained 👍

Some tasks have strange behaviour: n-shot prompting is worse than 0: needs looking into 🧐

Screenshot 2023-06-15 at 17.36.18.png

Screenshot 2023-06-15 at 17.38.03.png

Screenshot 2023-06-15 at 17.37.44.png

Screenshot 2023-06-15 at 17.36.35.png

Screenshot 2023-06-15 at 17.37.08.png