We finetuned EleutherAI/Pythia-1b models on data sets of 1B tokens.
The datasets were mixtures of different proportions containing Wikipedia, Pubmed abstracts and Smiles.
We also did a grid search over learning rates [1e-4, 1e-5, 1e-6]
— = Random performance
Constant learning rate: 1e-5
[tldr]
By mixing wikipedia we maintained natural language performance on lambada task 🤖
We learned a new task CompleteSmile
We did not improve on periodic table - not surprising because we didn’t train for it 🧪
We got worse on IsSmile - this is strange and needs looking into 🫣
Performance on most other tasks was maintained 👍
Some tasks have strange behaviour: n-shot prompting is worse than 0: needs looking into 🧐




