We finetuned EleutherAI/Pythia-1b models on data sets of 4B tokens.

The datasets were mixtures of different proportions containing Wikipedia, Pubmed abstracts and Smiles.

We also did a grid search over learning rates [1e-4, 1e-5]

— = Random performance

Scan over learning rates and data mixtures

n-shot prompting is constant at 0 for all

[tldr]

Screenshot 2023-06-22 at 17.51.05.png

Screenshot 2023-06-22 at 17.53.16.png

Screenshot 2023-06-22 at 17.52.04.png

Screenshot 2023-06-22 at 17.52.57.png

Screenshot 2023-06-22 at 17.51.26.png

Screenshot 2023-06-22 at 17.51.38.png

Screenshot 2023-06-22 at 17.50.12.png

Screenshot 2023-06-22 at 17.50.46.png

Screenshot 2023-06-22 at 17.50.01.png