<aside> 📊 Highlights
All evaluation below is zero-shot.
</aside>
This model is currently being finetuned from a pretrained Pythia-1b checkpoint on 250B tokens ⇒ 70k optimiser steps of EuroPMC biomedical / scientific text. The model suffered from loss spikes at ~ steps 10500, 14000, 20000 and 21000. The loss curve can be seen below with each bar in the subsequent evaluation charts being a saved checkpoint at 5k intervals. See more details here.

A float in the legend denotes a checkpoint which has been finetuned for that many optimiser steps.


We did drop in Lambada performance throughout training and the model seems to almost get 0% during the period of a loss spike (red bar at 20k optimiser steps).
There were slight drops in performance or maintaining equivalent performance across most other benchmarks.






