Overview

<aside> 📊 Highlights

All evaluation below is zero-shot.

</aside>

Context

This model is currently being finetuned from a pretrained Pythia-1b checkpoint on 250B tokens ⇒ 70k optimiser steps of EuroPMC biomedical / scientific text. The model suffered from loss spikes at ~ steps 10500, 14000, 20000 and 21000. The loss curve can be seen below with each bar in the subsequent evaluation charts being a saved checkpoint at 5k intervals. See more details here.

Screenshot 2023-08-01 at 11.59.21.png

Results

A float in the legend denotes a checkpoint which has been finetuned for that many optimiser steps.

Good

Screenshot 2023-08-01 at 12.24.59.png

Bad

Screenshot 2023-08-01 at 12.27.12.png

We did drop in Lambada performance throughout training and the model seems to almost get 0% during the period of a loss spike (red bar at 20k optimiser steps).

Ugly Indifferent

There were slight drops in performance or maintaining equivalent performance across most other benchmarks.

Screenshot 2023-08-01 at 13.28.36.png

Screenshot 2023-08-01 at 13.29.10.png

Screenshot 2023-08-01 at 13.28.00.png

Screenshot 2023-08-01 at 13.28.47.png

Screenshot 2023-08-01 at 13.29.25.png

Screenshot 2023-08-01 at 13.29.39.png

Screenshot 2023-08-01 at 13.29.53.png