Baseline results from StableLM-[3B, 7B ] compared to Pythia-[1B]

— = Random performance

STEM: hendrycks-[subject]

Screenshot 2023-04-24 at 11.11.12.png

Screenshot 2023-04-24 at 11.11.01.png

Screenshot 2023-04-24 at 11.10.44.png

Screenshot 2023-04-24 at 11.10.32.png

Screenshot 2023-04-24 at 11.10.23.png

Screenshot 2023-04-24 at 11.10.12.png

Screenshot 2023-04-24 at 11.10.00.png

Screenshot 2023-04-24 at 11.09.39.png

Screenshot 2023-04-24 at 11.12.12.png

LAMBADA

Screenshot 2023-04-24 at 11.16.56.png

Screenshot 2023-04-24 at 11.17.58.png