Baseline results from StableLM-[3B, 7B ] compared to Pythia-[1B]
— = Random performance
STEM: hendrycks-[subject]
LAMBADA