Baseline results from Pythia-1B compared to Pythia-1B finetuned on the chemrxiv dataset before evaluating via n-shot learning (ICL) where n = [0, 1, 2, 3]
— = Random performance