Results of [1B, 3B, 7B] models fully fine-tuned on 300M tokens of [elsevier, wikipedia and pubmed abstracts]
— = Random performance