Experiment details: https://www.notion.so/stable-society/1B-Model-45B-PubMed-v0-51be1c4d4d7542898d80dcbad5436321.

TLDR; Performance doesn’t hugely change apart from statistically significant drops to sciq and lambada. We don’t improve on PubMedQA but that’s potentially expected as these questions are only related to paper abstracts and not full papers. The drop in lambada is not so bad as to think we couldn’t stabilise that with a reasonable proportion of internet data (but yet to be tested).

This is an ominous sign for our EuroPMC data and potentially tampers expectations.

Screenshot 2023-07-12 at 14.45.07.png

Screenshot 2023-07-12 at 14.45.28.png

Screenshot 2023-07-12 at 14.46.08.png

Screenshot 2023-07-12 at 14.45.47.png