A sense check run that if we trained on 4B PubMed tokens without any other data distributions, do we get unchanged Lambada performance?

Screenshot 2023-06-30 at 11.47.37.png

basically, yes it looks like still no degradation occurs.