A sense check run that if we trained on 4B PubMed tokens without any other data distributions, do we get unchanged Lambada performance?
basically, yes it looks like still no degradation occurs.