Summary

<aside> 📌 Experiment table to keep track as we iterate. Note: WandB will automatically save the commit hash and all configuration parameters.

</aside>

<aside> 💽 Checkpoints are now stored on S3 storage and some have been delete if we believed the training was quick to reproduce or the checkpoints were no longer necessary.

</aside>

Finetuning experiment runs

Runtime Estimates

Dataset Management

Restart Policy

Manual S3 Checkpoint Transfer

When stuff goes wrong …

Appendix