Batching
- Large batch size slows down convergence. [https://arxiv.org/abs/1910.02054, Pg. 4, Footnote 1]
- It might be because it leads to a sharp minima in the loss landscape rather than a flatter minima. (Noise drives model to flatter/simpler solutions)