Researchers at SAIL (Stanford Artificial Intelligence Laboratory), have done it again. They have successfully used Jacket to speed up the training part of Deep Learning algorithms. In their paper titled “On Optimization Methods for Deep Learning”, they experiment with some of the well known training algorithms and demostrate their scalability across parallel architectures (GPUs as well as multi-machine networks). The algorithms include SGDs (Stochastic Gradient Descent) L-BFGS (Limited BFGS used for solving non-linear problems), CG (Conjugate Gradient). While SGDs are easy to implement, they require manual tuning. Add to that their sequential nature, they are hard to tune, scale and parallelize making them difficult to use with Deep Learning algorithms. L-BFGS and CG algorithms can be harder to implement and …