Gradient methods locally optimize an unknown differentiable function, and thus provide the engines that drive much machine learning. Here we'll take a look under the hood, beginning with brief overvie