|
Computer Science 2015
Saddle-free Hessian-free optimization for Deep LearningAbstract: We develop a variant of the Hessian-free optimization method by Martens (2010) but which implements the saddle-free Newton method (Dauphin et al, 2014) instead of classical Newton. It does this in linear time in the amount of parameters in the network, which makes it scalable to very large problems. It is also easy to use, stable, and does not make any low rank approximation of any version of the Hessian. Finally, it is memory efficient, since it does not store any matrix, and uses only matrix-vector products to solve the problem at hand.
|