CISE Seminar: November 2, 2018, Andrew Gordon Wilson – Cornell University

8 St. Mary’s Street, PHO 211
3:00pm-4:00pm – Refreshments at 2:45pm

 

Andrew Gordon Wilson
Cornell University

 

Loss Valleys and Generalization in Deep Learning

In this talk, we present two surprising geometric discoveries, leading to two different practical methods for training deep neural networks.  The first result shows that the optima for DNN are not isolated, but can be connected along simple curves, such as a polygonal chain or quadratic Bezier curve, of near-constant accuracy. We present a new training procedure for finding such paths, and an ensembling algorithm, Fast Geometric Ensembling, which was inspired by this insight. This paper will appear at NIPS 2018: https://arxiv.org/abs/1802.10026

The next work helps advance the debate about optima width and generalization, as well as understand whether SGD does indeed converge to broad optima.  In this paper, we provide a general procedure for training neural networks with greatly improved performance over SGD training (for essentially any architecture and any benchmark), and no overhead. This work appears at UAI 2018: https://arxiv.org/abs/1803.05407

This is joint work with Pavel Izmailov (Cornell), Timur Garipov, Dmitrii Podoprikhin, and Dmitry Vetrov (Moscow State University and the Higher School of Economics).

Andrew Gordon Wilson is an assistant professor at Cornell University. Previously, he was a research fellow in the machine learning department at CMU with Eric Xing and Alex Smola. He completed his PhD with Zoubin Ghahramani at the University of Cambridge. Andrew’s interests include probabilistic modelling, numerical methods, stochastic MCMC, deep learning, Gaussian processes, and kernel methods. His webpage is: https://people.orie.cornell.edu/andrew

Faculty Host: Brian Kulis
Student Host: 
Rebecca Swaszek