PDE methods for deep learning analysis and optimization (original) (raw)
Abstract
The objective of this dissertation is to use tools from PDEs to understand and construct deep learning algorithms. This research includes the theory-inspired design of optimization algorithms for deep learning, theoretical analysis of deep network training, and applications in computer vision. We introduce a recently developed framework (PDE acceleration), which is a variational approach to accelerated optimization with PDEs, in the context of optimization of deep networks, which leads to a novel and simple extension of SGD with momentum. We empirically validate the theory and evaluate our new algorithm on image classification showing empirical improvement over SGD. To further enhance the performance of deep learning algorithms, we need a better understanding of the stability and convergence properties. We discovered restrained numerical instabilities in current training practices of deep networks. To explain this phenomenon, we present a theoretical framework using numerical analysis of PDE and analyzing the gradient descent PDE of a simplified CNN. We also link restrained instabilities to the recently discovered EoS phenomena and provide new insights and predictions about the EoS. Further, the special potential of "geometric" PDEs in particular to advance deep learning applications is explored in this dissertation. Under the geometric PDE's framework, we provide a theoretical analysis to understand the instability caused by the Eikonal loss and explain how some existing approaches can unknowingly mitigate this instability. Furthermore, those regularization enables the use of new neural networks with higher representation power that can capture finer scale details of shape. In summary, we believe the tools we’ve introduced could improve deep learning practice.