Speaker: Aditya Ranganath, PhD Candidate at UC Merced (EECS Program)
When: Friday, April 29 from, 3:00 - 4:20pm (talk will start at 3:20pm)
Where: Granite Pass 135 or Zoom (https://ucmerced.zoom.us/j/83811243487?pwd=UUxsOVgrTnF5SlJzWkFLWXp3ZW5BZz09)
Title: Optimization methods for deep learning
Abstract: Deep learning involves generalizing data by making predictions on the unseen data. This is achieved by minimizing the empirical risk of estimation over the dataset using optimization techniques. First order methods are very commonly used in deep learning to reduce this empirical risk of estimation. Stochastic gradient descent (SGD) and other first-order variants, such as Adam and AdaGrad, are commonly used in the field of deep learning due to their computational efficiency and low-storage memory requirements. However, these methods do not exploit curvature information. Consequently, iterates can converge to saddle points and poor local minima. However, second order methods are unpopular in the are of deep learning due to their time and computational demand.
In this talk we will discuss two methods: a second order trust-region method and a quasi-Newton method in an adaptive regularize cubic setting. In the second order trust-region approach, a modified conjugate gradient approach is used. This modified conjugate gradient approach is used in conjunction with a trust-region setting which generates a sequence of quadratic subproblems. This approach employs an efficient Hessian-vector product which is cheap to compute and inexpensive to store. We apply this optimization technique to an image classification problem over the MNIST dataset. We compare this method against SGD and discuss the performance of both methods. In the second method, we propose using a limited-memory symmetric rank-one quasi-Newton approach which allows for indefinite Hessian approximations, enabling directions of negative curvature to be exploited. Furthermore, we use a modified adaptive regularized using cubics approach, which generates a sequence of cubic subproblems that have closed-form solutions. We investigate the performance of our proposed method on autoencoders and feed-forward neural network models and compare our approach to state-of-the-art first-order adaptive stochastic methods as well as other quasi-Newton methods.
Aditya Ranganath is a PhD candidate in the department of Electrical Engineering and Computer Science (EECS) at UC Merced. He grew up in New Delhi, India and completed his Bachelor’s in Electrical and Electronics Engineering from Anna University (Loyola-ICAM college of Engineering and Technology) in 2014. After working as a production and supply chain specialist in a robotics and warehouse automation company, he moved to California to pursue a Masters in Computer Science at UC Merced. In 2018, he began his PhD program under the guidance of Dr. Roummel Marcia and Dr. Mukesh Singhal. His research interests include numerical optimization for deep learning, image processing and reinforcement learning. Over the summer of 2022, Aditya will be joining as a Applied Research intern in their core machine learning team at Meta Artificial Intelligence and Applied Research (MAIAR).