Recipe: Hessian eigenvector computation for PyTorch models

7evhub

3Nina Rimsky

1Caridorc Tergilti

1Caridorc Tergilti

1Nina Rimsky

New Comment

5 comments, sorted by Click to highlight new comments since: Today at 1:19 PM

If you're interested in approximating Hessian-vector products efficiently for frontier-size models, this recent Anthropic paper describes a mechanism for doing so.

The method described does not explicitly compute the full Hessian matrix. Instead, it derives the top eigenvalues and eigenvectors of the Hessian. The implementation accumulates a large batch from a dataloader by concatenating

of the typical batch size. This is an approximation to estimate the genuine loss/gradient on the complete dataset more closely. If you have a large and high-variance dataset, averaging gradients over multiple batches might be better. This is because the loss calculated from a single, accumulated batch may not be adequately representative of the entire dataset's true loss.**n_batches**

The idea/description of this method is fully taken fromJohn Wentworth's Applied Linear Algebra lecture series, specificallyLecture 2.Training deep neural networks involves navigating high-dimensional loss landscapes. Understanding the curvature of these landscapes via the Hessian of the loss function can provide insights into the optimization dynamics. However, computing the full Hessian can be prohibitively expensive. In this post, I describe a method (described by John Wentworth in his lecture series) for efficiently computing the top eigenvalues and eigenvectors of the loss Hessian using PyTorch's autograd and SciPy's sparse linear algebra utilities.

## Hessian-vector product

The core idea hinges upon the Hessian-vector product (HVP). Given a vector v, the HVP is defined as H⋅v , where H is the Hessian matrix. This product can be computed efficiently using automatic differentiation without forming the full Hessian. The process can be outlined as:

## Lanczos Iteration and eigsh

eigsh from scipy.sparse.linalg implements the Lanczos iteration, which finds the top eigenvalues and eigenvectors of a symmetric matrix. It requires matrix-vector multiplication as the main computation, making it ideal for large matrices where full matrix factorizations are infeasible.

## Using LinearOperator

To interface with eigsh, we need a mechanism to represent our Hessian as a linear operator that supports matrix-vector multiplication. SciPy's LinearOperator serves this purpose, allowing us to define a matrix implicitly by its action on vectors without forming the matrix explicitly.

## Implementation

Given a PyTorch model, loss function, and training data, the approach is to:

## Appendix: Python code

You can find this code as a GitHub gist here also.