Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

Prizes for matrix completion problems

17Mihai Nica

6midco

5Zygi Straznickas

7Zygi Straznickas

3paulfchristiano

5blf

3paulfchristiano

1LGS

1blf

2blf

1Zygi Straznickas

2paulfchristiano

4paulfchristiano

4MondSemmel

4habryka

2paulfchristiano

2habryka

1dr_s

5habryka

1awg

4Robert_AIZI

5paulfchristiano

8Thomas Sepulchre

2paulfchristiano

5Jacob_Hilton

2Thomas Sepulchre

3paulfchristiano

1John_R_Ramsden

2Thomas Sepulchre

3CBiddulph

2Alexander Bistagne

2midco

1jkim2

4Mark Xu

1rokosbasilisk

1midco

1Holoraven

1Thomas Sepulchre

1midco

1midco

1RomanS

5paulfchristiano

1RomanS

3faul_sname

3John Maar

2faul_sname

3Thomas Sepulchre

1dr_s

5paulfchristiano

2faul_sname

1dr_s

1jkim2

New Comment

52 comments, sorted by Click to highlight new comments since: Today at 10:54 PM

I made a video going through the solution in the easiest case of 3x3 matrices. This might help people trying to understand the problem and/or get people interested https://youtu.be/0d66hPBC38Y

I think logarithmic dependence on epsilon will be difficult, especially for problem one. To demonstrate this, consider the problem of approximating the maximum eigenvalue of a sparse (let's say m = O(n)) PSD matrix with trace at most n. The obvious algorithms to try aren't fast enough:

- power iteration / Lanczos algorithm compute in time O(mk) using sparse matrix-vector multiplies with polynomial precision in k. Thus, such methods exhibit the undesirable polynomial dependence in epsilon.
- repeated squaring computes in time using dense matrix multiplication with exponential precision in k. Though such methods are logarithmic in k, the dense matrix multiplication steps render the algorithm too slow.

Intuitively, power iteration-style methods exploit sparsity by taking matrix-vector products exclusively, which yields slow (polynomial in epsilon) convergence.

To argue that problem one is difficult, I claim that one natural candidate completion is the all-zero completion, resulting in a sparse matrix. This suggests (but does not prove!) that the problem of determining existence of PSD completions requires determining whether the resulting sparse completion is PSD - equivalently, whether its minimum eigenvalue is nonnegative. The reduction to the max eigenvalue problem then follows by taking spectral shifts ( for ).

It's unclear to me whether problem two has a similar dependence, but given the similarity of the problems I'm skeptical that logarithmic dependence in epsilon is possible. If anyone has ideas for a similar reduction I'd love to discuss!

(comment based on work co-authored w @Somsubhro Bagchi)

note: Som and I have discussed a similar objection with @paulfchristiano, who seemed ok with solutions with convergence polynomial in epsilon but still O(mn) in the "typical case" (eg. for problem 1, with enough precision to determine (non)existence of PSD completions). After constraining all diagonal elements of A to be 1, we're somewhat optimistic such weaker convergence is attainable.

Question 1 is very likely NP-hard. https://arxiv.org/abs/1203.6602 shows:

We consider the decision problem asking whether a partial rational symmetric matrix with an all-ones diagonal can be completed to a full positive semidefinite matrix of rank at most k. We show that this problem is $\NP$-hard for any fixed integer k≥2.

I'm still stretching my graph-theory muscles to see if this technically holds for (and so precisely implies the NP-hardness of Question 1.) But even without that, for applied use cases we can just fix to a very large number to see that practical algorithms are unlikely.

It can be decided in time by semidefinite programming, and I'd guess that most researchers would expect it can be done in time (which is the runtime of PSD testing, i.e. the case when ).

More generally, there's a large literature on positive semidefinite completions which is mostly concerned with finding completions with particular properties. Sorry for not better flagging the state of this problem. Those techniques don't seem super promising for getting to .

I think it can be done in , where I recall for non-expert's convenience that is the exponent of matrix multiplication / inverse / PSD testing / etc. (all are identical). Let be the space of matrices and let be the -dimensional vector space of matrices with zeros in all non-specified entries of the problem. The maximum-determinant completion is the (only?) one whose inverse is in . Consider the map and its projection where we zero out all of the other entries. The function can be evaluated in time . We wish to solve . This should be doable using a Picard or Newton iteration, with a number of steps that depends on the desired precision.

Would it be useful if I try to spell this out more precisely? Of course, this would not be enough to reach in the small case. Side-note: The drawback of having posted this question in multiple places at the same time is that the discussion is fragmented. I could move the comment to mathoverflow if you think it is better.

Agree about the downside of fragmented discussion, I left some time for MO discussion before posting the bounty but I don't know how much it helped.

I agree that some approach like that seems like it could work for . I'm not familiar with those techniques and so don't know a technique that needs iterations rather than or . For Newton iteration it seems like you need some kind of bound on the condition number.

But I haven't thought about it much, since it seems like something fundamentally different is needed to break . I believe that one of those might just work out of the box.

Wait which algorithm for semidefinite programming are you using? The ones I've seen look like they should translate to a runtime even slower than For example the one here:

https://arxiv.org/pdf/2009.10217.pdf

Also, do you have a source for the runtime of PSD testing being ? I assume no lower bound is known, i.e. I doubt PSD testing is hard for matrix multiplication. Am I wrong about that?

That's a good question. From what I've seen, PSD testing can be done by trying to make a Cholesky decomposition (writing the matrix as with lower-triangular) and seeing if it fails. The Cholesky decomposition is an decomposition in which the lower-triangular and upper-triangular are simply taken to have the same diagonal entries, so PSD testing should have the same complexity as decomposition. Wikipedia quotes Bunch and Hopcroft 1974 who show that decomposition can be done in by Strassen, and presumably the more modern matrix multiplication algorithms also give an improvement for .

I also doubt that PSD testing is hard for matrix multiplication, even though you can get farther than you'd think. Given a positive-definite matrix whose inverse we are interested in, consider the block matrix . It is positive-definite if and only if all principal minors are positive. The minors that are minors of are positive by assumption, and the bigger minors are equal to times minors of , so altogether the big matrix is positive-definite iff is. Continuing in this direction, we can get in time (times ) any specific component of . This is not enough at all to get the full inverse.

There is a simple intuition for why PSD testing cannot be hard for matrix multiplication or inversion: regardless of how you do it and what matrix you apply it to, it only gives you one bit of information. Getting even just one bit of information about each matrix element of the result requires applications of PSD testing. The only way out would be if one only needed to apply PSD testing to tiny matrices.

Thanks for the clarification! I assumed the question was using the bit computation model (biased by my own experience), in which case the complexity of SDP in the general case still seems to be unknown (https://math.stackexchange.com/questions/2619096/can-every-semidefinite-program-be-solved-in-polynomial-time)

We intend to leave this prize open until the end of September. At that point we will distribute prizes (probably just small prizes for useful arguments and algorithms, but no full solution).

I now pretty strongly suspect that the version of problem 1 with logarithmic dependence on is not solvable. We would award a prize for an algorithm running in time which can distinguish matrices with no PSD completion from those with a completion where the ratio of min to max eigenvalue is at least . And of course a lower bound is still fair game.

That said, I don't expect any new submissions to win prizes and so wouldn't recommend that anyone start working on it.

Asking for some clarifications:

1. For both problems, should the solution work for an adversarially chosen set of m entries?

2. For both problems, can we read more entries of the matrix if it helps our solution? In particular can we WLOG assume we know the diagonal entries in case that helps in some way.

- Yes, the algorithm should work for adversarial inputs.
- For the second problem you can effectively read more entries since you can compute any given entry of in time . For the first problem you can't read more entries. But you can assume WLOG that you have the diagonal---if you don't have any given diagonal entry, you might as well just set it to infinity in the completion, effectively removing that whole row and column from the problem. So we can just remove all rows and columns for which you don't have the diagonal, and be left with a matrix where all diagonal entries are known.

I agree with your example. I think you can remove the row/column if there are no known 0s on the diagonal, and I also think you can first WLOG remove any rows and columns with 0s on the diagonal (since if there are any other known non-zero entries in their row or column then the matrix has no PSD completions).

I'm also happy to just assume the diagonal entries are known. I think these are pretty weird corner cases.

It's not quite this simple, the same issue arises if every PSD completion of the known-diagonal minor has zero determinant (e.g. ((?, 1, 2), (1, 1, 1), (2, 1, 1))). But I think in that case making the remaining diagonal entries large enough still makes the eigenvalues at least −ε, which is good enough.

What matters is not whether or not there is another 0 on the diagonal, but whether or not there is another PSD non definite matrix on the diagonal. For example, in the comment from **Jacob_Hilton, **they introduce the 2x2 matrix ((1,1),(1,1)), with eigenvalues [0,1], which is a PSD non definite.

I agree with assuming that all diagonal entries are known. You can even assume that all entries are 1 on the diagonal WLOG.

Yes indeed. One definition of a PSD matrix is some matrix such that, for any vector , (so, it defines some kind of scalar product).

If , then you can always divide the whole row/column by , which is equivalent to applying some scaling, this won't change the fact that is PSD or not.

If , then if you try the vector , you can check that , thus the matrix isn't PSD.

Love to see a well-defined, open mathematical problem whose solution could help make some progress on AI alignment! It's like a little taste of not being a pre-paradigmic field. Maybe someday, we'll have lots of problems like this that can engage the broader math/CS community, that don't involve so much vague speculation and philosophy :)

I understand this bounty is roughly closed, but the page is still listed as bounty active and my attempt took me only a couple hours.

Problem 2 is solvable when there are simple restrictions on the entries chosen.

Restriction 1: No diagonal entries need to be correct. Solution: Let C = AAT For each entry c_ij we construct the matrix with only c_ii, c_ij, c_ji , c_jj and all else zero. Clearly this matrix is constructible in O(n) work. So all entries can be constructed in this way. Note if c_ij and c_ji are requested, only construct one matrix for the pair. Repeating this for all m does O(nm) work. We then sum all such matricies to get our estimate E. Only the diagonal will possibly need summing so O(nm) work is done. Clearly each constructed matrix is positive semidefinite as it is a principal submatrix of C. The sum is also positive semidefinite because the individual matricies are positive semidefinite.

Restriction 2: Only 1 entry per row-column number is needed other than the main diagonal. The above solution still works.

This comment proposes a strategy for (2) I'll call *Active Matrix Completion (AMC)*, which allows querying of elements of given some cost/budget. In our case, each query takes O(n), so we can afford O(m) total queries.

Some notes:

- O(n) seems really fast - for instance, evaluating the bilinear form takes O(n^2). This makes me suspect any solution to (2) will rely on some form of AMC.
- AMC exists in literature - this paper gives a nice introduction and (I think) proves that O(n*rank(A)) queries suffice.
- One strategy that seems intuitively appealing is querying enough elements to guarantee that the resulting adjacency graph is chordal, which (I think) allows for efficient PSD completion (see Vandenberghe; algorithm 10.2). Thus, the following proposition is stronger than (2):

PROPOSITION: given a graph G = (V, E) with n nodes and m edges, can we efficiently (in O(mn) time) determine a small (with at most Cm edges for universal constant C) chordal completion of G?

The proposition seems "too good to be true", especially since chordal completion seems like a hard computational problem (NP-complete, no great approximations are known), so I'm most interested in easy counterexamples. For instance, a "small" graph with only "large" chordal completions would confirm the falsehood of the proposition.

So, uh, any graph theory experts around?

(I'm not the OP but) absolutely not. The problem is about incomplete matrices, and the idea is to get an upper bound linked to m, i.e. an upper bound linked to how incomplete the matrix is. If you rephrase the question 1 as O(n^3), this is a completely different question, because now you only care about how big the matrix is, not how complete it is.

Also, since question 1 can be achieved with some gaussian pivot in O(n^3), and it also implies being able to tell whether a matrix is PSD or not, I think the best known complexity wrt n is indeed O(n^3).

As a side note, GPT-4 correctly identifies both problems as open:

1: ...Determining whether a partial matrix has a PSD completion is a challenging problem, and finding an efficient algorithm with time complexity O(nm) is an open question. The problem can be formulated as a semidefinite programming (SDP) problem or as a convex optimization problem, but the complexity of these approaches is usually higher than O(nm). There is no known algorithm that can determine whether a partial matrix has a PSD completion in time complexity O(nm), nor is there a known proof that this problem is at least as hard as dense matrix multiplication or PSD testing. Further research in the field of optimization and linear algebra is needed to provide a more definitive answer.

2: ...It is important to note that finding an algorithm that can solve this problem with time complexity ~O(nm) remains an open question in the field of optimization and linear algebra.

Initially, it has provided a solution for the first problem. I asked it: "please check your answer for any errors", and then it apologized for the erroneous solution, and made the aforementioned assessment that the problem is still open.

With the second problem, it was a somewhat similar story: it answered with a partial solution involving a Cholesky decomposition, and then clarified that the problem is open.

Seems to be worth talking with GPT-4 on the matter. Perhaps it could give some useful hints.

When I tried asking about this problem, and referenced the above comment about being able to assume that all entries on the diagonal are positive, GPT-4 said the following:

If an n×n matrix is a symmetric positive semidefinite (PSD) matrix, it can be interpreted as a Gram matrix representing the dot products of a set of n vectors, and this configuration has 0 stress.

To understand this, let's first consider what a Gram matrix is. Given a set of n vectors v_1, v_2, ..., v_n in a d-dimensional space, the Gram matrix G is an n×n matrix where the entry G(i, j) is the dot product of vectors v_i and v_j:

G(i, j) = v_i • v_j

A Gram matrix is always symmetric (i.e., G(i, j) = G(j, i)) and positive semidefinite.

Now, let's consider the stress function in the context of multidimensional scaling (MDS). The stress function measures the difference between the given dissimilarities (distances) and the distances in the lower-dimensional space. When the stress is 0, this means that the distances in the lower-dimensional space exactly match the given dissimilarities.

Is GPT-4 correct? If so, am I interpreting it correctly that this problem could be rephrased as "given an incomplete set of m desired pairwise distances between n points, determine whether there exists some configuration of those points in an n-dimensional space such that the pairwise distances between those points are exactly as desired"?

Since we are only given m distances maybe a numerical algorithm for MDS might work. For example we might start with a random nxn matrix and minimize the strain by gradient descent. For the desired runtime this would just have to converge in $O(m/n)$ steps. I'm not sure if this is realistic though. Also there might be better numerical algorithms for this specific problem. Has anyone looked into this yet?

I think running a single strain minimization iteration on `m`

points in `n`

dimensions takes O(m*n) steps. So there would need to be some reason to expect that it would converge in some constant (though possibly large) number of steps.

Unless you're saying "for each node, run the strain minimization step until it converges, then do the same for each subsequent node". I don't know if the greedy algorithm *works* there, but if it does then maybe?

Also I kinda expect that if there's something that works in `O(n*m*log(m))`

that's probably fine.

(and yeah, "try the greedy exact solution for each node" was my "dumb solution" attempt).

I think you can convert between the two representations in `O(m)`

time, which would mean that any algorithm that solves either version in `O(n*m)`

solves both in `O(n*m)`

.

Do you have some large positive and negative examples of the kinds of sparse matrix you're trying to check for the existence of a PSD completion on, or alternatively a method for generating such examples with a known ground truth? I have a *really dumb* idea for a possible algorithm here (that shamelessly exploits the *exact shape of this problem* in a way that probably doesn't generalize to being useful for broader problems like MDS) that I think would complete in approximately the time constraints you're looking for. It almost certainly won't work, but I think it's at least worth an hour of my time to *check* and figure out why (especially since I'm trying to improve my linear algebra skills *anyway*).

Edit: there's the obvious approach, which I'm trying, of "start with only 1s on the diagonal and then keep adding random entries until it no longer has a PSD completion, then removing random entries until it does, and repeat to build a test set" but I doubt that covers the interesting corners of the problem space.

Edit 2: the *really* dumb thing does not work. I think I haven't ruled out that a slightly less dumb approach could work though?

Edit 3: never mind, my really dumb "solution" requires inverting a matrix that is, in the worst case, nxn, if e.g. you have an input that looks like

```
1 n n n n n n
n 1 - - - - n
n - 1 - - - n
n - - 1 - - n
n - - - 1 - n
n - - - - 1 n
n n n n n n 1
```

you'll have to invert 6 2x2 matrices and one each of 3x3 to 7x7 matrices.

Here are two self-contained algorithmic questions that have come up in our research. We're offering a bounty of $5k for a solution to either of them—either an algorithm, or a lower bound under any hardness assumption that has appeared in the literature.

Question 1 (existence of PSD completions): given m entries of an n×n matrix, including the diagonal, can we tell in time ~O(nm) whether it has any (real, symmetric) positive semidefinite completion? Proving that this task is at least as hard as dense matrix multiplication or PSD testing would count as a resolution.Question 2 (fast “approximate squaring”):given A∈Rn×n and a set of m=Ω(n) entries of AAT, can I findsomePSD matrix that agrees with AAT in those m entries in time ~O(nm)?We'll pay $5k for a solution to either problem. The offer is open for each problem for 3 months or until the problem gets solved (whichever happens first). Winners are welcome to publish solutions independently. Otherwise, if the result ends up being a significant part of a paper, we’ll invite them to be a coauthor.

We’ll also consider smaller prizes for partial progress, or anything that we find helpful for either solving the problem or realizing we should give up on it.

To understand the motivation for these questions, you can read our paper on Formalizing the presumption of independence and in particular Appendix D.7.2. ARC is trying to find efficient heuristic estimators as a formalization of defeasible reasoning about quantities like the variance of a neural network's output. These two questions are very closely related to one of the simplest cases where we haven't yet found any reasonable linear time heuristic estimator.

We don’t expect to receive many incorrect proposals, but if we receive more than 5 we may start applying a higher standard in order to save our time. If we can’t understand a solution quickly, we may ask you to provide more details, and if we still can’t understand it we may reject it. We expect a correct solution to be about as clear and easy to verify as a paper published at STOC.

For both problems, it’s OK if we incorrectly treat a matrix as PSD as long as all of its eigenvalues are at least −ε for a small constant ε. ~O hides polylogarithmic factors in n, ε, and the max matrix entry. Feel free to ask for other clarifications on our question on Math Overflow, on Facebook, or by email.

To submit a solution, send an email to prize@alignment.org.