Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

Here are two self-contained algorithmic questions that have come up in our research. We're offering a bounty of $5k for a solution to either of them—either an algorithm, or a lower bound under any hardness assumption that has appeared in the literature.

Question 1 (existence of PSD completions): given entries of an matrix, including the diagonal, can we tell in time whether it has any (real, symmetric) positive semidefinite completion? Proving that this task is at least as hard as dense matrix multiplication or PSD testing would count as a resolution.

Question 2 (fast “approximate squaring”): given and a set of entries of , can I find some PSD matrix that agrees with in those m entries in time ?

We'll pay $5k for a solution to either problem. The offer is open for each problem for 3 months or until the problem gets solved (whichever happens first). Winners are welcome to publish solutions independently. Otherwise, if the result ends up being a significant part of a paper, we’ll invite them to be a coauthor.

We’ll also consider smaller prizes for partial progress, or anything that we find helpful for either solving the problem or realizing we should give up on it.

To understand the motivation for these questions, you can read our paper on Formalizing the presumption of independence and in particular Appendix D.7.2. ARC is trying to find efficient heuristic estimators as a formalization of defeasible reasoning about quantities like the variance of a neural network's output. These two questions are very closely related to one of the simplest cases where we haven't yet found any reasonable linear time heuristic estimator.

We don’t expect to receive many incorrect proposals, but if we receive more than 5 we may start applying a higher standard in order to save our time. If we can’t understand a solution quickly, we may ask you to provide more details, and if we still can’t understand it we may reject it. We expect a correct solution to be about as clear and easy to verify as a paper published at STOC.

For both problems, it’s OK if we incorrectly treat a matrix as PSD as long as all of its eigenvalues are at least for a small constant . hides polylogarithmic factors in , , and the max matrix entry. Feel free to ask for other clarifications on our question on Math Overflow, on Facebook, or by email.

To submit a solution, send an email to prize@alignment.org.

New to LessWrong?

New Comment
52 comments, sorted by Click to highlight new comments since: Today at 10:54 PM

I made a video going through the solution in the easiest case of 3x3 matrices. This might help people trying to understand the problem and/or get people interested https://youtu.be/0d66hPBC38Y

I think logarithmic dependence on epsilon will be difficult, especially for problem one. To demonstrate this, consider the problem of approximating the maximum eigenvalue of a sparse (let's say m = O(n)) PSD matrix with trace at most n. The obvious algorithms to try aren't fast enough:

  1. power iteration / Lanczos algorithm compute  in time O(mk) using sparse matrix-vector multiplies with polynomial precision in k. Thus, such methods exhibit the undesirable polynomial dependence in epsilon.
  2. repeated squaring computes  in time  using dense matrix multiplication with exponential precision in k. Though such methods are logarithmic in k, the dense matrix multiplication steps render the algorithm too slow.

Intuitively, power iteration-style methods exploit sparsity by taking matrix-vector products exclusively, which yields slow (polynomial in epsilon) convergence.

To argue that problem one is difficult, I claim that one natural candidate completion is the all-zero completion, resulting in a sparse matrix. This suggests (but does not prove!) that the problem of determining existence of PSD completions requires determining whether the resulting sparse completion is PSD - equivalently, whether its minimum eigenvalue is nonnegative. The reduction to the max eigenvalue problem then follows by taking spectral shifts ( for ).

It's unclear to me whether problem two has a similar dependence, but given the similarity of the problems I'm skeptical that logarithmic dependence in epsilon is possible. If anyone has ideas for a similar reduction I'd love to discuss!

(comment based on work co-authored w @Somsubhro Bagchi)

note: Som and I have discussed a similar objection with @paulfchristiano, who seemed ok with solutions with convergence polynomial in epsilon but still O(mn) in the "typical case" (eg. for problem 1, with enough precision to determine (non)existence of PSD completions). After constraining all diagonal elements of A to be 1, we're somewhat optimistic such weaker convergence is attainable.

Question 1 is very likely NP-hard. https://arxiv.org/abs/1203.6602 shows:

We consider the decision problem asking whether a partial rational symmetric matrix with an all-ones diagonal can be completed to a full positive semidefinite matrix of rank at most k. We show that this problem is $\NP$-hard for any fixed integer k≥2. 

I'm still stretching my graph-theory muscles to see if this technically holds for  (and so precisely implies the NP-hardness of Question 1.) But even without that, for applied use cases we can just fix   to a very large number to see that practical algorithms are unlikely.

Nope, I’m withdrawing this answer. I looked closer into the proof and I think it’s only meaningful asymptotically, in a low rank regime. The technique doesn’t work for analyzing full-rank completions.

It can be decided in  time by semidefinite programming, and I'd guess that most researchers would expect it can be done in time  (which is the runtime of PSD testing, i.e. the case when ).

More generally, there's a large literature on positive semidefinite completions which is mostly concerned with finding completions with particular properties. Sorry for not better flagging the state of this problem. Those techniques don't seem super promising for getting to .

I think it can be done in , where I recall for non-expert's convenience that  is the exponent of matrix multiplication / inverse / PSD testing / etc. (all are identical). Let  be the space of  matrices and let  be the -dimensional vector space of matrices with zeros in all non-specified entries of the problem.  The maximum-determinant completion is the (only?) one whose inverse is in .  Consider the map  and its projection  where we zero out all of the other entries.  The function  can be evaluated in time .  We wish to solve .  This should be doable using a Picard or Newton iteration, with a number of steps that depends on the desired precision.

Would it be useful if I try to spell this out more precisely?  Of course, this would not be enough to reach  in the small  case.  Side-note: The drawback of having posted this question in multiple places at the same time is that the discussion is fragmented.  I could move the comment to mathoverflow if you think it is better.

Agree about the downside of fragmented discussion, I left some time for MO discussion before posting the bounty but I don't know how much it helped.

I agree that some approach like that seems like it could work for . I'm not familiar with those techniques and so don't know a technique that needs  iterations rather than  or . For Newton iteration it seems like you need some kind of bound on the condition number.

But I haven't thought about it much, since it seems like something fundamentally different is needed to break . I believe that one of those might just work out of the box.

Wait which algorithm for semidefinite programming are you using? The ones I've seen look like they should translate to a runtime even slower than  For example the one here:

https://arxiv.org/pdf/2009.10217.pdf

Also, do you have a source for the runtime of PSD testing being ? I assume no lower bound is known, i.e. I doubt PSD testing is hard for matrix multiplication. Am I wrong about that?

That's a good question.  From what I've seen, PSD testing can be done by trying to make a Cholesky decomposition (writing the matrix as  with  lower-triangular) and seeing if it fails.  The Cholesky decomposition is an  decomposition in which the lower-triangular  and upper-triangular  are simply taken to have the same diagonal entries, so PSD testing should have the same complexity as  decomposition.  Wikipedia quotes Bunch and Hopcroft 1974 who show that  decomposition can be done in  by Strassen, and presumably the more modern matrix multiplication algorithms also give an improvement for .

I also doubt that PSD testing is hard for matrix multiplication, even though you can get farther than you'd think.  Given a positive-definite matrix  whose inverse we are interested in, consider the  block matrix .  It is positive-definite if and only if all principal minors are positive.  The minors that are minors of  are positive by assumption, and the bigger minors are equal to  times minors of , so altogether the big matrix is positive-definite iff  is.  Continuing in this direction, we can get in  time (times ) any specific component of .  This is not enough at all to get the full inverse.

There is a simple intuition for why PSD testing cannot be hard for matrix multiplication or inversion: regardless of how you do it and what matrix you apply it to, it only gives you one bit of information.  Getting even just one bit of information about each matrix element of the result requires  applications of PSD testing.  The only way out would be if one only needed to apply PSD testing to tiny matrices.

Thanks for the clarification! I assumed the question was using the bit computation model (biased by my own experience), in which case the complexity of SDP in the general case still seems to be unknown (https://math.stackexchange.com/questions/2619096/can-every-semidefinite-program-be-solved-in-polynomial-time)

Yeah, I can believe the problem is NP hard if you need to distinguish  eigenvalues in time  rather than . I don't understand exact arithmetic very well but it seems wild (e.g my understanding is that no polynomial time algorithm is known for testing ).

We intend to leave this prize open until the end of September. At that point we will distribute prizes (probably just small prizes for useful arguments and algorithms, but no full solution).

I now pretty strongly suspect that the version of problem 1 with logarithmic dependence on  is not solvable. We would award a prize for an algorithm running in time  which can distinguish matrices with no PSD completion from those with a completion where the ratio of min to max eigenvalue is at least . And of course a lower bound is still fair game.

That said, I don't expect any new submissions to win prizes and so wouldn't recommend that anyone start working on it.

Are the math formulas (= presumably LaTeX) in this post broken for everyone, or just for me? (That is, I see strings full of backslashes, like "").

Yeah, looks like it broke this morning, and I just fixed it again. Having RSS imports with LaTeX and non-admin editors is a bit fickle at the moment (since we have to sanitize the HTML to avoid security vulnerabilities). 

Sorry to create headache for you guys, thanks for handling imports!

No worries! Sorry for the fickleness. I just made a PR to handle non-admin editors more robustly.

Oh, yeah, this happens when switching back and forth from the LessWrong editor to Markdown and vice-versa. Love if someone fixed it, also because in LessWrong editor I don't know how or if you can make block equations - just inline ones.

Command + M for block equations (it says so in the help text that shows up in an empty LW editor text box).

And huh, I am surprised that LaTeX doesn't translate correctly. I feel like I've used that recently. Can you ping us on Intercom with a context where it didn't work? 

Gah, I assumed this was some notation I'd never seen before. (Then again, even if this was written out in clear LaTeX I wouldn't understand it.)

Asking for some clarifications:
1. For both problems, should the solution work for an adversarially chosen set of m entries?
2. For both problems, can we read more entries of the matrix if it helps our solution? In particular can we WLOG assume we know the diagonal entries in case that helps in some way.

  1. Yes, the algorithm should work for adversarial inputs.
  2. For the second problem you can effectively read more entries since you can compute any given entry of  in time . For the first problem you can't read more entries. But you can assume WLOG that you have the diagonal---if you don't have any given diagonal entry, you might as well just set it to infinity in the completion, effectively removing that whole row and column from the problem. So we can just remove all rows and columns for which you don't have the diagonal, and be left with a matrix where all diagonal entries are known.

I'm not sure this is true. Consider the 2x2 matrix ((?, 1),(1,0)). Removing the first row and the first column leaves you with ((0)), which is a PSD 1x1 matrix. That being said, there is no value of ? for which the 2x2 matrix is PSD.

I agree with your example. I think you can remove the row/column if there are no known 0s on the diagonal, and I also think you can first WLOG remove any rows and columns with 0s on the diagonal (since if there are any other known non-zero entries in their row or column then the matrix has no PSD completions).

I'm also happy to just assume the diagonal entries are known. I think these are pretty weird corner cases.

It's not quite this simple, the same issue arises if every PSD completion of the known-diagonal minor has zero determinant (e.g. ((?, 1, 2), (1, 1, 1), (2, 1, 1))). But I think in that case making the remaining diagonal entries large enough still makes the eigenvalues at least −ε, which is good enough.

What matters is not whether or not there is another 0 on the diagonal, but whether or not there is another PSD non definite matrix on the diagonal. For example, in the comment from Jacob_Hilton, they introduce the 2x2 matrix ((1,1),(1,1)), with eigenvalues [0,1], which is a PSD non definite. 

I agree with assuming that all diagonal entries are known. You can even assume that all entries are 1 on the diagonal WLOG.

I changed the statement to assume the diagonal is known, I agree this way is cleaner and prevents us from having to think about weird corner cases.

Just to spell it out, am I right in thinking that you can scale all the matrix elements so that each entry on the diagonal is the sign of its original value, i.e. -1 or 1, but if the resulting diagonal contains any -1s then it can't be PSD?

Yes indeed. One definition of a PSD matrix is some matrix  such that, for any vector  (so, it defines some kind of scalar product).

If , then you can always divide the whole row/column by , which is equivalent to applying some scaling, this won't change the fact that  is PSD or not.

If , then if you try the vector , you can check that , thus the matrix isn't PSD.

Love to see a well-defined, open mathematical problem whose solution could help make some progress on AI alignment! It's like a little taste of not being a pre-paradigmic field. Maybe someday, we'll have lots of problems like this that can engage the broader math/CS community, that don't involve so much vague speculation and philosophy :)

I understand this bounty is roughly closed, but the page is still listed as bounty active and my attempt took me only a couple hours.

Problem 2 is solvable when there are simple restrictions on the entries chosen.

Restriction 1: No diagonal entries need to be correct. Solution: Let C = AAT For each entry c_ij we construct the matrix with only c_ii, c_ij, c_ji , c_jj and all else zero. Clearly this matrix is constructible in O(n) work. So all entries can be constructed in this way. Note if c_ij and c_ji are requested, only construct one matrix for the pair. Repeating this for all m does O(nm) work. We then sum all such matricies to get our estimate E. Only the diagonal will possibly need summing so O(nm) work is done. Clearly each constructed matrix is positive semidefinite as it is a principal submatrix of C. The sum is also positive semidefinite because the individual matricies are positive semidefinite.

Restriction 2: Only 1 entry per row-column number is needed other than the main diagonal. The above solution still works.

This comment proposes a strategy for (2) I'll call Active Matrix Completion (AMC), which allows querying of elements of  given some cost/budget. In our case, each query takes O(n), so we can afford O(m) total queries.

Some notes:

  • O(n) seems really fast - for instance, evaluating the bilinear form  takes O(n^2). This makes me suspect any solution to (2) will rely on some form of AMC.
  • AMC exists in literature - this paper gives a nice introduction and (I think) proves that O(n*rank(A)) queries suffice.
  • One strategy that seems intuitively appealing is querying enough elements to guarantee that the resulting adjacency graph is chordal, which (I think) allows for efficient PSD completion (see Vandenberghe; algorithm 10.2). Thus, the following proposition is stronger than (2):

PROPOSITION: given a graph G = (V, E) with n nodes and m edges, can we efficiently (in O(mn) time) determine a small (with at most Cm edges for universal constant C) chordal completion of G?

The proposition seems "too good to be true", especially since chordal completion seems like a hard computational problem (NP-complete, no great approximations are known), so I'm most interested in easy counterexamples. For instance, a "small" graph with only "large" chordal completions would confirm the falsehood of the proposition.

So, uh, any graph theory experts around?

I'm currently working on this problem and feel like I'm making headway. Wondering if the bounty is still active?

The bounty is still active. (I work at ARC)

I don't see any recent publications for paul christiano related to this. So i guess the problem[s] is still open.

(possibly trivial) observation: the generalization replacing "entries" with "linear constraints" seems almost as easy? I'd strongly suspect impossibility of the former implies impossibility of the latter (thus showing equivalence)

In Question 1, can I consider that O(n^3)=O(nm) since m=Ω(n) can't be greater than n^2?

(I'm not the OP but) absolutely not. The problem is about incomplete matrices, and the idea is to get an upper bound linked to m, i.e. an upper bound linked to how incomplete the matrix is. If you rephrase the question 1 as O(n^3), this is a completely different question, because now you only care about how big the matrix is, not how complete it is. 

Also, since question 1 can be achieved with some gaussian pivot in O(n^3), and it also implies being able to tell whether a matrix is PSD or not, I think the best known complexity wrt n is indeed O(n^3).

re: (1), you might be able to phrase the problem as convex optimization in the unknown parameters, which is probably \tilde{O}(n^2)

[This comment is no longer endorsed by its author]Reply

better: the problem can be phrased as a feasibility problem in semidefinite programming given m linear constraints; no idea if there are (randomized?) algorithms which run fast enough

[This comment is no longer endorsed by its author]Reply

As a side note, GPT-4 correctly identifies both problems as open:

1: ...Determining whether a partial matrix has a PSD completion is a challenging problem, and finding an efficient algorithm with time complexity O(nm) is an open question. The problem can be formulated as a semidefinite programming (SDP) problem or as a convex optimization problem, but the complexity of these approaches is usually higher than O(nm). There is no known algorithm that can determine whether a partial matrix has a PSD completion in time complexity O(nm), nor is there a known proof that this problem is at least as hard as dense matrix multiplication or PSD testing. Further research in the field of optimization and linear algebra is needed to provide a more definitive answer.

2: ...It is important to note that finding an algorithm that can solve this problem with time complexity ~O(nm) remains an open question in the field of optimization and linear algebra.

I don't think that anyone has ever mentioned problem #2 in the literature and so I would at least consider that answer somewhat misleading.

I strongly suspect that if we put in a similar-sounding but very easy problem GPT-4 would be just as likely to say that it's an open question.

Initially, it has provided a solution for the first problem. I asked it: "please check your answer for any errors", and then it apologized for the erroneous solution, and made the aforementioned assessment that the problem is still open.

With the second problem, it was a somewhat similar story: it answered with a partial solution involving a Cholesky decomposition, and then clarified that the problem is open. 

Seems to be worth talking with GPT-4 on the matter. Perhaps it could give some useful hints. 

When I tried asking about this problem, and referenced the above comment about being able to assume that all entries on the diagonal are positive, GPT-4 said the following:

If an n×n matrix is a symmetric positive semidefinite (PSD) matrix, it can be interpreted as a Gram matrix representing the dot products of a set of n vectors, and this configuration has 0 stress.

To understand this, let's first consider what a Gram matrix is. Given a set of n vectors v_1, v_2, ..., v_n in a d-dimensional space, the Gram matrix G is an n×n matrix where the entry G(i, j) is the dot product of vectors v_i and v_j:

G(i, j) = v_i • v_j

A Gram matrix is always symmetric (i.e., G(i, j) = G(j, i)) and positive semidefinite.

Now, let's consider the stress function in the context of multidimensional scaling (MDS). The stress function measures the difference between the given dissimilarities (distances) and the distances in the lower-dimensional space. When the stress is 0, this means that the distances in the lower-dimensional space exactly match the given dissimilarities.

Is GPT-4 correct? If so, am I interpreting it correctly that this problem could be rephrased as "given an incomplete set of m desired pairwise distances between n points, determine whether there exists some configuration of those points in an n-dimensional space such that the pairwise distances between those points are exactly as desired"?

Since we are only given m distances maybe a numerical algorithm for MDS might work. For example we might start with a random nxn matrix and minimize the strain by gradient descent. For the desired runtime this would just have to converge in $O(m/n)$ steps. I'm not sure if this is realistic though. Also there might be better numerical algorithms for this specific problem. Has anyone looked into this yet?

I think running a single strain minimization iteration on m points in n dimensions takes O(m*n) steps. So there would need to be some reason to expect that it would converge in some constant (though possibly large) number of steps.

Unless you're saying "for each node, run the strain minimization step until it converges, then do the same for each subsequent node". I don't know if the greedy algorithm works there, but if it does then maybe?

Also I kinda expect that if there's something that works in O(n*m*log(m)) that's probably fine.

(and yeah, "try the greedy exact solution for each node" was my "dumb solution" attempt).

I think your interpretation is correct, I'm not entirely sure. There might be n+1 points in total, because the diagonal coefficients give you the distance to 0 I think?

The Gram matrix bit is certainly correct, though I'm not sure how the distances come into it. The m constraints are constraints on dot products, so more angles (well, cosines) than distances between vectors.

If you are given the dot products exactly, that's equivalent to being given the distances exactly.

(I think this reduction goes the wrong way though, if I wanted to solve the problem about distances I'd translate it back into dot products.)

I think you can convert between the two representations in O(m) time, which would mean that any algorithm that solves either version in O(n*m) solves both in O(n*m).

Do you have some large positive and negative examples of the kinds of sparse matrix you're trying to check for the existence of a PSD completion on, or alternatively a method for generating such examples with a known ground truth? I have a really dumb idea for a possible algorithm here (that shamelessly exploits the exact shape of this problem in a way that probably doesn't generalize to being useful for broader problems like MDS) that I think would complete in approximately the time constraints you're looking for. It almost certainly won't work, but I think it's at least worth an hour of my time to check and figure out why (especially since I'm trying to improve my linear algebra skills anyway).

Edit: there's the obvious approach, which I'm trying, of "start with only 1s on the diagonal and then keep adding random entries until it no longer has a PSD completion, then removing random entries until it does, and repeat to build a test set" but I doubt that covers the interesting corners of the problem space.

Edit 2: the really dumb thing does not work. I think I haven't ruled out that a slightly less dumb approach could work though?

Edit 3: never mind, my really dumb "solution" requires inverting a matrix that is, in the worst case, nxn, if e.g. you have an input that looks like

1 n n n n n n
n 1 - - - - n
n - 1 - - - n
n - - 1 - - n
n - - - 1 - n
n - - - - 1 n
n n n n n n 1

you'll have to invert 6 2x2 matrices and one each of 3x3 to 7x7 matrices.

You mean as in, if I have two points and , then ?

Guess so, but yeah, I just don't tend to think of it immediately as a part of the distances rather than just vectors. It works though since you mentioned the diagonal elements are known.

[+][comment deleted]7mo10