Wiki Contributions

Comments

Yeah, plus all the other stuff Alexander and Metz wrote about it, I guess.

It's just a figure of speech for the sorts of thing Alexander describes in Kolmogorov Complicity. More or less the same idea as "Safe Space" in the NYT piece's title—a venue or network where people can have the conversations they want about those ideas without getting yelled at or worse.

Mathematician Andrey Kolmogorov lived in the Soviet Union at a time when true freedom of thought was impossible. He reacted by saying whatever the Soviets wanted him to say about politics, while honorably pursuing truth in everything else. As a result, he not only made great discoveries, but gained enough status to protect other scientists, and to make occasional very careful forays into defending people who needed defending. He used his power to build an academic bubble where science could be done right and where minorities persecuted by the communist authorities (like Jews) could do their work in peace...

But politically-savvy Kolmogorov types can’t just build a bubble. They have to build a whisper network...

They have to serve as psychological support. People who disagree with an orthodoxy can start hating themselves – the classic example is the atheist raised religious who worries they’re an evil person or bound for Hell – and the faster they can be connected with other people, the more likely they are to get through.

They have to help people get through their edgelord phase as quickly as possible. “No, you’re not allowed to say this. Yes, it could be true. No, you’re not allowed to say this one either. Yes, that one also could be true as best we can tell. This thing here you actually are allowed to say still, and it’s pretty useful, so do try to push back on that and maybe we can defend some of the space we’ve still got left.”

They have to find at-risk thinkers who had started to identify holes in the orthodoxy, communicate that they might be right but that it could be dangerous to go public, fill in whatever gaps are necessary to make their worldview consistent again, prevent overcorrection, and communicate some intuitions about exactly which areas to avoid. For this purpose, they might occasionally let themselves be seen associating with slightly heretical positions, so that they stand out to proto-heretics as a good source of information. They might very occasionally make calculated strikes against orthodox overreach in order to relieve some of their own burdens. The rest of the time, they would just stay quiet and do good work in their own fields.

That section is framed with

Part of the appeal of Slate Star Codex, faithful readers said, was Mr. Siskind’s willingness to step outside acceptable topics. But he wrote in a wordy, often roundabout way that left many wondering what he really believed.

More broadly, part of the piece's thesis is that the SSC community is the epicenter of a creative and influential intellectual movement, some of whose strengths come from a high tolerance for entertaining weird or disreputable ideas.

Metz is trying to convey how Alexander makes space for these ideas without staking his own credibility on them. This is, for example, what Kolmogorov Complicity is about; it's also what Alexander says he's doing with the neoreactionaries in his leaked email. It seems clear that Metz did enough reporting to understand this.

The juxtaposition of "Scott aligns himself with Murray [on something]" and "Murray has deplorable beliefs" specifically serves that thesis. It also pattern-matches to a very clumsy smear, which I get the impression is triggering readers before they manage to appreciate how it relates to the thesis. That's unfortunate, because the “vague insinuation” is much less interesting and less defensible than the inference that Alexander is being strategic in bringing up Murray on a subject where it seems safe to agree with him.

Muireall1mo1-8

In 2021, I was following these events and already less fond of Scott Alexander than most people here, and I still came away with the impression that Metz's main modes were bumbling and pattern-matching. At least that's the impression I've been carrying around until today. I find his answers here clear, thoughtful, and occasionally cutting, although I get the impression he leaves more forceful versions on the table for the sake of geniality. I'm wondering whether I absorbed some of the community's preconceptions or instinctive judgments about him or journalists in general.

I do get the stubbornness, but I read that mostly as his having been basically proven right (and having put in the work at the time to be so confident).

Answer by MuireallJan 25, 2024245

In the 2D case, there's no escaping exponential decay of the autocorrelation function for any observable satisfying certain regularity properties. (I'm not sure if this is known to be true in higher dimensions. If it's not, then there could conceivably be traps with sub-exponential escape times or even attractors, but I'd be surprised if that's relevant here—I think it's just hard to prove.) Sticking to 2D, the question is just how the time constant in that exponent for the observable in question compares to 20 seconds.

The presence of persistent collective behavior is a decent intuition but I'm not sure it saves you. I'd start by noting that for any analysis of large-scale structure—like a spectral analysis where you're looking at superpositions of harmonic sound waves—the perturbation to a single particle's initial position is a perturbation to the initial condition for every component in the spectral basis, all of which perturbations will then see exponential growth.

In this case you can effectively decompose the system into "Lyapunov modes" each with their own exponent for the growth rate of perturbations, and, in fact, because the system is close to linear in the harmonic basis, the modes with the smallest exponents will look like the low-wave-vector harmonic modes. One of these, conveniently, looks like a "left-right density" mode. So the lifetime (or Q factor) of that first harmonic is somewhat relevant, but the actual left-right density difference still involves the sum of many harmonics (for example, with more nodes in the up-down dimension) that have larger exponents. These individually contribute less (given equipartition of initial energy, these modes spend relatively more of their energy in up-down motion and so affect left-right density less), but collectively it should be enough to scramble the left-right density observable in 20 seconds even with a long-lived first harmonic.

On the other hand, 1 mol in 1 m^3 is not very dense, which should tend to make modes longer-lived in general. So I'm not totally confident on this one without doing any calculations. Edit: Wait, no, I think it's the other way around. Speed of sound and viscosity are roughly constant with gas density and attenuation otherwise scales inversely with density. But I think it's still plausible that you have a 300 Hz mode with Q in the thousands.

Related would be some refactoring of Deception Chess.

When I think about what I'd expect to see in experiments like that, I get curious about a sort of "baseline" set of experiments without deception or even verbal explanations. When can I distinguish the better of two chess engines more efficiently than playing them against each other and looking at the win/loss record? How much does it help to see the engines' analyses over just observing moves? 

How is this related? Well, how deep is Chess? Ratings range between, say, 800 and 3500, with 300 points being enough to distinguish players (human or computer) reasonably well. So we might say there are about 10 "levels" in practice, or that it has a rating depth of 10.

If Chess were Best-Of-30 ChessMove as described above, then ChessMove would have a rating depth a bit below 2 (just dividing by ). In other words, we'd expect it to be very hard to ever distinguish any pair of engines off a single recommended move—and difficult with any number of isolated observations, given our own error-prone human evaluation. If it's closer to Best-Of-30 Don'tBlunder, it's a little more complicated—usually you can't tell the difference because there basically is none, but on rare pivotal moves it will be nearly as easy to tell as when looking at a whole game.

The solo version of the experiment looks like this:

  1. I find a chess engine with a rating around mine, and use it to analyze positions in games against other engines. Play a bunch of games to get a baseline "hybrid" rating for myself with that advisor.
  2. I do the same thing with a series of stronger chess engines, ideally each within a "level" of the last.
  3. I do the same thing with access to the output of two engines, and I'm blinded to which is which. (The blinding might require some care around, for example, timing, as well as openings.) In sub-experiment A, I only get top moves and their scores. In sub-experiment B, I can look at lines from the current position up to some depth. In sub-experiment C, I can use these engines however I want. For example, I can basically play them against each other if I want to run down my own clock doing it. (Because pairs might be within a level of one another, I can't be sure which is stronger from a single win/loss outcome. I'd hope to find more efficient ways of distinguishing them.)
  4. I repeat #3 with different random pairs of advisors.

What I'd expect is that my ratings with pairs of advisors should be somewhere between my rating with the bad advisor and my rating with the good advisor. If I can successfully distinguish them, it's close to the latter. If I'm just guessing, it's close to the former (in the Don'tBlunder world) or to the midpoint (in the ChessMove world). I should have an easier time in sub-experiments B and C. Having a worse engine in the mix weighs me down relatively more (a) the closer the engines are to each other, and (b) the stronger both engines are compared to me.

The main question I'd hope might be answerable this way would be something like, "How do (a) and (b) trade off?" Which is easier to distinguish—1800 and 2100, or, say, 2700 and 3300? Will there be a ceiling beyond which I'm always just guessing? Might I tend to side with worse advisors because, being closer to my level, they agree with me?

It seems like we'd want some handle on these questions before asking how much worse outright deception can be.

(There's some trouble here because higher-ranked players are more likely to draw given a fixed rating difference. This itself is relatively Don'tBlunder-like, and it makes me wonder if it's possible to project how far our best engines are likely to be from perfect play. But it makes it harder to disentangle inability to draw distinctions in play above my level from "natural" indistinguishability. There are also more general issues in doing these experiments with computers—for example, weak engines tend to be weak in ways humans wouldn't be, and it's hard to calibrate ratings for superhuman play.)

(It might also be interesting to automate myself out of this experiment by choosing between recommendations using some simple scripted logic and evaluation by a relatively weak engine.)

Along the lines of what I wrote in the parent, even though I think there's potentially a related and fairly deep "worldview"-type crux (crux generator?) nearby when it comes to AI risk—are we in a ChessMove world or a Don'tBlunder world?—[sorry, these are terrible names, because actual Chess moves are more like Don'tBlunder, which is itself horribly ugly]—I'm not particularly motivated to do this experiment, because I don't think any possible answer on this level of metaphor would be informative enough to shift anyone on more important questions.

I sometimes wonder how much we could learn from toy models of superhuman performance, in terms of what to expect from AI progress. I suspect the answer is "not much", but I figured I'd toss some thoughts out here, as much to discharge any need I feel to think about them further as to see if anyone has any good pointers here.

Like—when is performance about making super-smart moves, and when is it about consistently not blundering for as long as possible? My impression is that in Chess, something like "average centipawn loss" (according to some analysis engine) doesn't explain outcomes as well as "worst move per game". (I don't know the keywords to search for, but I relatedly found this neat paper which finds a power law for the difference between best and second-best moves in a position.) What does Go look like, in comparison?

How deep are games? What's the longest chain of players such that each consistently beats the next? How much comes from the game itself being "deep" versus the game being made up of many repeated small contests? (E.g., the longest chain for best-of-9 Chess is going to be about 3 times longer than that for Chess, if the assumptions behind the rating system hold. Or, another example, is Chess better thought of as Best-Of-30 ChessMove with Elo-like performance and rating per move, or perhaps as Best-Of-30 Don'tBlunder with binary performance per move?)

Where do ceilings come from? Are there diminishing returns on driving down blunder probabilities given fixed deep uncertainties or external randomness? Is there such a thing as "perfect play", and when can we tell if we're approaching it? (Like—maybe there's some theoretically-motivated power law that matches a rating distribution until some cutoff at the extreme tail?)

What do real-world "games" and "rating distributions" look like in this light?

Muireall4mo140

Many times have I heard people talk about ideas they thought up that are ‘super infohazardous’ and ‘may substantially advance capabilities’ and then later when I have been made privy to the idea, realized that they had, in fact, reinvented an idea that had been publicly available in the ML literature for several years with very mixed evidence for its success – hence why it was not widely used and known to the person coming up with the idea.

I’d be very interested if anyone has specific examples of ideas like this they could share (that are by now widely known or obviously not hazardous). I’m sympathetic to the sorts of things the article says, but I don’t actually have any picture of the class of ideas it’s talking about.

It sounds like you're saying that you can tell once someone's started transitioning, not that you can recognize trans people who haven't (or who haven't come out, at least not to a circle including you), right? Whether or not you're right, the spirit of this post includes the latter, too.

This reasoning is basically right, but the answer ends up being 5 for a relatively mundane reason.

If the time-averaged potential energy is k_B T / 2, so is the kinetic energy. Because damping is low, at some point in a cycle, you'll deterministically have the sum of the two in potential energy and nothing in kinetic energy. So you do have some variation getting averaged away.

More generally, while the relaxation timescale is the relevant timescale here, I also wanted to introduce an idea about very fast measurement events like the closing of the electrical circuit. If you have observables correlated on short timescales, then measurements faster than that won't necessarily follow expectations from naive equilibrium thinking.

Load More