Sorted by New

Wiki Contributions


Covert side channels like you're suggesting would probably be a related and often helpful thing for someone trying to do what OP is talking about, but I think the side channels are distinct from the things they can be used for.

This concept in radio communications would be "spread spectrum", reducing the signal intensity or duration in any given part of the spectrum and using a wider band/more channels.  See especially military spread spectrum comms and radars.  E.g. this technique has been used to frustrate simple techniques for identifying the location of a radio transmitter, to avoid jamming, and to defeat radar warning/missile warning systems on jets.

It's pretty easy to find reasons why everything will hopefully be fine, or AI hopefully won't FOOM, or we otherwise needn't do anything inconvenient to get good outcomes.  It's proving considerably harder (from my outside the field view) to prove alignment, or prove upper bounds on rate of improvement, or prove much of anything else that would be cause to stop ringing the alarm.

FWIW I'm considerably less worried than I was when the Sequences were originally written.  The paradigms that have taken off since do seem a lot more compatible with straightforward training solutions that look much less alien than expected.  There are plausible scenarios where we fail at solving alignment and still get something tolerably human shaped, and none of those scenarios previously seemed plausible.  That optimism just doesn't take it under the stop worrying threshold.

Admittedly I skimmed large portions of that, but I'd like to take a crack at bridging some of that inferential distance with a short description of the model I've been using, whereby I keep all the concerns you brought up straight but also don't have to choke on pronouns.

Categories of Men and Women are useful in a wide variety of areas and point at a real thing.  There's a region in the middle these categories overlap and lack clean boundaries - while both genetics and birth sex are undeniable and straightforward fact in almost all cases (~98% IIRC), they don't make the wide ranging good predictions you'd otherwise expect in this region.  I've mentally been calling this the "gender/sex/identity is complicated" region.  Within this region, carefully consider which category is more relevant and go with that; other times a weighted average may be more appropriate.

By way of example if I want to infer likely skill-sets, hobbies, or interests for someone trans, I'm probably looking at either their pre-transition category, or a weighted average based on years before vs after transition.

On the other hand if I'm considering how a friend or conversation partner might prefer to be treated, I'd almost certainly be correct to infer based on claimed/stated gender until I know more.

On the one hand I can definitely see why those threads got under your skin (and shocked The Thoughts You Cannot Think didn't get a link); not the finest showing in clear thinking.  Ultimately though I'm skeptical that we should treat pronouns as making some deep claim about the structure of person-space along the axis of sex.  If anything, that there's conflict at all should serve to highlight that there's a large region (as much as 20% of the population maybe???) where this isn't cut and dry and simple rules aren't making good predictions.  Looking at that structure there's a decent if not airtight case for treating pronouns as you would any other nicknames or abbreviations - namely acceptable insofar as the referent finds the name acceptable.  There are places where a "no pseudonyms allowed, no exceptions" rule should and does trump "preferred moniker"/"no name-calling", but Twitter clearly isn't one.

I think a key distinction here is any of this only helps if people care more about the truth of the issue at hand than whatever realpolitik considerations the issue has tangentially gotten pulled into.  And yeah, absent "unreasonable levels of political savvy", academics are mostly relying on academic issues usually being far enough from the icky world of politics to be openly discussed, at least outside of a few seriously diseased disciplines where the rot is well and truly set in.  The powers that be seem to only care about the truth of an issue when it starts directly impinging on their day to day; people seem to find it noteworthy when this isn't true of a given leader.

I don't think this will ever be fully predictable.  E.g. in the US I don't think anyone really saw the magnitude of the backlash against election workers, academics, and security folks coming until it became headline news.  And arguably that's what a near-miss looks like.

This is very much what I want my headlines to look like.  

Personally, preferred mode of consumption would be AM email newsletter like Axios or Morning Brew.

The resolution dates on the markets seem important on several of the headlines and were noticeably missing from the body.

"Crimea land bridge 22% chance of being cut [this year/campaign season], down from 34% according to Insight"

Notice how different that would read with the time horizon on there vs leaving unqualified.  The other big question an update like that begs is "what changed?"

Interesting follow-up: how long do they take to break out of the bad equilibrium if all start there? How about if we choose a less extreme bad equilibrium (say 80 degrees)?

Looking ahead multiple moves seems sufficient to break the equilibrium, but for the started assumption that the other players also have deeply flawed models of your behavior that assume you're using a different strategy - the shared one including punishment. There does seem to be something fishy/circular about baking an assumption about other players strategy into the player's own strategy and omitting any ability to update.

Not sure I'm following the setup and notation quite close enough to argue that one way or the other, as far as the order we're saying the agent receives evidence and has to commit to actions.  Above I was considering the simplest case of 1 bit evidence in, 1 bit action out, repeat.

I pretty sure that could be extended to get that one small key/update that unlocks the whole puzzle sort of effect and have the model click all at once. As you say though, not sure that gets to the heart of the matter regarding the bound; it may show that no such bound exists on the margin, the last piece can be much more valuable on the margin than all the prior pieces of evidence, but not necessarily in a way that violates the proposed bound overall.  Maybe we have to see that last piece as unlocking some bounded amount of value from your prior observations.

It's possible to construct a counterexample where there's a step from guessing at random to perfect knowledge after an arbitrary number of observed bits; n-1 bits of evidence are worthless alone and the nth bit lets you perfectly predict the next bit and all future bits.  

Consider for example shifting bits in one at a time into the input of a known hash function that's been initialized with an unknown value (and known width) and I ask you to guess a specified bit from the output; in the idealized case, you know nothing about the output of the function until you learn the final bit in the input (all unknown bits have shifted out) b/c they're perfectly mixed, and after that you'll guess every future bit correctly.

Seems like the pathological cases can be arbitrarily messy.

Load More