Wiki Contributions


Maybe it's an apple of discord thing? You claim to devote resources to a good cause, and all the other causes take it as an insult?

If you really want to create widespread awareness of the broad definition, the thing to do would be to use the term in all the ways you currently wouldn't.

E.g. "The murderer realized his phone's GPS history posed a significant infohazard, as it could be used to connect him to the crime."

If Bostrom's paper is our Schelling point, 'infohazard' encompasses much more than just the collectively-destructive smallpox-y sense.

Here's the definition from the paper.

Information hazard: A risk that arises from the dissemination or the potential dissemination of (true) information that may cause harm or enable some agent to cause harm.

'Harm' here does not mean 'net harm'. There's a whole section on 'Adversarial Risks', cases where information can harm one party by benefitting another party:

In competitive situations, one person’s information can cause harm to another even if no intention to cause harm is present. Example:  The rival job applicant knew more and got the job.

ETA: localdeity's comment below points out that it's a pretty bad idea to have a term that colloquially means 'information we should all want suppressed' but technically also means 'information I want suppressed'. This isn't just pointless pedantry.

I agree that there's a real sense in which the genome cannot 'directly' influence the things on the bulleted list. But I don't think 'hardcoded circuitry' is the relevant kind of 'direct'.

Instead, I think we should be asking whether genetic changes can produce isolated effects on things on that list.

E.g. If there can be a gene whose only observable-without-a-brain-scan effect is to make its carriers think differently about seeking power, that would indicate that the genome has fine-grained control at the level of concepts like 'seeking power'. I think this would put us in horn 1 or 2 of the trilemma, no matter how indirect the mechanism for that control.

(I suppose the difficult part of testing this would be verifying the 'isolated' part)

Important update from reading the paper: Figure A3 (the objective and subjective outcomes chart) is biased against the cash-receiving groups and can't be taken at face value. Getting money did not make everything worse. The authors recognize this; it's why they say there was no effect on the objective outcomes (I previously thought they were just being cowards about the error bars).

The bias is from an attrition effect: basically, control-group members with bad outcomes disproportionately dropped out of the trial. Search for 'attrition' in the paper to see their discussion on this.

This doesn't erase the study; the authors account for this and remain confident that the cash transfers didn't have significant positive impacts. But they conclude that most or all of the apparent negative impacts are probably illusory.

Note that after day 120 or so, all three groups' balances decline together. Not sure what that's about.

The latter issue might become more tractable now that we better understand how and why representations are forming, so we could potentially distinguish surprisal about form and surprisal about content.

I would count that as substantial progress on the opaqueness problem.

The ideal gas law describes relations between macroscopic gas properties like temperature, volume and pressure. E.g. "if you raise the temperature and keep volume the same, pressure will go up". The gas is actually made up of a huge number of individual particles each with their own position and velocity at any one time, but trying to understand the gas's behavior by looking at long list of particle positions/velocities is hopeless.

Looking at a list of neural network weights is analogous to looking at particle positions/velocities. This post claims there are quantities analogous to pressure/volume/temperature for a neutral network (AFAICT it does not offer an intuitive description of what they are)

I've downvoted this comment; in light of your edit, I'll explain why. Basically, I think it's technically true but unhelpful.

There is indeed "no mystery in Americans getting fatter if we condition on the trajectory of mean calorie intake", but that's a very silly thing to condition on. I think your comment reads as if you think it's a reasonable thing to condition on.

I see in your comments downthread that you don't actually intend to take the 'increased calorie intake is the root cause' position. All I can say is that in my subjective judgement, this comment really sounds like you are taking that position and is therefore a bad comment.

(And I actually gave it an agreement upvote because I think it's all technically true)

I agree that (1) is an important consideration for AI going forward, but I don't think it really applies until the AI has a definite goal. AFAICT the goal in developing systems like GPT is mostly 'to see what they can do'.

I don't fault anybody for GPT completing anachronistic counterfactuals—they're fun and interesting. It's a feature, not a bug. You could equally call it an alignment failure if GPT-4 starting being a wet blanket and giving completions like

Prompt: "In response to the Pearl Harbor attacks, Otto von Bismarck said" Completion: "nothing, because he was dead."

In contrast, a system like IBM Watson has a goal of producing correct answers, making it unambiguous what the aligned answer would be.

To be clear, I think the contest still works—I just think the 'surprisingness' condition hides a lot of complexity wrt what we expect in the first place.

Load More