# Shortform Content

People are not being careful enough about what they mean when they say "simulator" and it's leading to some extremely unscientific claims. Use of the "superposition" terminology is particularly egregious.

I just wanted to put a record of this statement into the ether so I can refer back to it and say I told you so.

# Reflections on bay area visit

GPT-4 generated TL;DR (mostly endorsed but eh):

1. The beliefs of prominent AI safety researchers may not be as well-founded as expected, and people should be cautious about taking their beliefs too seriously.
2. There is a tendency for people to overestimate their own knowledge and confidence in their expertise.
3. Social status plays a significant role in the community, with some individuals treated like "popular kids."
4. Important decisions are often made in casual social settings, such as lunches and parties.
5. Geographical separation of com
...

I think that the magnitude of the AI alignment problem has been ridiculously overblown & our ability to solve it widely underestimated.

I've been publicly called stupid before, but never as often as by the "AI is a significant existential risk" crowd.

That's OK, I'm used to it.

Devastating and utter communication failure?

Fixing the ticker-tape problem, or the disconnect between how we write and how we read

Between the tedious wash steps of the experiment I'm running, I've been tinkering with Python. The result is aiRead.

aiRead integrates the ideas about active reading I've accumulated over the last four years. Although its ChatGPT integration is its most powerful feature, this comment is about an insight I've gleaned by using its ticker-tape display feature.

Mostly, I sit down at my desk to read articles on my computer screen. I click a link, and what appears is a column of ...

As far as I can tell, people typically use the orthogonality thesis to argue that smart agents could have any motivations. But the orthogonality thesis is stronger than that, and its extra content is false - there are some goals that are too complicated for a dumb agent to have, because the agent couldn't understand those goals. I think people should instead directly defend the claim that smart agents could have arbitrary goals.

I no longer endorse this claim about what the orthogonality thesis says.

# Concision is especially important for public speakers

If I was going to give a talk in front of 200 people, it being 1 minute unnecessarily less consise wastes ~3 hours of the audience's time in total, so I should be willing to spend up to 3 hours to change that.

In 95%-ile isn't that good, Dan Luu writes:

Most people consider doing 30 practice runs for a talk to be absurd, a totally obsessive amount of practice, but I think Gary Bernhardt has it right when he says that, if you're giving a 30-minute talk to a 300 person audience, that's 150 person-hours watching your talk, so it's not obviously unreasonable to spend 15 hours practicing (and 30 practice runs will probably be less than 15 hours since you can cut a number of the runs short and/or repeatedly practice problem sections).

“On average, buildings that are being blasted with a firehose right now are significantly more likely to be on fire than the typical structure, but this does not mean we should ban fire departments as a clear fire hazard.” Byrne Hobart

Would it be possible to use a huge model (e.g. an LLM) to interpret smaller networks, and output human-readable explanations? Is anyone working on something along these lines?

I'm aware Kayla Lewis is working on something similar (but not quite the same thing) on a small scale. In my understanding, from reading her tweets, she's using a network to predict the outputs of another network by reading its activations.

In a previous post, I described my current alignment research agenda, formalizing abstractions of computations. One among several open questions I listed was whether unique minimal abstractions always exist. It turns out that (within the context of my current framework), the answer is yes.

I had a complete post on this written up (which I've copied below), but it turns out that the result is completely trivial if we make a fairly harmless assumption: The information we want the abstraction to contain is only a function of the output of the computation, not ...

I still don't get the goal of what you are trying to do (the puzzle this work should clarify), which I feel like I should've. As a shot in the dark, maybe abstract interpretation [https://courses.cs.washington.edu/courses/cse501/15sp/papers/jones.pdf][1] in general and abstracted abstract machines [https://arxiv.org/abs/1105.1743][2] in particular might be useful for something here. -------------------------------------------------------------------------------- 1. ND Jones, F Nielson (1994) Abstract Interpretation: A Semantics-Based Tool for Program Analysis ↩︎ 2. D Van Horn, M Might (2011) Abstracting Abstract Machines: A Systematic Approach to Higher-Order Program Analysis ↩︎
3Erik Jenner1d
Thanks for the pointers, I hadn't seen the Abstracting Abstract Machines paper before. If you mean you specifically don't get the goal of minimal abstractions under this partial order: I'm much less convinced they're useful for anything than I used to be, currently not sure. If you mean you don't get the goal of the entire agenda, as described in the earlier agenda post [https://www.alignmentforum.org/posts/L8LHBTMvhLDpxDaqv/research-agenda-formalizing-abstractions-of-computations-1]: I'm currently mostly thinking about mechanistic anomaly detection. Maybe it's not legible right now how that would work using abstractions, I'll write up more on that once I have some experimental results (or maybe earlier). (But happy to answer specific questions in the meantime.)

I meant the general agenda. For abstract interpretation, I think the relevant point is that quotienting a state space is not necessarily a good way of expressing abstractions about it, for some sense of "abstraction" (the main thing I don't understand is the reasons for your choice of what to consider abstraction). Many things want a set of subspaces (like a topology, or a logic of propositions) instead of a partition, so that a point of the space doesn't admit a unique "abstracted value" (as in equivalence class it belongs to), but instead has many "abstr...

random thought: are the most useful posts typically karma approximately 10, and 40 votes to get there? what if it was possible to sort by controversial? maybe only for some users or something? what sorts of sort constraints are interesting in terms of incentivizing discussion vs agreement? blah blah etc

Why don't most AI researcher engage with Less Wrong? What valuable criticism can be learnt from it, and how can it be pragmatically changed?

My girlfriend just returned from a major machine learning conference. She judged less than 1/18 of the content was dedicated to AI safety rather than capability, despite an increasing number of the people at the conference being confident of AGI in the future (like, roughly 10-20 years, though people avoided nailing down a specific number). And the safety talk was more of a shower thought.

And yet, Less Wrong and ...

-1LVSN16d
You write really long paragraphs. My sense of style is to keep paragraphs at 1200 characters or less at all times, and the mean average paragraph no larger than 840 characters after excluding sub-160 character paragraphs from the averaged set. I am sorry that I am not good enough to read your text in its current form; I hope your post reaches people who are.
6Portia16d
Thank you for the feedback, and I am sorry. ADHD. The main question was basically, why do you think AI researchers generally not engage with Less Wrong/MIRI/Eliezer, what are good reasons behind that that should be taken to heart as valuable learning experiences, and what are bullshit reasons that can and should still be addressed, considered challenges to hack.  I just see a massive discrepancy between what is going on in this community here, and the people actually working in AI implementation and policy, they feel like completely separate spheres. I see problems in both spheres, as well as my own sphere (academic philosophy), and immense potential for mutual gain if cooperation and respect were deepened, and would like to merge them, and wonder how.  I do not see the primary challenge in making a good argument as to why this would be good. I see the primary challenge as a social hacking challenge which includes passing tests set by another group you do not agree with.

This is a huge practical issue that seems to not get enough thought, and I'm glad you're thinking about it. I agree with your summary of one way forward. I think there's another PR front; many educated people outside of the relevant fields are becoming concerned.

It sounds like the ML researchers at that conference are mostly familiar with MIRI style work. And they actually agree with Yudkowsky that it's a dead end. There's a newer tradition of safety work focused on deep networks. That's what you mostly see in the Alignment Forum. And it's what you see in ...

OpenAI says they are using ChatGPT 4 internally: "We’ve also been using GPT-4 internally, with great impact on functions like support, sales, content moderation, and programming. We also are using it to assist humans in evaluating AI outputs, starting the second phase in our alignment strategy." https://openai.com/research/gpt-4

Does this mean what I think it means? That they are using this AI to analyse and optimise the code the AI themself run on? Does anyone know if OpenAI have confirmed or denied this, or given information on safeguards that are in plac...

"We also are using it to assist humans in evaluating AI outputs, starting the second phase in our alignment strategy."

Probably something along the lines of RLAIF? Anthropic's Claude might be more robustly tuned because of this, though GPT-4 might already have similar things as part of its own training.

Does severe vitamin C deficiency (i.e. scurvy) lead to oxytocin depletion?

According to Wikipedia

The activity of the PAM enzyme [necessary for releasing oxytocin fromthe neuron] system is dependent upon vitamin C (ascorbate), which is a necessary vitamin cofactor.

I.e. if you don't have enough vitamin C, your neurons can't release oxytocin. Common sensically, this should lead to some psychological/neurological problems, maybe with empathy/bonding/social cognition?

Quick googling "scurvy mental problems" or "vitamin C deficiency mental symptoms" doesn't r...

Showing 3 of 4 replies (Click to show all)
1Mateusz Bagiński3d
Huh, you're right. I thought most fruits have enough to cover daily requirements.
1Carl Feynman3d
Googling for "scurvy low mood", I find plenty of sources that indicate that scurvy is accompanied by "mood swings — often irritability and depression".  IIRC, this has remarked upon for at least two hundred years.

That's also what this meta-analysis found but I was mostly wondering about social cognition deficits (though looking back I see it's not clear in the original shortform)

1. April 9 Williamsburg VA Meetup, AI, and thank goodness Spring is almost upon us

If you find yourself in or near Williamsburg, Virginia on 2023/04/09 come join for a Virginia Rationalists Meetup and the Williamsburg 2nd Sundays Art & Music Festival.

This week was a bit overwhelming in AI news, with GPT-4 releasing, new Midjourney, Stanford's Alpaca, more AI offerings from Google, Microsoft CoPilot 365, and honestly a bunch more things. I've spent too much time already talking with the GPT-4 version of ChatGPT given how long it's actually been available...

Arbitrary incompleteness invites gameability, and arbitrary specificity invites exceptioncraft.

Showing 3 of 5 replies (Click to show all)
0Dagon3d
Exceptioncraft is seeking results within a set of constraints that don't make the path to those results obvious.  Engineering and gaming are just other words for understanding the constraints deeply enough to find the paths to desired (by the engineer) results.  Powered heavier-than-air flight is gaming the rules of physics, utilizing non-obvious aerodynamic properties to overcome gravity.  Using hold-to-maturity accounting to bypass rules on risk/capitalization is financial engineering in search of profits.   The words you choose are political, with embedded intentional beliefs, not definitional and objective about the actions themselves.
1LVSN3d
Yes. Well now that was out of left-field! People don't normally say that without having a broader disagreement at play. I suppose you have a more-objective reform-to-my-words prepared to offer me? My point about the letter of the law being more superficial than the spirit seems like a robust observation, and I think my choice of words accurately, impartially, and non-misleadingly preserves that observation;  until you have a specific argument against the objectivity, your response amounts to an ambiguously adversarially-worded request to imagine I was systematically wrong and report back my change of mind. I would like you to point my imagination in a promising direction; a direction that seems promising for producing a shift in belief.

Yeah, I suspect we mostly agree, and I apologize for looking to find points of contention.

Could someone please ELI5 why using a CNOT gate (if the target qubit was initially zero) does not violate the no-cloning theorem?

EDIT:

Oh, I think I got it. The forbidden thing is to have a state "copied and not entangled". CNOT gate creates a state that is "copied and entangled", which is okay, because you can only measure it once (if you measure either the original or the copy, the state of the other one collapses). The forbidden thing is to have a copy that you could measure independently (e.g. you could measure the copy without collapsing the original).

5Joey Marcellino4d
Just to (hopefully) make the distinction a bit more clear: A true copying operation would take |psi1>|0> to |psi1>|psi1>; that's to say, it would take as input one qubit in an arbitrary quantum state and a second qubit in |0>, and output two qubits in the same arbitrary quantum state that the first qubit was in. For our example, we'll take |psi1> to be an equal superposition of 0 and 1: |psi1> = |0> + |1> (ignoring normalization). If CNOT is a copying operation, it should take (|0> + |1>)|0> to (|0> + |1>)(|0> + |1>) = |00> + |01> + |10> + |11>. But as you noticed, what it actually does is create an entangled state (in this case, a Bell state) that looks like |00> + |11>. So in some sense yes, the forbidden thing is to have a state copied and not entangled, but more importantly in this case CNOT just doesn't copy the state, so there's no tension with the no-cloning theorem.
4Viliam3d
Thank you! Some context: I am a "quantum autodidact", and I am currently reading a book Q is for Quantum [https://www.qisforquantum.org/], which is a very gentle, beginner-friendly introduction to quantum computing. I was thinking how it relates to the things I have read before, and then I noticed that I was confused. I looked at Wikipedia, which said that CNOT does not violate the no-cloning theorem... but I didn't understand the explanation why. I think I get it now. |00> + |11> is not a copy (looking at one qubit collapses the other), |00> + |01> + |10> + |11> would be a copy (looking at one qubit would still leave the other as |0> + |1>).

I recommend this article by the discoverers of the no-cloning theorem for a popular science magazine over the Wikipedia page for anyone trying to understand it.

Are we heading towards an new financial crisis?

Mark to market changes since 2009, combined with the recent significant interest hikes, seems to make bank balance sheets "unreliable".

Mark to market changes broadly means that banks can have certain assets on their balance sheet, and the value of the asset is set via mark to model (usually meaning its marked down as worth face value).

Banks traditionally have a ton of bonds on their balance sheet, and a lot of those are governed by mark to model and not mark to market.

Interest rates go up a lot, which leads to...

Showing 3 of 18 replies (Click to show all)
2Gerald Monroe3d
The feds. Note the basis for my statement is that Treasury note you can think of as an exchangable paper you can barter for its face value of 7B or so. So by the Fed giving 10 billion to the bank and taking the paper they are adding (10-7) 3B in new cash. I may be totally wrong because I don't understand all the mechanics, derivatives, and so on that this operation actually involves.
1JNS3d
That's not how it works. The 10B are new money, unless they came from someone not the FED (notes are not money).

See the barter argument. Also yeah the Fed will probably issue a new note for 10B which removes exactly that from the economy.

Academic philosophers are better than average at evaluating object-level arguments for some claim. They don't seem to be very good at thinking about what rationalization in search implies about the arguments that come up. Compared to academic philosophers, rationalists strike me as especially appreciating filtered evidence and its significance to your world model.

If you find an argument for a claim easily, then even if that argument is strong, this (depending on some other things) implies that similarly strong arguments on the other side may turn up with n...

Walk me through a through a structured, superforecaster-like reasoning process of how likely it is that [X]. Define and use empirically testable definitions of [X]. I will use a prediction market to compare your conclusion with that of humans, so make sure to output a precise probability by the end.