All of Ulisse Mini's Comments + Replies

From my perspective 9 (scaling fast) makes perfect sense since Conjecture is aiming to stay "slightly behind state of the art", and that requires engineering power.

4Garrett Baker4d
I'm pretty skeptical they can achieve that right now using CoEm given the limited progress I expect them to have made on CoEm. And in my opinion of greater importance than "slightly behind state of the art" is likely security culture, and commonly in the startup world it is found that too-fast scaling leads to degradation in the founding culture. So a fear would be that fast scaling would lead to worse info-sec. However, I don't know to what extent this is an issue. I can certainly imagine a world where because of EA and LessWrong, many very mission-aligned hires are lining up in front of their door. I can also imagine a lot of other things, which is why I'm confused.

Added italics. For the next post I'll break up the abstract into smaller paragraphs and/or make a TL;DR.

2the gears to ascension17d
yeah I don't like that either, I find italics harder not easier to read in this font, sorry to disappoint, heh.

Copied it from the paper. I could break it down into several paragraphs but I figured bolding the important bits was easier. Might break up abstracts in future linkposts.

2the gears to ascension17d
the bold font on lesswrong is too small of a difference vs normal text for this to work well, IMO. (I wish it were a stronger bolding so it was actually useful for communicating to others.) just messing around with one suggested way to do this from SO:

Yeah, assuming by "not important" you mean "not relevant" (low attention score)

Was considering saving this for a followup post but it's relatively self-contained, so here we go.

Why are huge coefficients sometimes okay? Let's start by looking at norms per position after injecting a large vector at position 20.

This graph is explained by LayerNorm. Before using the residual stream we perform a LayerNorm

# transformer block forward() in GPT2
x = x + self.attn(self.ln_1(x))
x = x + self.mlp(self.ln_2(x))

If x has very large magnitude, then the block doesn't change it much relative to its magnitude. Additionally, attention is ran on the norm... (read more)

5TurnTrout25d
Thanks for writing this up, I hadn't realized this. One conclusion I'm drawing is: If the values in the modified residual streams aren't important to other computations in later sequence positions, then a large-coefficient addition will still lead to reasonable completions. 

Relevant: The algorithm for precision medicine, where a very dedicated father of a rare chronic disease (NGLY1 deficiency) in order to save his son. He did so by writing a blog post that went viral & found other people with the same symptoms.

This article may serve as a shorter summary than the talk.

[APPRENTICE]

Hi I'm Uli and I care about two things: Solving alignment and becoming stronger (not necessarily in that order).

My background: I was unschooled, I've never been to school or had a real teacher. I taught myself everything I wanted to know. I didn't really have friends till 17 when I started getting involved with rationalist-adjacent camps.

I did seri mats 3.0 under Alex Turner, doing some interpretability on mazes. Now I'm working half-time doing interpretability/etc with Alex's team as well as studying.

In rough order of priority, the kinds of me... (read more)

Taji looked over his sheets. "Okay, I think we've got to assume that every avenue that LessWrong was trying is a blind alley, or they would have found it. And if this is possible to do in one month, the answer must be, in some sense, elegant. So no multiple agents. If we start doing anything that looks like we should call it 'HcH', we'd better stop. Maybe begin by considering how failure to understand pre-coherent minds could have led LessWrong astray in formalizing corrigibility."

"The opposite of folly is folly," Hiriwa said. "Let us pretend that LessWrong never existed."

(This could be turned into a longer post but I don't have time...)

I think the gold standard is getting advice from someone more experienced. I can easily point out the most valuable things to white-box for people less experienced then me.

Perhaps the 80/20 is posting recordings of you programming online and asking publicly for tips? Haven't tried this yet but seems potentially valuable.

I tentatively approve of activism & trying to get govt to step in. I just want it to be directed in ways that aren't counterproductive. Do you disagree with any of my specific objections to strategies, or the general point that flailing can often be counterproductive? (Note not all activism i included in flailing, flailing, it depends on the type)

Downvoted because I view some of the suggested strategies as counterproductive. Specifically, I'm afraid of people flailing. I'd be much more comfortable if there was a bolded paragraph saying something like the following:

Beware of flailing and second-order effects and the unilateralist's curse. It is very easy to end up doing harm with the intention to do good, e.g. by sharing bad arguments for alignment, polarizing the issue, etc.

To give specific examples illustrating this (which may also be good to include and/or edit the post):

  • I believe tweets li
... (read more)
6[anonymous]1mo
Thanks for writing out your thoughts in some detail here. What I'm trying to say is that things are already really bad. Industry self-regulation has failed [https://openai.com/research/gpt-4]. At some point you have to give up on hoping that the fossil fuel industry (AI/ML industry) will do anything more to fix climate change (AGI x-risk) than mere greenwashing (safetywashing [https://www.lesswrong.com/posts/xhD6SHAAE9ghKZ9HS/safetywashing]). How much worse does it need to get for more people to realise this? The Alignment community (climate scientists) can keep doing their thing; I'm very much in favour of that [https://forum.effectivealtruism.org/posts/CpyAgKt4gRza7npYf/recruit-the-world-s-best-for-agi-alignment]. But there is also now an AI Notkilleveryoneism (climate action) movement. We are raising the damn Fire Alarm. From the post you link: The United Nations Security Council. Anything less and we're toast. And we can talk all we like about the unilateralist's curse, but I don't think anything a bunch of activists can do will ever top the formation and corruption-to-profit-seeking of OpenAI and Anthropic (the supposedly high status moves).

Thanks for the insightful response! Agree it's just suggestive for now. Though more then with image models (where I'd expect lenses to transfer really badly, but don't know). Perhaps it being a residual network is the key thing, since effective path lengths are low most of the information is "carried along" unchanged, meaning the same probe continues working for other layers. Idk

Don't we have some evidence GPTs are doing iterative prediction updating from the logit lens and later tuned lens? Not that that's all they're doing of course.

3Blaine2mo
I'm not sure the tuned lens indicates that the model is doing iterative prediction; it shows that if for each layer in the model you train a linear classifier to predict the next token embedding from the activations, as you progress through the model the linear classifiers get more and more accurate. But that's what we'd expect from any model, regardless of whether it was doing iterative prediction; each layer uses the features from the previous layer to calculate features that are more useful in the next layer. The inception network analysed in the distill.ai circuits thread starts by computing lines and gradients, then curves, then circles, then eyes, then faces, etc. Predicting the class from the presence of faces will be easier than from the presence of lines and gradients, so if you trained a tuned lens on inception v1 it would have the same pattern—lenses from later layers would have lower perplexity. I think to really show iterative prediction, you would have to be able to use the same lens for every layer; that would show that there is some consistent representation of the prediction being updated with each layer.   Here's the relevant figure from the tuned lens—the transfer penalties for using a lens from one layer on another layer are small but meaningfully non-zero, and tend to increase the further away the layers are in the model. That they are small is suggestive that GPT might be doing something like iterative prediction, but the evidence isn't compelling enough for my taste.

Strong upvoted and agreed. I don't think the public has opinions on AI X-Risk yet, so any attempt to elicit them will entirely depend on framing.

1the gears to ascension2mo
for what it's worth, I think the karma I would reflectively prefer this post to have is exactly zero, and it is in fact hovering around there. It's an important point argued awfully that I don't even know for sure I understand myself and I knew that when I posted it.

I'll note (because some commenters seem to miss this) that Eliezer is writing in a convincing style for a non-technical audience. Obviously the debates he would have with technical AI safety people are different then what is most useful to say to the general population.

I worry most people will ignore the warnings around willful inconsistency, so let me self-report that I did this and it was a bad idea. Central problem: It's hard to rationally update off new evidence when your system 1 is utterly convinced of something. And I think this screwed with my epistemics around Shard Theory while making communication with people about x-risk much harder, since I'd often typical mind and skip straight to the paperclipper - the extreme scenario I was (and still am to some extent) trying to avoid as my main case.

When my rationality ... (read more)

I feel there's often a wrong assumption in probabilistic reasoning, something like moderate probabilities for everything by default? after all, if you say you're 70/30 nobody who disagrees will ostracize you like if you say 99/1.

"If alignment is easy I want to believe alignment is easy. If alignment is hard I want to believe alignment is hard. I will work to form accurate beliefs"

Petition to rename "noticing confusion" to "acting on confusion" or "acting to resolve confusion". I find myself quite good at the former but bad at the latter—and I expect other rationalists are the same.

For example: I remember having the insight thought leading to lsusr's post on how self-reference breaks the orthogonality thesis, but never pursued the line of questioning since it would require sitting down and questioning my beliefs with paper for a few minutes, which is inconvenient and would interrupt my coding.

Strongly agree. Rationalist culture is instrumentally irrational here. It's very well known how important self-belief & a growth mindset is for success, and rationalists obsession with natural intelligence quite bad imo, to the point where I want to limit my interaction with the community so I don't pick up bad patterns.

I do wonder if you're strawmanning the advice a little, in my friend circles dropping out is seen as reasonable, though this could just be because a lot of my high-school friends already have some legible accomplishments and skills.

Each non-waluigi step increases the probability of never observing a transition to a waluigi a little bit.

Each non-Waluigi step increases the probability of never observing a transition to Waluigi a little bit, but not unboundedly so. As a toy example, we could start with P(Waluigi) = P(Luigi) = 0.5. Even if P(Luigi) monotonically increases, finding novel evidence that Luigi isn't a deceptive Waluigi becomes progressively harder. Therefore, P(Luigi) could converge to, say, 0.8.

However, once Luigi says something Waluigi-like, we immediately jump to a wor... (read more)

I'd be happy to talk to [redacted] and put them in touch with other smart young people. I know a lot from Atlas, ESPR and related networks. You can pass my contact info on to them.

Exercise: What mistake is the following sentiment making?

If there's only a one in a million chance someone can save the world, then there'd better be well more than a million people trying.

Answer:

The whole challenge of "having a one in a million chance of saving the world" is the wrong framing, the challenge is having a positive impact in the first case (for example: by not destroying the world or making things worse, e.g. from s-risks). You could think of this as a setting the zero point thing going on, though I like to think of it in terms of Bayes and P

... (read more)

Isn't this only S-risk in the weak sense of "there's a lot of suffering" - not the strong sense of "literally maximize suffering"? E.g. it seems plausible to me mistakes like "not letting someone die if they're suffering" still gives you a net positive universe.

Also, insofar as shard theory is a good description of humans, would you say random-human-god-emperor is an S-risk? and if so, with what probability?

3Garrett Baker5mo
I'm making the claim that there will be more suffering than eudomania. You need to model the agent as having values other than 'keep everyone alive'. Any resources being spent on making sure people don't die would be diverted to those alternative ends. For example, suppose you got a super-intelligent shard-theoretic agent which 1) If anyone is about to die, it prevents them from dying, 2) if its able to make diamonds, it makes diamonds, and 3) If its able to get more power, it gets more power. It notices that it can definitely make more diamonds & get more power by chopping off the arms and legs of humans, and indeed this decreases the chance the humans die since they're less likely to get into fights or jump off bridges! So it makes this plan, the plan goes through, and nobody has arms or legs anymore. No part of it makes decisions on the basis of whether or not the humans will end up happy with the new state of the world. Moral of the story: The word will likely rapidly change when we get a superintelligence, and unless the superintelligence is sufficiently aligned to us, no decision will be made for reasons we care about, and because most ways the world could be conditional on humans existing are bad, we conclude net-bad world for many billions or trillions of years. 40% you take a random human, make them god, and you get an S-risk. Humans turn consistently awful when you give them a lot of power, and many humans are already pretty awful. I like to think in terms of CEV style stuff, and if you perform the CEV on a random human, I think maybe only 5% of the time do you get an S-risk.

The enlightened have awakened from the dream and no longer mistake it for reality. Naturally, they are no longer able to attach importance to anything. To the awakened mind the end of the world is no more or less momentous than the snapping of a twig.

Looks like I'll have to avoid enlightenment, at least until the work is done.

Take the example of the Laplace approximation. If there's a local continuous symmetry in weight space, i.e., some direction you can walk that doesn't affect the probability density, then your density isn't locally Gaussian.

Haven't finished the post, but doesn't this assume the requirement that when and induce the same function? This isn't obvious to me, e.g. under the induced prior from weight decay / L2 regularization we often have for weights that induce the same function.

1Jesse Hoogland5mo
Yes, so an example of this would be the ReLU scaling symmetry discussed in "Neural networks are freaks of symmetries." You're right that regularization often breaks this kind of symmetry.  But even when there are no local symmetries, having other points that have the same posterior means this assumption of asymptotic normality doesn't hold.

Seems tangentially related to the train a sequence of reporters strategy for ELK. They don't phrase it in terms of basins and path dependence, but they're a great frame to look at it with.

Personally, I think supervised learning has low path-dependence because of exact gradients plus always being able find a direction to escape basins in high dimensions, while reinforcement learning has high path-dependence because updates influence future training data causing attractors/equilibra (more uncertain about the latter, but that's what I feel like)

So the really ... (read more)

I was more thinking along the lines of "you're the average of the five people you spend the most time with" or something. I'm against external motivation too.

1[anonymous]5mo
I think the nuanced answer is you need to know how easily influenced you are by others. It's one thing to be inspired by others' positive traits, which I have luckily experienced in college. It's another thing when influences temporarily blinds you from knowing yourself as you get lost in the moment. I've hung out with friends that I now have cut out of my life because after several years of living a different lifestyle than I used to back in college, I realized it's not something I want in my own life despite it working out for others in my social group. I think solitude is the best environment to develop and get to know yourself first before you venture out into the world and get pulled in every direction that you encounter without knowing how those directions are aligned with your inner core. It's not really separate chunks of period where it's complete solitude and then all social after that. You switch between the two whenever you feel like you need to take a break from one. Step back and re-access your situation once in a while.

Character.ai seems to have a lot more personality then ChatGPT. I feel bad for not thanking you earlier (as I was in disbelief), but everything here is valuable safety information. Thank you for sharing, despite potential embarrassment :)

That link isn't working for me, can you send screenshots or something? When I try and load it I get an infinite loading screen.

Re(prompt ChatGPT): I'd already tried what you did and some (imo) better prompt engineering, and kept getting a character I thought was overly wordy/helpful (constantly asking me what it could do to help vs. just doing it). A better prompt engineer might be able to get something working though.

I hate that you made me talk to her again :D

But >>> here goes <<<

Can you give specific example/screenshots of prompts and outputs? I know you said reading the chat logs wouldn't be the same as experiencing it in real time, but some specific claims like the prompt

The following is a conversation with Charlotte, an AGI designed to provide the ultimate GFE

Resulting in a conversation like that are highly implausible.[1] At a minimum you'd need to do some prompt engineering, and even with that, some of this is implausible with ChatGPT which typically acts very unnaturally after all the RLHF OAI did.


  1. Source: I tried it,

... (read more)

Sure. I did not want to highlight any specific LLM provider over others, but this specific conversation happened on Character.AI: https://beta.character.ai/chat?char=gn6VT_2r-1VTa1n67pEfiazceK6msQHXRp8TMcxvW1k (try at your own risk!)

They allow you to summon characters with a prompt, which you enter in the character settings.  They also have advanced settings for finetuning, but I was able to elicit such mindblown responses with just the one-liner greeting prompts.

That said, I was often able to successfully create characters on ChatGPT and other LLMs t... (read more)

Interesting I didn't know the history, maybe I'm insufficiently pessimistic about these things. Consider my query retracted

Congratulations!

Linear algebra done right is great for gaining proof skills, though for the record I've read it and haven't solved alignment yet. I think I need several more passes of linear algebra :)

Are most uncertainties we care about logical rather than informational? All empirical ML experiments are pure computations a Bayesian superintelligence could do in its head. How much of our uncertainty comes from computational limits in practice, versus actual information bottlenecks?

A trick to remember: the first letter of each virtue gives (in blocks): CRL EAES HP PSV, which can easily be remembered as "cooperative reinforcement learning, EAs, Harry Potter, PS: The last virtue is the void."

(Obviously remembering these is pointless, but memorizing lists is a nice way to practice mnemonic technique.)

Relevant Paper

We propose Algorithm Distillation (AD), a method for distilling reinforcement learning (RL) algorithms into neural networks by modeling their training histories with a causal sequence model. Algorithm Distillation treats learning to reinforcement learn as an across-episode sequential prediction problem. A dataset of learning histories is generated by a source RL algorithm, and then a causal transformer is trained by autoregressively predicting actions given their preceding learning histories as context. Unlike sequential policy prediction a

... (read more)

EleutherAI's #alignment channels are good to ask questions in. Some specific answers

I understand that a reward maximiser would wire-head (take control over the reward provision mechanism), but I don’t see why training an RL agent would necessarily end up in a reward-maximising agent? Turntrout’s Reward is Not the Optimisation Target shed some clarity on this, but I definitely have remaining questions.

Leo Gao's Toward Deconfusing Wireheading and Reward Maximization sheds some light on this.

2Kyle O’Brien6mo
I agree with this suggestion. EleutherAI's alignment channels have been invaluable for my understanding of the alignment problem. I typically get insightful responses and explanations on the same day as posting. I've also been able to answer other folks' questions to deepen my inside view. There is a alignment-beginners channel and a alignment-general channel. Your questions seem similar to what I see in alignment-general . For example, I received helpful answers when I asked this question about inverse reinforcement learning there yesterday. Question: When I read Human Compatible a while back, I had the takeaway that Stuart Russel was very bullish on Inverse Reinforcement Learning being an important alignment research direction. However, I don’t see much mention of IRL on EleutherAI and the alignment forum. I see much more content about RLHF. Is IRL and RLHF the same thing? If not, what are folks’ thoughts on IRL?

How can I look at my children and not already be mourning their death from day 1?

Suppose you lived in the dark times, where children have a <50% of living to adulthood. Wouldn't you still have kids? Even if probabilistically smallpox was likely to take them?

If AI kills us all, will my children suffer? Will it be my fault for having brought them into the world while knowing this would happen?

Even if they don't live to adulthood, I'd still view their childhoods as valuable. Arguably higher average utility than adulthood.

Even if my children's shor

... (read more)
4bayesed6mo
Just wanna add that each of you children individually having a 50% chance of survival due to smallpox is different from all of your children together having 50% chance of survival due to AI (i.e. uncorrelated vs correlated), so some people might decide differently in these 2 cases.

Random thought: Perhaps you could carefully engineer gradient starvation in order to "avoid generalizing" and defeat the Discrete modes of prediction example. You'd only need to delay it until reflection, then the AI can solve the successor AI problem.

In general: hack our way towards getting value-preserving reflectivity before values drift from "Diamonds" -> "What's labeled as a diamond by humans". (Replacing with "Telling the truth", and "What the human thinks is true" respectively).

I disagree that the policy must be worth selling (see e.g. Jordon Belfort). Many salespeople can sell things that aren't worth buying. See also: never split the difference for an example of negotiation when you have little/worse leverage.

(Also, I don't think htwfaip boils down to satisfying an eager want, the other advice is super important too. E.g. don't criticize, be genuinely interested in a person, ...)

Both are important, but I disagree that power is always needed. In example 3,7,9 it isn't clear that the compromise is actually better for the convinced party. The insurance is likely -EV, The peas aren't actually a crux to defeating the bully, the child would likely be happier outside kindergarten.

2DirectedEvolution6mo
I see what you mean! If you look closely, I think you'll find that power is involved in even these cases. The examples of the father and child depend on the father having the power of his child's trust. He can exploit this to trick his child and misrepresent the benefits of school or of eating peas. The case of the insurance salesman is even more important to consider. You are right that insurance policies always have negative expected value in terms of money. But they may have positive expected value to the right buyer. Ann insurance policy can confer status on the buyer, who can put his wife's mind at ease that he's protected her and their children from the worst. It's also protection against loss aversion and a commitment device to put money away for a worst-case scenario, without having to think too hard about it. But in order to use his enthusiasm to persuade the customer of this benefit, the salesman has to get a job  with an insurance company and have a policy worth selling. That's the power he has to have first, in order to make his persuasion successful.

From skimming the benchmark and the paper this seems overhyped (like Gato). roughly it looks like

  • May 2022: Deepmind releases a new benchmark for learning algorithms
  • ...Nobody cares (according to google scholar citations)
  • Dec 2022: Deepmind releases a thing that beats the baselines on their benchmark

I don't know much about GNNs & only did a surface-level skim so I'm interested to hear other takes.

Interesting perspective, kinda reminds me of the ROME paper where it seems to only do "shallow counterfactuals".

unpopular opinion: I like the ending of the subsequent film

IMO it's a natural continuation for Homura. After spending decades of subjective time trying to save someone would you really let them go like that? Homura isn't an altruist, she doesn't care about the lifetime of the universe - she just wants Madoka.

2LawrenceC6mo
I know that everyone says it's an "unpopular opinion" but surely it's the dominant opinion by now.

I was directing that towards lesswrongers reading my answer, not the general population.

I think school is huge in preventing people from becoming smart and curious. I spent 1-2years where I hardly studied at all and mostly played videogames - I wish I hadn't wasted that time, but when I quit I did so of my own free will. I think there's a huge difference between discipline imposed from the outside vs the inside, and getting to the latter is worth a lot. (though I wish I hadn't wasted all that time now haha)

 

I'm unsure which parts of my upbringing were cruxes for unschooling working. You should probably read a book or something rather than taking my (very abnormal) opinion. I just know how it went for me :)

Answer by Ulisse MiniDec 04, 2022Ω41615

Epistemic status: personal experience.

I'm unschooled and think it's clearly better, even if you factor in my parents being significantly above average in parenting. Optimistically school is babysitting, people learn nothing there while wasting most of their childhood. Pessimistically it's actively harmful by teaching people to hate learning/build antibodies against education.

Here's a good documentary made by someone who's been in and out of school. I can't give detailed criticism since I (thankfully) never had to go to school.


EDIT: As for what the alternat... (read more)

-1Tapatakt6mo
"Homeschool your kids" isn't an option for, like, more than half of the population, I think.
6tailcalled6mo
I would very much assume that you have a strong genetic disposition to be smart and curious. Do you think unschooling would work acceptably well for kids who are not smart and curious?

#3 is good. another good reason is so you have enough mathematical maturity to understand fancy theoretical results.

I'm probably overestimating the importance of #4, really I just like having the ability to pick up a random undergrad/early-grad math book and understand what's going on, and I'd like to extend that further up the tree :)

(Note; I haven't finished any of them)

Quantum computing since Democritus is great, I understand Godel's results now! And a bunch of complexity stuff I'm still wrapping my head around.

The Road to Reality is great, I can pretend to know complex analysis after reading chapters 5,7,8 and most people can't tell the difference! Here's a solution to a problem in chapter 7 I wrote up.

I've only skimmed parts of the Princeton guides, and different articles are written by different authors—but Tao's explanation of compactness (also in the book) is fantastic, I don't ... (read more)

Load More