Ulisse Mini

Born too late to explore Earth; born too early to explore the galaxy; born just the right time to save humanity.

https://uli.rocks/about

Sequences

Alignment stream of thought

Wiki Contributions

Comments

From my perspective 9 (scaling fast) makes perfect sense since Conjecture is aiming to stay "slightly behind state of the art", and that requires engineering power.

Added italics. For the next post I'll break up the abstract into smaller paragraphs and/or make a TL;DR.

Copied it from the paper. I could break it down into several paragraphs but I figured bolding the important bits was easier. Might break up abstracts in future linkposts.

Yeah, assuming by "not important" you mean "not relevant" (low attention score)

Was considering saving this for a followup post but it's relatively self-contained, so here we go.

Why are huge coefficients sometimes okay? Let's start by looking at norms per position after injecting a large vector at position 20.

This graph is explained by LayerNorm. Before using the residual stream we perform a LayerNorm

# transformer block forward() in GPT2
x = x + self.attn(self.ln_1(x))
x = x + self.mlp(self.ln_2(x))

If x has very large magnitude, then the block doesn't change it much relative to its magnitude. Additionally, attention is ran on the normalized x meaning only the "unscaled" version of x is moved between positions.

As expected, we see a convergence in probability along each token position when we look with the tuned lens.

You can see how for positions 1 & 2 the output distribution is decided at layer 20, since we overwrote the residual stream with a huge coefficient all the LayerNorm'd outputs we're adding are tiny in comparison, then in the final LayerNorm we get ln(bigcoeff*diff + small) ~= ln(bigcoeff*diff) ~= ln(diff).

Relevant: The algorithm for precision medicine, where a very dedicated father of a rare chronic disease (NGLY1 deficiency) in order to save his son. He did so by writing a blog post that went viral & found other people with the same symptoms.

This article may serve as a shorter summary than the talk.

[APPRENTICE]

Hi I'm Uli and I care about two things: Solving alignment and becoming stronger (not necessarily in that order).

My background: I was unschooled, I've never been to school or had a real teacher. I taught myself everything I wanted to know. I didn't really have friends till 17 when I started getting involved with rationalist-adjacent camps.

I did seri mats 3.0 under Alex Turner, doing some interpretability on mazes. Now I'm working half-time doing interpretability/etc with Alex's team as well as studying.

In rough order of priority, the kinds of mentorship I'm looking for:

  1. Drill Sergeant: I want to improve my general capabilities, there are many obvious things I'm not doing enough, and my general discipline could be improved a lot too.  Akrasia is just a problem to be solved, and one I'll be embarrassed if I haven't ~fully solved by 20. There is much more that I could put here. Instead I'll list a few related thoughts
    1. Meditation is mind-training why isn't everyone doing it, is the world that inadequate?[1]
    2. Introspection tells me the rationalist community has been bad for my thinking in some ways, Lots of groupthink, overconfident cached thoughts about alignment, etc.
    3. I'm pretty bad at deliberating once and focusing medium-term. Too many things started and not enough finished. Working on fixing.
    4. (The list goes on...)
  2. Skills I've neglected: I know relatively little of the sciences, haven't written much outside of math, and know essentially zero history & other subjects.
  3. Skills I'm better in: I want to get really good at machine learning, programming, and applied math. Think 10x ML Engineer/Researcher.
  4. Alignment Theory. I have this pretty well covered, and think the potential costs from groupthink and priming outweigh additional depth here. I've already read too much LessWrong.

 

[MENTOR]

I am very good at learning when I want to be[2]. If you would like someone to yell at you for using obviously inefficient learning strategies (which you probably are), I can do that.

I can also introduce bored high-schoolers with interesting people their age, and give advice related to the stuff I'm good at.

Too busy for intensive mentorship, but async messaging plus maybe a call every week or so could work.

  1. ^

    Semiconsistently meditating an hour a day + walking meditation when traveling. Currently around stage 3-4 in mind illuminated terms (for those not familiar, this is dogshit.)

  2. ^

    Which sadly hasn't been the past year as much as it used to. I've been getting distracted by doing research and random small projects over absorbing fountains of knowledge. In the process of fixing this now.

Taji looked over his sheets. "Okay, I think we've got to assume that every avenue that LessWrong was trying is a blind alley, or they would have found it. And if this is possible to do in one month, the answer must be, in some sense, elegant. So no multiple agents. If we start doing anything that looks like we should call it 'HcH', we'd better stop. Maybe begin by considering how failure to understand pre-coherent minds could have led LessWrong astray in formalizing corrigibility."

"The opposite of folly is folly," Hiriwa said. "Let us pretend that LessWrong never existed."

(This could be turned into a longer post but I don't have time...)

I think the gold standard is getting advice from someone more experienced. I can easily point out the most valuable things to white-box for people less experienced then me.

Perhaps the 80/20 is posting recordings of you programming online and asking publicly for tips? Haven't tried this yet but seems potentially valuable.

I tentatively approve of activism & trying to get govt to step in. I just want it to be directed in ways that aren't counterproductive. Do you disagree with any of my specific objections to strategies, or the general point that flailing can often be counterproductive? (Note not all activism i included in flailing, flailing, it depends on the type)

Load More