LESSWRONG
LW

TheManxLoiner
16911240
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
3TheManxLoiner's Shortform
7mo
6
Compressed Computation is (probably) not Computation in Superposition
TheManxLoiner20d10

Is code for experiments open source?

Reply
TheManxLoiner's Shortform
TheManxLoiner3mo110

In Sakana AI's paper on AI Scientist v-2, they claim that the sytem is independent of human code.  Based on quick skim, I think this is wrong/deceptful. I wrote up my thoughts here: https://lovkush.substack.com/p/are-sakana-lying-about-the-independence

Main trigger was this line in the system prompt for idea generation: "Ensure that the proposal can be done starting from the provided codebase."

Reply
Top AI safety newsletters, books, podcasts, etc – new AISafety.com resource
TheManxLoiner4mo60

Substacks:
- https://aievaluation.substack.com/
- https://peterwildeford.substack.com/
- https://www.exponentialview.co/
- https://milesbrundage.substack.com/


Podcasts:
- Cognitive Revolution. https://www.cognitiverevolution.ai/tag/episodes/
- Doom debates. https://www.youtube.com/@DoomDebates
- AI policy podcast https://www.csis.org/podcasts/ai-policy-podcast

Worth checking this too: https://forum.effectivealtruism.org/posts/5Hk96JqpEaEAyCEud/how-do-you-follow-ai-safety-news 

Reply
Conditional Importance in Toy Models of Superposition
TheManxLoiner4mo10

Vague thoughts/intuitions:

  • Using the word "importance" I think is misleading. Or, makes it harder to reason about the connection between this toy scenario and real text data. In real comedy/drama, there is patterns in the data to let me/the model deduce it is comedy or drama and hence allow me to focus on the conditionally important features.
  • Phrasing the task as follows helps me: You will be given 20 random numbers x1 to x20. I want you to find projections that can recover x1 to x20. Half the time I will ignore your answers from x1 to x10 and the other half the time x11 to x20. It's totally random which half of the numbers I will ignore. xi and x_{10+i} get the same reward, and reward decreases for bigger i. Now, I find it easier to understand the model: the "obvious" strategy is to make sure I can reproduce x1 and x11, then x2 and x12, and so on, putting little weight on x10 and x20. Alternatively, this is equivalent to having fixed importance of (0.7, 0.49,...,0.7,0.49,...) without any conditioning.
  • Follow up Id be interested in is if the conditional importance was deducible from the data. E.g. x is a "comedy" if x1 + ... + x20 > 0. Or if x1>0. With same architecture, I'd predict getting the same results though...? Not sure how the model could make use of this pattern.
  • And contrary to Charlie, I personally found the experiment crucial to understanding the informal argument. Shows how different ppl think!
Reply
Thoughts on Toy Models of Superposition
TheManxLoiner4mo10

there are features such as X_1 which are perfectly recovered

Just to check, in the toy scenario, we assume the features in R^n are the coordinates in the default basis. So we have n features X_1, ..., X_n

 

Separately, do you have intuition for why they allow network to learn b too? Why not set b to zero too?

Reply
[PAPER] Jacobian Sparse Autoencoders: Sparsify Computations, Not Just Activations
TheManxLoiner5mo21

If you’d like to increase the probability of me writing up a “Concrete open problems in computational sparsity” LessWrong post

I'd like this!

Reply1
Shallow review of technical AI safety, 2024
TheManxLoiner6mo41

I think this is missing from the list. https://wba-initiative.org/en/25057/. Whole brain architectue initiative.

Reply
TheManxLoiner's Shortform
TheManxLoiner7mo10

Should LessWrong have an anonymous mode? When reading a post or comments, is it useful to have the username or does that introduce bias?

I had this thought after reading this review of LessWrong: https://nathanpmyoung.substack.com/p/lesswrong-expectations-vs-reality

Reply
Visual demonstration of Optimizer's curse
TheManxLoiner7mo31

Sounds sensible to me!

Reply
Visual demonstration of Optimizer's curse
TheManxLoiner7mo10

What do we mean by U−V?

I think the setting is:

  • We have a true value function V
  • We have a process to learn an estimate of V. We run this process once and we get U
  • We then ask an AI system to act so as to maximize U (its estimate of human values)

So in this context, U−V is just a fixed function measuring the error between the learnt values and true values.

I think confusion could be using the term U to represent both a single instance or the random variable/process.

Reply
Load More
13Adding noise to a sandbagging model can reveal its true capabilities
5d
1
24Two flaws in the Machiavelli Benchmark
5mo
0
9Liron Shapira vs Ken Stanley on Doom Debates. A review
6mo
0
3TheManxLoiner's Shortform
7mo
6
9How to make evals for the AISI evals bounty
7mo
0
5Scattered thoughts on what it means for an LLM to believe
8mo
4
47AI as a powerful meme, via CGP Grey
9mo
8
26Distillation of 'Do language models plan for future tokens'
1y
2
2How to build a data center, by Construction Physics
1y
0
3AI Safety Institute's Inspect hello world example for AI evals
1y
0
Load More