Aaro Salosensaari — LessWrong

LESSWRONG
LW

Replying toDouglas Hofstadter changes his mind on Deep Learning & AI risk (June 2023)?

Douglas Hofstadter changes his mind on Deep Learning & AI risk (June 2023)?

>It turns out that using Transformers in the autoregressive mode (with output tokens being added back to the input by concatenating the previous input and the new output token, and sending the new versions of the input through the model again and again) results in them emulating dynamics of recurrent neural networks, and that clarifies things a lot...

I'll bite: Could you dumb down the implications of the paper a little bit, what is the difference between a Transformer emulating a RNN and some pre-Transformer RNNs and/or not-RNN?

My much more novice-level answer to Hofstadter's intuition would have been: it's not the feedforward firing, but it is the gradient descent training of the model on massive scale (both in data and in computation). But apparently you think that something RNN-like about the model structure itself is important?

Replying toLatent variables for prediction markets: motivation, technical guide, and design considerations

Aaro Salosensaari3y

Latent variables for prediction markets: motivation, technical guide, and design considerations

Epistemic status: I am probably misunderstanding some critical parts of the theory, and I am quite ignorant on technical implementation of prediction markets. But posting this could be useful for my and others' learning.

First question. Am I understanding correctly how the market would function. Taking your IRT probit market example, here is what I gather:

(1) I want to make a bet on the conditional market P(X_i | Y). I have a visual UI where I slide bars to make a bet on parameters a and b; (dropping subscript i) however, internally this is represented by a bet on a' = a sigma_y and P(X) = Phi(b'/sqrt(a'+1)), b' = b + a mu_y.... (read 673 more words →)

Replying toOn not getting contaminated by the wrong obesity ideas

Aaro Salosensaari3y

On not getting contaminated by the wrong obesity ideas

>Glancing back and forth, I keep changing my mind about whether or not I think the messy empirical data is close enough to the prediction from the normal distribution to accept your conclusion, or whether that elbow feature around 1976-80 seems compelling.

I realize you two had a long discussion about this, but my few cents: This kind of situation (eyeballing is not enough to resolve which of two models fit the data better) is exactly the kind of situation for which a concept of statistical inference is very useful.

I'm a bit too busy right now to present a computation, but my first idea would be to gather the data and run a... (read more)

Replying toThe Alignment Community Is Culturally Broken

Aaro Salosensaari3y

The Alignment Community Is Culturally Broken

Hyperbole aside, how many of those experts linked (and/or contributing to the 10% / 2% estimate) have arrived to their conclusion with a thought process that is "downstream" from the thoughtspace the parent commenter thinks suspect? Then it would not qualify as independent evidence or rebuttal, as it is included as the target of criticism.

Replying toOpen & Welcome Thread - July 2022

Aaro Salosensaari4y

Open & Welcome Thread - July 2022

Thanks. I had read it years ago, but didn't remember that he had many more points than O(n^3.5 log(1/h)) scale and provides useful references (other than Red Plenty).

Replying toOpen & Welcome Thread - July 2022

Aaro Salosensaari4y

Open & Welcome Thread - July 2022

(I initially thought it would be better not to mention the context of the question as it might bias the responses. OTOH the context could make the marginal LW poster more interested in providing answers, so I here it is:)

It came up in an argument that the difficulty of economic calculation problem could be a difficult to a hypothetical singleton, insomuch a singleton agent needs certain amount of compute relative to the economy in question. My intuition consists two related hypotheses: First, during any transition period where any agent participates in global economy where most other participants are humans ("economy" could be interpreted widely to include many human transactions), can the problem... (read more)

Replying toOpen & Welcome Thread - July 2022

Aaro Salosensaari4y

Open & Welcome Thread - July 2022

Can anyone recommend good reading material on economic calculation problem?

Replying toThe Cage of the Language

Aaro Salosensaari4y

The Cage of the Language

I found this interesting. Finnish is also language of about 5 million speakers, but we have a commonly used natural translation of "economies of scale" (mittakaavaetu, "benefit of scale"). Any commonplace obvious translation for "Single point of failure" didn't strike my mind, so I googled, and found engineering MSc thesis works and similar documents: the words they choose to use included yksittäinen kriittinen prosessi ("single critical process", most natural one IMO), yksittäinen vikaantumispiste ("single point of failure", literal translation and a bit clumsy one), yksittäinen riskikohde ("single object of risk", makes sense but only in the context), and several phrases that chose to explain the concept.

Small languages need active caretaking and cultivation... (read more)

Replying toQuestions about ''formalizing instrumental goals"

Aaro Salosensaari4y

Questions about ''formalizing instrumental goals"

"if I were an AGI, then I'd be able to solve this problem" "I can easily imagine"

Doesn't this way of analysis come with a ton of other assumptions left unstated?

Suppose "I" am an AGI running on a data center and I can modeled as an agent with some objective function that manifest as desires and I know my instantiation needs electricity and GPUs to continue running. Creating another copy of "I" running in the same data center will use the same resources. Creating another copy in some other data center requires some other data center.

Depending on the objective function and algorithm and hardware architecture bunch of other things, creating copies may result... (read 561 more words →)

Replying to[RETRACTED] It's time for EA leadership to pull the short-timelines fire alarm.

Aaro Salosensaari4y

[RETRACTED] It's time for EA leadership to pull the short-timelines fire alarm.

Why wonder when you can think: What is the substantial difference in MuZero (as described in [1]) that makes the algorithm to consider interruptions?

Maybe I show some great ignorance of MDPs, but naively I don't see how an interrupted game could come into play as a signal in the specified implementations of MuZero:

Explicit signals I can't see, because the explicitly specified reward u seems contingent ultimately only on the game state / win condition.

One can hypothesize an implicit signal could be introduced if algorithm learns to "avoid game states that result in game being terminated for out-of-game reason / game not played until the end condition", but how such learning would happen? Can MuZero interrupt the game during training? Sounds unlikely such move would be implemented in Go or Shogi environment. Are there any combination of moves in Atari game that could cause it?

[1] https://arxiv.org/abs/1911.08265

Genetic algorithms are an old and classic staple of LW. [1]

Genetic algorithms (as used in optimization problems) traditionally assume "full connectivity", that is any two candidates can mate. In other words, population network is assumed to be complete and potential mate is randomly sampled from the population.

Aymeric Vié has a paper out showing (numerical experiments) that some less dense but low average shortest path length network structures appear to result in better optimization results: https://doi.org/10.1145/3449726.3463134

Maybe this isn't news for you, but it is for me! Maybe it is not news to anyone familiar with mathematical evolutionary theory?

This might be relevant for any metaphors or thought experiments where you wish to invoke GAs.

[1] https://www.lesswrong.com/search?terms=genetic%20algorithms

I have a small intuition pump I am working on, and thought maybe others would find it interesting.

Consider a habitat (say, a Petri dish) that in any given moment has maximum carrying capacity for supporting 100 000 units of life (say, cells), and two alternative scenarios.

Scenario A. Initial population of 2 cells grows exponentially, one cell dying but producing two descendants each generation. After the 16th generation, the habitat overflows, and all cells die in overpopulation. The population experienced a total of 262 142 units of flourishing.

Scenario B. More or less stable population of x cells (x << 100 000 units, say, approximately 20) continues for n generations, for total of x... (read more)

Aaro Salosensaari's Shortform

Aaro Salosensaari

This is a special post for quick takes (aka "shortform"). Only the owner can create top-level comments.