Posts

Sorted by New

Wiki Contributions

Comments

+1 I was really really upset safe.ai decided to use an established acronym for something very different

Answer by Jan KulveitJun 17, 202250

Yes, check eg https://www.lesswrong.com/posts/H5iGhDhQBtoDpCBZ2/announcing-the-alignment-of-complex-systems-research-group or https://ai.objectives.institute/ or also partially https://www.pibbss.ai/

You won't find much of this on LessWrong, due to LW being an unfavorable environment for this line of thinking.

It's probably worth noting you seem to be empirically wrong: I'm pretty confident I'd be able to do >half of human jobs, with maybe ~3 weeks of training, if I was able to understand all human languages (obviously not in parallel!) Many others here would be able to do the same.

The criterion is not as hard as it seems, because there are many jobs like cashiers or administratrative workers or assembly line workers which are not that hard to learn.

It's probably worth noting that I take the opposite update from the covid crisis: it was much easier to get governments listen to us and do marginally more sensible things than expected. With better preparation and larger resources, it would have been possible to cause order of magnitude more sensible things to happen. Also it's worth noting some governments were highly sensible and agentic about covid

Similary to johnswentworth: My current impression is core alignment problems are the same and manifest at all levels - often sub-human version just looks like a toy version of the scaled-up problem, and the main difference is, in the sub-human version problem, you can often solve it for practical purposes by plugging in human at some strategic spot. (While I don't think there are deep differences in the alignment problem space, I do think there are differences in the "alignment solutions" space, where you can use non-scalable solutions, or in risk space, where dangers being small due to the systems being stupid.)

I'm also unconvinced about some of practical claims about differences for wildly superintelligent systems. 

One crucial concern related to "what people want" is this seems underdefined, un-stable in interactions with wildly superintelligent systems, and prone to problems with scaling of values within systems where intelligence increases.  By this line of reasoning, if the wildly superintelligent system is able to answer me these sort of questions "in a way I want", it very likely must be already aligned. So it feels like part of the worries was assumed away. Paraphrasing the questions about human values again, one may ask "how did you get to the state where you have this aligned wildly superintelligent system which is able to answer questions about human values, as opposed to e.g. overwriting what humans believe about themselves by it's own non-human-aligned values?".

Ability to understand itself seems a special case of competence: I can imagine systems which are wildly superhuman in their ability to understand the rest of the world, but pretty mediocre at understanding themselves, e.g. due to some problems with recursion, self-references, reflections, or different kinds of computations being used at various levels of reasoning. As a result, it seems unclear whether the ability to clearly understand itself is a feature of all wildly super-human systems. (Toy counterexample: imagine a device which would connect someone in ancient Greece with our modern civilization, and our civilization dedicating about 10% of global GDP to answering questions from this guy. I would argue this device is for most practical purposes wildly superhuman compared to this individual guy in Greece, but at the same time bad at understanding itself)

Fundamentally inscrutable thoughts seems like something which you can study with present day systems as toy models. E.g., why does AlphaZero believe something is a good go move? Why does a go grand-master believe something is a good move? What counts as a 'true explanation'? Who is the recipient of the explanation? Are you happy with explanation of the algorithm like 'upon playing myriad games, my general functional approximator is approximating the expected value of this branch of an unimaginably large choice tree is larger than for other branches?'? If yes, why? If no, why not?

Inscrutable influence-seeking plans seem also a present problem. Eg, if there are already some complex influence-seeking patterns now, how would we notice? 

Answer by Jan KulveitFeb 17, 2020150

Getting oriented fast in complex/messy real world situations in fields in which you are not an expert

  • For example, now, one topic to get oriented in would be COVID; I think for a good thinker, it should be achievable to have big-picture understanding of the situation comparable to a median epidemiologist after few days of research
      • Where the point isn't to get an accurate forecast of some global variable which is asked on metaculus, but gears-level model of what's going on / what are the current 'critical points' which will have outsized impact / ...
      • In my impression, compared to some of the 'LessWrong-style rationality', this is more heavily dependent on 'doing bounded rationality well' - that is, finding the most important bits / efficiently ignoring almost all information, in contrast to carefully weighting several hypothesis which you already have

Actually trying to change something in the world where the system you are interacting with has significant level of complexity & somewhat fast feedback loop (&it's not super-high-stakes)

  • Few examples of seemingly stupid things of this type I did
    • filled a lawsuit without the aid of a lawyer (in low-stakes case)
    • repaired various devices with value much lower than value of my time
    • tinkering with code in a language I don't know
    • trying to moderate Wikipedia article on highly controversial topic about which two groups of editors are fighting

One thing I'm a bit worried about in some versions of LW rationality & someone should write a post about is something like ... 'missing opportunities to actually fight in non-super-high-stakes matters'', in the martial arts metaphor.

I like the metaphor!

Just wanted to note: in my view the original LW Sequences are not functional as a stand-alone upgrade for almost any human mind, and you can empirically observe it: You can think about any LW meet-up group around the world as an experiment, and I think to a first approximation it's fair to say aspiring Rationalists running just on the Sequences do not win, and good stuff coming out of the rationalist community was critically dependent of presence of minds Eliezer & others. (This is not say Sequences are not useful in many ways)

Answer by Jan KulveitDec 14, 201960

I basically agree with Vanessa:

the correct rule is almost always: first think about the problem yourself, then go read everything about it that other people did, and then do a synthesis of everything you learned inside your mind.

Thinking about the problem myself first often helps me understand existing work as it is easier to see the motivations, and solving solved problems is good as a training.

I would argue this is the case even in physics and math. (My background is in theoretical physics and during my high-school years I took some pride in not remembering physics and re-deriving everything when needed. It stopped being a good approach for physics ca since 1940 and somewhat backfired.)

The mistake members of "this community" (LW/rationality/AI safety) are sometimes making is skipping the second step / bouncing off the second step if it is actually hard.

Second mistake is not doing the third step in a proper way, which leads to somewhat strange and insular culture which may be repulsive for external experts. (E.g. people partially crediting themselves for discoveries which are know to outsiders)

Epistemic status: Wild guesses based on reading del Guidice's Evolutionary psychopathology and two papers trying to explain autism in terms of predictive processing. Still maybe better than the "tower hypothesis"

0. Let's think in terms of two parametric model, where one parameter tunes something like capacity of the brain, which can be damaged due to mutations, disease, etc., and the other parameter is explained bellow.

1. Some of the genes that increase risk of autism tune some parameter of how sensory prediction is handled, specifically, making the system to expect higher precision from sensory inputs/being less adaptive about it. (lets call it parameter p)

2. Several hypothesis - Mildly increased p sounds like something which should be somewhat correlated with increased learning / higher intelligence;

  • something which can force the system to build more exact representations, notice more "rule violations", keep track of more patterns, etc.
  • (also if abstract concepts are subject to the same machinery as sensoria, it would be something like having higher precision in abstract/formal reasoning)

3. But note: tune it up even more, and the system starts to break; too much weight is put on sensory experience, "normal world experience" becomes too surprising which leads to seek more repetitive behaviours and highly predictable environments. In the abstract, it becomes difficult to handle fluidity, rules which are vague and changing,...

4. In the two-parameter space of capacity and something like surprisal handling, this creates a picture like this

  • the space of functional minds is white, the orange space is where things break (in practice the boundary is not sharp)
  • for functional minds, g is something like capacity c + 0.1 * p; for minds in the orange area this no longer holds and on the contrary increasing p makes the the mind work worse
  • highly intelligent people can have higher values of p and still be quite functional
  • blue dotted area is what is diagnosed as autism; this group should be expected to have on average low g

Parts of the o.p. can be reinterpreted as

  • in this picture, some genes mean movement to the right; they are selected because of slight correlation with g
  • random mutations/infections/ etc. generally mean movement down
  • overall fitness profile of right-moving genes is somewhat complex (movement to the left or right is good or bad in different parts of the graph)

Even if this is simple, it makes some predictions (in the sense that the results are likely already somewhere in the literature, just I don't know whether this is true or not when writing this)

  • What happens if you move parameter p in the opposite direction? you get a mind less grounded in sensory inputs and stronger influence of 'downstream' predictions. In small quantities this would manifest as e.g. "clouds resembling animals" more for such people. Move to the left much more, and the system also breaks down, via hallucinations, everyday experience seemingly fitting arbitrary explanations despite many details not fitting, etc. This sounds like some symptoms of schizophrenia; the model predicts mild movement in the direction of schizophrenia should decrease g a bit

Note

With a map of brains/minds into two dimensional space it is a priori obvious that it will fail in explaining the original high-dimensional problem, in many ways; many other dimensions are not orthogonal but actually "project" to the space (e.g. something like "brain masculinisation" has nonzero projection on p), there are various regulatory system like g means better ability to compensate via metacognition, or social support.

Load More