Jan Kulveit

Posts

Sorted by New

Wiki Contributions

Comments

Is civilizational alignment on the table?

Yes, check eg https://www.lesswrong.com/posts/H5iGhDhQBtoDpCBZ2/announcing-the-alignment-of-complex-systems-research-group or https://ai.objectives.institute/ or also partially https://www.pibbss.ai/

You won't find much of this on LessWrong, due to LW being an unfavorable environment for this line of thinking.

The Problem With The Current State of AGI Definitions

It's probably worth noting you seem to be empirically wrong: I'm pretty confident I'd be able to do >half of human jobs, with maybe ~3 weeks of training, if I was able to understand all human languages (obviously not in parallel!) Many others here would be able to do the same.

The criterion is not as hard as it seems, because there are many jobs like cashiers or administratrative workers or assembly line workers which are not that hard to learn.

Intergenerational trauma impeding cooperative existential safety efforts

It's probably worth noting that I take the opposite update from the covid crisis: it was much easier to get governments listen to us and do marginally more sensible things than expected. With better preparation and larger resources, it would have been possible to cause order of magnitude more sensible things to happen. Also it's worth noting some governments were highly sensible and agentic about covid

The alignment problem in different capability regimes

Similary to johnswentworth: My current impression is core alignment problems are the same and manifest at all levels - often sub-human version just looks like a toy version of the scaled-up problem, and the main difference is, in the sub-human version problem, you can often solve it for practical purposes by plugging in human at some strategic spot. (While I don't think there are deep differences in the alignment problem space, I do think there are differences in the "alignment solutions" space, where you can use non-scalable solutions, or in risk space, where dangers being small due to the systems being stupid.)

I'm also unconvinced about some of practical claims about differences for wildly superintelligent systems. 

One crucial concern related to "what people want" is this seems underdefined, un-stable in interactions with wildly superintelligent systems, and prone to problems with scaling of values within systems where intelligence increases.  By this line of reasoning, if the wildly superintelligent system is able to answer me these sort of questions "in a way I want", it very likely must be already aligned. So it feels like part of the worries was assumed away. Paraphrasing the questions about human values again, one may ask "how did you get to the state where you have this aligned wildly superintelligent system which is able to answer questions about human values, as opposed to e.g. overwriting what humans believe about themselves by it's own non-human-aligned values?".

Ability to understand itself seems a special case of competence: I can imagine systems which are wildly superhuman in their ability to understand the rest of the world, but pretty mediocre at understanding themselves, e.g. due to some problems with recursion, self-references, reflections, or different kinds of computations being used at various levels of reasoning. As a result, it seems unclear whether the ability to clearly understand itself is a feature of all wildly super-human systems. (Toy counterexample: imagine a device which would connect someone in ancient Greece with our modern civilization, and our civilization dedicating about 10% of global GDP to answering questions from this guy. I would argue this device is for most practical purposes wildly superhuman compared to this individual guy in Greece, but at the same time bad at understanding itself)

Fundamentally inscrutable thoughts seems like something which you can study with present day systems as toy models. E.g., why does AlphaZero believe something is a good go move? Why does a go grand-master believe something is a good move? What counts as a 'true explanation'? Who is the recipient of the explanation? Are you happy with explanation of the algorithm like 'upon playing myriad games, my general functional approximator is approximating the expected value of this branch of an unimaginably large choice tree is larger than for other branches?'? If yes, why? If no, why not?

Inscrutable influence-seeking plans seem also a present problem. Eg, if there are already some complex influence-seeking patterns now, how would we notice? 

A 'Practice of Rationality' Sequence?

Getting oriented fast in complex/messy real world situations in fields in which you are not an expert

  • For example, now, one topic to get oriented in would be COVID; I think for a good thinker, it should be achievable to have big-picture understanding of the situation comparable to a median epidemiologist after few days of research
      • Where the point isn't to get an accurate forecast of some global variable which is asked on metaculus, but gears-level model of what's going on / what are the current 'critical points' which will have outsized impact / ...
      • In my impression, compared to some of the 'LessWrong-style rationality', this is more heavily dependent on 'doing bounded rationality well' - that is, finding the most important bits / efficiently ignoring almost all information, in contrast to carefully weighting several hypothesis which you already have

Actually trying to change something in the world where the system you are interacting with has significant level of complexity & somewhat fast feedback loop (&it's not super-high-stakes)

  • Few examples of seemingly stupid things of this type I did
    • filled a lawsuit without the aid of a lawyer (in low-stakes case)
    • repaired various devices with value much lower than value of my time
    • tinkering with code in a language I don't know
    • trying to moderate Wikipedia article on highly controversial topic about which two groups of editors are fighting

One thing I'm a bit worried about in some versions of LW rationality & someone should write a post about is something like ... 'missing opportunities to actually fight in non-super-high-stakes matters'', in the martial arts metaphor.

We run the Center for Applied Rationality, AMA

I like the metaphor!

Just wanted to note: in my view the original LW Sequences are not functional as a stand-alone upgrade for almost any human mind, and you can empirically observe it: You can think about any LW meet-up group around the world as an experiment, and I think to a first approximation it's fair to say aspiring Rationalists running just on the Sequences do not win, and good stuff coming out of the rationalist community was critically dependent of presence of minds Eliezer & others. (This is not say Sequences are not useful in many ways)

Under what circumstances is "don't look at existing research" good advice?

I basically agree with Vanessa:

the correct rule is almost always: first think about the problem yourself, then go read everything about it that other people did, and then do a synthesis of everything you learned inside your mind.

Thinking about the problem myself first often helps me understand existing work as it is easier to see the motivations, and solving solved problems is good as a training.

I would argue this is the case even in physics and math. (My background is in theoretical physics and during my high-school years I took some pride in not remembering physics and re-deriving everything when needed. It stopped being a good approach for physics ca since 1940 and somewhat backfired.)

The mistake members of "this community" (LW/rationality/AI safety) are sometimes making is skipping the second step / bouncing off the second step if it is actually hard.

Second mistake is not doing the third step in a proper way, which leads to somewhat strange and insular culture which may be repulsive for external experts. (E.g. people partially crediting themselves for discoveries which are know to outsiders)

Autism And Intelligence: Much More Than You Wanted To Know

Epistemic status: Wild guesses based on reading del Guidice's Evolutionary psychopathology and two papers trying to explain autism in terms of predictive processing. Still maybe better than the "tower hypothesis"

0. Let's think in terms of two parametric model, where one parameter tunes something like capacity of the brain, which can be damaged due to mutations, disease, etc., and the other parameter is explained bellow.

1. Some of the genes that increase risk of autism tune some parameter of how sensory prediction is handled, specifically, making the system to expect higher precision from sensory inputs/being less adaptive about it. (lets call it parameter p)

2. Several hypothesis - Mildly increased p sounds like something which should be somewhat correlated with increased learning / higher intelligence;

  • something which can force the system to build more exact representations, notice more "rule violations", keep track of more patterns, etc.
  • (also if abstract concepts are subject to the same machinery as sensoria, it would be something like having higher precision in abstract/formal reasoning)

3. But note: tune it up even more, and the system starts to break; too much weight is put on sensory experience, "normal world experience" becomes too surprising which leads to seek more repetitive behaviours and highly predictable environments. In the abstract, it becomes difficult to handle fluidity, rules which are vague and changing,...

4. In the two-parameter space of capacity and something like surprisal handling, this creates a picture like this

  • the space of functional minds is white, the orange space is where things break (in practice the boundary is not sharp)
  • for functional minds, g is something like capacity c + 0.1 * p; for minds in the orange area this no longer holds and on the contrary increasing p makes the the mind work worse
  • highly intelligent people can have higher values of p and still be quite functional
  • blue dotted area is what is diagnosed as autism; this group should be expected to have on average low g

Parts of the o.p. can be reinterpreted as

  • in this picture, some genes mean movement to the right; they are selected because of slight correlation with g
  • random mutations/infections/ etc. generally mean movement down
  • overall fitness profile of right-moving genes is somewhat complex (movement to the left or right is good or bad in different parts of the graph)

Even if this is simple, it makes some predictions (in the sense that the results are likely already somewhere in the literature, just I don't know whether this is true or not when writing this)

  • What happens if you move parameter p in the opposite direction? you get a mind less grounded in sensory inputs and stronger influence of 'downstream' predictions. In small quantities this would manifest as e.g. "clouds resembling animals" more for such people. Move to the left much more, and the system also breaks down, via hallucinations, everyday experience seemingly fitting arbitrary explanations despite many details not fitting, etc. This sounds like some symptoms of schizophrenia; the model predicts mild movement in the direction of schizophrenia should decrease g a bit

Note

With a map of brains/minds into two dimensional space it is a priori obvious that it will fail in explaining the original high-dimensional problem, in many ways; many other dimensions are not orthogonal but actually "project" to the space (e.g. something like "brain masculinisation" has nonzero projection on p), there are various regulatory system like g means better ability to compensate via metacognition, or social support.

Minimization of prediction error as a foundation for human values in AI alignment

Based on

(For example, if subagents are assigned credit based on who's active when actual reward is received, that's going to be incredibly myopic -- subagents who have long-term plans for achieving better reward through delayed gratification can be undercut by greedily shortsighted agents, because the credit assignment doesn't reward you for things that happen later; much like political terms of office making long-term policy difficult.)

it seems to me you have in mind a different model than me (sorry if my description was confusing). In my view, you have the world-modelling, "preference aggregation" and action generation done by the "predictive processing engine". The "subagenty" parts basically extract evolutionary relevant features of this (like:hunger level), and insert error signals not only about the current state, but about future plans. (Like: if the p.p. would be planning a trajectory which is harmful to the subagent, it would insert the error signal.).

Overall your first part seems to assume more something like reinforcement learning where parts are assigned credit for good planning. I would expect the opposite: one planning process which is "rewarded" by a committee.

parsimonious theory which matched observations well

With parsimony... predictive processing in my opinion explains a lot for a relatively simple and elegant model. On the theory side it's for example

  • how you can make a bayesian approximator using local computations
  • how hierarchical models can grow in an evolutionary plausible way
  • why predictions, why actions

On the how do things feel for humans from the inside, for example

  • some phenomena about attention
  • what is that feeling when you are e.g. missing the right word, or something seems out of place
  • what's up with psychedelics
  • & more

On the neuroscience side

  • my non-expert impression is the evidence that at least cortex is following the pattern that neurons at higher processing stages generate predictions that bias processing at lower levels is growing

I don't think predictive processing should try to explain all about humans. In one direction, animals are running on predictive processing as well, but are missing some crucial ingredient. In the opposite direction, simpler organisms had older control systems (eg hormones),we have them as well, and p.p. must be in some sense be stacked on top of that.

Why is there this stereotype that the more you can make rocket ships, the more likely you are to break down crying if the social rules about when and how you are allowed to make rocket ships are ambiguous?

This is likely sufficiently explained by the principle component of human mindspace stretching from mechanistic cognition to mentalistic cognition, does not need more explanations (https://slatestarcodex.com/2018/12/11/diametrical-model-of-autism-and-schizophrenia/)

Also I think there are multiple stereotypes of very smart people: eg Feynman or Einstein


Load More