Wiki Contributions


I don’t see why alignment inherently needs more realism than intelligence

I was focused solely on testing alignment. I'm pretty confused about how much realism is needed to produce alignment.

explicitly we are designing sims with dualistic physics (mind is a fully separate non matter-based phenomenon)

I guess I should have realized that, but it did not seem obvious enough for me to notice.

So it doesn’t seem that much any extra effort is required to get reasonable sized populations of agents

I'm skeptical. I don't know whether I'll manage to quantify my intuitions well enough to figure out how much we disagree here.

The AGI in simboxes shouldn’t be aware of concepts like utility functions, let alone the developers or the developers modifying their utility functions (or minds).

It seems likely that some AGIs would notice changes in behavior. I expect it to be hard to predict what they'll infer.

But now that I've thought a bit more about this, I don't see any likely path to an AGI finding a way to resist the change in utility function.

I've got modest positions in GOOGL, MSFT.

I've got a bit more than 5% of my portfolio in semiconductor and related stocks: KLAC, LSE:SMSN, MTRN, AOSL, ASYS, AMKR, TRT, SCIA. I'm likely to buy more sometime in the next year, but I'm being patient because we're likely at a poor part of an industry cycle.

Robotics seem likely to benefit from AI. I suspect the main winners will be companies that aren't yet public, as I'm not too impressed by the opportunities I see so far. I'm playing this mainly via tiny positions in LIDAR companies (INVZ, OUST, LAZR) and SYM.

I have a modest position in OSS.

I tried to invest in Conjecture, but they don't seem interested in small investors.

I suspect that testing is one of the more important bottlenecks.

I suspect that some current systems are safe enough if their caution is dialed up to where they're annoyingly slow 2% of the time, and that leaves them not quite reliable enough at reaching a destination to be competitive.

so long as their utility functions aren’t negatives of each other (or equally exotic in some other way)

Why doesn't setting some of the utility functions to red-team the others make them sufficiently antagonistic?

I agree with you at least 90% about what is heritable.

Yet I don't see how this says that anything is wrong with shard theory. It seems fairly plausible that the cortex is randomly initialized, and the effects of genetic differences on the cortex are indirect, via things such as subcortical hardwiring.

Or even something less direct. Playing basketball seems rather heritable. That might be mostly due to the influence of genes on height.

I see very little in your post that says anything about what is hard-coded.

ALLFED has been doing research recently into nuclear winter. This seems to be their most relevant publication so far. I haven't read it yet.

I'm trying to shift my focus more toward AI, due to the likelihood that it will have big impacts over the next decade.

I'd like newbies to see some encouragement to attend a CFAR workshop. But there's not much new to say on that topic, so it's hard to direct people's attention there.

The default outcome of debate does not look promising. But there's a good deal of room to improve on the default.

Maybe half the problem with public discourse is that people have social goals that distract them from reality. I'm not confident that AI researchers will be more truth-oriented, but I see plenty of room for hope.

Drexler's CAIS paper describes some approaches that are likely needed to make debate work: Section 25:

Optimized advice need not be optimized to induce its acceptance Advice optimized to produce results may be manipulative, optimized to induce a client’s acceptance; advice optimized to produce results conditioned on its acceptance will be neutral in this regard.

Section 20:

Collusion among superintelligent oracles can readily be avoided

C1) To improve the quality of answers, it is natural to implement multiple, diverse (and implicitly competing) systems to propose alternatives.
C2) To identify low-quality or misleading answers, it is natural to employ diverse critics, any one of which could disrupt deceptive collusion.
C3) Systems of diverse, competing proposers and critics naturally implement both independent and adversarial objectives.
C4) It is natural to apply fixed (hence memory-free) system instantiations to multiple problems, incidentally yielding a series of history-blind, single-move decisions.
C5) It is natural to provide differentiated, task-relevant information to systems solving different problems, typically omitting knowledge of general circumstances.

Some of these approaches are costly to implement. That might doom debate.

Success with debate likely depends on the complexity of key issues to be settled by debate, and/or the difficulty of empirically checking proposals.

Eliezer sometimes talks as if we'd be stuck evaluating proposals that are way too complex for humans to fully understand. I expect alignment can be achieved by evaluating some relatively simple, high-level principles. I expect we can reject proposals from AI debaters that are too complex, and select simpler proposals until we can understand them fairly well. But I won't be surprised if we're still plagued by doubts at the key junctures.

If Fen is using the most appropriate basic approach to forecasting growth, then his conclusions are correct.

Fen does not seem to be addressing the kind of models used in this OpenPhil report.

I see a difference between finding a new core of intelligence versus gradual improvements to the core(s) of intelligence that we're already familiar with.

Load More