capybaralet

Comments

Any work on honeypots (to detect treacherous turn attempts)?

I strongly disagree.  
I think this is emblematic of the classic AI safety perspective/attitude, which has impeded and discouraged practical progress towards reducing AI x-risk by supporting an unnecessary and misleading emphasis on "ultimate solutions" that address the "arbitrarily intelligent agent trapped in a computer" threat model.
This is an important threat model, but it is just one of many.

My question is inspired by the situation where a scaled up GPT-3-like model is fine-tuned using RL and/or reward modelling.  In this case, it seems like we can honeypot the model during the initial training and have a good chance of catching it attempting a premature treacherous turn.  Whether or not the model would attempt a premature treacherous turn seems to depend on several factors.  
A hand-wavy argument for this strategy working is: an AI should conceive of the treacherous turn strategy before the honeypot counter-strategy because a counter-strategy presupposes the strategy it counters.

There are several reasons that make this not a brilliant research opportunity. Firstly, what is and is not a honeypot is sensitively dependant on the AI's capabilities and situation. There is no such thing as a one size fits all honeypot. 

I am more sympathetic to this argument, but it doesn't prevent us from doing research that is limited to specific situations.  It also proves to much, since combining this line of reasoning with no free lunch arguments would seem to invalidate all of machine learning.

Tips for the most immersive video calls

Any tipe for someone who's already bought the C920 and isn't happy with the webcam on their computer?  (e.g. details on the 2 hour process :P)

Has anyone researched specification gaming with biological animals?

There are probably a lot of things that people do with animals that can be viewed as "automatic training", but I don't think people are viewing them this way, or trying to create richer reward signals that would encourage the animals to demonstrate increasingly impressive feats of intelligence.

Industrial literacy

The claim I'm objecting to is:

all soil loses its fertility naturally over time


I guess your interpretation of "naturally" is "when non-sustainably farmed"? ;) 

My impression is that we know how to keep farmland productive without using fertilizers by rotating crops, letting fields lie fallow sometimes, and involving fauna.  Of course, this might be much less efficient than using synthetic fertilizers, so I'm not saying that's what we should be doing. 

Is there any work on incorporating aleatoric uncertainty and/or inherent randomness into AIXI?

Is there a reference for this?

I was inspired to think of this by this puzzle (which I interpret as being about the distinction between epistemic and aleatoric uncertainty):

"""
"To present another example, suppose that five tosses of a given coin are planned and that the agent has equal strength of belief for two outcomes, both beginning with H, say the outcomes HTTHT and HHTTH. Suppose the first toss is made, and results in a head. If all that the agent learns is that a head occurred on the first toss it seems unreasonable for him to move to a greater confidence in the occurrence of one sequence rather than another. The only thing he has found out is something which is logically implied by both propositions, and hence, it seems plausible to say, fails to differentiate between them.

This second example might be challenged along the following lines: The case might be one in which initially the agent is moderately confident that the coin is either biased toward heads or toward tails. But he has as strong a belief that the bias is the one way as the other. So initially he has the same degree of confidence that H will occur as that T will occur on any given toss, and so, by symmetry considerations, an equal degree of confidence in HTTHT and HHTTH. Now if H is observed on the first toss it is reasonable for the agent to have slightly more confidence that the coin is biased toward heads than toward tails. And if so it might seem he now should have more confidence that the sequence should conclude with the results HTTH than TTHT because the first of these sequence has more heads in it than tails."

Which is right?
"""

What's striking to me is that the 2nd argument seems clearly correct, but only seems to work if you make a distinction between epistemic and aleatoric uncertainty, which I don't think AIXI does.  So that makes me wonder if it's doing something wrong (or if people who use Beta distributions to model coin flips are(!!))


 

Weird Things About Money

I really like this.  I read part 1 as being about the way the economy or society implicitly imposes additional pressures on individuals' utility functions.  Can you provide a reference for the theorem that Kelly betters predominate?

EtA: an observation: the arguments for expected value also assume infinite value is possible, which (module infinite ethics style concerns, a significant caveat...) also isn't realistic. 


 

AGI safety from first principles: Control

Which previous arguments are you referring to?

Industrial literacy

That the food you eat is grown using synthetic fertilizers, and that this is needed for agricultural productivity, because all soil loses its fertility naturally over time if it is not deliberately replenished.

This claim doesn't make sense.  If it were true, plants would not have survived to the present day.

Steelmanning (which I would say OP doesn't do a good job of...), I'll interpret this as: "we are technologically reliant on synthetic fertilizers to grow enough food to feed the current population".  But in any case, there are harmful environmental consequences to our current practice that seem somewhat urgent to address: https://en.wikipedia.org/wiki/Haber_process#Economic_and_environmental_aspects

capybaralet's Shortform

Some possible implications of more powerful AI/technology for privacy:

1) It's as if all of your logged data gets poured over by a team of super-detectives to make informed guesses about every aspect of your life, even those that seem completely unrelated to those kinds of data.

2) Even data that you try to hide can be read from things like reverse engineering what you type based on the sounds of you typing, etc.

3) Powerful actors will deploy advanced systems to model, predict, and influence your behavior, and extreme privacy precautions starting now may be warranted.

4) On the other hand, if you DON'T have a significant digital footprint, you may be significantly less trustworthy.  If AI systems don't know what to make of you, you may be the first up against the wall (compare with seeking credit without a having credit history).
 
5) On the other other hand ("on the foot"?), if you trust that future societies will be more enlightened, then you may be retroactively rewarded for being more enlightened today.

Anything important I left out?

Load More