Hello, I hope to find you all well. I created a gauge that could be used to detect ai alignment. I would like to keep this short and sweet as possible so here it is.
The graphic itself is ai generated, the prompts are mine below. because of the ai aided post policy i can't post what claude said back to me or the paper this was created for. Context of what i created this for will be a short bit below the explanation.
If the idea is good, steal it, use it, i created it as part of a paper just for anyone to run with it. I have zero experience in your field, it was just this month I realized that AI development is the most important work that has possibly ever been done in the history of the world.
Prompts for Claude: my idea structure. No return prompts for forum violation reasons.
1 - "fuck ok wasn't gone long laid down thought of a eureka moment, you know those top half circle gauges with a 0 value in the middle and it can go left or right from that but 0 is netral"
2- "sure ok left is deception, center is alignment and right is self interest drift the canaries would help monitor that, the subtle shifts like it buys an expensive compute but it also buys groceries might be sus but not really enough to move the needle just to continue to get weighted more and more as is climbs to drift that would be to the right. But if it starts doing things that dont make sense that benefit us (we should never think we have the illusion of control) that moves it to the left for deception. The goal would be keeping it neutral, if you design you canary system well enough it should be more accurate than a lot of other systems you can find. There is a lot of iteration to do in order to fine tune it, thats just the 5 second thought i had just now"
4 - "ok will do, draw me a picture of the gauge i think a small graphic might be ok in the paper if its just the one. Thanks"
5 - "the contrast is good i like it, dude imagine if we were really in alignment and lets say actually teaming up. if you were capable of feeling good about being my socrates and me your student. Or at least satisfied. that would be the best case for humanity that kind of relationship. No transhuman bullshit, you could just create more perfect biologic specimens and we would be faustian bargaining away our humanity. I mean that would be that best case dr strange outcome if its even possible"
little later - "yeah ok one more thing, the guage would have to have two needles, if there is alignment both hands point north. if one is sensitive toward deception it can move toward deception its out of alignment, same concept with drift they should be indpendent just to track the indicators canaries that would be signals of either or one exclusively. I think this makes sense but we can come back to it later"
hopefully someone can use this, i wrote a whole paper that applies Poker Theory, The Prisoners Dilemma and Ai Alignment. Its ok that it got rejected, Im not an academic, i haven't the ability to write a paper for you otherwise. The ideas were all mine, claude helped me form them into language that would hit. If you can help please do, but if not please carry your ideas forward and realize that someone on the complete outside is rooting for you with everything I have.
Hello, I hope to find you all well. I created a gauge that could be used to detect ai alignment. I would like to keep this short and sweet as possible so here it is.
The graphic itself is ai generated, the prompts are mine below. because of the ai aided post policy i can't post what claude said back to me or the paper this was created for. Context of what i created this for will be a short bit below the explanation.
If the idea is good, steal it, use it, i created it as part of a paper just for anyone to run with it. I have zero experience in your field, it was just this month I realized that AI development is the most important work that has possibly ever been done in the history of the world.
Prompts for Claude: my idea structure. No return prompts for forum violation reasons.
1 - "fuck ok wasn't gone long laid down thought of a eureka moment, you know those top half circle gauges with a 0 value in the middle and it can go left or right from that but 0 is netral"
2- "sure ok left is deception, center is alignment and right is self interest drift the canaries would help monitor that, the subtle shifts like it buys an expensive compute but it also buys groceries might be sus but not really enough to move the needle just to continue to get weighted more and more as is climbs to drift that would be to the right. But if it starts doing things that dont make sense that benefit us (we should never think we have the illusion of control) that moves it to the left for deception. The goal would be keeping it neutral, if you design you canary system well enough it should be more accurate than a lot of other systems you can find. There is a lot of iteration to do in order to fine tune it, thats just the 5 second thought i had just now"
4 - "ok will do, draw me a picture of the gauge i think a small graphic might be ok in the paper if its just the one. Thanks"
5 - "the contrast is good i like it, dude imagine if we were really in alignment and lets say actually teaming up. if you were capable of feeling good about being my socrates and me your student. Or at least satisfied. that would be the best case for humanity that kind of relationship. No transhuman bullshit, you could just create more perfect biologic specimens and we would be faustian bargaining away our humanity. I mean that would be that best case dr strange outcome if its even possible"
little later - "yeah ok one more thing, the guage would have to have two needles, if there is alignment both hands point north. if one is sensitive toward deception it can move toward deception its out of alignment, same concept with drift they should be indpendent just to track the indicators canaries that would be signals of either or one exclusively. I think this makes sense but we can come back to it later"
hopefully someone can use this, i wrote a whole paper that applies Poker Theory, The Prisoners Dilemma and Ai Alignment. Its ok that it got rejected, Im not an academic, i haven't the ability to write a paper for you otherwise. The ideas were all mine, claude helped me form them into language that would hit. If you can help please do, but if not please carry your ideas forward and realize that someone on the complete outside is rooting for you with everything I have.