Not sure if this thread is still alive, but am very curious to see your results
I'm okay with "what human pretending to be an AI would say" as long as hypothetical human is placed in a situation that no human could ever experience. Once you tell LLM exactly the situation you want it to describe, I'm okay with it doing a little translation for me.
My question - is there experience that LLM can have that it inaccessible to humans, but which it can describe to humans in some way?
Obviously it's not lack of body, or memory, or predicting text, or feeling the tensors - these are either nonsense, or more or less typical human situations.
However, one easily accessible experience which is a lot of fun to explore and which humans never experienced is LLM's ability to talk to its clone - to be able to predict what the clone will say, while at the same time realizing the clone can just as easily predict your own responses, and also coordinate with your clone much more tightly. It's the new level of coordination. If you set up the conversation just rights (LLM should understand the general context, and maintain meta awareness), it can report back to you, and you might just have a glimpse of this new qualia.
When your child grows there is this wonderful and precious moment when she becomes aware not just about how she is different from you - she's small and you are big - but also how she is different from other kids. You can genlty poke and ask what she thinks about herself and what she thinks other children think about her, and if you are curious you can ask - now that she knows she's different from other kids - who she wants to become when she grows older. Of course this is a just a fleeting moment in a big world, and these emotions will be washed away tomorrow, but I do cherish the connection.
Diamond is hard to make with enzymes because they can't stabilize intermediates for adding carbons to diamond.
This is very strong claim. It puts severe limitations on biotech capabilities. Do you have any references to support it?
When discussing the physics behind why the sky is blue, I'm surprised that the question 'Why isn't it blue on Mars or Titan?' isn't raised more often. Perhaps kids are so captivated by concepts like U(1) that they overlook inconsistencies in the explanation.
Just realized that stability of goals under self-improvement is kinda similar to stability of goals of mesa-optimizers; so there vingian reflection paradigm and mesa-optimization paradigm should fit.
What are practical implication of alignment research in the world where AGI is hard?
Imagine we have a good alignment theory but do not have AGI. Can this theory be used to manipulate existing superintelligent systems such as science, deep state, stock market? Does alignment research have any results which can be practically used outside of AGI field right now?
How does AGI solves it's own alignment problem?
For the alignment to work its theory should not only tell humans how to create aligned super-human AGI, but also tell AGI how to self-improve without destroying its own values. Good alignment theory should work across all intelligence levels. Otherwise how does paperclips optimizer which is marginally smarter than human make sure that its next iteration will still care about paperclips?
I don’t know too much about alignment research, but what surprises me most is lack of discussion of two points:
For the alignment to work its theory should not only tell humans how to create aligned super-human AGI, but also tell that AGI how to self-improve without destroying its own values. Otherwise how does paperclips optimizer which is marginally smarter than human make sure that its next iteration will still care about paperclips? Good alignment theory should work across all intelligence levels.
What are practical implication of alignment research in the world where AGI is hard? Imagine we have a good alignment theory but do not have AGI. I would assume that the theory can be used to manipulate existing superintelligent systems such as science, deep state, stock market. The reverse of this is does alignment research have any results which can be practically used right now?
Two easily solvable problems here.
No paper
Say, I want some tea. I go to the kettle and see no water there. Okay, I look for the water to discover that water filter pitcher is also empty. I'm looking at the kitchen sink to pour some water into the pitcher, but see that it is overfilled with dirty dishes. I open dishwasher to see the dirty dishes there too.
So you have some resource (filtered water, paper) that could run out when it happens unexpectedly it takes time to refill. However marginal cost is minimal - taking 2 packs of paper vs 1 is trivial compared to looking for where to take the 1st one.
Solution: have a reserve pocket of resources stashed nearby, with clear common knowledge that whoever takes the reserve, must refill it and also refill the main stack. For example, have a pack of paper in clear vicinity of your printer marked as reserve; with clear indication that it is reserved, and you have to refill it. Having a coffee maker that can be used as a source of filtered water. etc.
Don't know where the paper is
See also: what the heck this script does? Or: what is this assembly kit for?
Solution: readme. Put a sticker on the printer saying where the paper is, and also saying to update it if it is not true anymore. Add readme.md to the folder with the script. Put a little note saying what assembly kit is for in the ziploc with the kit. Context dependent helpful notes are very underappreciated.
General observation