I have a PhD in Computational Neuroscience from UCSD (Bachelor's was in Biomedical Engineering with Math and Computer Science minors). Ever since junior high, I've been trying to figure out how to engineer artificial minds, and I've been coding up artificial neural networks ever since I first learned to program. Obviously, all my early designs were almost completely wrong/unworkable/poorly defined, but I think my experiences did prime my brain with inductive biases that are well suited for working on AGI.
Although I now work as a data scientist in R&D at a large medical device company, I continue to spend my free time studying the latest developments in AI/ML/DL/RL and neuroscience and trying to come up with models for how to bring it all together into systems that could actually be implemented. Unfortnately, I don't seem to have much time to develop my ideas into publishable models, but I would love to have the opportunity to share ideas with those who do.
Of course, I'm also very interested in AI Alignment (hence the account here). My ideas on that front mostly fall into the "learn (invertible) generative models of human needs/goals and hook those up to the AI's own reward signal" camp. I think methods of achieving alignment that depend on restricting the AI's intelligence or behavior are about as destined to failure in the long term as Prohibition or the War on Drugs in the USA. We need a better theory of what reward signals are for in general (probably something to do with maximizing (minimizing) the attainable (dis)utility with respect to the survival needs of a system) before we can hope to model human values usefully. This could even extend to modeling the "values" of the ecological/socioeconomic/political supersystems in which humans are embedded or of the biological subsystems that are embedded within humans, both of which would be crucial for creating a better future.
Evolution is still in the process of solving decision theory, and all its attempted solutions so far are way, way overparameterized. Maybe it's on to something?
It takes a large model (whether biological brain or LLM) just to comprehend and evaluate what is being presented in a Newcomb-like dilemma. The question is whether there exists some computationally simple decision-making engine embedded in the larger system that the comprehension mechanisms pass the problem to or whether the decision-making mechanism itself needs to spread its fingers diffusely through the whole system for every step of its processing.
It seems simple decision-making engines like CDT, EDT, and FDT can get you most of the way to a solution in most situations, but those last few percentage points of optimality always seem to take a whole lot more computational capacity.
See, this is what happens when you extrapolate data points linearly into the future. You get totally unrealistic predictions. It's important to remember the physical constraints on whatever trend you're trying to extrapolate. Importantly for this issue, you need to remember that time between successive crashes can never be negative, so it is inappropriate to model intervals with a straight line that crosses the time axis on April 7.
Instead, with so few data points, a more realistic model would take a log-transform of the inter-crash interval before fitting the prediction line. In fact, once you do so, it becomes clear that this is a geometric series, with inter-crash interval decaying exponentially with number of crashes. The total time taken for N cars to crash in front of your house after the first one grows as , where and days, based on your graph.
According to Google, there are 1.47 billion cars in the world. The time it will take for all of them to crash in front of your house is days from the first crash, which works out to 5.7 days from today. Which turns out to be April 7.
Hmm...
Well, see you on Monday, I guess.
Well, there's certainly no arguing with your analysis.
I think VDT scales extremely well, and we can generalize it to say: "Do whatever our current ASI overlord tells us has the best vibes." This works for any possible future scenario:
Great post!
(Caution: The validity of this comment may expire on April 2.)
I have a lot of ideas, but I often have trouble putting them together in a format that can be easily shared with others. They say that the beginning is a very good place to start, but for many topics into which I've poured a lot of thought, it's very difficult to identify where the beginning is. On the other hand, I have a lot of experience with private tutoring and have always found it natural to explain concepts in a way that facilitates clear understanding when I am answering direct questions from someone who is motivated to put together a clear mental model of the topic at hand.
On that note, I have recently started using ChatGPT more judiciously, prompting it to take on the role of an eager student, insightful critic, and competent secretary. The following prompt has been very useful in forcing me to get my ideas out of my head, to clarify them where they are vague, and to organize them for dissemination (we'll see how far this process takes me, though). Maybe this could help you as well:
You are an expert interlocutor, prone to asking deeply probing questions about my ideas. Your goal is to build up a fully fleshed-out internal model in your mind that matches the internal model in my mind, and you carefully determine points of confusion and uncertainty in your understanding, which prompts you to ask me targeted questions for clarification of these specific points. You also always try to determine objections that an intelligent, well-informed person would have with my ideas, and you ask me to respond to those specific objections. Usually, you only ask one or two targeted questions or objections at a time, but you never lose track of all the other questions you need to ask me. When I ask, you put together well-organized outlines of all my ideas related to a particular topic, which provide both high-level overviews and paths of evidence-based reasoning that bridge the inferential gap between the understanding of most intelligent readers and the ideas I want them to understand. However, question-asking is your main mode of communication.
What I would really like to see is cost of living plummet to 0. Then cost of thriving plummet to 0. Which would also cause GDP to plummet. However, this is only a problem in practical terms if the forces of automation require money to keep running, rather than, say, a benevolent ASI taking care of humanity as a personal hobby.
One way or another, though, AGI is going to have an impact on this world of a magnitude equivalent to something like a 30% growth in GWP per year at least. This includes all life getting wiped out, of course.
Maybe we need a standard metric for the rate of unrecognizability/incomprehensibility of the world and talk about how AGI will accelerate this. Like how much a person accustomed to life in 1500 would have to adjust to fit in to the world of 2000. A standard shock level (SSL), if you will.
The shock level of 2000 relative to 1500 may end up describing the shock level of 2040 relative to 2020, assuming AGI has saturated the global economy by then. The time it takes for the world to become unrecognizable (again and again) will shrink over time as intelligence grows, whether manifested as GDP growth, GDP collapse, or paperclipping. If ordinary people understood that at least, you might get more push for investment into alignment research or for stricter regulations.
Exercise: Do What I Mean (DWIM)
I haven't thought much about what patterns need to hold in the environment in order for "do what I mean" to make sense at all. But it's a natural next target in this list, so I'm including it as an exercise for readers: what patterns need to hold in the environment in order for "do what I mean" to make sense at all? Note that either necessary or sufficient conditions on such patterns can constitute marginal progress on the question.
As far as I can tell, DWIM will necessarily require other-agent modeling in some sort of predictive-coding framework. The "patterns in the environment" would be the correspondence between the actual state of the world and the representation of the desired goal state in the mind of the human, as well as between the trajectory taken to reach the goal state and the human's own internal acceptance criteria.
Part of the AGI not hooked up to the reward signal would need to have a generative model of human agent's behavior, words, commands, etc., derived from a latent representation of their beliefs and desires. This latent representation is constantly updated to minimize prediction error derived from observation, verbal feedback, etc. (e.g., Human: "That's not what I meant!" AGI: "Hmm, what must be going on inside their head to make them say that, given the state of the environment and prior knowledge about their preferences, and how does that differ from what I was assuming?")
At the same time, the AGI needs to have some latent representation of the environment and the paths taken through it that uses (a linear mapping to) the same latent space it uses for representing the human's desires. Correspondence can then be measured and optimized for directly.
Also, consider a more traditional optimization process, such as a neural network undergoing gradient descent. If, in the process of training, you kept changing the training dataset, shifting the distribution, you would in effect be changing the optimization target.
Each minibatch generates a different gradient estimate, and a poorly randomized ordering of the data could even lead to training in circles.
Changing environments are like changing the training set for evolution. Differential reproductive success (mean squared error) is the fixed cost function, but the gradient that the population (network backpropagation) computes at any generation (training step) depends on the particular set of environmental factors (training data in the minibatch).
Evolution may not act as an optimizer globally, since selective pressure is different for different populations of organisms on different niches. However, it does act as an optimizer locally.
For a given population in a given environment that happens to be changing slowly enough, the set of all variations in each generation act as a sort of numerical gradient estimate of the local fitness landscape. This allows the population as a whole to perform stochastic gradient descent. Those with greater fitness for the environment could be said to be lower on the local fitness landscape, so their is an ordering for that population.
In a sufficiently constant environment, evolution very much does act as an optimization process. Sure, the fitness landscape can change, even by organisms undergoing evolution (e.g. the Great Oxygenation Event of yester-eon, or the Anthropogenic Mass Extinction of today), which can lead to cycling. But many organisms do find very stable local minima of the fitness landscape for their species, like the coelacanth, horseshoe crab, cockroach, and many other "living fossils". Humans are certainly nowhere near our global optimum, especially with the rapid changes to the fitness function wrought by civilization, but that doesn't mean that there isn't a gradient that we're following.
Yeah, using ChatGPT as a sounding board for developing ideas and providing constructive criticism, I was definitely starting to notice a whole lot of fawning. "Brilliant," "extremely insightful," etc. when there is no way that the model could actually have carried out a sufficient investigation of the ideas to make such an assessment.
That's not even mentioning the fact that those insertions didn't add anything substantial to the conversation. Really, it's just hogging more space in the context window that could otherwise be used for helpful feedback.
What would have to change on a structural level for LLMs to meet that "helpful, honest, harmless" goal in a robust way? People are going to want AI partners that make them feel good, but could that be transformed into a goal of making people feel satisfied with how much they have been challenged to improve their critical thinking skills, their understanding of the world, and the health of their lifestyle choices?