I recently interviewed with Epoch, and as part of a paid work trial they wanted me to write up a blog post about something interesting related to machine learning trends. This is what I came up with:
I should point out that the logic of the degrowth movement follows from a relatively straightforward analysis of available resources vs. first world consumption levels. Our world can only sustain 7 billion human beings because the vast majority of them live not at first world levels of consumption, but third world levels, which many would argue to be unfair and an unsustainable pyramid scheme. If you work out the numbers, if everyone had the quality of life of a typical American citizen, taking into account things like meat consumption to arable land, energy usage, etc., then the Earth would be able to sustain only about 1-3 billion such people. Degrowth thus follows logically if you believe that all the people around the world should eventually be able to live comfortable, first world lives.
I'll also point out that socialism is, like liberalism, a child of the Enlightenment and general beliefs that reason and science could be used to solve political and economic problems. Say what you will about the failed socialist experiments of the 20th century, but the idea that government should be able to engineer society to function better than the ad-hoc arrangement that is capitalism, is very much an Enlightenment rationalist, materialist, and positivist position that can be traced to Jean-Jacques Rousseau, Charles Fourier, and other philosophes before Karl Marx came along and made it particularly popular. Marxism in particular, at least claims to be "scientific socialism", and historically emphasized reason and science, to the extent that most Marxist states were officially atheist (something you might like given your concerns about religions).
In practice, many modern social policies, such as the welfare state, Medicare, public pensions, etc., are heavily influenced by socialist thinking and put in place in part as a response by liberal democracies to the threat of the state socialist model during the Cold War. No country in the world runs on laissez-faire capitalism, we all utilize mixed market economies with varying degrees of public and private ownership. The U.S. still has a substantial public sector, just as China, an ostensibly Marxist Leninist society in theory, has a substantial private sector (albeit with public ownership of the "commanding heights" of the economy). It seems that all societies in the world eventually compromised in similar ways to achieve reasonably functional economies balanced with the need to avoid potential class conflict. This convergence is probably not accidental.
If you're truly more concerned with truth seeking than tribal affiliations, you should be aware of your own tribe, which as far as I can tell, is western, liberal, and democratic. Even if you honestly believe in the moral truth of the western liberal democratic intellectual tradition, you should still be aware that it is, in some sense, a tribe. A very powerful one that is arguably predominant in the world right now, but a tribe nonetheless, with its inherent biases (or priors at least) and propaganda.
Just some thoughts.
I'm using the number calculated by Ray Kurzweil for his book, the Age of Spiritual Machines from 1999. To get that figure, you need 100 billion neurons firing every 5 ms, or 200 Hz. That is based on the maximum firing rate given refractory periods. In actuality, average firing rates are usually lower than that, so in all likelihood the difference isn't actually six orders of magnitude. In particular, I should point out that six orders of magnitude is referring to the difference between this hypothetical maximum firing brain and the most powerful supercomputer, not the most energy efficient supercomputer.
The difference between the hypothetical maximum firing brain and the most energy efficient supercomputer (at 26 GigaFlops/watt) is only three orders of magnitude. For the average brain firing at the speed that you suggest, it's probably closer to two orders of magnitude. Which would mean that the average human brain is probably one order of magnitude away from the Landauer limit.
This also assumes that its neurons and not synapses that should be the relevant multiplier.
Okay, so I contacted 80,000 hours, as well as some EA friends for advice. Still waiting for their replies.
I did hear from an EA who suggested that if I don't work on it, someone else who is less EA-aligned will take the position instead, so in fact, it's slightly net positive for myself to be in the industry, although I'm uncertain whether or not AI capability is actually funding constrained rather than personal constrained.
Also, would it be possible to mitigate the net negative by choosing to deliberately avoid capability research and just take an ML engineering job at a lower tier company that is unlikely to develop AGI before others and just work on applying existing ML tech to solving practical problems?
I previously worked as a machine learning scientist but left the industry a couple of years ago to explore other career opportunities. I'm wondering at this point whether or not to consider switching back into the field. In particular, in case I cannot find work related to AI safety, would working on something related to AI capability be a net positive or net negative impact overall?
Even further research shows the most recent Nvidia RTX 3090 is actually slightly more efficient than the 1660 Ti, at 36 TeraFlops, 350 watts, and 2.2 kg, which works out to 0.0001 PetaFlops/Watt and 0.016 PetaFlops/kg. Once again, they're within an order of magnitude of the supercomputers.
So, I did some more research, and the general view is that GPUs are more power efficient in terms of Flops/watt than CPUs, and the most power efficient of those right now is the Nvidia 1660 Ti, which comes to 11 TeraFlops at 120 watts, so 0.000092 PetaFlops/Watt, which is about 6x more efficient than Fugaku. It also weighs about 0.87 kg, which works out to 0.0126 PetaFlops/kg, which is about 7x more efficient than Fugaku. These numbers are still within an order of magnitude, and also don't take into account the overhead costs of things like cooling, case, and CPU/memory required to coordinate the GPUs in the server rack that one would assume you would need.
I used the supercomputers because the numbers were a bit easier to get from the Top500 and Green500 lists, and I also thought that their numbers include the various overhead costs to run the full system, already packaged into neat figures.
Another thought is that maybe Less Wrong itself, if it were to expand in size and become large enough to roughly represent humanity, could be used as such a dataset.
So, I had a thought. The glory system idea that I posted about earlier, if it leads to a successful, vibrant democratic community forum, could actually serve as a kind of dataset for value learning. If each post has a number attached to it that indicates the aggregated approval of human beings, this can serve as a rough proxy for a kind of utility or Coherent Aggregated Volition.
Given that individual examples will probably be quite noisy, but averaged across a large amount of posts, it could function as a real world dataset, with the post content being the input, and the post's vote tally being the output label. You could then train a supervised learning classifier or regressor that could then be used to guide a Friendly AI model, like a trained conscience.
This admittedly would not be provably Friendly, but as a vector of attack for the value learning problem, it is relatively straightforward to implement and probably more feasible in the short-run than anything else I've encountered.
The average human lifespan is about 70 years or approximately 2.2 billion seconds. The average human brain contains about 86 billion neurons or roughly 100 trillion synaptic connections. In comparison, something like GPT-3 has 175 billion parameters and 500 billion tokens of data. Assuming very crudely weight/synapse and token/second of experience equivalence, we can see that the human model's ratio of parameters to data is much greater than GPT-3, to the point that humans have significantly more parameters than timesteps (100 trillion to 2.2 billion), while GPT-3 has significantly fewer parameters than timesteps (175 billion to 500 billion). Given the information gain per timestep is different for the two models, but as I said, these are crude approximations meant to convey the ballpark relative difference.
This means basically that humans are much more prone to overfitting the data, and in particular, memorizing individual data points. Hence why humans experience episodic memory of unique events. It's not clear that GPT-3 has the capacity in terms of parameters to memorize its training data with that level of clarity, and arguably this is why such models seem less sample efficient. A human can learn from a single example by memorizing it and retrieving it later when relevant. GPT-3 has to see it enough times in the training data for SGD to update the weights sufficiently that the general concept is embedded in the highly compressed information model.
It's thus, not certain whether or not existing ML models are sample inefficient because of the algorithms being used, or if its because they just don't have enough parameters yet, and increased efficiency will emerge from scaling further.