Prometheus

# Wiki Contributions

A man asks one of the members of the tribe to find him some kindling so that he may start a fire. A few hours pass, and the second man returns, walking with a large elephant.

“I asked for kindling.” Says the first.

“Yes.” Says the second.

“Where is it?” Asks the first, trying to ignore the large pachyderm in the room.

The second gestures at the elephant, grinning.

“That’s an elephant.”

“I see that you are uninformed. You see, elephants are quite combustible, despite their appearance. Once heat reaches the right temperature, its skin, muscles, all of it will burn. Right down to its bones.”

“What is the ignition temperature for an elephant”

“I don't know, perhaps 300-400°C”

The first held up two stones.

“This is all I have to start a fire.” He says, “It will only create a few sparks at best… I’m not even sure how I can get it to consistently do that much, given how hard this will be for people thousands of years from now to replicate.”

“That is the challenge.” The second nodded solemnly, “I’m glad you understand the scope of this. We will have to search for ways to generate sparks at 400° so that we can solve the Elephant Kindling Problem.”

“I think I know why you chose the elephant. I think you didn’t initially understand that almost everything is combustible, but only notice things are combustible once you pay enough attention to them. You looked around the Savana, and didn't understand that dry leaves would be far more combustible, and your eyes immediately went to the elephant. Because elephants are interesting. They’re big and have trunks. Working on an Elephant Problem just felt way more interesting than a Dry Leaves Problem, so you zeroed all of your attention on elephants, using the excuse that elephants are technically combustible, failing to see the elegant beauty in the efficient combustibility of leaves and their low ignition temperature.”

“Leaves might be combustible. But think of how fast they burn out. And how many you would have to gather to start a fire. An elephant is very big. It might take longer to get it properly lit, but once you do, you will have several tons of kindling! You could start any number of fires with it!”

“Would you have really made these conclusions if you had searched all the possible combustible materials in the Savana, instead of immediately focusing on elephants?”

“Listen, we can’t waste too much time on search. There are thousands of things in the Savana! If we tested the combustibility and ignition temperature of every single one of them, we’d never get around to starting any fires. Are elephants the most combustible things in the Universe? Probably not. But should I waste time testing every possible material instead of focusing on how to get one material to burn? We have finite time, and finite resources to search for combustible materials. It’s better to pick one and figure out how to do it well.”

“I still think you only chose elephants because they’re big and interesting.”

“I imagine that ‘big’ and ‘useful as kindling material’ are not orthogonal. We shouldn’t get distracted by the small, easy problems, such as how to burn leaves. These are low hanging fruit that anyone can pick. But my surveys of the tribe have found that figuring out the combustibility of elephants remains extremely neglected.”

“What about the guy who brought me a giraffe yesterday?”

“A giraffe is not an elephant! I doubt anything useful will ever come from giraffe combustibility. Their necks are so long that they will not even fit inside our caves!”

“What I am saying is that others have brought in big, interesting-looking animals, and tried to figure out how to turn them into kindling. Sure, no one else is working on the Elephant Kindling Problem. But that’s also what the guy with the giraffe said, and the zebra, and the python.”

“Excuse me,” Said a third, poking his head into the cave, “But the Python Kindling Problem is very different from the Elephant one. Elephants are too girthy to be useful. But with a python, you can roll it into a coil, which will make it extremely efficient kindling material.”

The second scratched his chin for a moment, looking a bit troubled.

“What if we combined the two?” He asked. “If we wound the python around a leg of the elephant, the heat could be transferred somewhat efficiently.”

“No, no, no.", argued the third, "I agree combining these two problems might be useful. But it would be far better to just cut the trunk off the elephant, and intertwine it with the python. This could be very useful, since elephant hide is very thick and might burn slower. This gives us the pros of a fast-burning amount of kindling, mixed with a more sustained blaze from the elephant.”

“Might I interject.” Said a fourth voice, who had been watching quietly from the corner, but now stepped forward, “I have been hard at work on the Giraffe Kindling problem, but think that we are actually working on similar things. The main issue has always been the necks. They simply won’t fit inside the cave. We need a solution that works in all edge cases, after all. If it’s raining, we can’t start a fire outside. But if we use the python and the elephant trunk to tie the neck of the giraffe against the rest of its body, we could fit the whole thing in!”

“I think this will be a very fruitful collaboration.” Said the second, “While at first it seemed as though we were all working on different problems, it turns out by combining them, we have found an elegant solution.”

“But we still can’t generate sparks hot enough to combust any of them!” Shouted the first, “All you’ve done is made this even more complicated and messy!”

“I am aware it might seem that way to a novice.” Said the second, “But we have all gained great knowledge in our own domains. And now it is time for our fields to evolve into a true science. We are not amateurs anymore, simply playing around with fire. We are now establishing expertise, creating sub-domains, arriving at a general consensus of the problem and its underlying structure! To an outsider, it will probably look daunting. But so does every scientific field once it matures. And we will continue to make new ground by standing on the shoulders of elephants!”

“Giraffes.” Corrected the fourth.

I strongly doubt we can predict the climate in 2100. Actual prediction would be a model that also incorporates the possibility of nuclear fusion, geoengineering, AGIs altering the atmosphere, etc.

I got into AI at the worst time possible

2023 marks the year AI Safety went mainstream. And though I am happy it is finally getting more attention, and finally has highly talented people who want to work in it; personally, it could not have been worse for my professional life. This isn’t a thing I normally talk about, because it’s a very weird thing to complain about. I rarely permit myself to even complain about it internally. But I can’t stop the nagging sensation that if I had just pivoted to alignment research one year sooner than I did, everything would have been radically easier for me.

I hate saturated industries. I hate hyped-up industries. I hate fields that constantly make the news and gain mainstream attention. This was one of the major reasons why I had to leave the crypto scene, because it had become so saturated with attention, grift, and hype, that I found it completely unbearable. Alignment and AGI was one of those things almost no one even knew about, and fewer even talked about, which made it ideal for me. I was happy with the idea of doing work that might never be appreciated or understood by the rest of the world.

Since 2015, I had planned to get involved, but at the time I had no technical experience or background. So I went to college, majoring in Computer Science. Working on AI and what would later be called “Alignment” was always the plan, though. I remember having a shelf in my college dorm, which I used to represent all my life goals and priorities: AI occupied the absolute top. My mentality, however, was that I needed to establish myself enough, and earn enough money, before I could transition to it. I thought I had all the time in the world.

Eventually, I got frustrated with myself for dragging my feet for so long. So in Fall 2022, I quit my job in cybersecurity, accepted a grant from the Long Term Future Fund, and prepared for spending a year of skilling-up to do alignment research. I felt fulfilled. When my brain normally nagged me about not doing enough, or how I should be working on something more important, I finally felt content. I was finally doing it. I was finally working on the Extremely Neglected, Yet Conveniently Super Important Thing.

And then ChatGPT came out two months later, and even my mother was talking about AI.

If I believed in fate, I would say it seems as though I was meant to enter AI and Alignment during the early days. I enjoy fields where almost nothing has been figured out. I hate prestige. I embrace the weird, and hate any field that starts worrying about its reputation. I’m not a careerist. I can imagine many alternative worlds where I got in early, maybe ~2012 (I’ve been around the typical lesswrong/rationalist/transhumanist group for my entire adult life). I’d get in, start to figure out the early stuff, identify some of the early assumptions and problems, and then get out once 2022/2023 came around. It’s the weirdest sensation to feel like I’m too old to join the field now, and also feel as though I’ve been part of the field for 10+ years. I’m pretty sure I’m just 1-2 Degrees from literally everyone in the field.

The shock of the field/community going from something almost no one was talking about to something even the friggin’ Pope is weighing in on is something I think I’m still trying to adjust to. Some part of me keeps hoping the bubble will burst, AI will “hit a wall”, marking the second time in history Gary Marcus was right about something, and I’ll feel as though the field will have enough space to operate in again. As it stands now, I don’t really know what place it has for me. It is no longer the Extremely Neglected, Yet Conveniently Super Important Thing, but instead just the Super Important Thing. When I was briefly running an AI startup (don’t ask), I was getting 300+ applicants for each role we were hiring in. We never once advertised the roles, but they somehow found them anyway, and applied in swarms. Whenever I get a rejection email from an AI Safety org, I’m usually told they receive somewhere in the range of 400-700 applications for every given job. That’s, at best, a 0.25% chance of acceptance: substantially lower than Harvard. It becomes difficult for me to answer why I’m still trying to get into such an incredibly competitive field, when literally doing anything else would be easier. “It’s super important” is not exactly making sense as a defense at this point, since there are obviously other talented people who would get the job if I didn’t.

I think it’s that I could start to see the shape of what I could have had, and what I could have been. It’s vanity. Part of me really loved the idea of working on the Extremely Neglected, Yet Conveniently Super Important Thing. And now I have a hard time going back to working on literally anything else, because anything else could never hope to be remotely as important. And at the same time, despite the huge amount of new interest in alignment, and huge number of new talent interested in contributing to it, somehow the field still feels undersaturated. In a market-driven field, we would see more jobs and roles growing as the overall interest in working in the field did, since interest normally correlates with growth in consumers/investors/etc. Except we’re not seeing that. Despite everything, by most measurements, there seems to still be fewer than 1000 people working on it fulltime, maybe as low as ~300, depending on what you count.

So I oscillate between thinking I should just move on to other things; and thinking I absolutely should be working on this at all cost. It’s made worse by sometimes briefly doing temp work for an underfunded org, sometimes getting to the final interview stage for big labs, and overall thinking that doing the Super Important Thing is just around the corner… and for all I know, it might be. It’s really hard for me to tell if this is a situation where it’s smart for me to be persistent in, or if being persistent is dragging me ever-closer to permanent unemployment, endless poverty/homelessness/whatever-my-brain-is-feeling-paranoid-about… which isn’t made easier by the fact that, if the AI train does keep going, my previous jobs in software engineering and cybersecurity will probably not be coming back.

Not totally sure what I’m trying to get out of writing this. Maybe someone has advice about what I should be doing next. Or maybe, after a year of my brain nagging me each day about how I should have gotten involved in the field sooner, I just wanted to admit that: despite wanting the world to be saved, despite wanting more people to be working on the Extremely Neglected, Yet Conveniently Super Important Thing, some selfish, not-too-bright, vain part of me is thinking “Oh, great. More competition.”

It probably began training in January and finished around early April. And they're now doing evals.

My birds are singing the same tune.

Prometheus5-17

Going to the moon

Say you’re really, really worried about humans going to the moon. Don’t ask why, but you view it as an existential catastrophe. And you notice people building bigger and bigger airplanes, and warn that one day, someone will build an airplane that’s so big, and so fast, that it veers off course and lands on the moon, spelling doom. Some argue that going to the moon takes intentionality. That you can’t accidentally create something capable of going to the moon. But you say “Look at how big those planes are getting! We've gone from small fighter planes, to bombers, to jets in a short amount of time. We’re on a double exponential of plane tech, and it's just a matter of time before one of them will land on the moon!”

Contra Scheming AIs

There is a lot of attention on mesaoptimizers, deceptive alignment, and inner misalignment. I think a lot of this can fall under the umbrella of "scheming AIs". AIs that either become dangerous during training and escape, or else play nice until humans make the mistake of deploying them. Many have spoken about the lack of an indication that there's a "humanculi-in-a-box", and this is usually met with arguments that we wouldn't see such things manifest until AIs are at a certain level of capability, and at that point, it might be too late, making comparisons to owl eggs, or baby dragons. My perception is that getting something like a "scheming AI" or "humanculi-a-box" isn't impossible, and we could (and might) develop the means to do so in the future, but that it's a very, very different kind of thing than current models (even at superhuman level), and that it would take a degree of intentionality.

"To the best of my knowledge, Vernor did not get cryopreserved. He has no chance to see the future he envisioned so boldly and imaginatively. The near-future world of Rainbows End is very nearly here... Part of me is upset with myself for not pushing him to make cryonics arrangements. However, he knew about it and made his choice."

I agree that consequentialist reasoning is an assumption, and am divided about how consequentialist an ASI might be. Training a non-consequentialist ASI seems easier, and the way we train them seems to actually be optimizing against deep consequentialism (they're rewarded for getting better with each incremental step, not for something that might only be better 100 steps in advance). But, on the other hand, humans don't seem to have been heavily optimized for this either*, yet we're capable of forming multi-decade plans (even if sometimes poorly).

*Actually, the Machiavellian Intelligence Hypothesis does seem to be optimizing consequentialist reasoning (if I attack Person A, how will Person B react, etc.)

This is the kind of political reasoning that I've seen poisoning LW discourse lately and gets in the way of having actual discussions. Will posits essentially an impossibility proof (or, in it's more humble form, a plausibility proof). I humor this being true, and state why the implications, even then, might not be what Will posits. The premise is based on alignment not being enough, so I operate on the premise of an aligned ASI, since the central claim is that "even if we align ASI it may still go wrong". The premise grants that the duration of time it is aligned is long enough for the ASI to act in the world (it seems mostly timescale agnostic), so I operate on that premise. My points are not about what is most likely to actually happen, the possibility of less-than-perfect alignment being dangerous, the AI having other goals it might seek over the wellbeing of humans, or how we should act based on the information we have.

Prometheus-1-2

I'm not sure who are you are debating here, but it doesn't seem to be me.

First, I mentioned that this was an analogy, and mentioned that I dislike even using them, which I hope implied I was not making any kind of assertion of truth. Second, "works to protect" was not intended to mean "control all relevant outcomes of". I'm not sure why you would get that idea, but that certainly isn't what I think of first if someone says a person is "working to protect" something or someone. Soldiers defending a city from raiders are not violating control theory or the laws of physics. Third, the post is on the premise that "even if we created an aligned ASI", so I was working with that premise that the ASI could be aligned in a way that it deeply cared about humans. Four, I did not assert that it would stay aligned over time... the story was all about the ASI not remaining aligned. Five, I really don't think control theory is relevant here. Killing yourself to save a village does not break any laws of physics, and is well within most human's control.

My ultimate point, in case it was lost, was that if we as human intelligences could figure out an ASI would not stay aligned, an ASI could also figure it out. If we, as humans, would not want this (and the ASI was aligned with what we want), then the ASI presumably would also not want this. If we would want to shut down an ASI before it became misaligned, the ASI (if it wants what we want) would also want this.

None of this requires disassembling black holes, breaking the laws of physics, or doing anything outside of that entities' control.