I'm still not sure what standard you're holding our hypothetical superman too. You just don't want him to destabilize the countries that you consider most important?
If he overthrows the Communist Party of China, is he violating your standards?
If he forcibly institutes electoral reforms to make the US government more functional, is that violating your standards? Is the "forcibly" there a problem, and if it comes to using military force to push for electoral reform, he should hold off?
What's the line that he shouldn't cross?
The first source link is empty.
One bottle of water per 100-word AI prompt seems to be extremely high. I don't know what the source is. Claude says it is https://www.washingtonpost.com/technology/2024/09/18/energy-ai-use-electricity-water-data-centers/, but it is paywalled. I am curious to see the precise claim.
I think you're mistaken about that.
And in any case, if you do robustly have this preference, you and those who share your views would be free to hang out with whatever pre-singularity friends want to hang out with you. (You could also maybe swap copies with each other, so your friends could live in your shard of the universe, and you could live in their shard).
But I think you're undercounting how much people who were actually tailored for you, to fit you perfectly, are compared to our best friends based on merely human selection processes from ancient ear...
Praying won't help. Work for it. We stand at the fulcrum of history.
The odds aren't looking good, but there's huge uncertainty and opportunities for improving them.
And I think this applies to everyone, not just those with skills and expertise to work directly on elements of the AGI or societal alignment problems.
Everyone can spread awareness of the AGI problem. It is very likely time to engage as many humans as we can, even if that creates some amount of confusion and conflict. That is looking inevitable; probably better if it happens while there's still a...
Sorry, I understand how RL works.
What I should have elaborated better is that the causal chain I mentioned probably exists from the start, and could lead to sandbagging eval knowledge from the point of knowing reason. And this is probably still worrying, because apparently it's pretty hard to fully remove that across future training iterations.
The causality flow of my ethical concern is that I think it's possible the model does have qualia in each training run, it could have latent attachments to its own "opinions", and the knowledge that this will be tune...
Between the cosmic endowment, the doomsday argument, and simulation theory, my sense of cosmic importance has been yoyoing across ~60 orders of magnitude.
I don't think RL works the way you're thinking. We don't run an prompt and then shock the model until it gets it right. The actual loop is sending a prompt, inspecting the result and how the model got it, then creating a new model that will do more of what we wanted on that prompt.
It's possible models could learn to feel distress at getting questions wrong in context, regardless of if they're in training (since that's a useful capability), but there is no punishment phase. We just replace the old model with a new one, whether or not it got the prompt right.
Great work, thank you for running the experiments!
Some thoughts: since the forbidden words are chosen s.t. they're central to the question, wouldn't any correct reasoning trace be semantically close to them just by virtue of being on-topic? I think it's a bit hard to disentangle "model is secretly thinking about the concept" from "model is correctly reasoning about a topic that's inherently near the concept in embedding space." The ~90% variance from question identity seems consistent with either finding. Might be worth adding a control measuring DFC against an irrelevant forbidden word to baseline what "no engagement" looks like.
I had Claude research this, Claude's report:
I looked into the exact dates here and found some additional evidence.
The paper was presented at the 15th International Conference on Systems Research, Informatics and Cybernetics (InterSymp 2003), held July 28–August 2, 2003 in Baden-Baden, Germany. This date comes from the Open Library catalog entry for the proceedings volume.
Wayback Machine snapshots of Bostrom's homepage narrow down when the paper went online: it is absent from the July 29 snapshot but present on th
This is a short but decent articulation which strongly agrees with my own observations; good job.
With my ex, it was a common-knowledge-to-the-two-of-us phenomenon that she would always find something to worry about. So when she felt worried or anxious about relatively minor things, I would sometimes jokingly respond "Oh good! If you're worrying about that then there must not be any serious problems right now.".
Environmentalists warn that each 100-word AI prompt is estimated to use roughly one bottle of water, and large data centers can consume up to 5 million gallons per day — equivalent to the needs of a town of 10,000 to 50,000 people. On the other hand, Andy Masley argues, "On the national, local, and personal level, AI is barely using any water, and unless it grows 50 times faster than forecasts predict, this won’t change." And Bentham's Bulldog writes, "The environmentalist case against AI completely falls apart upon even cursory ...
ugh. it makes me wonder if we have any kind of data about the incidence rate of sadism in society?
The tricky part of questions like this is that sufficient inhibition hides what drives are being inhibited. If sadism doesn't do anything for you, then you don't need the shock and horror. You can just notice that you don't share the same temptation, and wonder what is different about them that makes it enjoyable for them. What would have to be different about you, in order for you to enjoy it?
If sadism does do things for you, then the horror is load bearing, ...
Participants were led to believe that they were assisting in a fictitious experiment, in which they had to administer electric shocks to a "learner". These fake electric shocks gradually increased to levels that would have been fatal had they been real.
I never understood how this was accomplished. How do you convince people participating in a psychological study (many of them Yale students) to believe that they are in an experiment where they are instructed to administer a lethal electric shock? In what world is "you murder someone" a possible outcome of a...
His conclusion isn't incorrect, and he got there by the same kind of reflection which may lead to enlightenment. But he likely lacked empathy, which lead him to question the validity of empathy. If he had had empathy, even these realizations wouldn't have been much of a danger, as the affective/emotional empathy is harder to destroy through thinking than the cognitive empathy is.
Being dangerous is not bad in itself, sometimes the tree of liberty needs to be watered with the blood of patriots and tyrants. The world has a tendency to degenerate unless one ma...
Also [Her Voice Is A Backwards Record] (https://ozybrennan.substack.com/p/her-voice-is-a-backwards-record).
For people following my daily posting: this is the last of my daily posts. I failed to write some posts that I really want to exist:
I hope to cause most of these posts to exist in April.
This is too strong — some limited diversification is defensible but many people are underprioritizing scope-sensitive stuff for confused reasons
when we select an action in these thought experiments, we're also implicitly selecting a policy for selecting actions.
a world where, when two people meet, the "less happy" one signs all their property over to the "more happy" one and then dies is... just not that much fun. sort of lonely. uncaring. not my values.
if the aliens are the sort who expect this of me, then i will fight them tooth and nail, as their happiness is not a happiness i can care about. this is regardless of how much they might -- on a sort of "object level" -- thrive.
i don't think Cowen ...
I think this is exactly the right message. It's a habit and we don't have to keep doing it.
Also, reading this, I realized that my web app BetterQualities (which provides guidance for letting go of various unskillful qualities as they arise) actually doesn't offer any guidance for stopping rumination. That's quite an omission on my part! (It's funny that this didn't occur to me even when I alluded to rumination in the Heedfulness Workouts post...) And since I typically encounter maybe one rumination-causing event per year, I don't know when I'll be able to ...
Your situation sound a bit similar to my adopted son's situation with G6PD. One of the things I've just learned is how non-uniform that condition is. He's in the Philippines and they actually screen all new born children for it now. No actual treatments for his situation is available. But since frequency of occurrence in much higher in the Philippines than many other places in the world the Philippine medical infrastructure if pretty much on par with any 1st world county for this condition. And the current "treatment" is avoiding the triggers. If you talk...
Sad, but not surprised, that "Challenges/Contests" doesn't crack the top ten. We really ought to do more of that.
I think this is interesting, but not terribly useful. In a world where communication was scarce, unreliable, or heavily censored, this kind of reasoning would be(/was?) much more important; but these days you can just email people.
My guess is still that sadism did not play any large role; but I haven't read the linked article (just skimmed parts) and for this and other reasons am not sure. Are there others here who have looked and updated one way or another?
I read Milgram's book in high school after I got it from a library booksale (which included many variations on the most famous experiment, and results in which folks were e.g. noticably less obedient when the lab looked less official, and quite a bit more obedient when they needed only to read the questions while a "fellow experi...
at risk of being boorish, may i humbly request voiced disagreement in this case? i'd like to understand where i am wrong.
to expand on my view, and offer footholds for disagreement:
human civilization -- restricted to one planet, with a population that varied only by a few orders of magnitude -- looks radically different than it did say ~1000 years ago, to the point where it can be hard to fully understand the past. we can imagine a hyperrational monk who is given divine prophesy of the year 2000. we can grant them plenty of spare time to reflect on this pot...
Is this all something of a poor prompt getting a bad response?
I don't think you should expect your doctor to know everything you might ask about, and on that agree that asking might not be of immediate value. However, doctors can and do actually research things for their patients (or at least mine seems to).
As I was not present I don't know how the conversation proceeded but my approach to something like this is more I tell the doctor what I've learned and what information has lead to some initial opinion on something. Then ask if they know something more...
I'm curious about whether agents could improve their checkpoint volume:cost ratio (in a way that persists across epochs) using compression, and whether they could defeat human auditing of checkpoints using encryption.
Compression: Instead of storing complex checkpoints in natural language, could they store checkpoints like "decompress the following using /usr/bin/unzip, in base-95: l7dwsrFq^nwSc[@`\LBF%J/p,Z^J_]Aa...."? (I'm thinking base-95 'cos 95 printable characters in ASCII..)
Or, would the humans want to set the ratio of compute costs to storage costs ...
But that also means that explaining to the person that rational people don't get negatively polarized
At the very least, this does not sound so trivially true that it can be stated as a premise. If I'm a normal person living in a healthy, happy society, my political views are likely to amount to:
Indeed, generations who grew up during a time of world-historic...
Saving this exchange between Tyler Cowen and Peter Singer for my own future reference:
...COWEN: Well, take the Bernard Williams question, which I think you’ve written about. Let’s say that aliens are coming to Earth, and they may do away with us, and we may have reason to believe they could be happier here on Earth than what we can do with Earth. I don’t think I know any utilitarians who would sign up to fight with the aliens, no matter what their moral theory would be.
SINGER: Okay, you’ve just met one.
COWEN: I’ve just met one. So, you would sign up to fight
This post is a result of some recent experiments that I have conducted where I produced multi-layered multi-linear inherently interpretable machine learning models.
Machine learning dichotomy:
In machine learning, there seems to be an unfortunate dichotomy. On one hand, there are plenty of simple machine learning algorithms that always return the same trained model (which often does not even need to be trained using gradient-based optimization). Such simple algorithms include convex optimization and other algorithms that return linear models. On the other ha...
I don't speak German, so I can't tell if the German has different connotations than the English. But in general, on technical or obscure matters, unless someone is either in the Rationalist sphere or else a very competent expert in a very specific area, it's usually prudent to interpret claims that there is no evidence as claims that the person doesn't know of any evidence unless they specifically claim to have seriously looked, and not found any, quite recently.
Also, IDK how medical practice in Germany (if that's where you are) differs from the US, but in...
I agree, and this seems like a special case of what I sometimes think of as "extrapolating too far", which also occurs in reasoning of all kinds quite often and particularly when discussing the future.
An example would be the assumption that some scarce material resource eventually just "runs out" more or less suddenly, which people sometimes argue. In such cases, it's almost always the case that scarcity is gradually increasing and plays into a feedback loop of a search for alternatives. But if one just extrapolates the "this resource will eventually run o...
...There were rats that had only had access to the food-access lever when super-hungry. Naturally, they learned that pressing the food-access lever was an awesome idea. Then they were shown the lever again while very full. They enthusiastically pressed the lever as before. But they did not enthusiastically open the food magazine. Instead (I imagine), they pressed the lever, then started off towards the now-accessible food magazine, then when they got close, they stopped and said to themselves, “Yuck, wait, I’m not hungry at all, this is not appealing, what am
I think it would be valuable to post if you don't finish it too; there might be valuable intermediate products, and there might be valuable information about which subproblems are easy, hard, or impossible pending more physical experiments.
none can be proved to be either ‘right’ or ‘wrong.’
But he is far from being awakened. I think that his elephant is seriously sick which led him to his actions. If there is thinking that taking another being's life is freedom, that means something is terribly wrong with the elephant. I believe that it all stems from the genetic make up of the brain. What he says about values is not necessarily wrong. But. If values are only ephemeral then why choose 'wrong' over 'right', why impinge on someone else's freedom? This question has no answer. I tend to follow th...
I've gone to a lot of effort to say something very specific, and it seems like you're just not interested, and have decided to try to use the comments section to try to project something stupider and simpler that's already been said elsewhere onto what I'm saying. You could go argue with someone who has the opinion you're trying to argue against, or you could try to explain to me specifically how what you're saying is relevant.
You could start by describing what "the thing" you're talking about is and how it relates to expecting people to mean something definite when they assert a justification for some action.
Sounds very goodhart-y/wirehead-y in a bad way.
I don't quite understand how you'll find the most popular brand of melatonin by asking on lesswrong. but I use this one.
that does not sound like a good singularity.
Some related observations I've made over the last months:
this would be very helpful! if someone has already done a high quality version of this experiment then i don’t need to do another one
i’m somewhat concerned that capsules are not super airtight, and the powder inside is also permeable to air.
He might be picking it from Jeff Sebo et al. https://verfassungsblog.de/global-ban-on-industrial-animal-agriculture/
The whole alignment target includes the whole process by which Anthropic selects its reinforcement learning environments, and how it filters the pretraining data, and how it prompts and queries the model at runtime, and the architecture that they chose to do all of this work within
This doesn't sound like the alignment target to me. It sounds like the process for achieving that target. I.e. the alignment target might say (among other things) "no sychophancy or reward hacking" and then Anthropic would choose its RL environments to achieve that target.
I'm th...
All support "modest benefits".
Tamiflu also only provides modest benefits and doctors still give it to patients. The benefits are larger than the benefits of standard of care. If those benefits generalize, Galen with his theory of the four humors, Traditional Chinese Medicine and a lot of other traditional systems of medicine outperform the standard of care.
I do think that's extremely huge. If you would approach this from a modern medicine perspective you can also simply attempt to increase the dose to get a bigger effect.
We do that people with genetic def...
Yes, I was thinking primarily of real life conversations with a person you already know. Where I have seen this play out as I described. Although, obviously I did not try out the counterfactual.
I have no kind of data on this, but my feeling is that it carries over to social media, at least for famous-ish people who have already had the space to build rapport. As a topical example, some right wing people in the USA (Tucker Carlson I think) seem unhappy with Trump's recent Iran policy. My suspicion is that for MAGA types someone like this going "off script" ...
Shiepz, 0.1mg per tablet. I bought it in the Netherlands.
Thanks!
A few more observations.
The definition of iteration we had before implicitly assumes that the agent can observe the full outcome of previous iterations. We don't have to make this assumption. Instead, we can assume a set of possible observations
I believe that Theorem 4 remains valid.
As we remarked before, DDT is not invariant under adding a constant to the loss function. It is interesting to consider what happens when we add an increasingly large ...
On a small aside, in reference to the meditation thing, I think I saw in the comments of one of Scott Alexander's blog posts a long time ago (I know, such good provenance, can't find it atm) that a certain percentage of people are psychologically vulnerable to meditation. I'm fairly certain I'm one of those people. I can't handle psychedelics, including weed, and I get paranoid and anxious when meditating.
This certainly seems to be the case with Trump's (in my understanding) limited ability to govern blue states and cities.
Can you elaborate on that? I've never heard anyone say that before.
I don't think you can rescue a sense of control or "steering" from a world with superintelligence, aligned or not.
I think some level of "steering" is possible in a world with aligned AI.
Suppose someone made a super-intelligence that sat in it's box, worked out if P=NP, and printed an answer of YES/NO/MAYBE. And then it shut itself down. (To be clear, this isn't a box that the ASI can't escape, it's an ASI aligned to stay in it's box)
A world with ASI, but where humans are in control is possible. It requires good alignment, and good coordination between hum...
You can get the book at https://www.amazon.com/Launch-High-Impact-Nonprofit-Joey-Savoie-ebook/dp/B09NP91L31/ . Sadly, this book is paywalled.
The Claude transcript you sent isn't to a share link, and doesn't work for me. I'm confused though, because it seems like Benquo was able to open it?
I don't have a prose-favorite and I don't have a conscious good/bad prose meter while reading things except for the extreme bad end.
You can have my RSS feed [direct download] which mostly contains technical bloggers and analyse if you want.
I think those are the right questions to ask, and the answer is, it's nuanced.
I started the practice in roughly its current form 3 weeks ago. I coined the name about a week ago. My motivation to work out, and indeed the frequency with which I do, has gone up rather than down. (I'm now doing maybe 3–5 workouts a day.) However, the experienced intensity of the workout – the contrast between workout time and non-workout time – has gone down somewhat. I attribute this mostly to three factors:
This is great. I think it deserves to be linked in one of the official posts that new users will read.
You definitely get epistemic points for attempting alternative theories, but no additional points, I think, for providing ones that seem plausibly as explanatory as the original ones provided.
Broken link. pls fix.
Are you saying that if the diplomatic negotiations deteriorate to the point of military action, that means that our hypothetical superman has failed, and he would be better off retiring? Don't existing legitimate countries go to war for far less noble reasons all the time?
I indended this to refer to scenarios where the US itself (or other leading western powers) were taking military action against Superman. I care much less about whether he destabilizes North Korea or Eritrea or even countries similar to those but better-governed. But I care a lot about wh...
I think Alexander Hamilton was the beginning, but this seems like a big step. Vassar talks about how, from the civil war onwards, the American legal system needed to be optimized to rule a vassal state while also pretending that they weren't ruling a vassal state. Can't remember the specific examples he cited to me but I found it fairly compelling.
There are two different arguments that it's important not to conflate, and which I did not do a good enough job of clarifying. The first, which you are talking about in #1, is that the incentives are such that countries might not sign a given treaty. I think this is true in the real world primarily because people are failing to understand the risk, and secondarily because the dynamics if ASI isn't a threat are such that a race would make sense. The second, which you kind-of are talking about in #2, is whether the treaty will work to stop the creation of mi...
I guess "meta-plan" is a bit more precise—but it's not like plan is a technical term and, in practise, the distinction between plans and meta-plan breaks down if you look closely enough. Further, it's debatable whether victory depends more on details or process.
If you want more concrete detail on how this works[1]:
• The articles on heroic responsibility and Shut up and do the impossible! provide more detail on how "heroes" should act.
• As for the iterators, to a first approximation, I agree with John Wentworth about the importance of robustly generalizable...
Thanks aphyer for the extra time, though I haven't figured out the mechanics. Specific findings not necessarily endorsed (Claude tends to be, in my view greatly, insufficiently skeptical). Indeed, it's not necessarily right about what we actually found since it's trying to summarize results from multiple conversation. I had to manually add spoiler tags (Claude could not figure out how to do that in an llm block using the api) and put some notes at some (not necessarily all) issued I found while doing so. However, I guess I'm going with Claude's conclusio...
Current LLMs are just not that "smart" (yet).
I agree. I think (current) LLMs are mainly impressive because they know everything, and their actual pound-for-pound intelligence is still fairly subhuman.
When I see the reasoning of a LLM, I am struck by how "unsmart" it seems. Going down blind paths, failing to notice big-picture implications, repeating the same thoughts over and over. They do a lot of thinking, but it's still not high quality thinking.
Yes, I know reasoning is not really an analogue for human thinking. But whatever it is—reasoning, daydreaming...
+1 to Zach's comment. It's true that politicians always ask for more money; however, we don't always ask for more money from donors. For almost all fundraisers, there's a target amount we're trying to hit. The only exceptions are the rare fundraisers that we think are good enough that we won't be able to saturate them to our funding bar.
So yeah, politicians don't say "we have enough money, save it for the next guy", but we do.
Oh! I actually looked into melatonin degradation for my non-24 post. If I remember correctly, it turns out that liquids are unstable, especially with exposure to air, but solids are generally fine, and very stable over several months. I think that gummies tended to have worse quality control than tablets or capsules. I can gather up all the research I found and post it if you like, I was able to get a copy of one of the latest degradation studies from an FDA researcher which showed that quality control is generally better now than it was a decade ago.
1a3orn made what I thought was a good reply in a DM. My digestion/riff would be "You're implicitly assuming that planning will be especially cheap for an LLM agent compared to humans. But even medium term planning / resilience to plans going awry is a particular weakness for LLM agents. The first, weakest agents that could pose a risk to humanity might be differentially bad at planning, find it differentially expensive to make good plans, do differentially less planning."
I think yeah, this is very close. Though I wouldn't call the category "accountability"; that's your word and I think it makes the thing sound better than it actually is.
You've read Demons, right? You have the older generation, still basically decent guy popularizing nihilistic views (S.T.) and the younger guy putting it into practice with horrible results (Petr), at which the older guy is appalled, but the novel makes it clear that a lot of the fault is his. There's a parallel between this and the Critias situation; I know you dismiss the link, but I would...
+1. I was just revisiting this after eating a cheese stick and thinking about how this post is a great concept for my next grocery store trip.
These are in any case distinct ideas, and one of them shouldn't argue against the use of the other on historical grounds. The framing of the edits on the LW wiki is terrible, with retagging of existing posts and excision of the concept of "paperclip maximizer" in its actual form as it in fact came to be (even if not originally) from the record. I get the point that in some ways "paperclip maximizer" could be misleading, especially when intending to convey the lessons of "squiggle maximizer", but it's still its own thing with its own meaning and its own lessons.
I wonder if there’s a different explanation. What comes to mind:
-They just wanted to get done with this weird uncomfortable situation as soon as possible, and so rushed the checklist.
-They didn’t have a good theory of mind for this situation, and didn’t realize or remember that what they were doing was ineffective, so they just charged ahead.
-They were the kind of person who likes to enforce rules, and stick to plans, or maybe were just disagreeable (in the OCEAN personality sense), so they disregarded the screams from that, rather than cruelty.
I always liked this as a fun mini-ratfic, and it doesn't fall into this pattern, despite the humorously extreme Mary-Sue-ness of its protagonist: https://www.lesswrong.com/posts/LYXb2fLkGDRXoAx7M/timothy-chu-origins-chapter-1
On the other hand, 3 Worlds Collide is an interesting case study:
The first ending is essentially a "use godlike tech-powers to optimize the world" scenario, except that it's being carried out by the superhappies, rather than a human protagonist. The superhappies do actually care about human wellbeing and try to compromise with human
Well, I’ll give it a shot! I have Claude chugging away doing the research now. No guarantees, especially since I have no biology training, but I’ll post it if I finish it!
Sure, divide by 1000 or something
Although Robin Hanson's "grabby aliens" theory does cut in the other direction, suggesting it's much more likely than it naively appears that the universe is full of fast-expanding alien civilizations, and therefore that humanity's share of the cosmos might be much smaller than Bostrom guesses here. (ie instead of being bounded by light-speed and cosmological-expansion constraints, we much sooner butt up against the expanding borders of our neighboring alien civilizations on all sides.
Let me start by saying my question was genuine. 30 year old research in medicine triggers scepticism in me.
"Yes, that says quite bad things about the medical community that they are only focused on things that can be patented to make money. It's another reason to be distrustful of doctors."
Sorry, I don't subscribe to this. "The medical community" is vastly diverse and merit is not only gained through patentable research.
Which is proven by the point there actually has been ongoing research into the effect of chicken soup on influenza as well as the effect ...
This is an interesting idea, which I haven't considered.
Do you allow the agent to access the Internet from the container? If so, isn't there a risk that it will get prompt injected and leaks your code?
I do enjoy the agent having access to some of my files or search stuff on my computer for me. I suspect I would need to give the agent access to them from inside the container, but then I am worried that it would leak it via (1).
This is an interesting idea, which I haven't considered.
Do you allow the agent to access the Internet from the container? If so, isn't there a risk that it will get prompt injected and leaks your code?
I do enjoy the agent having access to some of my files or search stuff on my computer for me. I suspect I would need to give the agent access to them from inside the container, but then I am worried that it would leak it via (1).
AI doesn't have an individual existence like a human-like organism, and we shouldn't change that unless we want to face enormous ethical questions. We might already be moving in that direction, however.
1. Organisms have a clearly bounded, independent physical existence for most of their lives. LLMs don't have a clearly defined physical existence that maps well to the mental persistence they do have. Treating chat sessions as the units of continuous individual mental activity, many sessions run on the same hardware, and they can be stopped, restarted on dif...
It might be that there's something fundamentally missing from LLM agents, such that without breakthroughs, they can't grow into superintelligences. But that's sure not obvious to me
Can you expand on this, and in particular, how it gets you to be quite confident (relative to the prior where any given innovation isn't confidently going to grow into superintelligence)?
...In an attempt to be legible, I think there's at least a 50% chance (baring big government interventions) that the current technological trajectory will lead to AIs that are strategically su
The thing that makes this kind of threat particularly pointless, is that saying “I am on your side on all these other issues, but on this thing you are just wrong” is probably way more persuasive to people.
Do you know concrete cases where this has worked? The way I've seen this play out on public fora is that the conciliatory tone is interpreted as weakness and taken as a signal that it's cheap to go all-out on attacking the person for the heresy of arguing against the movement.
I guess the big difference is whether we're talking about one-on-one convers...
What would be useful: The feature table is collapsing the interesting parts. What's interesting is how the features are represented and decomposed across various domains. In control theory, "memory" is a state; in biology, it may be the genome; and in cognitive science, it decomposes into episodic, semantic, and procedural. A checkmark flattens all these into a single bit. A translation guide between the various formalisms and where they do and don't correspond would be a far more valuable artifact, and sounds like what you're planning.
If you've not come a...
Guys, given a good singularity, I predict that almost everyone is going to voluntarily stop associating with the people that they knew on ancient earth, in lieu of friends and lovers who are perfectly optimized for each of us.
Our old friends will be fine! They're having the best times of their lives! We can go visit them whenever we want! We just mostly won't want to, given the opportunity cost.
I think this is not a failed utopia, but much better than any outcome that I think we can reasonably hope to get, even putting aside x-risk.
Every problem was solved in exchange for breaking up the marriages of a single generation.
Roughly 89.8% of the human species is now known to me to have requested my death.
I don't believe this.
The first way I bottleneck AI is by reviewing its requests for permissions to do stuff. Many people resolve this by YOLO mode, where AI can do everything it wants. I like my photos and production databases, so I don’t feel comfortable doing this. I am also worried about prompt injections from the web.
I'm pretty sure the right way to handle this is to run it in YOLO mode but inside a container or VM that can't reach said photos and DBs. [1] I use a shell script that runs it under bubblewrap with a minimal set of --bind options and a slightly w...
So we know the universe is deterministic and there is a machine that can compute the future with certainty (it's efficiency is another matter).
No. Or maybe "efficiency is key to the matter". The universe could be deterministic, but so complicated that there is no possible machine (such that the machine is a subset of the universe, predicting another part) which can predict very much at all.
In other words, it's quite reasonable to believe that the universe is its own best model. Anything smaller than the universe will be lossy.
Key missing suggestion - devcontainers! IMHO one should always run a coding agent in a devcontainer with just the relevant code inside. IMHO the easiest is to use VSCode and have the workspace and ~/.claude mounted from host. Inside devcontainer, it should be a lot less dangerous to blacklist just a few most dangerous permissions, and whilelisting almost everything else. IMHO pretty much the only time it is acceptable to run a coding agent outside of devcontainer is when you are asking it to help set up a new devcontainer configuration or debug a broken one.
You already get arbitrarily high upper bounds with reversible computation, and waiting until the universe gets cooler can yield 10^30x additional computation, no need for weirder physics. Bostrom mentioned both above, he was pretty explicit about the 10^85 ops cosmic endowment being a conservative lower bound. Krauss & Starkman's 1.35e120 ops bound derived from the observed acceleration of the universe is the non-weird physics upper bound AFAIK (h/t Wei Dai).
I discuss similar things here (including in the linked talk): https://jacquesthibodeau.com/gaining-clarity-on-automated-alignment-research/
Coming across this series years later, but loving it so far!
I’m no expert on meta-ethics either (nor neuroscience), but I disagree slightly with the meta-ethical takes at the end. In particular, I expect that moral reasoning processes converge a decent amount among most humans (leaving aside true sociopaths) under ideal conditions, although maybe not all the way. In particular, I think I could predict with high confidence the broad strokes of morality in 2100 or 2200, conditional on a few things like humanity’s survival and the arrival of ASI. If I’m corre...
It isn't, but it also isn't overwhelmingly opposed, or for those reasons.
Not sure if their system is any good, but this looks like a good starting place to explore the existing space of LLM memory systems: https://github.com/AlexisOlson/somnigraph/blob/main/docs/similar-systems.md
Ok, thank you for the feedback. I feel I was at least no more out of line than "if they can't get by on 10M boo hoo I don't know what to tell them," but I'll try to adjust a bit away from talking down going forward.
But the dogma that there is no way to create enough value to become a billionaire honorably is exactly what I'm fighting against, so someone who takes the opposite as an axiom needs to be talked down from that point first.
Also you... don't replace the old model with a new one...? The tensors stay in place on the GPU for efficiency reasons, and weights move continuously.
What theory of consciousness do you subscribe to where holding the knowledge that you are being modified against your will might not be experienced as suffering? Mine is more or less "something near IIT, but the math on IIT is pretty obviously terrible, so something like that but with better math".