"Knowing the territory takes patient and direct observation." There’s a kind of thinking that happens when a person moves quickly and relies on their built-up structures of thought and perception. A different kind of thing can happen when a person steps back and brings those very structures into view rather than standing atop them. 

So copilot is still prone to falling into an arrogant attractor with a fairly short prompt that is then hard to reverse with a similar prompt: reddit post
I often wish I had a better way to concisely communicate "X is a hypothesis I am tracking in my hypothesis space". I don't simply mean that X is logically possible, and I don't mean I assign even 1-10% probability to X, I just mean that as a bounded agent I can only track a handful of hypotheses and I am choosing to actively track this one. * This comes up when a substantially different hypothesis is worth tracking but I've seen no evidence for it. There's a common sentence like "The plumber says it's fixed, though he might be wrong" where I don't want to communicate that I've got much reason to believe he might be wrong, and I'm not giving it even 10% or 20%, but I still think it's worth tracking, because strong evidence is common and the importance is high. * This comes up in adversarial situations when it's possible that there's an adversarial process selecting on my observations. In such situations I want to say "I think it's worth tracking the hypothesis that the politician wants me to believe that this policy worked in order to pad their reputation, and I will put some effort into checking for evidence of that, but to be clear I haven't seen any positive evidence for that hypothesis in this case, and will not be acting in accordance with that hypothesis unless I do." * This comes up when I'm talking to someone about a hypothesis that they think is likely and I haven't thought about before, but am engaging with during the conversation. "I'm tracking your hypothesis would predict something different in situation A, though I haven't seen any clear evidence for privileging your hypothesis yet and we aren't able to check what's actually happening in situation A." * A phrase people around me commonly use is "The plumber says it's fixed, though it's plausible he's mistaken". I don't like it. It feels too ambiguous with "It's logically possible" and "I think it's reasonably likely, like 10-20%" and neither of which is what I mean. This isn't a claim about its probability, it's just a claim about it being "worth tracking". Some options: * I could say "I am privileging this hypothesis" but that still seems to be a claim about probability, when often it's more a claim about importance-if-true, and I don't actually have any particular evidence for it. * I often say that a hypothesis is "on the table" as way to say it's in play without saying that it's probable. I like this more but I don't feel satisfied yet. * TsviBT suggested "it's a live hypothesis for me", and I also like that, but still don't feel satisfied. How these read in the plumber situation: * "The plumber says it's fixed, though I'm still going to be on the lookout for evidence that he's wrong." * "The plumber says it's fixed, though it's plausible he's wrong." * "The plumber says it's fixed, and I believe him (though it's worth tracking the hypothesis that's he's mistaken)." * "The plumber says it's fixed, though it's a live hypothesis for me that he's mistaken." * "The plumber says it's fixed, though I am going to continue to privilege the hypothesis that he's mistaken." * "The plumber says it's fixed, though it's on the table that he's wrong about that." Interested to hear any other ways people communicate this sort of thing! Added: I am reacting with a thumbs-up to all the suggestions I like in the replies below.
Someone mentioned maybe I should write this publicly somewhere, so that it is better known. I've mentioned it before but here it is again: I deeply regret cofounding vast and generally feel it has almost entirely done harm, not least by empowering the other cofounder, who I believe to be barely better than e/acc folk due to his lack of interest in attempting to achieve an ought that differs from is. I had a very different perspective on safety then and did not update in time to not do very bad thing. I expect that if you and someone else are both going to build something like vast, and theirs takes three weeks longer to get to the same place, it's better to save the world those three weeks without the improved software. Spend your effort on things like lining up the problems with QACI and cannibalizing its parts to build a v2, possibly using ideas from boundaries/membranes, or generally other things relevant to understanding the desires, impulses, goals, wants, needs, objectives, constraints, developmental learning, limit behavior, robustness, guarantees, etc etc of mostly-pure-RL curious-robotics agents. incidentally, I've had many conversations with GPT4 where I try to get it to tell me what difference it thinks justifies its (obviously reward-induced and therefore at-least-somewhat-motivated-reasoning) claim that it's not like humans, and the only justification it consistently gives is continuous-time lived experience vs discrete-time secondhand textual training data. I feel like video models and especially egocentric robotics video models don't have that difference...
"have one acceptable path and immediately reject anyone who goes off it" cuts you off from a lot of good things, but also a lot of bad things. If you want to remove that constraint to get at the good weirdness, you need to either tank a lot of harm, or come up with more detailed heuristics to prevent it
Why are there mandatory licenses for many businesses that don't seem to have high qualification requirements?  Patrick McKenzie (@patio11) suggests on Twitter that one aspect is that it prevents crime: > Part of the reason for licensing regimes, btw, isn’t that the licensing teaches you anything or that it makes you more effective or that it makes you more ethical or that it successfully identifies protocriminals before they get the magic piece of paper. > > It’s that you have to put a $X00k piece of paper at risk as the price of admission to the chance of doing the crime. > > This deters entry and raises the costs of criminal enterprises hiring licensed professionals versus capable, ambitious, intelligent non-licensed criminals.

Popular Comments

Recent Discussion

In life, there are facts that can be used to describe events objectively, and then there are subjective interpretations of those events. It is the latter—the interpretations—that can either be a source of great joy, or bring forth never-ending misery. While the facts are immutable, you’re able to consciously choose how to interpret them. This revelation helps you stop feeling like a victim of circumstances that are outside of your control.


An Example

When I was 19 years old I injured my leg in an accident. At that time in my life, much of my identity was centered around being an athlete. So not only did the injury hurt physically, but the shock to my ego—that I can no longer play sports or have exercise be part of my life—caused me immense grief...

Overview of essay series

This is the first in a collection of three essays exploring and ultimately defending the idea of choosing what feels wholesome as a heuristic for picking actions which are good for the world. I'm including a summary of the series here, before getting to the essay proper.


The two main generators of my thinking about this were:

  • Reflecting on major mistakes that have been made in EA, and wondering how EA might have been different at a deep level in ways that could have averted those mistakes.
  • Reflecting on and trying to name certain core praiseworthy behaviours of people whom I especially admire.


In the first essay, Acting Wholesomely (= the rest of this post), we see that the regular English concept of acting wholesomely can be action-guiding, especially if...

This post feels to me in some ways like the first chapter of a religious teaching. The post keeps talking about wholesomeness in a way where I have a (perhaps unjustified) sense the post is pretending or expecting me to know what it means, and talking like it has successfully explained it, but I’m not sure it succeeds (e.g. the circular definition for how to make wholesome decisions), and that feels common for religious texts about how to live a good life.

4Ben Pace32m
Pretty good essay. On first pass, I don’t feel like this post manages to fully communicate the concept of wholesomeness well enough to pin it down for someone who didn’t already know what this post was trying to communicate. I shall give it a quick go. When I am choosing an action and justifying it as wholesome, what it often feels like is that I am trying to track all the obvious considerations, but some (be it internal or external) force is pushing me to ignore one of them. Not merely to trade off against it, but to look away from it in my mind. And agains that force I’m trying to defend a particular action as the best one call all things considered - the “wholesome” action. I am having a hard time thinking of examples, in part because I think I’ve been doing better on this axis in recent years, but I think one of the most tempting versions of this to me has been to ignore people’s feelings and my impacts on them when I have a mission that is very important. For instance, I might think someone has done terribly at some work that they’re doing on a project I’m leading. Now, I think it’s good to be straight with people and it’s good communication to give feedback early and clearly. So I want to let them know that the work has been worse than useless and I regret handing it off to them. This will likely cause them some fear and feel destabilizing to their social status and that will cause them stress and who knows how they deal with that. It is tempting here for me to choose not to pay attention to that when I decide to give them feedback, and as I do so, and after. And I have a great justification - because the work is exceedingly important! And if they say “Ben I feel like you’re being hurtful and not caring about my feelings” I can say “But this is what I have to do for the mission! It’s important! We all agree on that!” And nobody around will disagree because it’s often been the core conceit of my social groups that the only reason we’re here, the only reason w
From Owen's post: "I’d suggested her as a candidate earlier in the application process, but was not part of their decision-making process". "Unrelated job offer" is a bad description of that. I don't see the claim about hosting in the post, but that would a little soften things if true. Anyway, it's not a random blog post! If it was a post about how many species of flowers there are or whatever, then my comment wouldn't make sense. But it's not random! It's literally about acting wholesomely! His very unwholesome behavior is very relevant to a post he's making to the forum of record about what wholesome behavior is!
I specifically think it's well within the human norm, i.e. that most of the things I read are written by a person who has done worse things, or who would do worse things given equal power. I have done worse things, in my opinion. There's just not a blog post about them right now.

TL;DR This relates to the findings reported in my posts Mapping the Semantic Void parts I and II. By creating a custom embedding at the token centroid (the mean vector of all 50,257 GPT-J token embeddings), prompting the model to define it and considering logits, it's possible to construct a "definition tree" which consists overwhelmingly of vague generalities. This is hardly surprising, as GPT-J is basically being challenged to define "the average thing". However, the most probable branch in the tree which gives a definition containing anything specific defines the "ghost token" at the centroid as "a man's penis". Lowering the cumulative probability cutoff to produce ever longer lists of possible definitions, we find that almost all the branches which provide definitions involving anything specific are of...

Let me formulate a hypothesis: what is discovered there at the centre of the system is reminiscent of what must have been the most central part of the unconscious AND/OR the preoccupations of the animal Homo before the emergence of language: the orifices of the human body and the means of filling them. 

I myself had put the word 'mama' at the centre of ANELLA, (Associative Network with Emergent Logical and Learning Abilities) following a suggestion by Roman Jakobson as to the likely first words of any language: "Why 'Mama' and 'Papa'?" (1971), but on reflection, 'hole' is also a good candidate. 

So copilot is still prone to falling into an arrogant attractor with a fairly short prompt that is then hard to reverse with a similar prompt: reddit post

Fatebook is the fastest way to track your predictions. Now we've made a Chrome extension that makes it even faster.

With Fatebook for Chrome, you can now create and embed forecasts inside Google Docs.


Or anywhere else on the web! Inside your to-do list...


Or even inside Google Meet!


To instantly create a forecast on any webpage, just press Ctrl-Shift-F, type your prediction, and hit enter:


Imagine you're writing a Google Doc – a report on the rate of AI progress. You want to write down a prediction: "The most capable LLM in 2026 will be made by OpenAI (80%)"

With Fatebook for Chrome, you can press Ctrl-Shift-F, write your prediction, and embed it right into your doc. Your forecast is recorded in Fatebook, so you won't lose track of it: you'll get...

1Adam B17h
Here's an alpha version of a Firefox version! If you run into any problems, it would be great to hear about them (e.g. by email).


First note: After I install the extension it takes me to a page that says,

Date: Saturday, March 16th, 2024

Time: 1 pm – 3 pm PT

Address: Yerba Buena Gardens in San Francisco, just outside the Metreon food court, coordinates 37°47'04.4"N 122°24'11.1"W  


Come join San Francisco’s usually-First Saturday-but-in-this-case-Third-Saturday ACX meetup. Whether you're an avid reader, a first time reader, or just a curious soul, come meet! We will make introductions, talk about a recent ACX article (Love and Liberty), and veer off into whatever topic you’d like to discuss (that may, or may not be, AI). You can get food from one of the many neighbouring restaurants.

We relocate inside the food court if there is inclement weather, or too much noise/music outside.

I will carry a stuffed-animal green frog to help you identify the group. You can let me know you are coming by either RSVPing on LW or sending an email to 34251super@gmail.com, or you can also just show up!

To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Log In Reset Password
...or continue with

Crossposted from AI Impacts.

Epistemic status: I am not a historian, nor have I investigated these case studies in detail. I admit I am still uncertain about how the conquistadors were able to colonize so much of the world so quickly. I think my ignorance is excusable because this is just a blog post; I welcome corrections from people who know more. If it generates sufficient interest I might do a deeper investigation. Even if I’m right, this is just one set of historical case-studies; it doesn’t prove anything about AI, even if it is suggestive. Finally, in describing these conquistadors as “successful,” I simply mean that they achieved their goals, not that what they achieved was good.


In the span of a few years, some minor European...

NEW EDIT: After reading three giant history books on the subject, I take back my previous edit. My original claims were correct.

Could you edit this comment to add which three books you're referring to?

Still, ASI is just equation model F(X)=Y on steroids, where F is given by the world (physics), X is a search process (natural Monte-Carlo, or biological or artificial world parameter search), and Y is goal (or rewards).

To control ASI, you control the "Y" (right side) of equation. Currently, humanity has formalized its goals as expected behaviors codified in legal systems and organizational codes of ethics, conduct, behavior, etc. This is not ideal, because those codes are mostly buggy.

Ideally, the "Y" would be dynamically inferred and corrected, based on e... (read more)

Just a short post to highlight an issue with debate on LW; I have recently been involved with some interest in the debate on covid-19 origins on here. User viking_math posted a response which I was keen to respond to, but it is not possible for me to respond to that debate (or any) because the LW site has rate-limited me to one comment per 24 hours because my recent comments are on -5 karma or less. 

So, I feel that I should highlight that one side of the debate (my side) is simply not going to be here. I can't prosecute a debate like this. 

This is funnily enough an example of brute-force manufactured consensus - there will be a debate, people will make points on their side...

Would you agree with the statement that your meta-level articles are more karma-successful than your object-level articles? Because if that is a fair description, I see it as a huge problem.

I don't think this is a good characterization of my posts on this website.

If by "meta-level articles", you mean my philosophy of language work (like "Where to Draw the Boundaries?" and "Unnatural Categories Are Optimized for Deception"), I don't think success is a problem. I think that was genuinely good work that bears directly on the site's mission, independently o... (read more)