Book 5 of the Sequences Highlights

To understand reality, especially on confusing topics, it's important to understand the mental processes involved in forming concepts and using words to speak about them.

First Post: Taboo Your Words

Recent Discussion

This post is part of the output from AI Safety Camp 2023’s Cyborgism track, run by Nicholas Kees Dupuis - thank you to AISC organizers & funders for their support.  Thank you for comments from Peter Hroššo; and the helpful background of conversations about the possibilities (and limits) of LLM-assisted cognition with Julia Persson, Kyle McDonnell, and Daniel Clothiaux. 


Epistemic status: this is not a rigorous or quantified study, and much of this might be obvious to people experienced with LLMs, philosophy, or both.  It is mostly a writeup of my (ukc10014) investigations during AISC and is a companion to The Compleat Cybornaut.


 

TL;DR

This post documents research into using LLMs for domains such as culture, politics, or philosophy (which arguably are different - from the perspective of research approach...

2Ape in the coat14h
I'm glad that someone is talking about automating philosophy. It seem to have huge potential for alignment because in the end alignment is about ethical reasoning. So 1. Make an ethical simulator using LLM capable of evaluating plans and answering whether a course of action is ethical or not. Test this simulator in multiple situations. 2. Use it as "alignment module" for an LLM-based agent composed of multiple LLMs processing every step of the reasoning explicitly and transparently. Everytime an agents is taking an action verify it with alignment module. If the action is ethical - proceed, else - try something else.  3. Test agents behavior in multiple situations. Check the reasoning process to figure out potential issues and fix them 4. Restrict any other approach to agentic AI. Restrict training larger than current LLM. 5. Improve the reasoning of the agent via Socratic method, rationality techniques, etc, explicitly writing them in the code of the agent. 6. Congratulations! We've achived transparent interpretability; tractable alignment that can be tested with minimal real world consequenses and doesn't have to be done perfectly from the first try; slow take off.  Something will probably go wrong. Maybe agents designed like that would be very inferior to humans. But someone really have to try investigating this direction.

It seems that the "ethical simulator" from point 1. and the LLM-based agent from point 2. overlap, so you just overcomplicate things if make them two distinct systems. An LLM prompted with the right "system prompt" (virtue ethics) + doing some branching-tree search for optimal plans according to some trained "utility/value" evaluator (consequentialism) + filtering out plans which have actions that are always prohibited (law, deontology). The second component is the closest to what you described as an "ethical simulator", but is not quite it: the "utility/value" evaluator cannot say whether an action or a plan is ethical or not in absolute terms, it can only compare some proposed plans for the particular situation by some planner.

2Mateusz Bagiński15h
  IMO it's accurate to say that philosophy (or at least the kind of philosophy that I find thought-worthy) is a category that includes high-level theoretical thinking that either (1) doesn't fit neatly into any of the existing disciplines (at least not yet) or (2) is strongly tied to one or some of them but engages in high-level theorizing/conceptual engineering/clarification/reflection to the extent that is not typical of that discipline ("philosophy of [biology/physics/mind/...]"). (1) is also contiguous with the history of the concept. At some point, all of science (perhaps except mathematics) was "(natural) philosophy". Then various (proto-)sciences started crystallizing and what was not seen as deserving of its own department, remained in the philosophy bucket.

Are you confident in your current ontology? Are you convinced that ultimately all ufos are prosaic in nature?

If so, do you want some immediate free money?

I suspect that LW's are overconfident in their views on ufos/uap. As such, I'm willing to offer what I think many will find to be very appealing terms for a bet.

The Bet

Essentially, I wish to bet on the world and rationalists eventually experiencing significant ontological shock as it relates to the nature of some ufos/uap. 

Offer me odds for a bet, and the maximum payout you are willing to commit to. I will pick 1+ from the pool and immediately pay out to you. In the event that I ultimately win the bet, then you will pay out back to me.

I'm looking to...

2Charlie Steiner31m
I have recieved $1000. The bet is on!
3Dagon10h
LOL!  If you think an executor (or worse, an heir if the estate is already settled) is going to pay $100K to a rando based on a 5-year old less-wrong post, you have a VERY different model of humanity than I do.  Even more so if the estate didn't include any mention of it or money earmarked for it.

How do the desires of possible executors/heirs/etc. factor into this?

Clearly the bet will not auto-extinguish and auto-erase itself regardless of the future desires of anyone.

If you thought I implied that the bet must be settled in purely monetary terms, that wasn't my intention. It's entirely possible for the majority, or entirety, of the bet to be settled with non-monetary currencies, such as social-status, reputation, etc... 

It's just not all that likely for someone, or their successors, to insist on going down that path.
 

2Legionnaire10h
I am concerned for your monetary strategy (unless you're rich). Let's say you're absolutely right that LW is overconfident, and that there is actually a 10% chance of aliens rather than 0.5. So this is a good deal! 20x! But only on the margin. Depending on your current wealth it may only be rational to take a few hundred dollars worth of these bets for this particular bet. If you go making lots of these types of bets (low probability, high payoff, great EXpected returns) for a small fraction of your wealth each, you should expect to make money, but if you make only 3 or 4 of these types of bets, you are more likely to lose money because your are loading all your gains into a small fraction of possibilities in exchange for huge payouts, and most outcomes end up with you losing money. See for example the St. Petersburg paradox which has infinite expected return, but very finite actual value given limited assets for the banker and or the player.

"AI alignment" has the application, the agenda, less charitably the activism, right in the name. It is a lot like "Missiology" (the study of how to proselytize to "the savages") which had to evolve into "Anthropology" in order to get atheists and Jews to participate. In the same way, "AI Alignment" excludes e.g. people who are inclined to believe superintelligences will know better than us what is good, and who don't want to hamstring them. You can think we're well rid of these people. But you're still excluding people and thereby reducing the amount of thinking that will be applied to the problem.

"Artificial Intention research" instead emphasizes the space of possible intentions, the space of possible minds, and stresses how intentions that are not natural (constrained by...

""AI alignment" has the application, the agenda, less charitably the activism, right in the name."

This seems like a feature, not a bug. "AI alignment" is not a neutral idea. We're not just researching how these models behave or how minds might be built neutrally out of pure scientific curiosity. It has a specific purpose in mind - to align AI's. Why would we not want this agenda to be part of the name?

1Archimedes1h
"Artificial Intention" doesn't sound catchy at all to me, but that's just my opinion. Personally, I prefer to think of the "Alignment Problem" more generally rather than "AI Alignment". Regardless of who has the most power (humans, AI, cyborgs, aliens, etc.) and who has superior ethics, conflict arises when participants in a system are not all aligned.
4faul_sname7h
So the idea is to use "Artificial Intention" to specifically speak of the subset of concerns about what outcomes an artificial system will try to steer for, rather than the concerns about the world-states that will result in practice from the interaction of that artificial system's steering plus the steering of everything else in the world? Makes sense. I expect it's valuable to also have a term for the bit where you can end up in a situation that nobody was steering for due to the interaction of multiple systems, but explicitly separating those concerns is probably a good idea.
4Caspar Oesterheld8h
Do philosophers commonly use the word "intention" to refer to mental states that have intentionality, though? For example, from the SEP article on intentionality [https://plato.stanford.edu/entries/intentionality/]: >intention and intending are specific states of mind that, unlike beliefs, judgments, hopes, desires or fears, play a distinctive role in the etiology of actions. By contrast, intentionality is a pervasive feature of many different mental states: beliefs, hopes, judgments, intentions, love and hatred all exhibit intentionality. (This is specifically where it talks about how intentionality and the colloquial meaning of intention must not be confused, though.) Ctrl+f-ing through the SEP article gives only one mention of "intention" that seems to refer to intentionality. ("The second horn of the same dilemma is to accept physicalism and renounce the 'baselessness' of the intentional idioms and the 'emptiness' of a science of intention.") The other few mentions of "intention" seem to talk about the colloquial meaning. The article seems to generally avoid the avoid "intention". Generally the article uses "intentional" and "intentionality". Incidentally, there's also an SEP article on "intention" [https://plato.stanford.edu/entries/intention/] that does seem to be about what one would think it to be about. (E.g., the first sentence of that article: "Philosophical perplexity about intention begins with its appearance in three guises: intention for the future, as I intend to complete this entry by the end of the month; the intention with which someone acts, as I am typing with the further intention of writing an introductory sentence; and intentional action, as in the fact that I am typing these words intentionally.") So as long as we don't call it "artificial intentionality research" we might avoid trouble with the philosophers after all. I suppose the word "intentional" becomes ambiguous, however. (It is used >100 times in both SEP articles.)

(By "most promising" I mostly mean "not obviously making noob mistakes", with the central examples being "any Proper Noun research agenda associated with a specific person or org".)

(By "formal" I mean "involving at least some math proofs, and not solely coding things".)

Asking because the field is both relatively-small and also I'm not sure if any single person "gets" all of it anymore.

Example that made me ask this (not necessarily a central example): Nate Soares wrote this about John Wentworth's work, but then Wentworth replied saying it was inaccurate about his current/overall priorities.

This post is crossposted from my blog. If you liked this post, subscribe to Lynette's blog to read more -- I only crosspost about half my content to other platforms.

If you’re going into surgery, you want the youngest operating surgeon available.

This is a slight exaggeration – you don’t want a doctor in their first year out of medical school.[1] After that, it’s less clear. One review found thirty-two studies indicating that the older a doctor was, the worse their medical outcomes; that review only found one study indicating that all outcomes got better with increasing age.[2] Other analyses suggest that middle-aged doctors might do better than younger doctors (though the effect is not statistically significant)[3], but older doctors are still clearly worse than middle-aged doctors.[4]

It’s not like doctors...

I have found a lot of online summaries of deliberate practice frustratingly vague. So I bought a well reviewed out of print manual on deliberate practice in music called The Practiceopedia. The chapter headings give some ideas about the sort of resolution being gone for. I might do a book review at some point.

Chapter guide

Beginners: curing your addiction to the start of your peace

Blinkers: shutting out the things you shouldn't be working on

Boot camp: where you need to send passages that won't behave

Breakthroughs diary: keeping track of your progress

Bridgin... (read more)

This is a draft written by J. Dmitri Gallow, Senior Research Fellow at the Dianoia Institute of Philosophy at ACU, as part of the Center for AI Safety Philosophy Fellowship. This draft is meant to solicit feedback. Here is a PDF version of the draft.

Abstract

The thesis of instrumental convergence holds that a wide range of ends have common means: for instance, self preservation, desire preservation, self improvement, and resource acquisition. Bostrom (2014) contends that instrumental convergence gives us reason to think that ''the default outcome of the creation of machine superintelligence is existential catastrophe''.  I use the tools of decision theory to investigate whether this thesis is true.  I find that, even if intrinsic desires are randomly selected, instrumental rationality induces biases towards certain kinds of choices....

1J. Dmitri Gallow2h
A few things to note. Firstly, when I say that there's a 'bias' towards a certain kind of choice, I just mean that the probability that a superintelligent agent with randomly sampled desires (Sia) would make that choice is greater than 1/N, where N is the number of choices available. So, just to emphasize the scale of the effect: even if you were right about that inference, you should still assign very low probability to Sia taking steps to eliminate other agents.   Secondly, when I say that a choice "leaves less up to chance", I just mean that the sum total of history is more predictable, given that choice, than the sum total of history is predictable, given other choices. (I mention this just because you didn't read the post, and I want to make sure we're not talking past each other.) Thirdly, I would caution against the inference: without humans, things are more predictable; therefore, undertaking to eliminate other agents leaves less up to chance. Even if things are predictable after humans are eliminated, and even if Sia can cook up a foolproof contingency plan for eliminating all humans, that doesn't mean that that contingency plan leaves less up to chance. Insofar as the contingency plan is sensitive to the human response at various stages, and insofar as that human response is unpredictable (or less predictable than humans are when you don't try to kill them all), this bias wouldn't lend any additional probability to Sia choosing that contingency plan. Fourthly, this bias interacts with the others. Futures without humanity might be futures which involve fewer choices---other deliberative agents tend to force more decisions.  So contingency plans which involve human extinction may involve comparatively fewer choicepoints than contingency plans which keep humans around. Insofar as Sia is biased towards contingency plans with more choicepoints, that's a reason to think she's biased against eliminating other agents. I don't have any sense of how these biases
2Evan R. Murphy11h
Interesting... still taking that in. Related question: Doesn't goal preservation typically imply self preservation? If I want to preserve my goal, and then I perish, I've failed because now my goal has been reassigned from X to nil.

A quick prefatory note on how I'm thinking about 'goals' (I don't think it's relevant, but I'm not sure): as I'm modelling things, Sia's desires/goals are given by a function from ways the world could be (colloquially, 'worlds') to real numbers, , with the interpretation that  is how well satisfied Sia's desires are if  turns out to be the way the world actually is. By 'the world', I mean to include all of history, from the beginning to the end of time, and I mean to encompass every region of space. I assume that this functio... (read more)

1rvnnt12h
I agree. But AFAICT that doesn't really change the conclusion that less agents would tend to make the world more predictable/controllable. As you say yourself: And that was the weaker of the two apparent problems. What about the {implied self-preservation and resource acquisition} part?
To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Subscribe to Curated posts
Log In Reset Password
...or continue with

Sometimes, people have life problems that can be entirely solved by doing one thing. (doing X made my life 0.1% better, PERMANENTLY!) These are not things like "This TAP made me exercise more frequently", but rather like "moving my scale into my doorway made me weigh myself more, causing me to exercise more frequently" - a one-shot solution that makes a reasonable amount of progress in solving a problem.

I've found that I've had a couple of life problems that I couldn't solve because I didn't know what the solution was, not because it was hard to solve - once I thought of the solution, implementation was not that difficult. I'm looking to collect various one-shot solutions to problems to expand my solution space, as well as potentially find solutions to problems that I didn't realize I had.

Please only put one problem-solution pair per answer.

here's a small improvement for me. i open a lot of tabs every day, sometimes to read them later, etc. it would get really disorganized, till i enabled a setting that makes new tabs open to the right of the current one, rather than to the right of all of them. it still gets disorganized, but not as much. also, now i don't need to scroll all the way to the right on my tab list to get to one i just opened, and can just ctrl + click -> ctrl + tab. 

(there may be a better solution for this, like a tab manager addon, though)

If a technology may introduce catastrophic risks, how do you develop it?

It occurred to me that the Wright Brothers’ approach to inventing the airplane might make a good case study.

The catastrophic risk for them, of course, was dying in a crash. This is exactly what happened to one of the Wrights’ predecessors, Otto Lilienthal, who attempted to fly using a kind of glider. He had many successful experiments, but one day he lost control, fell, and broke his neck.

Otto Lilienthal gliding experiment.
Otto Lilienthal gliding experiment. Wikimedia / Library of Congress

Believe it or not, the news of Lilienthal’s death motivated the Wrights to take up the challenge of flying. Someone had to carry on the work! But they weren’t reckless. They wanted to avoid Lilienthal’s fate. So what was their approach?

First,...

The Wrights invented the airplane using an empirical, trial-and-error approach. They had to learn from experience. They couldn’t have solved the control problem without actually building and testing a plane. There was no theory sufficient to guide them, and what theory did exist was often wrong. (In fact, the Wrights had to throw out the published tables of aerodynamic data, and make their own measurements, for which they designed and built their own wind tunnel.)

This part in particular is where I think there's a whole bunch of useful lessons for alignment... (read more)

2DirectedEvolution3h
The big difference between AI and these technologies is that we're worried about adversarial behavior by the AI.  A more direct analogy would be if Wright & co had been worried that airplanes might "decide" to fly safely until humanity had invented jet engines, then "decide" to crash them all at once. Nuclear bombs do have a direct analogy - a Dr. Strangelove-type scenario in which, after developing an armamentarium in a ostensibly carefully-controlled manner, some madman (or a defect in an automated launch system) triggers an all-out nuclear attack and ends the world. This is the difficulty, I think. Tech developers naturally want to think in terms of a non-adversarial relationship with their technology. Maybe this is more familiar to biologists like myself than to people working in computer science. We're often working with living things that can multiply, mutate and spread, and which we know don't have our best interests in mind. If we achieve AGI, it will be a living in silico organism, and we don't have a good ability to predict what it's capable of because it will be unprecedented on the earth.
2Gordon Seidoh Worley6h
I love stories like this. It's not immediately obvious to me how to translate them to AI—like what is the equivalent of what the Wright brother's did for AI?—but I think hearing these are helpful to developing the mindset that will create the kinds of precautions necessary to work with AI safely.

Apple is offering a VR/AR/XR headset, Vision Pro, for the low, low price of $3,500.

I kid. Also I am deadly serious.

The value of this headset to a middle class American or someone richer than that is almost certainly either vastly more than $3,500, or at best very close to $0.

This type of technology is a threshold effect. Once it gets good enough, if it gets good enough, it will feel essential to our lives and our productivity. Until then, it’s a trifle.

Thus, like Divia Eden, I am bullish on using the Tesla strategy of offering a premium product at a premium price, then later either people decide they need it and pay up or you scale enough to lower costs – if the tech delivers.

Gaming could be...

1Caspar Oesterheld7h
Nice overview! I mostly agree. >What I do not expect is something I’d have been happy to pay $500 or $1,000 for, but not $3,500. Either the game will be changed, or it won’t be changed quite yet. I can’t wait to find out. From context, I assume you're saying this about the current iteration? I guess willingness to pay for different things depends on one's personal preferences, but here's an outcome that I find somewhat likely (>50%): * The first-gen Apple Vision Pro will not be very useful for work, aside from some niche tasks. * It seems that to be better than a laptop for working at a coffee shop or something they need to have solved ~10 different problems extremely well and my guess is that they will have failed to solve one of them well enough. For example, I think comfort/weight alone has a >30% probability of making this less enjoyable to work with (for me at least) than with a laptop, even if all other stuff works fairly well. * Like you, I'm sometimes a bit puzzled by what Apple does. So I could also imagine that Apple screws up something weird that isn't technologically difficult. For example, the first version of iPad OS was extremely restrictive (no multitasking/splitscreen, etc.). So even though the hardware was already great, it was difficult to use it for anything serious and felt more like a toy. Based on what they emphasize on the website, I could very well imagine that they won't focus on making this work and that there'll be some basic, obvious issue like not being able to use a mouse. If Apple had pitched this more in the way that Spacetop [https://www.wired.com/story/sightful-spacetop-augmented-reality-laptop-hands-on-news/] was pitched, I'd be much more optimistic that the first gen will be useful for work. * The first-gen Apple Vision Pro will still produce lots of extremely interesting experiences so that many people would be happy to pay, say, $1
2Matt Goldenberg8h
I imagine this will relax over time, like the early iPhone didn't allow any access for apps to the phonecall hardware.
1Caspar Oesterheld8h
>All accounts agree that Apple has essentially solved issues with fit and comfort. Besides the 30min point, is it really true that all accounts agree on that? I definitely remember reading in at least two reports something along the lines of, "clearly you can't use this for hours, because it's too heavy". Sorry for not giving a source!

Two reviewers who worried about the weight: Norman Chan, Marques Brownlee.