Quick Takes

I think that people who work on AI alignment (including me) have generally not put enough thought into the question of whether a world where we build an aligned AI is better by their values than a world where we build an unaligned AI. I'd be interested in hearing people's answers to this question. Or, if you want more specific questions:

  • By your values, do you think a misaligned AI creates a world that "rounds to zero", or still has substantial positive value?
  • A common story for why aligned AI goes well goes something like: "If we (i.e. humanity) align AI
... (read more)
Reply21111
Showing 3 of 11 replies (Click to show all)
4ryan_greenblatt18h
* My current guess is that max good and max bad seem relatively balanced. (Perhaps max bad is 5x more bad/flop than max good in expectation.) * There are two different (substantial) sources of value/disvalue: interactions with other civilizations (mostly acausal, maybe also aliens) and what the AI itself terminally values * On interactions with other civilizations, I'm relatively optimistic that commitment races and threats don't destroy as much value as acausal trade generates on some general view like "actually going through with threats is a waste of resources". I also think it's very likely relatively easy to avoid precommitment issues via very basic precommitment approaches that seem (IMO) very natural. (Specifically, you can just commit to "once I understand what the right/reasonable precommitment process would have been, I'll act as though this was always the precommitment process I followed, regardless of my current epistemic state." I don't think it's obvious that this works, but I think it probably works fine in practice.) * On terminal value, I guess I don't see a strong story for extreme disvalue as opposed to mostly expecting approximately no value with some chance of some value. Part of my view is that just relatively "incidental" disvalue (like the sort you link to Daniel Kokotajlo discussing) is likely way less bad/flop than maximum good/flop.

Thank you for detailing your thoughts. Some differences for me:

  1. I'm also worried about unaligned AIs as a competitor to aligned AIs/civilizations in the acausal economy/society. For example, suppose there are vulnerable AIs "out there" that can be manipulated/taken over via acausal means, unaligned AI could compete with us (and with others with better values from our perspective) in the race to manipulate them.
  2. I'm perhaps less optimistic than you about commitment races.
  3. I have some credence on max good and max bad being not close to balanced, that additi
... (read more)
1Quinn19h
sure -- i agree that's why i said "something adjacent to" because it had enough overlap in properties. I think my comment completely stands with a different word choice, I'm just not sure what word choice would do a better job.

My current main cruxes:

  1. Will AI get takeover capability? When?
  2. Single ASI or many AGIs?
  3. Will we solve technical alignment?
  4. Value alignment, intent alignment, or CEV?
  5. Defense>offense or offense>defense?
  6. Is a long-term pause achievable?

If there is reasonable consensus on any one of those, I'd much appreciate to know about it. Else, I think these should be research priorities.

I offer, no consensus, but my own opinions: 

Will AI get takeover capability? When?

0-5 years.

Single ASI or many AGIs?

There will be a first ASI that "rules the world" because its algorithm or architecture is so superior. If there are further ASIs, that will be because the first ASI wants there to be. 

Will we solve technical alignment?

Contingent. 

Value alignment, intent alignment, or CEV?

For an ASI you need the equivalent of CEV: values complete enough to govern an entire transhuman civilization. 

Defense>offense or offense>defense?

Of... (read more)

AGI doom by noise-cancelling headphones:                                                                            

ML is already used to train what sound-waves to emit to cancel those from the environment. This works well with constant high-entropy sound waves easy to predict, but not with low-entropy sounds like speech. Bose or Soundcloud or whoever train very hard on... (read more)

Showing 3 of 10 replies (Click to show all)

FWIW it was obvious to me

2Seth Herd11d
Thanks! A joke explained will never get a laugh, but I did somehow get a cackling laugh from your explanation of the joke. I think I didn't get it because I don't think the trend line breaks. If you made a good enough noise reducer, it might well develop smart and distinct enough simulations that one would gain control of the simulator and potentially from there the world. See A smart enough LLM might be deadly simply if you run it for long enough if you want to hurt your head on this. I've thought about it a little because it's interesting, but not a lot because I think we probably are killed by agents we made deliberately long before we're killed by accidentally emerging ones.
3faul_sname15d
Fixed, thanks
nim6h20

I've found an interesting "bug" in my cognition: a reluctance to rate subjective experiences on a subjective scale useful for comparing them. When I fuzz this reluctance against many possible rating scales, I find that it seems to arise from the comparison-power itself.

The concrete case is that I've spun up a habit tracker on my phone and I'm trying to build a routine of gathering some trivial subjective-wellbeing and lifestyle-factor data into it. My prototype of this system includes tracking the high and low points of my mood through the day as recalled ... (read more)

dirk22h30

I'm not alexithymic; I directly experience my emotions and have, additionally, introspective access to my preferences. However, some things manifest directly as preferences which I have been shocked to realize in my old age, were in fact emotions all along. (In rare cases these are stronger than the ones directly-felt even, despite reliably seeming on initial inspection to be simply neutral metadata).

2Viliam16h
Specific examples would be nice. Not sure if I understand correctly, but I imagine something like this: You always choose A over B. You have been doing it for such long time that you forgot why. Without reflecting about this directly, it just seems like there probably is a rational reason or something. But recently, either accidentally or by experiment, you chose B... and realized that experiencing B (or expecting to experience B) creates unpleasant emotions. So now you know that the emotions were the real cause of choosing A over B all that time. (This is probably wrong, but hey, people say that the best way to elicit answer is to provide a wrong one.)

Here's an example for you: I used to turn the faucet on while going to the bathroom, thinking it was due simply to having a preference for somewhat-masking the sound of my elimination habits from my housemates, then one day I walked into the bathroom listening to something-or-other via earphones and forgetting to turn the faucet on only to realize about halfway through that apparently I actually didn't much care about such masking, previously being able to hear myself just seemed to trigger some minor anxiety about it I'd failed to recognize, though its ab... (read more)

So the usual refrain from Zvi and others is that the specter of China beating us to the punch with AGI is not real because limits on compute, etc. I think Zvi has tempered his position on this in light of Meta's promise to release the weights of its 400B+ model. Now there is word that SenseTime just released a model that beats GPT-4 Turbo on various metrics. Of course, maybe Meta chooses not to release its big model, and maybe SenseTime is bluffing--I would point out though that Alibaba's Qwen model seems to do pretty okay in the arena...anyway, my point is that I don't think the "what if China" argument can be dismissed as quickly as some people on here seem to be ready to do.

dirk22h40

I'm against intuitive terminology [epistemic status: 60%] because it creates the illusion of transparency; opaque terms make it clear you're missing something, but if you already have an intuitive definition that differs from the author's it's easy to substitute yours in without realizing you've misunderstood.

1cubefox14h
I agree. This is unfortunately often done in various fields of research where familiar terms are reused as technical terms. For example, in ordinary language "organic" means "of biological origin", while in chemistry "organic" describes a type of carbon compound. Those two definitions mostly coincide on Earth (most such compounds are of biological origin), but when astronomers announce they have found "organic" material on an asteroid this leads to confusion.

Also astronomers: anything heavier than helium is a "metal"

Research Writing Workflow: First figure stuff out

  • Do research and first figure stuff out, until you feel like you are not confused anymore.
  • Explain it to a person, or a camera, or ideally to a person and a camera.
    • If there are any hiccups expand your understanding.
    • Ideally, as the last step, explain it to somebody whom you have not ever explained it to.
  • Only once you made a presentation without hiccups you are ready to write post.
    • If you have a recording this is useful as a starting point.

I like the rough thoughts way though. I'm not here to like read a textbook.

Nathan and Carson's Manifold discussion.

As of the last edit my position is something like:

"Manifold could have handled this better, so as not to force everyone with large amounts of mana to have to do something urgently, when many were busy. 

Beyond that they are attempting to satisfy two classes of people:

  • People who played to donate can donate the full value of their investments
  • People who played for fun now get the chance to turn their mana into money

To this end, and modulo the above hassle this decision is good. 

It is unclear to me whether there... (read more)

Showing 3 of 14 replies (Click to show all)

Nevertheless lots of people were hassled. That has real costs, both to them and to you. 

2Nathan Young13h
If that were true then there are many ways you could partially do that - eg give people a set of tokens to represent their mana at the time of the devluation and if at future point you raise. you could give them 10x those tokens back.
2Nathan Young14h
I’m discussing with Carson. I might change my mind but i don’t know that i’ll argue with both of you at once.

Have there been any great discoveries made by someone who wasn't particularly smart?

This seems worth knowing if you're considering pursuing a career with a low chance of high impact. Is there any hope for relatively ordinary people (like the average LW reader) to make great discoveries?

Various sailors made important discoveries back when geography was cutting-edge science.  And they don't seem particularly bright.

Vasco De Gama discovered that Africa was circumnavigable.

Columbus was wrong about the shape of the Earth, and he discovered America.  He died convinced that his newly discovered islands were just off the coast of Asia, so that's a negative sign for his intelligence (or a positive sign for his arrogance, which he had in plenty.)

Cortez discovered that the Aztecs were rich and easily conquered.

Of course, lots of other wou... (read more)

5niplav1d
My best guess is that people in these categories were ones that were high in some other trait, e.g. patience, which allowed them to collect datasets or make careful experiments for quite a while, thus enabling others to make great discoveries. I'm thinking for example of Tycho Brahe, who is best known for 15 years of careful astronomical observation & data collection, or Gregor Mendel's 7-year-long experiments on peas. Same for Dmitry Belayev and fox domestication. Of course I don't know their cognitive scores, but those don't seem like a bottleneck in their work. So the recipe to me looks like "find an unexplored data source that requires long-term observation to bear fruit, but would yield a lot of insight if studied closely, then investigate".
4Gunnar_Zarncke1d
I asked ChatGPT  and it's difficult to get examples out of it. Even with additional drilling down and accusing it of being not inclusive of people with cognitive impairments, most of its examples are either pretty smart anyway, savants or only from poor backgrounds. The only ones I could verify that fit are: * Richard Jones accidentally created the Slinky * Frank Epperson, as a child, Epperson invented the popsicle * George Crum inadvertently invented potato chips I asked ChatGPT (in a separate chat) to estimate the IQ of all the inventors is listed and it is clearly biased to estimate them high, precisely because of their inventions. It is difficult to estimate the IQ of people retroactively. There is also selection and availability bias.

I expect large parts of interpretability work could be safely automatable very soon (e.g. GPT-5 timelines) using (V)LM agents; see A Multimodal Automated Interpretability Agent for a prototype. 

Notably, MAIA (GPT-4V-based) seems approximately human-level on a bunch of interp tasks, while (overwhelmingly likely) being non-scheming (e.g. current models are bad at situational awareness and out-of-context reasoning) and basically-not-x-risky (e.g. bad at ARA).

Given the potential scalability of automated interp, I'd be excited to see plans to use large amo... (read more)

Showing 3 of 8 replies (Click to show all)
2ryan_greenblatt18h
Noteably, the mainline approach for catching doesn't involve any internals usage at all, let alone labeling a bunch of internals. I agree that this model might help in performing various input/output experiments to determine what made a model do a given suspicious action.

Noteably, the mainline approach for catching doesn't involve any internals usage at all, let alone labeling a bunch of things.

This was indeed my impression (except for potentially using steering vectors, which I think are mentioned in one of the sections in 'Catching AIs red-handed'), but I think not using any internals might be overconservative / might increase the monitoring / safety tax too much (I think this is probably true more broadly of the current control agenda framing).

1Bogdan Ionut Cirstea1d
Hey Jacques, sure, I'd be happy to chat!  
dirk21h125

Sometimes a vague phrasing is not an inaccurate demarkation of a more precise concept, but an accurate demarkation of an imprecise concept

cubefox17h10

Yeah. It's possible to give quite accurate definitions of some vague concepts, because the words used in such definitions also express vague concepts. E.g. "cygnet" - "a young swan".

1dkornai19h
I would say that if a concept is imprecise, more words [but good and precise words] have to be dedicated to faithfully representing the diffuse nature of the topic. If this larger faithful representation is compressed down to fewer words, that can lead to vague phrasing. I would therefore often view vauge phrasing as a compression artefact, rather than a necessary outcome of translating certain types of concepts to words. 

Today I learned that being successful can involve feelings of hopelessness.

When you are trying to solve a hard problem, where you have no idea if you can solve it, let alone if it is even solvable at all, your brain makes you feel bad. It makes you feel like giving up.

This is quite strange because most of the time when I am in such a situation and manage to make a real efford anyway I seem to always suprise myself with how much progress I manage to make. Empirically this feeling of hopelessness does not seem to track the actual likelyhood that you will completely fail.

Showing 3 of 5 replies (Click to show all)
7Carl Feynman3d
I was depressed once for ten years and didn’t realize that it was fixable.  I thought it was normal to have no fun and be disagreeable and grumpy and out of sorts all the time.  Now that I’ve fixed it, I’m much better off, and everyone around me is better off.  I enjoy enjoyable activities, I’m pleasant to deal with, and I’m only out of sorts when I’m tired or hungry, as is normal. If you think you might be depressed, you might be right, so try fixing it.  The cost seems minor compared to the possible benefit (at least it was in my case.). I don’t think there’s a high possibility of severe downside consequences, but I’m not a psychiatrist, so what do I know. I had been depressed for a few weeks at a time in my teens and twenties and I thought I knew how to fix it: withdraw from stressful situations, plenty of sleep, long walks in the rain.  (In one case I talked to a therapist, which didn’t feel like it helped.)  But then it crept up on me slowly in my forties and in retrospect I spent ten years being depressed. So fixing it started like this.  I have a good friend at work, of many years standing.  I’ll call him Barkley, because that‘s not his name.  I was riding in the car with my wife, complaining about some situation at work.  My wife said “well, why don’t you ask Barkley to help?”  And I said “Ahh, Barkley doesn’t care.”  And my wife said “What are you saying?  Of course he cares about you.”  And I realized in that moment that I was detached from reality, that Barkley was a good friend who had done many good things for me, and yet my brain was saying he didn’t care.  And thus my brain was lying to me to make me miserable.  So I think for a bit and say “I think I may be depressed.”  And my wife thinks (she told me later) “No duh, you’re depressed. It’s been obvious for years to people who know you.”  But she says “What would you like to do about it?” And I say, “I don’t know, suffer I guess, do you have a better idea?”  And she says “How about if I find you a
1Johannes C. Mayer2d
This is useful. Now that I think about it, I do this. Specifically, I have extremely unrealistic assumptions about how much I can do, such that these are impossible to accomplish. And then I feel bad for not accomplishing the thing. I haven't tried to be mindful of that. The problem is that this is I think mainly subconscious. I don't think things like "I am dumb" or "I am a failure" basically at all. At least not in explicit language. I might have accidentally suppressed these and thought I had now succeeded in not being harsh to myself. But maybe I only moved it to the subconscious level where it is harder to debug.

I would highly recommend getting someone else to debug your subconscious for you.  At least it worked for me.  I don’t think it would be possible for me to have debugged myself.
 

My first therapist was highly directive.  He’d say stuff like “Try noticing when you think X, and asking yourself what happened immediately before that.  Report back next week.” And listing agenda items and drawing diagrams on a whiteboard.  As an engineer, I loved it.  My second therapist was more in the “providing supportive comments while I tal... (read more)

Fabien Roger22hΩ6130

List sorting does not play well with few-shot mostly doesn't replicate with davinci-002.

When using length-10 lists (it crushes length-5 no matter the prompt), I get:

  • 32-shot, no fancy prompt: ~25%
  • 0-shot, fancy python prompt: ~60% 
  • 0-shot, no fancy prompt: ~60%

So few-shot hurts, but the fancy prompt does not seem to help. Code here.

I'm interested if anyone knows another case where a fancy prompt increases performance more than few-shot prompting, where a fancy prompt is a prompt that does not contain information that a human would use to solve the task. ... (read more)

American Philosophical Association (APA) announces two $10,000 AI2050 Prizes for philosophical work related to AI, with June 23, 2024 deadline: https://dailynous.com/2024/04/25/apa-creates-new-prizes-for-philosophical-research-on-ai/

https://www.apaonline.org/page/ai2050

https://ai2050.schmidtsciences.org/hard-problems/

dirk1d30

Classic type of argument-gone-wrong (also IMO a way autistic 'hyperliteralism' or 'over-concreteness' can look in practice, though I expect that isn't always what's behind it): Ashton makes a meta-level point X based on Birch's meta point Y about object-level subject matter Z. Ashton thinks the topic of conversation is Y and Z is only relevant as the jumping-off point that sparked it, while Birch wanted to discuss Z and sees X as only relevant insofar as it pertains to Z. Birch explains that X is incorrect with respect to Z; Ashton, frustrated, reiterates ... (read more)

dirk1d10

Meta/object level is one possible mixup but it doesn't need to be that. Alternative example, is/ought: Cedar objects to thing Y. Dusk explains that it happens because Z. Cedar reiterates that it shouldn't happen, Dusk clarifies that in fact it is the natural outcome of Z, and we're off once more.

I wish there were an option in the settings to opt out of seeing the LessWrong reacts. I personally find them quite distracting, and I'd like to be able to hover over text or highlight it without having to see the inline annotations. 

If you use ublock (or adblock, or adguard, or anything else that uses EasyList syntax), you can add a custom rule

lesswrong.com##.NamesAttachedReactionsCommentBottom-footerReactionsRow
lesswrong.com##.InlineReactHoverableHighlight-highlight:remove-class(InlineReactHoverableHighlight-highlight)

which will remove the reaction section underneath comments and the highlights corresponding to those reactions.

The former of these you can also do through the element picker.

3mesaoptimizer1d
I use GreaterWrong as my front-end to interface with LessWrong, AlignmentForum, and the EA Forum. It is significantly less distracting and also doesn't make my ~decade old laptop scream in agony when multiple LW tabs are open on my browser.

Are people in rich countries happier on average than people in poor countries? (According to GPT-4, the academic consensus is that it does, but I'm not sure it's representing it correctly.) If so, why do suicide rates increase (or is that a false positive)? Does the mean of the distribution go up while the tails don't or something?

4peterbarnett1d
People in rich countries are happier than people in poor countries generally (this is both people who say they are "happy" or "very happy", and self-reported life satisfaction), see many of the graphs here https://ourworldindata.org/happiness-and-life-satisfaction In general it seems like richer countries also have lower suicide rates: "for every 1000 US dollar increase in the GDP per capita, suicide rates are reduced by 2%" 
Viliam1d62

Possible bias, that when famous and rich people kill themselves, everyone is discussing it, but when poor people kill themselves, no one notices?

Also, I wonder what technically counts as "suicide"? Is drinking yourself to death, or a "suicide by cop", or just generally overly risky behavior included? I assume not. And these seem to me like methods a poor person would choose, while the rich one would prefer a "cleaner" solution, such as a bullet or pills. So the reported suicide rates are probably skewed towards the legible, and the self-caused death rate of the poor could be much higher.

keltan2d60

A potentially good way to avoid low level criminals scamming your family and friends with a clone of your voice is to set a password that you each must exchange.

An extra layer of security might be to make the password offensive, an info hazard, or politically sensitive. Doing this, criminals with little technical expertise will have a harder time bypassing corporate language filters.

Good luck getting the voice model to parrot a basic meth recipe!

Good luck getting the voice model to parrot a basic meth recipe!

This is not particularly useful, plenty of voice models will happily parrot absolutely anything. The important part is not letting your phrase get out; there's work out there on designs for protocols for how to exchange sentences in a way that guarantees no leakage even if someone overhears.

3Dagon2d
Hmm.  I don't doubt that targeted voice-mimicking scams exist (or will soon).  I don't think memorable, reused passwords are likely to work well enough to foil them.  Between forgetting (on the sender or receiver end), claimed ignorance ("Mom,  I'm in jail and really need money, and I'm freaking out!  No, I don't remember what we said the password would be"), and general social hurdles ("that's a weird thing to want"), I don't think it'll catch on. Instead, I'd look to context-dependent auth (looking for more confidence when the ask is scammer-adjacent), challenge-response (remember our summer in Fiji?), 2FA (let me call the court to provide the bail), or just much more context (5 minutes of casual conversation with a friend or relative is likely hard to really fake, even if the voice is close). But really, I recommend security mindset and understanding of authorization levels, even if authentication isn't the main worry.  Most friends, even close ones, shouldn't be allowed to ask you to mail $500 in gift cards to a random address, even if they prove they are really themselves.
1keltan2d
I now realize that my thinking may have been particularly brutal, and I may have skipped inferential steps. To clarify, If someone didn't know, or was reluctant to repeat a password, I would end contact or request an in person meeting. But to further clarify, that does not make your points invalid. I think it makes them stronger. If something is weird and risky, good luck convincing people to do it.

Check my math: how does Enovid compare to to humming?

Nitric Oxide is an antimicrobial and immune booster. Normal nasal nitric oxide is 0.14ppm for women and 0.18ppm for men (sinus levels are 100x higher). journals.sagepub.com/doi/pdf/10.117…

Enovid is a nasal spray that produces NO. I had the damndest time quantifying Enovid, but this trial registration says 0.11ppm NO/hour. They deliver every 8h and I think that dose is amortized, so the true dose is 0.88. But maybe it's more complicated. I've got an email out to the PI but am not hopeful about a response ... (read more)

4kave2d
Enovid is also adding NO to the body, whereas humming is pulling it from the sinuses, right? (based on a quick skim of the paper). I found a consumer FeNO-measuring device for €550. I might be interested in contributing to a replication

I think that's their guess but they don't directly check here. 

I also suspect that it doesn't matter very much. 

  • The sinuses have so much NO compared to the nose that this probably doesn't materially lower sinus concentrations. 
  • the power of humming goes down with each breath but is fully restored in 3 minutes, suggesting that whatever change happens in the sinsues is restored quickly
  • From my limited understanding of virology and immunology, alternating intensity of NO between sinuses and nose every three minutes is probably better than keeping
... (read more)
Load More